Eventual is building an integrated development experience for data scientists and engineers to query, process and build applications on Complex Data (non-tabular data such as images, video, audio and 3D scans).
Daft (https://www.getdaft.io) is our open-sourced Python dataframe API for working with Complex Data. With Daft, users can query and transform their data interactively in a notebook environment, running workloads such as analytics, data preprocessing and machine learning model training/inference. The same transformations that are performed on the dataframe can then be deployed as a HTTP service to respond to incoming requests, helping our users go from experimentation to productionization faster than ever before.
The Eventual Cloud Platform provides an integrated development environment for our users to go from local development to production. We provide:
Notebooks for interactive data science with Daft Fully-managed cluster computing infrastructure to run large distributed Daft workloads Application deployment as services or automated jobs
Eventual (YC W22) is funded by investors such as Caffeinated Capital, Array.vc and top angels in the valley from Databricks, Meta and Lyft. Our team has deep expertise in high performance computing, big data technologies, cloud infrastructure and machine learning.
At Eventual (YC W22), we are building Daft, the open-source framework for processing complex data.
Data processing systems today are highly optimized for simple tabular data. However, much of the useful data in the world is in a more complex form, such as media (e.g. images, video, audio), scientific formats (e.g. genomes), and ML artifacts (e.g. embeddings). There are many challenges today that make processing complex data much more difficult than working with simple tabular data. Our goal is to make working with complex data as easy as working with simple data, and become the de-facto solution for building applications on top of complex data.
You would be joining a small, fast-moving team of engineers with deep expertise across related domains: big data, distributed systems, machine learning, self-driving, genomics, and high performance computing.
As a software engineering intern, your primary responsibilities would be contributing to the development of the open-source Daft distributed dataframe. Development moves quickly here, but here are some projects you might expect to work on:
We are a young startup, so do be prepared to wear many hats - tinkering with infrastructure, talking to users, and participating in design processes with the team!
You will want to be comfortable working on a novel distributed data processing system. Some things that will be important are:
Big nice to haves are:
Our office setup is hybrid remote, with three in-person days a week. We are in our San Francisco office Wednesday to Friday.
Daft (https://www.getdaft.io) is a Python dataframe library that can run on user-provided resources (e.g. a user’s laptop or their team’s compute cluster) and also on the Eventual Cloud Platform.
Query planning and optimizations: optimizing the query graph and data blocks for efficient execution of workloads on our various compute engine backends. Data lake management: optimal storage, indexing and retrieval of data using open formats such as Apache Parquet and Apache Iceberg. Building and serving applications: packaging our users’ dataframe code and deploying their algorithms as live services.
The Eventual Cloud Platform is a multi-tenant system that runs on top of Kubernetes.
Web Frontend: web application (Typescript/React/FastAPI/Jupyter) that provides a frontend to all of the platform’s functionality such as notebooks, application serving and cluster management. Cluster computing: managing and operating Ray clusters to power large distributed Daft queries and data processing Multi-tenancy: ensure tenant isolation for security, but leverage multi-tenancy for economies of scale
A short phone screen over video call with one of our cofounders (either Sammy or Jay) for us to get acquainted, understand your aspirations and evaluate if there is a good fit in terms of the type of role you are looking for.
Our technical interviews for this role are focused on understanding your technical knowledge with distributed data processing.
A technical interview to understand your familiarity with the internals of a distributed data engineering system.
A technical interview to understand your familiarity with systems programming and Linux.
As many chats as necessary to get to know us - come have a coffee with our cofounders and existing employees to understand who we are and our goals, motivations and ambitions.
We look forward to meeting you!
internSan Francisco, CA, USFull Stack$80K - $100KJunior and above
fulltimeSan Francisco, CABackend$120K - $200K0.50% - 2.00%3+ years