Open Source Software Engineering Intern at Eventual (W22)
$80K - $100K  •  
Data Warehouse for Computer Vision
San Francisco, CA, US
About Eventual

Eventual: The Data Warehouse for Computer Vision

Eventual is building an integrated development experience for data scientists and engineers to query, process and build applications on Complex Data (non-tabular data such as images, video, audio and 3D scans).


Daft ( is our open-sourced Python dataframe API for working with Complex Data. With Daft, users can query and transform their data interactively in a notebook environment, running workloads such as analytics, data preprocessing and machine learning model training/inference. The same transformations that are performed on the dataframe can then be deployed as a HTTP service to respond to incoming requests, helping our users go from experimentation to productionization faster than ever before.

Eventual Cloud Platform

The Eventual Cloud Platform provides an integrated development environment for our users to go from local development to production. We provide:

Notebooks for interactive data science with Daft Fully-managed cluster computing infrastructure to run large distributed Daft workloads Application deployment as services or automated jobs

About Us

Eventual (YC W22) is funded by investors such as Caffeinated Capital, and top angels in the valley from Databricks, Meta and Lyft. Our team has deep expertise in high performance computing, big data technologies, cloud infrastructure and machine learning.

About the role
Skills: Git, Python, Rust, Distributed Systems

About Eventual:

At Eventual (YC W22), we are building Daft, the open-source framework for processing complex data.

Data processing systems today are highly optimized for simple tabular data. However, much of the useful data in the world is in a more complex form, such as media (e.g. images, video, audio), scientific formats (e.g. genomes), and ML artifacts (e.g. embeddings). There are many challenges today that make processing complex data much more difficult than working with simple tabular data. Our goal is to make working with complex data as easy as working with simple data, and become the de-facto solution for building applications on top of complex data.

You would be joining a small, fast-moving team of engineers with deep expertise across related domains: big data, distributed systems, machine learning, self-driving, genomics, and high performance computing.

About the role:

As a software engineering intern, your primary responsibilities would be contributing to the development of the open-source Daft distributed dataframe. Development moves quickly here, but here are some projects you might expect to work on:

  1. Implementing algorithms on complex data types, such as embedding similarity or image kernels
  2. Improving performance of distributed join operations
  3. Building integrations for high-performance data loading

We are a young startup, so do be prepared to wear many hats - tinkering with infrastructure, talking to users, and participating in design processes with the team!

About you:

You will want to be comfortable working on a novel distributed data processing system. Some things that will be important are:

  1. Industry or research experience working with distributed systems, especially data-intensive systems such as Spark, Dask, or Ray.
  2. Experience with Arrow-based frameworks.
  3. Familiarity with Python or Rust.
  4. A strong sense of ownership and autonomy; a desire to build good systems for users.

Big nice to haves are:

  1. Experience working on production machine learning systems.
  2. Experience with compilers or query optimization.

Office and benefits:

Our office setup is hybrid remote, with three in-person days a week. We are in our San Francisco office Wednesday to Friday.



Daft ( is a Python dataframe library that can run on user-provided resources (e.g. a user’s laptop or their team’s compute cluster) and also on the Eventual Cloud Platform.

Query planning and optimizations: optimizing the query graph and data blocks for efficient execution of workloads on our various compute engine backends. Data lake management: optimal storage, indexing and retrieval of data using open formats such as Apache Parquet and Apache Iceberg. Building and serving applications: packaging our users’ dataframe code and deploying their algorithms as live services.

Eventual Cloud Platform

The Eventual Cloud Platform is a multi-tenant system that runs on top of Kubernetes.

Web Frontend: web application (Typescript/React/FastAPI/Jupyter) that provides a frontend to all of the platform’s functionality such as notebooks, application serving and cluster management. Cluster computing: managing and operating Ray clusters to power large distributed Daft queries and data processing Multi-tenancy: ensure tenant isolation for security, but leverage multi-tenancy for economies of scale

Interview Process

15-minute phone screen

A short phone screen over video call with one of our cofounders (either Sammy or Jay) for us to get acquainted, understand your aspirations and evaluate if there is a good fit in terms of the type of role you are looking for.

Technical Interviews

Our technical interviews for this role are focused on understanding your technical knowledge with distributed data processing.

60-minute data engineering design interview

A technical interview to understand your familiarity with the internals of a distributed data engineering system.

60-minute systems programming interview

A technical interview to understand your familiarity with systems programming and Linux.

Get to know us

As many chats as necessary to get to know us - come have a coffee with our cofounders and existing employees to understand who we are and our goals, motivations and ambitions.

We look forward to meeting you!

Other jobs at Eventual

internSan Francisco, CA, USFull Stack$80K - $100KJunior and above

fulltimeSan Francisco, CABackend$120K - $200K0.50% - 2.00%3+ years

Hundreds of YC startups are hiring on Work at a Startup.

Sign up to see more ›