Software Engineer, Infrastructure at Aquarium Learning (S20)
We help ML teams improve their models by improving their datasets
Remote (US)
About Aquarium Learning

Aquarium helps deep learning teams improve their model performance by improving their datasets.

A model is only as good as the dataset it’s trained on. We help teams find problems with their datasets + models and fix them by editing / adding data to their datasets.

About the role

As the first Infrastructure Software Engineer on the Aquarium team, you will drive development of our core product’s underlying technical infrastructure and systems. These services include data streaming pipelines to ingest customer datasets, search indexes for live queries of unstructured data, and user-facing web applications / APIs. Our current tech stack is primarily python based on GCP, with Apache Beam for most batch and streaming data processing jobs. 

You will also contribute to the internal infrastructure that all of engineering uses to develop and operate their services. As an expert on reliable, maintainable systems, you will set the direction for our development processes, including building the initial versions of core internal infrastructure and systems.

What you will do

  • Architect and build core infrastructure platforms to handle production scales of machine learning data.
  • Collaborate with the rest of engineering to establish and enable development best practices.
  • Ensure the uptime and reliability of Aquarium’s distributed systems, through both proactive improvements and by handling live production incidents.
  • Automate manual and error-prone operational tasks away.
  • Diagnose and optimize performance of our infrastructure, reducing costs and improving experience for our end users.

What you should have

  • 3+ years as an infrastructure focused Software Engineer, Site Reliability Engineer, or similar experience.
  • Demonstrated skills with python or a similar backend language, and at least one major cloud provider (AWS, GCP, Azure).
  • Experience with production deployments using Docker and Kubernetes (or similar orchestration frameworks)
  • Experience with Infrastructure-As-Code and modern configuration management.
  • Experience with data engineering, ETL pipelines, and related technologies.
  • Care and empathy for users. We only exist to make our customers successful.
  • Bachelor’s degree in Computer Science or a related field, or equivalent industry experience.

About Aquarium

Machine learning is eating the world. However, though it’s easier than ever to build a prototype of an ML system, it’s still extremely difficult to build, maintain, and improve ML systems in production to solve real world problems. Aquarium helps teams ship better ML models faster to enable the next generation of revolutionary AI applications.

Aquarium is backed by top investors including Y Combinator and Sequoia Capital. Our customers span many industries, from robotics to agriculture to construction. We’re looking to grow our team with awesome people who’ll shape the future of Aquarium -- both as a product and as a company.


Aquarium’s technology relies on letting your trained ML model do the work of guiding what parts of your dataset to pay attention to.

For example, Aquarium finds examples where your model has the highest loss / disagreement with your labeled dataset, which tends to surface many labeling errors (ie, the model is right and the label is wrong!).

Users can also provide their model's embeddings for each entry, which are an anonymized representation of what their model “thought” about the data. The neural network embeddings for a datapoint encode the input data into a relatively short vector of floats. We can then identify outliers and group together examples in a dataset by analyzing the distances between these embeddings. We also provide a nice thousand-foot-view visualization of embeddings that allows users to zoom into interesting parts of their dataset. ( We heavily use React, WebGL, Python, and Apache Beam in our day-to-day work.

Think about this as a platform for interactive learning. By focusing on the most “important” areas of the dataset that the model is consistently getting wrong, we increase the leverage of ML teams to sift through massive datasets and decide on the proper corrective action to improve their model performance.

Our goal is to build tools to reduce or eliminate the need for ML engineers to handhold the process of improving model performance through data curation - basically, Andrej Karpathy’s Operation Vacation concept ( as a service.

Other jobs at Aquarium Learning

fulltimeRemote6+ years
fulltimeRemoteBackend3+ years
fulltimeRemote (US)Full Stack
fulltimeRemoteWeb Design
fulltimeRemoteFull Stack3+ years

Hundreds of YC startups are hiring on Work at a Startup.

Sign up to see more ›