Software Engineer, Backend at Aquarium Learning (S20)
We help ML teams improve their models by improving their datasets
San Francisco, California / Remote
Full-time
3+ years
About Aquarium Learning

Aquarium helps deep learning teams improve their model performance by improving their datasets.

A model is only as good as the dataset it’s trained on. We help teams find problems with their datasets + models and fix them by editing / adding data to their datasets.

About the role

Skills: Python, Distributed Systems, Data Warehousing, ETL

Aquarium is making it easier for teams to build and improve their ML models.

As a backend software engineer, you will drive development of our backend services, data processing pipelines, and customer facing APIs. Our current backend tech stack is primarily python based on GCP, with Apache Beam for most batch data processing jobs. You’ll own taking our existing backend and data pipelines to a place where they can support hundreds and thousands of customer organizations. In addition to that, you’ll also drive development of the customer developer experience, such as customer APIs and our python client library.

What you will do:

  • Drive development of our backend services and data pipelines, improving both robustness and functionality. Improve our existing technical foundations, and influence our technical direction and strategy.
  • Work with our developer customers to offer a great developer experience.
  • Generally be a great person, and help set the tone for future hires!

What you should have

  • 2+ years of professional development experience.
  • Demonstrated skills with python or a similar backend language, and at least one major cloud provider (AWS, GCP, Azure). Bonus points for prior experience with data pipelines or B2B SaaS infrastructure.
  • Experience with data engineering, ETL pipelines, and related technologies.
  • The ability to work in an unstructured, self-directed environment.
  • A love for building, especially novel product experiences.
  • Care and empathy for users. We only exist to make our customers successful.
  • Bachelor’s degree in Computer Science or a related field, or equivalent industry experience.

Technology

Aquarium’s technology relies on letting your trained ML model do the work of guiding what parts of your dataset to pay attention to.

For example, Aquarium finds examples where your model has the highest loss / disagreement with your labeled dataset, which tends to surface many labeling errors (ie, the model is right and the label is wrong!).

Users can also provide their model's embeddings for each entry, which are an anonymized representation of what their model “thought” about the data. The neural network embeddings for a datapoint encode the input data into a relatively short vector of floats. We can then identify outliers and group together examples in a dataset by analyzing the distances between these embeddings. We also provide a nice thousand-foot-view visualization of embeddings that allows users to zoom into interesting parts of their dataset. (https://youtu.be/DHABgXXe-Fs?t=139). We heavily use React, WebGL, Python, and Apache Beam in our day-to-day work.

Think about this as a platform for interactive learning. By focusing on the most “important” areas of the dataset that the model is consistently getting wrong, we increase the leverage of ML teams to sift through massive datasets and decide on the proper corrective action to improve their model performance.

Our goal is to build tools to reduce or eliminate the need for ML engineers to handhold the process of improving model performance through data curation - basically, Andrej Karpathy’s Operation Vacation concept (https://youtu.be/g2R2T631x7k?t=820) as a service.

Other jobs at Aquarium Learning

fulltime
San Francisco, CA
fulltime
San Francisco, California / Remote
Backend
3+ years
fulltime
Remote
3+ years
fulltime
San Francisco / Remote
Full Stack
3+ years