Senior Software Engineer - AI Agents at Datafold (S20)
$175K - $245K  •  0.10% - 0.50%
Automating manual and repetitive data engineering tasks with AI
Remote (US)
Full-time
US citizen/visa only
6+ years
About Datafold

About Datafold

At Datafold, we build tools for data practitioners to automate the most error-prone and time-consuming parts of the data engineering workflow: testing data to guarantee its quality. While data quality (just like software quality) is a complex and multifaceted problem, we draw from decades of our team’s combined experience in the data domain to build opinionated tools our users love. Specifically, we believe that:

Data quality is a byproduct of a great data engineering workflow. That means, rather than building yet-another-app for data practitioners to switch to and from, we insert our tools in the existing workflows, for example, in CI/CD for deployment testing and IDEs for testing during development.

Data quality issues should be addressed before deploying the code. Most data quality issues are bugs in the code that processes data, and applying a proactive, shift-left approach is the most effective way to achieve high shopping velocity and data quality simultaneously. Read more

Lack of metadata (data about data) is the biggest gap in the data engineering workflow. We bring powerful tools such as data diffing and column-level lineage to every data engineer’s workflow to help them validate the code and underlying data and fully understand the dependencies in complex data pipelines.

Datafold is used by data teams at Patreon, Thumbtack, Substack, Angellist, among others, and raised $22M from YC, NEA & Amplify Partners.

About the role
Skills: Python

About Datafold

Datafold automates manual workflows in data engineering, helping companies unlock the full value of their data and enabling data teams to ship faster without compromising data quality.

Backed by top-tier investors including YC, Amplify, and NEA, we're redefining how companies like Disney, FanDuel, and Perplexity do data engineering. Headquartered in New York, we’re a fully remote team with employees from around the world, spanning three continents.

About the Role

We’re looking for an experienced backend engineer to help build and scale the Datafold Migration Agent (DMA) — an AI-powered product that revolutionizes data platform code migrations. DMA automates the static analysis, translation, and refactoring of analytical codebases at a million-line-of-code scale, reducing migration timelines by 5-10x and eliminating the need for manual work and costly consultants.

Responsibilities

  • Drive the development of DMA, shaping both the technical and architectural foundation for a product that combines AI and data engineering to solve complex migration challenges.
  • Collaborate with our Solution Engineers and directly with customers to refine features and iterate on the product.
  • Make strategic technical decisions to ensure DMA is scalable, robust, and optimized for high-performance code migrations.
  • Take ownership of projects end-to-end, troubleshooting and resolving complex challenges in real time to deliver high-quality, impactful results.

About You

  • Experience: 5+ years as a software engineer with a strong backend focus.
  • Tech Stack: Proficiency in Python is required. Experience building with large language models (LLMs) is a plus, but not required.
  • Ownership: Proven ability to manage projects end-to-end, from design through deployment.
  • Startup Mindset: Thoughtful about balancing speed, quality, and business impact.

If building a high-impact, innovative product at the intersection of AI and data engineering sounds exciting, we’d love to hear from you.

Technology

Datafold is a distributed web application with a rich interactive UI. Our backend is primarily written in Python with the use of FastAPI, SQLAlchemy, and Celery. We rely on Rust for specialized and performance-critical components. The data is stored in PostgreSQL, Neo4j (metadata graph), and Clickhouse (logs and time series). Our frontend is written in Typescript with the use of React, Redux, and GraphQL. We also use WebAssembly for performance-critical components.

Other jobs at Datafold

fulltimeRemote (US)Full stack$175K - $245K0.10% - 0.50%6+ years

Hundreds of YC startups are hiring on Work at a Startup.

Sign up to see more ›