Founding Research Engineer at AgentHub (S25)
$160K - $350K  •  0.50% - 2.01%
The simulation and evaluation engine for AI agents
San Francisco, CA, US
Full-time
Will sponsor
Any (new grads ok)
About AgentHub

At AgentHub, we’re building the simulation and evaluation engine for AI agents. As agents become more powerful, complex, and widely deployed, the need to efficiently and thoroughly evaluate them grows significantly. Our platform enables teams to measure safety, reliability, and quality by testing agents in realistic environments, surfacing insights, and driving improvements.

We’re an early seed-stage startup, built by the former tech lead of Apple’s Foundation Model Evaluation team and CMU and MIT grads and backed by Y-Combinator, leading VC’s, and angels.

Joining AgentHub means working directly with the founders on high-impact problems, shaping the culture, and building the critical infrastructure layer that will make complex, agent-driven products possible. If you want to work at the frontier of AI, where research meets real-world systems, we'd love to hear from you.

About the role
Skills: Prompt Engineering, Python, TypeScript, Reinforcement learning (RL), Amazon Web Services (AWS)

The Role:

You will work directly with the founders on advancing the product roadmap and AgentHub’s core evaluation and simulation capabilities. You’ll have significant ownership and will keep up to date with the latest state-of-the-art methodologies and techniques across areas like agent evaluation, data generation, and RL - translating these into real features in the hands of real users.

What you will do:

  • Design and build the core methodologies and components for evaluating agents across axes like instruction following, safety, groundedness, and efficiency.
  • Research, experiment, and implement robust data generation capabilities.
  • Tie in the latest advancements in research and productionalize them to unlock value for our customers.

Signs you might thrive in this role:

  • Bachelors/Masters/PhD in Computer Science or related field.
  • Passionate about building category-defining products.
  • Previous background and experience in reinforcement learning.
  • Demonstrated experience in model/agent evaluation is a major plus.
  • Opinionated, perpetually curious, and love having scope over a problem and delivering.
Technology

We’re building the simulation evaluation engine for AI agents. AgentHub is defining the core infra layer that makes agents reliable enough for production at scale for industry-specific tasks.

If you’re excited about building a category defining product pushing the boundaries of what it means to evaluate and ship agents that operate in complex environments we’d love to have you on board.

We work at the intersection of AI research, large-scale infrastructure, and highly streamlined user experiences. The problems we tackle include:

  • Sandboxing agents and running thousands of concurrent simulations at scale.
  • Designing evaluation pipelines that capture reasoning, safety, reliability, and edge-case behavior.
  • Building scalable systems for ingesting, storing, and analyzing structured + unstructured outputs from agents.
  • Rethinking the optimal user experience to quickly onboard arbitrary agents and leverage robust evaluation infrastructure.
  • Exposing tools and interfaces that make evaluation results understandable and actionable.

The work is both research and systems heavy. You’ll have the opportunity to wear multiple hats and exercise a variety of skills, own features end-to-end, and meet with customers. You might be building a new evaluation method, scaling an infra service, and sitting with a customer to design a custom environment all within the same week.

Other jobs at AgentHub

internSan Francisco, CA, USFull stack$7K - $10K / monthlyAny

fulltimeSan Francisco, CA, USFull stack$160K - $350K0.50% - 2.00%Any (new grads ok)

internSan Francisco, CA, USMachine learning$8K - $11K / monthlyAny

fulltimeSan Francisco, CA, USMachine learning$160K - $350K0.50% - 2.01%Any (new grads ok)

Hundreds of YC startups are hiring on Work at a Startup.

Sign up to see more ›