Product Engineer at Hamming AI (S24)
$140K - $200K  •  0.05% - 0.20%
Complete QA platform for voice agents
San Francisco, CA, US / London, England, GB / Austin, TX, US
Full-time
US citizen/visa only
3+ years
About Hamming AI

Hamming automates the QA of voice agents. (both pre-deployment testing and post-deployment analytics)

Making voice AI agents reliable is hard. A small change in prompts, function call definitions, or model providers can cause large changes in LLM outputs.

We have a proven track record of helping enterprises win with AI. Sumanyu (CEO) previously helped Citizen (safety app) grow its users by 4X and grew an AI-powered sales program to 100s of millions in revenue/year at Tesla.

Our eng team ranks #1 on https://workweave.ai/

About the role
Skills: Next.js, Node.js, React, TypeScript

Location: Remote (North America) or Austin, TX

Employment Type: Full-time (no contractors)

Department: Engineering

About Hamming AI

Hamming automates QA for voice AI agents. Everyone is building voice agents. We secure them. In fact, we invented this category. With one click, thousands of our agents call our customers’ agents across accents, background noise, and personalities—then we generate crisp bug reports and production-grade analytics. Reliability is the moat in voice AI, and that’s our whole job.

We are one of the fastest engineering teams in the world. We prod deploy 4x / day.

I’m looking for someone who can own reliability and scale across our LLM-enabled platform, shipping precise, outcome-driven improvements to high-availability systems.

Sumanyu (CEO)

Previously: grew Citizen 4× and scaled an AI sales program to $100Ms/yr at Tesla.

Devin Case Study

Ranked #1 Eng team

OpenAI Dev Day 100billion token list

What you’ll do

  • Own product features end-to-end: spec → prototype → ship → iterate, across frontend and backend.
  • Work closely with customers: onboard new accounts, run weekly check-ins, and act as a high-agency partner to drive adoption and outcomes.
  • Build core customer workflows for voice-agent QA: test creation, scenario management, evaluation results, analytics, debugging, and triage.
  • Turn messy, high-dimensional data (calls, transcripts, tool events, traces, eval outputs) into product experiences that are obvious and actionable.
  • Partner with customers to understand their reliability pain, then translate it into shipped product with measurable outcomes.
  • Tighten the product loop: instrumentation, funnels, and feedback so we know what’s working and what’s not.
  • Maintain high engineering velocity while keeping craftsmanship: clean APIs, strong abstractions, and excellent UI polish.

You might be a fit if you

  1. Have 3+ years building customer-facing products in a high-velocity environment (startup experience a plus).
  2. Are fluent in TypeScript and comfortable across the stack (React/Next.js + Node services).
  3. Ship quickly but with discipline: you write clear code, strong tests where it matters, and avoid accidental complexity.
  4. Have strong product instincts: you can simplify complex workflows into crisp UX and make good tradeoffs under ambiguity.
  5. Love talking to users, diagnosing friction, and iterating until a feature feels “done.”
  6. Care about reliability: you build with observability, failure modes, and data correctness in mind.
  7. Communicate clearly: written specs, crisp PRs, and decisions that scale across a fast-moving team.

Bonus

  • Experience building analytics-heavy products (dashboards, event pipelines, debugging tools).
  • Familiarity with LLM apps, evals, tool calling, or prompt/guardrail systems.
  • Experience with real-time systems, telecom/voice, or high-concurrency workflows.
  • Strong UI craft: interaction design, information architecture, and performance tuning.

Interesting problems you’ll touch

  • Debugging workflows for voice agents: call timelines, transcripts, tool calls, traces, and “what changed?” diffs.
  • Test authoring that scales: scenario libraries, parameterization, coverage, and regression packs.
  • Evaluation UX: turning model-graded / heuristic / human feedback into trustworthy signals and action items.
  • Analytics that matter: reliability metrics customers can run their business on.
  • Enterprise readiness in-product: RBAC, audit trails, data retention, and environment/region controls.

Our stack

  • App: Next.js, TypeScript, Tailwind
  • AI: OpenAI, Anthropic, STT/TTS providers
  • Realtime/Orchestration: LiveKit, Pipecat/Daily, Temporal
  • Infra/DB: AWS, k8s, PostgreSQL, Redis, Terraform
  • Observability: OpenTelemetry, SigNoz

Apply

If you want to build the product layer for reliable Voice AI, let’s talk.

Technology

TECH STACK

  • Essentials: Next.JS, TypeScript, Python, Tailwind
  • AI: OpenAI, Anthropic, STT, TTS, etc.
  • Infrastructure: PostgreSQL, k8

INTERESTING TECHNICAL CHALLENGES

  • Create voice simulations that model the real world (background noise, accents, etc.)
  • Generalize DSPy style prompt optimization to voice
  • Support 10K parallel calls with 99.99% reliability

Other jobs at Hamming AI

fulltimeSan Francisco, CA, US / London, England, GB / Austin, TX, USFull stack$140K - $200K0.05% - 0.20%3+ years

fulltimeSan Francisco, CA, US / London, England, GB / Austin, TX, USBackend$140K - $200K0.05% - 0.20%3+ years

Hundreds of YC startups are hiring on Work at a Startup.

Sign up to see more ›