Staff AI Systems Engineer at Flock Safety (S17)
$200K - $240K
The first public safety operating system that eliminates crime.
US / Remote (US)
Full-time
Will sponsor
6+ years
About Flock Safety

Flock Safety provides the first public safety operating system that empowers private communities and law enforcement to work together to eliminate crime. We are committed to protecting human privacy and mitigating bias in policing with the development of best-in-class technology rooted in ethical design, which unites civilians and public servants in pursuit of a safer, more equitable society.

Our Safety-as-a-Service approach includes affordable devices powered by LTE and solar that can be installed anywhere. Our technology detects and captures objective details, decodes evidence in real-time and delivers investigative leads into the hands of those who matter.

While safety is a serious business, we are a supportive team that is optimizing the remote experience to create strong and fun relationships even when we are physically apart. Our flock of hard-working employees thrive in a positive and inclusive environment, where a bias towards action is rewarded. Flock Safety is headquartered in Atlanta and operates nationwide. We have raised $150M in our Series E led by Tiger Global at a $3.5B valuation.

About the role
Skills: Python

The Opportunity

We’re hiring a Sr. AI Systems Engineer to help support our emerging product, Night Shift, an AI research assistant that amplifies the impact of investigators by automating the tedious, repetitive steps involved in working a case. This role sits within the Machine Learning team and will work closely with partners in Engineering (Backend, Frontend, and Design) in a fast-paced environment. You will be one of the earliest technical contributors to our system architecture for agentic AI, and will own our AI evaluation framework. The outcome we’re after is clear and ambitious: measurably faster, more accurate leads for every officer and every shift.

The Skillset

Familiarity with Agentic Systems: Hands-on experience with LLM agents including:

  • LLM API use (e.g. LangChain/LangGraph, vLLM, OpenAI/Gemini/Anthropic APIs)
  • Agent Design: tool use (e.g. via MCP), retrieval, memory, grounding/attribution for claims, and guardrails.
  • Architectural patterns: planning and hand-off for multi-agent systems, context management
  • RAG: vector/hybrid search (e.g. pgvector, turbopuffer, rerankers, etc.)

ML Platform expertise: 5+ years building and shipping ML systems to production; experience in the following areas:

  • Backend Python and JS familiarity required; Typescript/Golang familiarity welcome
  • Web services (e.g. Express/FastAPI, REST, SSE, JWTs)
  • Cloud Infrastructure (e.g. AWS, Terraform, VPC, Networking)
  • Backend databases/stores (e.g. Postgres, Redis)
  • Observability (e.g. Prometheus, Grafana, OpenTelemetry, LangSmith/Langfuse)
  • [Preferred] Durable execution (e.g. Temporal, Hatchet)
  • [Preferred] OLAP (e.g. ClickHouse, Bigquery)
  • [Preferred] ML Inference (e.g. PyTorch, TensorRT, NVIDIA Triton), ideally in multimodal domains (text/image/video)
  • [Preferred] Compute orchestration (e.g. Kubernetes, Prefect, Ray)

Experience with LLM Evaluations at scale: You’ve built offline/online eval harnesses and are familiar with the methodologies and metrics to measure:

  • Search, retrieval, and recommendation performance
  • Safety & robustness (security, compliance, red-teaming, regression testing)
  • Cost, performance and latency trade-offs
  • [Preferred] Agentic task success, trajectory quality, preference learning (e.g. SFT, DPO, RLHF, LLM-as-judge)

Feeling uneasy that you haven’t ticked every box? That’s okay; we’ve felt that way too. Studies have shown women and minorities are less likely to apply unless they meet all qualifications. We encourage you to break the status quo and apply to roles that would make you excited to come to work every day.

90 Days at Flock

We are a results-oriented culture and believe job descriptions are a thing of the past. We prescribe 90 day plans and believe that good days lead to good weeks, which lead to good months. This serves as a preview of the 90 day plan you will receive if you were to be hired in this role at Flock Safety.

The First 30 Days

  • Immerse yourself in the current system design and agent/tooling landscape. Understand the core customer use cases and data flows.
  • Support the team by shipping a few quick wins (e.g., refining tool APIs, prompt engineering, fixing bugs)
  • Stand up the foundational eval and observability scaffolding (datasets, metrics, KPIs, reporting)
  • Propose a technical architecture and implementation plan for an agent evaluation framework.

The First 60 Days

  • Deliver the MVP evaluation harness to produce initial metrics, enable debugging and perform regression testing.
  • Take on a system feature that offers demonstrated improvement against your MVP evaluation suite

90 Days & Beyond

  • Productionize the evaluation and observability platform and make it the source of truth for quality and safety. (e.g. Online/offline tracing, alerting, dashboards, evaluations and PR-gated regression suite)
  • Own the roadmap for evolving the agent evaluation platform
  • Lead deeper R&D threads (e.g., lightweight fine-tuned projection layers, specialized embeddings, multimodal understanding) that can improve system performance on core metrics.

If you’re excited to build AI that tangibly amplifies real-world public safety outcomes—and you love making complex systems measurable, dependable, and fast—we’d love to talk.

Salary & Equity

In this role, you’ll receive a starting salary between $200,000 and $225,000 as well as Flock Safety Stock Options. Base salary is determined by job-related experience, education/training, as well as market indicators. Your recruiter will discuss this in-depth with you during our first chat.

Location

We’re building the impossible, together. To drive innovation through in-person collaboration, we’re prioritizing candidates in our key hubs: Atlanta, Boston, Chicago, Denver, Los Angeles, New York City, San Francisco, and Austin. While we value the energy of our hub communities, we embrace remote work and welcome applications from exceptional talent across the United States.

Technology

Main Stack: Typescript (Node.js), React, Golang, Postgres, Elastic, Dynamo. Hosted on AWS (k8s, sns) using Docker.

Camera Firmware: Embedded Android running on a custom PCB that we develop in house.

Machine Learning: Pytorch, TensorFlow, TF Serving, Computer Vision, Kubernetes

Other jobs at Flock Safety

fulltimeUS / Remote (US)Backend$200K - $235K6+ years

fulltimeAtlanta, GA, US / Remote (Atlanta, GA, US)Full stack$170K - $250K6+ years

fulltimeUS / Remote (US)Machine learning$200K - $240K6+ years

fulltimeAtlanta, GA, US / Remote (Atlanta, GA, US)Backend$170K - $190K6+ years

fulltimeRemote, US / Atlanta, GA, US / Remote (US)Machine learning$170K - $200K6+ years

fulltimeUS / Remote (US)Full stack$171K - $190K3+ years

Hundreds of YC startups are hiring on Work at a Startup.

Sign up to see more ›