Cerebrium is a serverless infrastructure platform that makes it easy for companies to build and scale data/AI workloads. With just a team of 4 across the US and South Africa, we scaled to doing millions in revenue, serving some of the most ambitious AI teams from Seed to Series C and are backed by world class investors like Gradient Ventures (Google's AI fund) and Y Combinator with over $8m in funding.

We obsess over performance - providing low latency, high scalability, and reliability to our clients. This has led us to engineering core components across our stack from from the ground up, including our own content-aware file system, custom image building pipeline, as well as optimizing our storage and network layers. Every decision we make is driven by the experience we want to unlock for developers: fast, reliable, and intuitive.

You’ll work alongside a team that has founded and exited companies, led engineering teams of 80+ engineers, and built distributed systems at scale across multiple industries. You’ll learn a lot—but you’ll also be expected to take ownership, move quickly, and directly influence the direction of our product.

Skills: Go, Kubernetes

Cerebrium is a serverless infrastructure platform that makes it easy for companies to build and scale data/AI workloads. With just a team of 4 across the US and South Africa, we scaled to doing millions in revenue, serving some of the most ambitious AI teams from Seed to Series C and are backed by world class investors like Gradient Ventures (Googles AI fund) and Y Combinator with over $8m in funding.

We obsess over performance - providing low latency, high scalability, and reliability to our clients. This has led us to engineering core components across our stack from from the ground up, including our own content-aware file system, custom image building pipeline, as well as optimizing our storage and network layers. Every decision we make is driven by the experience we want to unlock for developers: fast, reliable, and intuitive.

You’ll work alongside a team that has founded and exited companies, led engineering teams of 80+ engineers, and built distributed systems at scale across multiple industries. You’ll learn a lot—but you’ll also be expected to take ownership, move quickly, and directly influence the direction of our product.

The Role

As a Systems Engineer at Cerebrium, you’ll be responsible for the core infrastructure that powers 1000’s of CPU/GPUs workloads. You’ll work on low-level systems like custom file systems, container runtimes, and our deployment pipelines, while owning critical components across compute, storage, and networking.

This role demands deep technical expertise in areas like containerization, distributed systems, infrastructure as code (e.g. Terraform), observability, and multi-cloud environments (AWS, GCP, etc). You should be reliability-obsessed, performance-driven, enjoy solving hard technical problems and can take full ownership of a task—from initial discovery and technical validation, through implementation and release.

Responsibilities:

Design, build, and maintain scalable, high-performance systems that deliver seamless functionality to our users.
Write clean, well-structured, and maintainable code with a strong emphasis on performance, reliability, and long-term scalability.
Perform code reviews, unit testing, and system testing to ensure high, durable code quality.
Strong cloud experience
Be versatile and open-minded - learn new technologies, experiment with different frameworks, and adapt as the platform evolves.

Bonus:

Familiarity with CUDA and GPUs
Familarity with Rust

Stack:

Go
Terraform
Kubernetes
Streaming: Kafka or equivalent
Observability: Prometheus/Open Telemetry

How we work:

We focus on output. We don’t care what hours you work or from where you work. Just do what it takes to meet your weekly sprint. Finished early or just not having a good day - take the day.
We have a flat structure and want to constantly to be challenged by you. In terms of product and company decisions - tell us what you think!
We ship multiple times a week - every time we can add value to the customer we ship it. Also you do weekly demos to team members on what you have been building.
This position is in-office in our Manhattan (NYC) location 3 days a week.

Benefits

Competitive salary and meaningful equity
Health, dental, and vision benefits with 80% coverage for you
Unlimited PTO
2 company off-sites a year. We have previously done Rio, Budapest, Tulum and Athens.
Learning budget, and much more

We use the latest in technologies and try experiment where we can in order to add value to our users:

Go Python Javascript Typescript

Docker/Kubernetes Keda Prometheus Open Telementary