Site Reliability Engineer
$140k - $210k • 0.10% - 0.40%
We started as app developers who just wanted a developer-friendly API for push notifications. Finding no good solution, we built one ourselves.
Today, we are the leading solution for push notifications, in-app messaging, and email. We support over 900,000 developers. OneSignal is available on every platform and development environment, letting content creators focus on quality user engagement instead of complex implementation.
Covid-19's Impact on OneSignal
Covid-19 has accelerated OneSignal's growth. We've seen a 20%+ increase in new accounts created for OneSignal each day, and a 20%+ increase in daily message delivery volume.
We are growing faster than ever, and hiring in all departments. We hope you'll apply and we look forward to meeting you!
Skills: Go, Redis, Rust, Ruby on Rails
OneSignal has grown rapidly to where we are today serving billions of HTTP requests daily and sending upwards of 5 billion messages daily. We achieved this scale by leveraging bare metal cloud and writing scale sensitive components in languages like Rust and Go. This potent combination of high performance, low cost hardware with efficient resource utilization has given us an incredible competitive edge.
We are hiring SREs to help us continue to scale by operating and engineering the future of our infrastructure. We are maintaining 99.95% uptime today, and we are investing to ensure we maintain that as then business continues to grow and as the product evolves.
What you'll do:
Your primary task will be software engineering with a focus on infrastructure, operations, and automation. You'll be building systems to run our product, improving internal services, and advising product teams on architecture as it relates to the operability of the service.
The systems you'll be responsible include all of the services which power our product. This ranges from off-the-shelf services like haproxy, nginx, Redis, PostgreSQL, Kafka, and etc. to our in-house services such as the Rails web app, various Rust backend services, and our high performance API layer written in Go.
You'll be working with Kubernetes to automate our datacenter operations and writing operational services to automate database operations. One of the key challenges in this role is to not only understand systems to the point of being able to manually operate by hand, but also to understand in sufficient detail to write software systems to automate such operations.
For some additional context on how we think about SRE, please see the introductory chapter of the Google SRE book: https://landing.google.com/sre/sre-book/chapters/introduction/
Skills and experience:
- At least 3 years experience working as a software engineer
- Experience operating reliable production systems at scale
- Knowledge of Linux systems internals
- Experience writing networking applications
- Easily bored running tasks by hand and the ability to automate such tasks
Preferred skills and experience:
- Operational experience deploying and managing Kubernetes on bare metal
- Experience writing Kubernetes controllers and operators
- Recent experience writing Go and/or Rust
- Past experience as an SRE
- Experience working with Layers 1-3 of the OSI networking model
- Experience with PostgreSQL
- Experience with any of Redis, Kafka, etcd, ZooKeeper, nginx, haproxy