Lead Site Reliability Engineer

Intellum, Inc.
Atlanta, GA

Job Description

Job Description

About us

Intellum is the leader in corporate education technology and powers the largest, most successful customer, partner, and employee learning programs in the world. Large brands and fast-moving companies like Google, Meta, Amazon, Walmart, Xero, Atlassian, Mailchimp, Airbnb, Stripe, and TikTok rely on Intellum to engage and educate the audiences they touch.

We have always been a "remote first" company and are proud to have team members located all over the world. We value Curiosity, Creativity, Perseverance, and Kindness and strive to demonstrate these core values every day. Our culture is very important to us. We invest in our people in fun and exciting ways, including personal development budgets and an annual all-company retreat that is focused less on work and more on human connections. We are in growth mode, and our "smart growth" approach ensures that we will continue to scale our company effectively.

Summary

We are seeking a Lead Site Reliability Engineer to spearhead our SRE team. You are not just an operator; you are an experienced software engineer who excels at architecture, code optimization, and deep troubleshooting. In this role, you will drive operational maturity by defining our reliability standards (SLOs), hardening our security posture (WAF/InfraSec), and scaling the Intellum platform.

Our stack

  • Core : Applications written in Ruby on Rails and Node.js, PostgreSql, MongoDB,, Redis, Memcached, Sidekiq, ActiveJob, Elasticsearch, Websockets
  • Infrastructure : 100% Linux-based cloud infrastructure (AWS, Google Cloud, MongoDB Atlas) and services (ECS/EC2/Kubernetes, Elasticache, MemoryStore, RDS, CloudSQL, BigQuery etc.)
  • Infrastructure as Code (IaC) : GitHub, Terragrunt, Terraform, Ansible
  • CI/CD: Spinnaker, Jenkins
  • Observability & Alerting : New Relic, AWS CloudWatch, Google Cloud Stackdriver, Squadcast
  • Agile/Scrum practices utilizing JIRA

Responsibilities

  • SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives.
  • Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience.
  • Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department.
  • Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline.
  • Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence.
  • Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it".

Required Skills

Experience & Engineering

  • 10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications.
  • Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible).
  • Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases.

SRE & Operations

  • Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals".
  • SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability.
  • Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security.
  • Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection).

Leadership

  • Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills.
  • Documentation & Training: Skilled in documenting solutions and training operational teams on how to effectively support and maintain systems.
  • Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion.

Bonus Skills

  • Automation Tools: Experience in developing solutions using server automation tools such as Terraform, Ansible.
  • CI/CD Expertise: Experience in writing and maintaining CI/CD pipelines and services.
  • Kubernetes: Experience in building, deploying, and optimizing Kubernetes-based infrastructure
  • Perimeter Defense: Experience configuring and managing Web Application Firewalls (WAF) (e.g., Cloudflare, AWS WAF, Akamai) and DDOS protection mechanisms.

Education

  • Bachelor's degree in Computer Science or related technical field

BENEFITS

  • Medical - 100% of employee premiums for selected individual plans
  • Dental - 100% of employee premiums covered
  • Vision - 100% of employee premiums covered
  • LinkedIn Learning
  • 401(k) plus matching (US Based Only)
  • Unlimited PTO
  • Calm subscription
  • Annual Company Retreat

Intellum is an equal-opportunity employer. We're committed to building an inclusive team that celebrates diversity in people, perspectives, and backgrounds regardless of race, color, national origin, gender, sexual orientation, age, religion, disability, citizenship, veteran status, or any other protected status. We encourage you to apply for an open position and if you have questions about whether or not your job experience and skill set meet the requirements for a specific role, reach out to us directly at [email protected].


If you are an individual applying from CA, NY, CO, CT, MD, NV, or RI, please reach out to [email protected] to inquire about specific pay ranges.

Posted 2026-03-21

Recommended Jobs

Outside Sales Representative

Culleoka Company
Lagrange, GA

* IMMEDIATE OPENING IN AUBURN, AL AREA * In-Person | Job Fair – March 10th (AUBURN, AL - LOCATION & TIME TBD) We’re experiencing exciting growth in Auburn and the surrounding area and are actively…

View Details
Posted 2026-02-21

Fire Service Technician Helper

Cintas Corporation
Worth County, GA

Requisition Number: 220628  Job Description Cintas is seeking a Fire Service Technician Helper for our Fire Protection business. This is a great opportunity for a candidate who would like to l…

View Details
Posted 2026-02-20

Business Development Associate

Capital Investment Advisors
Atlanta, GA

At Capital Investment Advisors (CIA), we strive to help the families we serve reach their goals by focusing on our specialty: Income Investing. We are a fee-only financial advisory and portfolio mana…

View Details
Posted 2026-02-13

In Home Nursing LPN - Feeding Tube Teenager (Days) (Ringgold)

Aveanna Healthcare
Ringgold, GA

Join a Company That Puts People First! Licensed Practical / Vocational Nurse – LPN/LVN Schedule : Sunday, Tuesday, Wednesday, Saturday 9:00am-9:00pm Location/Setting: Ringgold 30736 (1 dog)…

View Details
Posted 2026-03-19

Regional Operations Manager

Hueman PE Talent Solutions
Atlanta, GA

A Hueman partner is looking to find a Regional Operations Manager for the Southeast region. This is remote position but requires travel to client sites.    As Regional Operations Manager, yo…

View Details
Posted 2026-02-11

Bilingual Office Clerk

Cartersville, GA

Bilingual Office Clerk Do have manufacturing and clerical experience you want to put to use? APPLY TODAY Company Profile North American manufacturer that takes pride in sustaining hundreds o…

View Details
Posted 2026-03-17

Auto Mechanic

Meineke - 2977
Warner Robins, GA

Job Description Job Description Benefits: ~401(k) ~401(k) matching ~ Bonus based on performance ~ Competitive salary ~ Dental insurance ~ Health insurance Are you crazy-skilled? A…

View Details
Posted 2026-03-17

Expo/Food Runner

Blackjack Bar Tapas
Atlanta, GA

Hello! Located on the corner of 12th and Crescent, Blackjack is an sophisticated cocktail experience offering South Asian inspired tapas and décor. The expo/food runner role is a critical and foun…

View Details
Posted 2026-03-07

Audit Senior Associate

NorthPoint Search Group
Peachtree City, GA

Audit Senior Associate - Peachtree City, GA To Apply Now - email your resume to [email protected] Who: A detail-oriented accounting professional with at least three years…

View Details
Posted 2026-03-18

Developer software development analyst

HII
Atlanta, GA

Requisition Number: 26333 Required Travel: 0 - 10% Employment Type: Full Time/Salaried/Exempt Anticipated Salary Range: $78,696.00 - $115,000.00 Security Clearance: Ability to Obtain Leve…

View Details
Posted 2026-03-18