Site Reliability Engineer (SRE)
Hiring: W2 Candidates Only
Visa: Open to any visa type with valid work authorization in the USA
Summary
A Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of software systems and infrastructure. This role bridges the gap between development and operations by applying software engineering principles to IT operations, automating processes, and monitoring system health to prevent downtime and improve system efficiency.
Key Responsibilities
- Design, implement, and maintain reliable, scalable, and highly available infrastructure and services.
- Monitor system performance, availability, and capacity; respond proactively to incidents and outages.
- Develop and maintain automation tools for deployment, monitoring, and infrastructure management.
- Collaborate with software engineers to design systems with reliability and maintainability in mind.
- Troubleshoot, debug, and resolve complex production issues across multiple systems and services.
- Implement and maintain CI/CD pipelines, configuration management, and version control best practices.
- Conduct post-incident reviews, identify root causes, and implement corrective actions to prevent recurrence.
- Define and enforce service-level objectives (SLOs), service-level indicators (SLIs), and service-level agreements (SLAs).
- Optimize system performance, cost, and resource utilization through analysis and continuous improvement.
- Document infrastructure, operational procedures, incident reports, and monitoring configurations.
- Mentor junior engineers and promote best practices for reliability, automation, and observability.
- Stay current with emerging technologies and DevOps practices to improve operational excellence.
Qualifications
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- 3-6 years of experience in site reliability engineering, DevOps, or system administration.
- Strong understanding of Linux/Unix systems, networking, and cloud platforms (AWS, Azure, GCP).
- Proficiency in scripting and programming languages such as Python, Bash, Go, or Java.
- Experience with monitoring, logging, and observability tools (Prometheus, Grafana, ELK Stack).
- Familiarity with containerization and orchestration tools (Docker, Kubernetes).
Preferred Skills / Duties
- Experience with Infrastructure as Code (Terraform, Ansible, CloudFormation).
- Knowledge of CI/CD tools and pipelines (Jenkins, GitLab, CircleCI).
- Understanding of distributed systems, microservices architecture, and high-availability systems.
- Strong problem-solving, analytical, and communication skills.
- Ability to implement security best practices in operational environments.
- Experience in automating repetitive operational tasks and improving system reliability
Recommended Jobs
Sales Support Intern
Location: Atlanta, GA, United States Job ID: 85404 We Elevate... Quality of urban life Our elevators, escalators, and moving walks safely transport more than two billion of us up and down bu…
Accountant
U.S. Citizens and Permanent Residents Only Requirements ~Perform operational analysis and financial forecasting ~Employ problem-solving skills and analysis, and report problems to …
Sales Specialist
Summary The Sales Specialist plays a critical role in nurturing leads and managing the sales pipeline for specific territories, primarily through outbound telephone calls. This position involves bu…
Beverage Director- 5Church Midtown
Beverage Director: K5 Hospitality, an award-winning restaurant group in the heart of Atlanta, is seeking a passionate and experienced Beverage Director to lead the beverage programs across our thr…
Material Handler / Kit Builder
Job Summary: We are seeking a detail-oriented and dependable Warehouse Associate / Kit Builder to join our team. This position is designed with cross-functional development in mind. While the in…
Personal Trainer (Austell)
Lead inspiring, results-driven workouts for members at every level, and grow your client base in a collaborative environment. This is a fantastic opportunity for a driven trainer looking to gain ha…
RN - Emergency Department, Nights
Overview: SIGN ON BONUS OF UP TO $10,000 AVAILABLE Piedmont Macon is a 310-bed, acute-care community hospital that has served the Macon community for over 50 years. The medical center offers 24…
Service Assistant - Dishwasher
Service Assistant - Busser Job Purpose Assists the restaurant team by cleaning the kitchen, all kitchen equipment and utensils, restaurant dining areas and outside the restaurant. PRIMARY ACTI…
Host
Like no place else. Fun. Fresh. Flavorful. Fun work environment. A Host presents a positive first impression of the establishment's friendliness, excellent service and high standards. The Host gr…
Maintenance Worker
The employee operates a variety of equipment, machinery, and tools to perform manual work for the City Utility Department, supporting functions such as sewer construction and maintenance, road repair…