Site Reliability Engineer
Responsibilities - Be available to respond to critical service incidents outside of business hours on a rotating on-call schedule.
- Proactively monitor application health and performance across cloud infrastructure (AWS).
- Troubleshoot and prevent service interruptions in real-time, working closely with development teams to resolve incidents efficiently.
- Lead and participate in disaster recovery drills and security incident simulations.
- Implement Infrastructure as Code (IaC) and maintain scalable deployments using AWS-native tools and services.
- Collaborate with development teams to ensure smooth CI/CD workflows using Git and containerized deployments (Docker).
- Work closely with stakeholders and product teams to ensure technical reliability aligns with business needs.
- Support and improve observability tools, alerting mechanisms, and logging infrastructure to promote transparency and response agility.
- Champion best practices in security, availability, performance, and incident response.
Required Technologies & Tools
- Cloud Infrastructure : Strong proficiency in Amazon Web Services (AWS) with knowledge of services like EC2, ECS, RDS, CloudWatch, and IAM.
- Programming/Scripting : Proficiency in Node.js and scripting for automation and tooling.
- Containerization : Experience with Docker for container-based deployment pipelines.
- Frontend Awareness : Familiarity with React and Ember.js to understand performance implications at the frontend level.
- Backend Stack : Understanding of NestJS and scalable Node-based services.
- Databases : Proficient in MySQL and performance monitoring of relational databases.
- Version Control : Proficiency with Git for collaborative code management and DevOps workflow integration.
Core Competencies
- Incident Response : Calm and focused under pressure with a structured approach to resolving outages and degradation.
- System Design : Ability to contribute to and review architectural designs for scalability and resiliency.
- Collaboration : Strong communication skills to coordinate across developers, QA, and product teams.
- Automation & Efficiency : Passion for automation, repeatability, and continuous improvement.
- Security Mindset : Consistent implementation of security best practices and a strong grasp of data protection standards.
Qualifications
- 3+ years of experience in a Site Reliability, DevOps, or related engineering role.
- Proven track record managing and scaling applications in a production AWS environment.
- Familiarity with full stack environments , particularly those using Node.jss .
- Experience maintaining and deploying databases such as MySQL with performance tuning.
- Experience with container orchestration (e.g., ECS or Kubernetes is a plus).
- Commitment to uptime, performance, and security in fast-moving SaaS environments.
Recommended Jobs
Lead of Hardware Engineering
Position Overview: Argon is seeking a strategic and technically adept Lead of Hardware Engineering to drive the development of ruggedized electronics, displays, and computing systems for mission-…
Paraprofessional Substitute SY 2025/2026 Griffin...
The ideal candidate will be able to work efficiently and effectively in a fast-paced environment, adhering to the guidelines/procedures set by the school and Spur. The primary responsibilities of thi…
Route Sales Rep
Job Description Job Description Job Description Form WADE LINEN SERVICE Job Title: Route Sales Representative Department: Service Department FLSA: Exempt Job Reports To…
Vice President, Atlanta Division
Job Description Job Description Salary: Our history of safety and quality has been established through 50 years of successful projects and satisfied customers. Ace Electric is growing bigger i…
Join Atlanta's Vibrant Healthcare Scene: ER Nurse Adventure!
Registered Nurse - Emergency Room - Travel - (ER RN) Embark on an exciting travel nursing adventure as an ER Nurse at Emory University Hospital Midtown in Atlanta, a city rich in history and vibrant …
Alpharetta, Georgia
Calling all innovators - find your future at Fiserv. We're Fiserv, a global leader in Fintech and payments, and we move money and information in a way that moves the world. We connect financial …
Physical Therapist Technician / Aide
Overview: Confluent Health System Solutions is a member of the Confluent Health family of physical and occupational therapy companies that is transforming healthcare by developing and educating tod…
Landscape Maintenance Manager
Job Description Job Description Job Requirements ~ Minimum of five years of experience in management/scheduling experience of 20+ employees. ~ Minimum of five years in horticulture. Degree …
Warehouse Recycle Handler Night Shift
We are looking for energetic applicants to join our team in warehouse operations of recycling cardboard. Must be able to pass a background check and pass a drug screening. Must have dependable transp…