Site Reliability Engineer
Advantage Tech is hiring! We are looking for a qualified individual to join our client.
Our client is seeking an engineer with strong experience in Infrastructure as Code, automation, CI/CD, Containers, AWS, and DevOps best practices to join their Site Reliability Engineering team!
The ideal candidate has a very strong sense of ownership and passion for learning. This position will report directly to the Director of Developer Operations, who will assist in grooming and assigning out work but will rely on the Engineer to provide documentation and regular progress updates to the TechOps team and key stakeholders.
- Improve the observability of legacy and newly designed systems.
- Leverage infrastructure-as-code and AWS services to evolve and retire our legacy infrastructure.
- Provide support for emergent problems, identify the root cause, and drive improvements through automation and self-healing services.
- Support network infrastructure including WAN, LAN, and wireless technologies.
- Running team meetings, grooming tickets, and managing workloads of the team, with the assistance of the Director.
- Provide clear and professional communication of ideas or feedback to Leadership and other internal team members.
- Handle Tier III escalations from the Sys Admins as needed.
- Remove toil from our development teams by training their teams and creating self-service tools alongside our Architect.
- Actively participate in team activities and discussions such as suggesting architecture improvements, best practices, new processes, etc.
- Perform on-call duty as required (About once every 6 weeks for a week at a time)
- Write Infrastructure-as-Code in such a way that it can be leveraged across multiple application stacks.
- Perform additional tasks as assigned.
- Education Requirements: Bachelor’s Degree in Computer Science or related field, or 5+ years relevant work experience
- Experience with CI/CD and Infrastructure-as-Code tools, such as Github Actions, Terraform, Jenkins, etc.
- Familiarity with AWS managed service offerings and have successfully designed solutions using them. Most importantly, familiarity with ECS Fargate, EC2, S3, RDS, Lambda, Cloudfront, and Cloudwatch X-Ray/Eventbus.
- Familiarity with NewRelic or other similar APM tools.
- Experience with software monitoring and log aggregation tools.
- Experience training and mentoring junior members.
- Strong sense of ownership and troubleshooting skills.
- Strong working knowledge of Linux operating systems
- Strong working knowledge of Docker or Kubernetes.
- Familiarity with microservice and event driven architectures
- 5+ years of experience in a software engineering discipline.
- 2 + years of experience writing Python code
- 2+ years of experience as a Site Reliability Engineer
- Some experience running daily briefings or other team meetings.
- The ideal candidate is an autonomous self-starter that has a passion for learning paired with a strong sense of responsibility and ownership.
- Experience developing “cloud-native” applications.
- Experience “containerizing” legacy applications.
- Strong documentation skills.
- Strong troubleshooting skills and an ability to come up with creative “outside the box” solutions in a cost-effective manner.
- Demonstrable track record of dealing well with ambiguity, prioritizing needs, and delivering measurable results in an agile environment.
- Familiarity with the Agile Framework and working in both sprints and kanban methodology.