Senior Site Reliability Engineers are responsible for leading infrastructure solutions in a cloud based environment. They are expected to identify, define, and develop new technologies and processes in a multi cloud environment and across the various solutions with the goal of improving quality, reliability and efficiency services. They work across product lines to reach standardization of tools and methodologies. Senior Site Reliability Engineers work to solve complex problems with autonomy and authority, leading a team of Engineers to create a reliable and stable platform.
- Provides technical support to the Site Reliability Manager and/or Architect- leveraging technical understanding of product functionality in defining customer requirements and designing optimal solutions
- Collaborates with team members across R&D to continuously improve technology, methodology, and relationships
- Designs systems to enable rapid development, high availability, and observability
- Forecasts level of team effort in terms of quality and timelines by accurately estimating and defining tasks
- Manages dependencies between development teams and the SRE team
- Participates in on-call rotations which include 24×7 support of multiple environments
- Maintains and improves the reliability and operability of all infrastructure and infrastructure management services
- Writes tools and leverages open source technology to automate tasks with an emphasis on safety and repeatability
- Leads team members to explore new approaches that will provide optimal solutions
- Collaborates with architects and product managers to design complete software products that can be leveraged to meet a broad range of customer needs and requirements.
What You’ll Need
- Undergraduate Degree or equivalent combination of education and experience in a related field.
- Excellent verbal, written, and interpersonal communication skills
- Self-motivated with strong propensity for action, results and continuous improvement
- The ability to work successfully in a high-energy, fast paced, rapidly changing environment is necessary
- Exceptional organizational skills with the ability to multi-task and manage multiple processes, programs, and procedures simultaneously while working under pressure to meet deadlines
- 2 years of experience in site reliability, software engineering, or other relevant experience
- Experience with Go and Python or other programming language
- Experience with Docker and Operating systems(Linux,Windows, etc)
- Familiarity with Kubernetes, Cloudformation and Terraform
- Experience with Amazon Web Services, Google App Engine or Google Compute Engine
- Familiarity writing code that works across platforms and browsers preferred
- Experience with systems performance tuning and load testing preferred
- Experience with defining SLIs for internal services preferred
- Experience with Gremlin or other Chaos testing solution preferred
- Minimal travel
Working Conditions & Physical Requirements
- Reliable internet access for any period of time working remotely, not in a Workiva office.
- Available at all times during on call rotations
How You’ll Be Rewarded:
- Base Pay Range in Colorado: $121,000 – $163,000
- A discretionary bonus typically paid annually
- Restricted Stock Units granted at time of hire
The base pay range represents the low and high end of the hiring range for this job. Actual pay will vary and may be above or below the range based on various factors including but not limited to relevant skills, experience, and capabilities.
Apply through the link below: https://workiva.wd1.myworkdayjobs.com/en-US/careers/job/Bozeman/Senior-Site-Reliability-Engineer_R1543-1