Staff Site Reliability Engineer

Staff Site Reliability Engineer

Staff Site Reliability Engineers are responsible for ensuring Workiva’s cloud based systems support the “customer first” philosophy day in and day out. They work to develop ways to build, deploy, and support our solutions across the Google Cloud Platform and Amazon AWS infrastructure. Staff Site Reliability Engineers are the technical leaders and innovators of one or more discrete components of Workiva infrastructure, with broad and complete understanding of all aspects of those components. They have a proven track record of delivering high quality solutions and generally display a greater level of experience, depth, and responsibility over a Senior Site Reliability Engineer.

What You’ll Do
Leadership & Influence

  • Leads team members to explore new approaches that will provide optimal, innovative solutions to identified issues.
  • Collaborates with architects and product managers to design complete software products that can be leveraged to meet a broad range of customer needs and requirements.
  • Collaborates with team members across R&D to continuously improve technology, methodology, and relationships.
  • Serves as the Tech Lead on the team.

Communication and Collaboration

  • Collaborates and uses professional concepts to resolve critical issues, and assists in design decisions.
  • Clearly communicates concise technical visions and directions.
  • Estimates level of effort and breaks down tasks and subtasks.
  • Manages dependencies between teams in forecasting and planning.
  • Knows team’s capabilities and the scope and level of anticipated effort to produce intended results.
  • Works with development teams to provide assistance, guidance, and solutions to help achieve company goals.

Technical Skill

  • Streamlines the processes to move code from development teams to a highly scalable and highly available runtime environment.
  • Works with Cloud vendors and external technical support on upgrades, problem resolution, and design issues.
  • Monitors and tunes appropriate systems to ensure optimum levels of performance.
  • Writes tools and leverages open source to automate tasks with an emphasis on safety and repeatability.
  • Participates in on-call rotations which include 24×7 support of multiple complicated environments.


  • Designs complete innovative applications or solutions to meet customer needs and requirements.
  • Designs systems to enable rapid development, high availability, and clear observability.

What You’ll Need

  • Undergraduate Degree or equivalent combination of education and experience in a related field.


  • Excellent verbal, written, and interpersonal communication skills
  • Self-motivated with strong propensity for action, results and continuous improvement
  • The ability to work successfully in a high-energy, fast paced, rapidly changing environment is necessary
  • Exceptional organizational skills with the ability to multi-task and manage multiple processes, programs, and procedures simultaneously while working under pressure to meet deadlines


  • 5 years of experience in site reliability, software engineering, or other relevant experience
  • Experience with GitHub or other distributed VCS
  • Experience with Go, Python, and Docker
  • Experience with Amazon Web Services, Google App Engine or Google Compute Engine
  • Familiarity with Apache, Nginx, MySQL, PostgreSQL, Tomcat, RabbitMQ is a plus
  • Knowledge of git, chef, docker, pypi, npm is a plus
  • Experience writing code that works across platforms and browsers is a plus
  • Experience with the latest HTML5 technologies (JavaScript/Dart/React) is a plus
  • Experience running Apache Kafka and Apache Cassandra is a plus
  • Experience with systems performance tuning and load testing is a plus

Read More Here


Posted on

May 5, 2022