Limit Break

Senior DevOps/Site Reliability Engineer

8.0/10

Limit Break

$160,000 – $218,000 USD
Remote
senior
about 1 month ago
May be outdated
cryptotechweb3KubernetesAWSTerraformAnsibleCI/CDGitHub ActionsJenkinsGitLab CIAurora

AI Summary

The vacancy is well-structured with clear responsibilities and compensation, but lacks some company details.

Check Match — Just drop your CV

See your fit for Senior DevOps/Site Reliability Engineer in seconds.

Description

What you'll do

  • Identify, propose and execute improvements to performance and scalability bottlenecks across our multi-cluster EKS environment on AWS.
  • Measure systems health, scalability and performance metrics and identify areas of improvement.
  • Deploy services and troubleshoot production issues day-to-day, using code to solve broad operational challenges within the Limit Break Infrastructure and Platform.
  • Work with the wider engineering team to identify how we can provide the most production-like environment for running both manual and automated testing.
  • Define SLOs, SLIs, monitoring, alerting and incident response practices — and continuously improve our observability stack (Grafana, Thanos, Loki) to be ready for worldwide scale.

Requirements

  • 5+ years experience in SRE, DevOps or Systems engineering.
  • Strong background in Kubernetes, including operating multiple EKS clusters in production.
  • Extensive experience in Terraform and Ansible.
  • CI/CD and automation experience with tools such as GitHub Actions, Jenkins, or GitLab CI.
  • Solid background in AWS, including experience with Aurora, RDS (MySQL/SQL), and networking.
  • Ability to participate in an on-call rotation.
  • Effective communication skills to clearly explain your reasoning and thought process.
  • Excellent collaboration skills to work closely with product engineers and product owners.
  • Implementation of in-house monitoring and observability infrastructure (e.g.

Grafana, Thanos, Loki, or equivalents).

  • Implementation of ElasticSearch stack or equivalent solutions for capturing logs from all environments.
  • Experience with CloudFlare, CDN technologies, and edge/perimeter networking.
  • Exposure to cloud security and perimeter tooling such as Wiz (or equivalent CSPM/vulnerability detection), AWS GuardDuty, CloudFlare Zero Trust, and secrets management platforms.
  • Experience addressing vulnerabilities — comfortable finding issues, digging deep to root cause, and driving remediation.
  • Implement various tools to monitor and protect the environment in real-time.
Loading similar jobs...