Senior Site Reliability Engineer
8.0/10
Block
$170,100 β $170,100 USD
Remote
senior
about 1 month ago
May be outdated
techKotlinModern Java (11+)HTTPJSONgRPCProtocol BuffersMySQLVitessDynamoDBEvent driven architecturesDataDog
AI Summary
The vacancy is well-structured with clear responsibilities, compensation, and requirements, though company details could be more comprehensive.
Check Match β Just drop your CV
See your fit for Senior Site Reliability Engineer in seconds.
Description
What you'll do
- β’Build and extend platforms to improve system reliability
- β’Work on team goals that encompass reliability for the entire company
- β’Standardize reliability tools across multiple platforms and organizations
- β’Triage, coordinate, and lead stabilization of sev 0β1 incidents
- β’Serve as primary oncall, maintaining structured escalation paths and exercising leadership escalation
- β’Drive platform-wide reliability improvements, shared operational tooling, and deploy-safety patterns
- β’Use AI-driven systems to improve signal detection, reduce noise, and accelerate root cause analysis
- β’Design and implement safe deployment patterns (progressive delivery, automated rollback, guardrails)
Conditions
- β’This program shifts Block from reactive incident handling to repeatable, system-wide reliability gains β fewer customer-visible incidents, faster response, higher product velocity, and lower burnout across the organization.
- β’Block takes a market-based approach to pay, and pay may vary depending on your location. U.S. locations are categorized into one of four zones based on a cost of labor index for that geographic area. The successful candidateβs starting pay will be determined based on job-related skills, experience, qualifications, work location, and market conditions. These ranges may be modified in the future.
- β’Zone A: USD $189,000 - USD $283,600
- β’Zone B: USD $179,600 - USD $269,400
- β’Zone C: USD $170,100 - USD $255,100
- β’Zone D: USD $160,700 - USD $241,100
Requirements
- β’Drive to root cause systems with many moving parts and take the necessary steps to fix them
- β’Demonstrated technical initiative and leadership on previous projects, especially those with a backend/platform focus
- β’Familiarity with AI-driven tooling for observability, incident analysis, or automation
- β’A mindset that naturally reaches for AI to accelerate problem-solving and reduce toil
- β’Experience running production oncall for high-availability systems
- β’Strong incident management skills β structured triage, mitigation under pressure, blameless postmortems
- β’Fluency with CI/CD pipelines, progressive rollout strategies, and rollback automation
- β’Monitoring & observability expertise β building/tuning alerts for uptime, error rates, latency regression, and resource exhaustion
- β’Ability to create and maintain evidence-based maturity assessments using trailing 90-day data windows
- β’Comfort with vendor/dependency management β maintaining validated escalation contacts reachable within β€ 5 minutes
- β’Boundless curiosity, autonomy, and a strong sense of accountability
- β’A strong desire to perform and grow as an engineer
- β’5+ years of software development experience
Loading similar jobs...