CloudLinux

Senior IaaS / Kubernetes Platform Engineer

9.0/10

CloudLinux

$115,000 – $195,500 USD
Remote
senior
about 1 month ago
May be outdated
devtechKubernetesIaaSCephTerraformAnsibleGitOpsNetworking

AI Summary

The vacancy is well-structured and informative, providing clarity on tasks, compensation, and requirements.

Check Match β€” Just drop your CV

See your fit for Senior IaaS / Kubernetes Platform Engineer in seconds.

Description

What You Will Do

  • β€’Kubernetes Platform Engineering (Primary Focus β€” 40%)
  • β€’Design, build, and operate a multi-tenant Kubernetes platform using Cluster API (CAPI) with bare-metal providers (Metal3/Sidero).
  • β€’Implement hard multi-tenancy using vCluster (Loft Labs) or similar technology, providing isolated Kubernetes API servers per tenant.
  • β€’Deploy and manage KubeVirt for VM orchestration within Kubernetes, including CPU pinning, NUMA awareness, and HugePages configuration.
  • β€’Implement GitOps-driven infrastructure using ArgoCD or Flux as the single source of truth for all cluster configurations.
  • β€’Deploy and manage Policy-as-Code using Kyverno or OPA Gatekeeper for admission control, resource quotas, and security policies.
  • β€’Build self-service capabilities using Crossplane or similar Kubernetes-native infrastructure provisioning tools.
  • β€’Storage Engineering (20%)
  • β€’Operate and optimize Ceph distributed storage clusters (currently 1 PiB raw, 149 OSDs, Quincy 17.2.5).
  • β€’Manage Rook-Ceph operator deployments at scale on modern Kubernetes (v1.28+).
  • β€’Implement storage tiering: Ceph for bulk storage, local NVMe for high-IOPS workloads, LINSTOR/DRBD or TopoLVM for ultra-fast replicated storage.
  • β€’Design and implement per-VM / per-tenant I/O isolation on shared Ceph clusters.
  • β€’Manage CDI (Containerized Data Importer) for VM image lifecycle in KubeVirt environments.
  • β€’Networking (15%)
  • β€’Deploy and manage overlay networks for pod networking, micro-segmentation, and WireGuard/IPsec encryption.
  • β€’Implement Cluster Mesh for multi-datacenter pod-to-pod connectivity.
  • β€’Configure Multus CNI and SR-IOV for multi-NIC VM support in KubeVirt.
  • β€’Work with physical network infrastructure: Juniper switches (JunOS), BGP (eBGP/iBGP), EVPN/VXLAN, VLANs.
  • β€’Maintain IPSec site-to-site connectivity between datacenters.
  • β€’Reliability and Operations (15%)
  • β€’Practice SRE discipline: define and maintain SLOs with error budgets, implement proactive capacity management with 6-12 month forecasting.
  • β€’Design and execute chaos engineering experiments to validate system resilience.
  • β€’Participate in on-call rotation for IaaS infrastructure (OpenNebula, Ceph, networking).
  • β€’Write and maintain runbooks, DRP documentation, and postmortem analyses.
  • β€’Drive proactive improvement: identify reliability risks, performance bottlenecks, and toil β€” then propose and implement solutions without waiting for incidents.
  • β€’Infrastructure as Code and Automation (10%)
  • β€’Develop and maintain Terraform/OpenTofu modules for multi-cloud infrastructure provisioning.
  • β€’Write Ansible playbooks for bare-metal server configuration and fleet management.
  • β€’Automate infrastructure lifecycle: PXE

What We Offer

  • β€’Competitive salary ranging from $115,000 to $195,500 USD.
  • β€’Fully remote work environment.
  • β€’Supportive team culture focused on collaboration and success.
  • β€’Opportunities for professional growth and development.

Requirements

Requirements

  • β€’Proven experience in Kubernetes platform engineering and IaaS.
  • β€’Strong understanding of cloud infrastructure, networking, and storage solutions.
  • β€’Experience with GitOps practices and tools (ArgoCD, Flux).
  • β€’Familiarity with Ceph and distributed storage management.
  • β€’Proficiency in Terraform and Ansible for automation and infrastructure as code.
  • β€’Ability to work independently and collaboratively in a remote team environment.
Loading similar jobs...