Senior Site Reliability Engineer / Kubernetes (Remote)
Location: Fully remote EU timezone (CET ±2h)
Start date: ASAP
Languages: Fluent English is mandatory
Industry: Cloud Computing
We are hiring at Pragmatike to expand our team and drive the growth of our internal projects.
Our focus is on developing cutting-edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your ideas and skills directly contribute to shaping tomorrows technologies.
If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from you!
Responsibilities
- Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
- Deploy, manage, and scale Kubernetes clusters across bare-metal, virtualized, and on-prem environments.
- Oversee full cluster lifecycle: upgrades, node pools, networking, storage, and security hardening.
- Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows.
- Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and multi-site connectivity.
- Build automated deployment workflows (PXE boot, Preseed, cloud-init).
- Deploy and maintain observability stacks (Prometheus/Grafana, Loki, ELK, Graylog).
- Lead incident response and escalation activities across the platform.
- Improve system availability and reduce latency at all levels.
- Define and implement SLOs/SLIs at multiple infrastructure levels (physical network/hardware, platform virtualization, software services).
- Optimize alerting and monitoring pipelines to provide actionable insights.
- Establish and maintain on-call schedules to ensure coverage across timezones.
- Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks.
- Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC-Ops).
- Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware).
- Help develop and maintain overall architecture across all products.
- Plan resources for future initiatives, accounting for demand and growth projections.
- Work with development teams to improve overall quality and optimize resource utilization.
- Collaborate with cross-functional stakeholders (Hivenet, Policloud, Customer Success teams).