CASE STUDIES

How did we minimize the risk of outages during the k8s upgrade in a system that contained over 100 microservices?

Implementation

GCP & AWS

Technology

GKE / EKS

Tooling

Golang, Terraform, eks-clt

Team

4 Engineers

Scale

6000 production containers

BENEFITS

Maintenance cost reduction

Summary:

To minimize the risk of outages during k8s upgrades or maintenance, it's best to have multiple Kubernetes clusters in production. Even with a sizable number of nodes, relying on just one cluster can be risky. To ensure high availability, we've adopted the paradigm of immutable infrastructure and established a fleet of independent Kubernetes clusters.

Challenges:

  • The production system contained over 100 microservices
  • During peak hours there are over 6000 containers in the cluster
  • Standard auto-discovery is too slow to catch up with 3000 changes

Solution:

  • Streamline cluster creation with internal cli (cli eks create [role])
  • Automated deployment with GitOps model to get clusters up and running
  • Autoscaling capabilities to ensure clusters can independently handle traffic
  • Redesign service-discovery across the infrastructure, to avoid 502 errors while sunseting clusters

Watch more:

2023 - Let’s Go DevOps - All rights reserved

Design by Creativetree