Kubernetes overcommitment best practices

Overview

Greetings knowledge seekers and kubernetes explorers. In the following post we will take a look at some of kubernetes overcommitment best practices and how to manage pods resources and how to manage pods request and limits.

When working with Kubernetes and handling pods resources, this topic doesn’t get much attention than it needs or gets overlooked(guilty as charged). But understanding this and correctly managing these are crucial if you want to retain healhty and effecient cluster and avoiding a plethora of problems in the long run.

Understanding overcommitment(What it is?)

Overcommitment in Kubernetes occurs when the total resource requests specified for all running pods exceed the actual physical capacity of the worker nodes. This can happen when limits are set too high or when requests don’t accurately reflect the real resource consumption of the pods, or leave the pods requests and limits undefined or maybe utilizing the vertical pod scaler, etc.

While Kubernetes allows for some overcommitment to optimize resource utilization, excessive overcommitment can lead to significant issues. Correctly managing resource requests and limits in your Kubernetes clusters ensures that your infrastructure is resilient, efficient, and capable of supporting scaling workloads without unwanted surprises.

Why is it important?

When setting up your Kubernetes pods, defining resource requests ensures that the scheduler knows how much CPU and memory a pod needs to run effectively. Resource limits act as boundaries to prevent any pod from consuming more than it needs. These specifications aren’t just best practices, they’re essential to prevent overcommitment of resources on your worker nodes.

What can go wrong without proper configuration?

Overcommitting resources can lead to a number of issues to the worker nodes:

Node Instability: Pods consuming unbounded resources can overwhelm a node, causing system-level disruptions.
Evictions and OOM Kills: When a node runs out of memory, the Kubernetes scheduler may start evicting pods or killing processes, disrupting workloads.
Throttling: Excessive CPU usage can result in throttled performance, slowing down application response times and impacting user experience.
Scheduling Challenges: Overcommitment affects the scheduler’s ability to place new pods. If existing pods’ requests make nodes appear fully utilized, the scheduler may struggle to find space for new pods, leading to prolonged Pending states.

Best Practices for Resource Management

Set Realistic Requests and Limits: This is something that you need to figure out or at least know the requirements of the pods, applications and workload that you’re running. You can also utilize monitoring tools like Prometheus and Grafana to gain more insights about resource utilization.
Use Resource Quotas: For namespaces with multiple teams or applications, quotas can prevent any single workload from monopolizing resources.
Horizontal Pod Autoscaling (HPA): Integrate HPA to dynamically scale pods based on metrics, ensuring efficient use of resources.
LimitRange Policies: Apply LimitRange resources to enforce limits and requests across a namespace.

Correctly managing resource requests and limits in your Kubernetes clusters ensures that your infrastructure is resilient, efficient, and capable of supporting scaling workloads without unwanted surprises.

Examples

Let’s see some of the examples we mentioned and how to apply them on Kubernetes.

Manifest for pod limits and requests:

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
  - name: example-container
    image: nginx:latest
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "1"

This is the everyday pod manifest with pod specifications. Here we defined the request values, how much of resources to reserve in order for the pod to initialize succesfully and limits, how much of resources maximum can it get.

Example HPA manifest for autoscaling a deployment:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-hpa
  namespace: my-namespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75

HPA manifest breakdown - this example instructs the kubernetes how to handle horizontal scaling of the pods. There’s the minimum number of 2 replicas to be present all the time and to scale up to maximum of 10 replicas. Method for scaling the number of replicas is the resource trigger where it’s specified to watch out for CPU usage and when the utilization reaches 75% percent of above then add another replica.

LimitRange example for a namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: example-limitrange
  namespace: my-namespace
spec:
  limits:
  - default:
      cpu: "750m"
      memory: "512Mi"
    defaultRequest:
      cpu: "250m"
      memory: "256Mi"
    max:
      cpu: "1"
    min:
      cpu: "100m"
    type: Container

LimitRange component in Kubernetes works in a way that the request and limits will be applied for each and every pod that are in the specified namespace - and on it we can define the minimum, default and maximum resources for each pod. This is very useful if you have or plan to have multiple apps in the same namespace.

ResourceQuota example for a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: example-quota
  namespace: my-namespace
spec:
  hard:
    requests.cpu: "4"
    requests.memory: "8Gi"
    limits.cpu: "10"
    limits.memory: "16Gi"
    pods: "20"

With ResourceQuota one must be careful because this mechanism is powerful and also strict with assigned/defined resources. The use case for it is - to prevent any application from monopolizing resources within a shared namespace, ensuring fair usage among multiple applications. So this mechanism also is applied on the namespace level.

Summary

Let’s briefly recap and summarize - we reviewed some of the kubernetes overcommitment best practices and how to manage pods requests and limits and how to manage pods resources. It may not seem much when you understand it and have more insight on the topic but it can produce some serious issues down the line.

Even though we have some mechanics to tackle these potential problems, unfortunately this is not one of those “set it and forget it” things and this is one of those “never ending battles” in the IT world(like monitoring and observability for an example), from my experience that is. This calls for endless fine-tuning since not all pods and applications behave/run the same, especially not on all worker node types(talking about node resources, CPU, Memory etc).

So, the person who works or wants to work with Kubernetes needs to be on a lookout.

I hope you found the article helpful and i thank you for your time.

Overview#

Understanding overcommitment(What it is?)#

Why is it important?#

What can go wrong without proper configuration?#

Best Practices for Resource Management#

Examples#

Summary#

Join the newsletter!

Thank you!