Skip to content

Operational Playbook

This playbook covers the full lifecycle of chant-produced Kubernetes manifests — from build through production debugging. The same content is available to AI agents via the /chant-k8s skill.

StepCommandWhat it catches
Lint sourcechant lint src/Hardcoded namespaces (WK8001)
Build manifestschant build src/ --output manifests.yamlPost-synth: secrets in env (WK8005), latest tags (WK8006), API keys (WK8041), missing probes (WK8301), no resource limits (WK8201), privileged containers (WK8202), and more
Server dry-runkubectl apply -f manifests.yaml --dry-run=serverK8s API validation: schema errors, admission webhooks

Run lint on every edit. Run build + dry-run before every apply.

Terminal window
# Build
chant build src/ --output manifests.yaml
# Diff before applying
kubectl diff -f manifests.yaml
# Dry run (validates with admission webhooks)
kubectl apply -f manifests.yaml --dry-run=server
# Apply
kubectl apply -f manifests.yaml
Terminal window
# Watch rollout progress
kubectl rollout status deployment/my-app --timeout=300s
# Check rollout history
kubectl rollout history deployment/my-app
# Undo last rollout
kubectl rollout undo deployment/my-app
# Roll back to a specific revision
kubectl rollout undo deployment/my-app --to-revision=2
Terminal window
# Overview
kubectl get pods -l app.kubernetes.io/name=my-app
kubectl get events --sort-by=.lastTimestamp -n <namespace>
# Deep dive into a specific pod
kubectl describe pod <pod-name>
# Logs (current and previous crash)
kubectl logs <pod-name>
kubectl logs <pod-name> --previous
kubectl logs <pod-name> -c <container-name> # specific container
kubectl logs deployment/my-app --all-containers
# Debug containers (K8s 1.25+)
kubectl debug <pod-name> -it --image=busybox --target=<container>
# Port-forwarding for local testing
kubectl port-forward svc/my-app 8080:80
kubectl port-forward pod/<pod-name> 8080:8080
Terminal window
# Get all resources in namespace
kubectl get all -n <namespace>
# YAML output for debugging
kubectl get deployment/my-app -o yaml
# Check resource usage
kubectl top pods -l app.kubernetes.io/name=my-app
kubectl top nodes
StatusMeaningDiagnostic commandTypical fix
PendingNot scheduledkubectl describe pod → EventsCheck resource requests, node selectors, taints, PVC binding
CrashLoopBackOffApp crashing on startkubectl logs --previousFix app startup, check probe config, increase initialDelaySeconds
ImagePullBackOffImage not foundkubectl describe pod → EventsVerify image name/tag, check imagePullSecrets, registry auth
OOMKilledOut of memorykubectl describe pod → Last StateIncrease memory limit, profile app memory usage
EvictedNode disk/memory pressurekubectl describe nodeIncrease limits, add node capacity, check for log/tmp bloat
CreateContainerErrorContainer config issuekubectl describe pod → EventsCheck volume mounts, configmap/secret refs, security context
Init:CrashLoopBackOffInit container failingkubectl logs -c <init-container>Fix init container command, check dependencies
  • RollingUpdate (default): Gradually replaces pods. Set maxSurge and maxUnavailable.
  • Recreate: All pods terminated before new ones created. Use for stateful apps that cannot run multiple versions.
  • Canary: Deploy a second Deployment with 1 replica + same selector labels. Route percentage via Ingress annotations or service mesh.
  • Blue/Green: Two full Deployments (blue/green), switch Service selector between them.
Terminal window
# Always diff before applying
kubectl diff -f manifests.yaml
# Server-side dry run (validates with admission webhooks)
kubectl apply -f manifests.yaml --dry-run=server
# Client-side dry run (fast, but no webhook validation)
kubectl apply -f manifests.yaml --dry-run=client

Use server-side dry-run before production applies — it catches schema errors and runs admission webhooks. Client-side dry-run is faster but only validates locally.

SymptomLikely causeResolution
Pod stuck in PendingInsufficient CPU/memory on nodesScale up cluster or reduce resource requests
Pod stuck in PendingPVC not boundCheck StorageClass exists, PV available
Pod stuck in PendingNode selector/affinity mismatchVerify node labels match selectors
Pod stuck in ContainerCreatingConfigMap/Secret not foundEnsure referenced ConfigMaps/Secrets exist
Pod stuck in ContainerCreatingVolume mount failureCheck PVC status, CSI driver health
Service returns 503No ready endpointsCheck pod readiness probes, selector match
Service returns 503Wrong port configurationVerify targetPort matches containerPort
Ingress returns 404Backend service not foundCheck Ingress rules, service name/port
Ingress returns 404Wrong path matchingCheck pathType (Prefix vs Exact)
HPA not scalingMetrics server not installedInstall metrics-server
HPA not scalingResource requests not setAdd CPU/memory requests to containers
CronJob not runningInvalid cron expressionValidate cron syntax (5-field format)
NetworkPolicy blockingDefault deny appliedAdd explicit allow rules for required traffic
RBAC permission deniedMissing Role/RoleBindingCheck ServiceAccount bindings and verb permissions