kubectl —— K8s 的瑞士军刀

一句话定义

kubectl 是 K8s apiserver 的客户端工具。所有"对集群做事"——查看资源、创建、改、删、调试——都通过它。本质上是个 HTTP 客户端，把命令翻译成 REST 调用打给 apiserver。

典型场景

训练营文档里 kubectl 出场 288 次，是绝对的命令之王。常见动作分四类：

类型	例子	用途
查看	`kubectl get pods`, `kubectl describe pod xxx`	看集群状态
改变	`kubectl apply -f file.yaml`, `kubectl edit deploy nginx`	创建/修改资源
调试	`kubectl logs xxx`, `kubectl exec -it xxx -- bash`	进入容器排错
运维	`kubectl scale`, `kubectl rollout`, `kubectl drain`	集群操作

kubectl 的 context（最重要的概念）

kubectl 的所有调用都基于一个上下文（context）：

context = cluster + user + namespace
       ↓        ↓      ↓
    连哪个    用谁    操作哪个 ns

context 配置存在 ~/.kube/config 里。这是个 YAML 文件，包含：

clusters:                    # 你能连的集群（apiserver URL + CA）
  - name: k8s-bootcamp
    cluster:
      server: https://10.0.24.28:6443
      certificate-authority-data: ...

users:                       # 你的身份凭证（client cert / token / kubeconfig）
  - name: admin@k8s-bootcamp
    user:
      client-certificate-data: ...
      client-key-data: ...

contexts:                    # 组合：哪个集群 + 哪个用户 + 默认 ns
  - name: bootcamp-admin
    context:
      cluster: k8s-bootcamp
      user: admin@k8s-bootcamp
      namespace: default

current-context: bootcamp-admin    # 当前生效的 context

切换 context（一台机器管多个集群必备）

kubectl config get-contexts          # 列出所有 context
kubectl config current-context       # 看当前
kubectl config use-context dev       # 切到 dev
kubectl config use-context prod      # 切到 prod

临时改 namespace

kubectl config set-context --current --namespace=kube-system
# 之后所有命令默认在 kube-system

kubectl get pods                     # 等价 kubectl -n kube-system get pods

推荐工具：kubectx + kubens

apt install -y kubectx               # 或 brew install kubectx

kubectx                              # 列 context
kubectx prod                         # 切 prod
kubectx -                            # 切回上一个

kubens kube-system                   # 切 namespace
kubens -                             # 切回上一个

切 context / namespace 是高频操作，没这两个工具会累死。

查看资源：`get` / `describe` / `explain`

`get` —— 列表

kubectl get pods                              # 当前 ns 的 pod
kubectl get pods -A                            # 所有 ns（all-namespaces）
kubectl get pods -n kube-system                # 特定 ns
kubectl get pods -o wide                       # 加节点、IP
kubectl get pods --show-labels                 # 带 label
kubectl get pods -l app=nginx                  # 按 label 过滤
kubectl get pods --field-selector status.phase=Running   # 按字段
kubectl get pods --watch                       # 持续观察变化
kubectl get pods -w --output-watch-events       # 含 ADDED/MODIFIED/DELETED 事件类型

资源缩写（实战必背）

全名	缩写
`pods`	`po`
`services`	`svc`
`deployments`	`deploy`
`replicasets`	`rs`
`daemonsets`	`ds`
`statefulsets`	`sts`
`configmaps`	`cm`
`secrets`	`secret`（没缩写）
`persistentvolumeclaims`	`pvc`
`persistentvolumes`	`pv`
`namespaces`	`ns`
`nodes`	`no`
`ingresses`	`ing`
`customresourcedefinitions`	`crd`
`events`	`ev`

记忆诀窍：基本都是去元音 / 取前几个字母。kubectl get po 比 kubectl get pods 省 3 个字符 —— 一天敲百次累积下来差很多。

`describe` —— 详情

kubectl describe pod nginx-abc                       # 单个 pod 详情
kubectl describe node m1                             # 节点详情
kubectl describe pvc data-mysql-0                    # PVC 详情

describe 的精华是 Events 段（命令输出最下方）：

Events:
  Type     Reason          Age   From               Message
  ----     ------          ----  ----               -------
  Normal   Scheduled       30s   default-scheduler  Successfully assigned ...
  Normal   Pulled          25s   kubelet            Container image "..." already present
  Normal   Created         25s   kubelet            Created container
  Warning  Unhealthy       10s   kubelet            Readiness probe failed: ...   ← 关键信号

排查 pod 起不来：永远先 describe，几乎所有失败原因都在 Events 里。

`explain` —— 文档查询

kubectl explain pod                                  # 看 pod 资源结构
kubectl explain pod.spec                             # 看 spec 字段
kubectl explain pod.spec.containers                  # 看 containers 字段
kubectl explain pod.spec.containers.resources --recursive    # 递归看所有子字段

写 yaml 不知道某字段叫啥用 explain，比 google 快。

改变资源：`apply` / `create` / `edit` / `patch`

`apply` —— 声明式（首选）

kubectl apply -f deployment.yaml                # 单文件
kubectl apply -f manifests/                      # 整个目录
kubectl apply -k overlays/prod                   # kustomize
kubectl apply -f https://....yaml                # URL（cilium、ingress 安装常这样）

kubectl apply --dry-run=client -f file.yaml      # 本地预览（不真改）
kubectl apply --dry-run=server -f file.yaml      # apiserver 模拟（含 validation webhook）
kubectl apply -f file.yaml --record              # 记录到 annotation（rollout history 用）

apply 是幂等的：同一个 yaml 多次 apply 不会重复创建。它对比期望和现状、只下发 diff（"three-way merge"）。

`create` —— 命令式

kubectl create namespace foo
kubectl create deployment nginx --image=nginx:1.21 --replicas=3
kubectl create secret generic mysecret --from-literal=password=xxx
kubectl create configmap app-cfg --from-file=config.json

create 不幂等。资源已存在会报 AlreadyExists。脚本里首选 apply，create 适合一次性。

`create --dry-run=client -o yaml` —— 生成 yaml 模板

kubectl create deployment nginx --image=nginx:1.21 --dry-run=client -o yaml

输出标准 yaml，可以保存编辑成 manifest 模板。CKA 考试这是必备技巧（手写 yaml 太慢）。

`edit` —— 交互式编辑

kubectl edit deploy nginx
# 打开 vim 编辑当前 deployment 的 yaml；保存后 apply

适合临时调试改一两个字段。生产改动应该走 apply -f（修改保存在 git）。

`patch` —— 单字段更新

# 改 replicas
kubectl patch deploy nginx -p '{"spec":{"replicas":5}}'

# 加 label
kubectl patch deploy nginx -p '{"metadata":{"labels":{"env":"prod"}}}'

# strategic merge（默认）vs JSON merge vs JSON patch
kubectl patch deploy nginx --type=json \
  -p='[{"op":"replace","path":"/spec/replicas","value":5}]'

三种 patch 类型：

strategic —— K8s 智能合并（数组按 key 合并）默认
merge —— RFC 7396，数组整体替换
json —— RFC 6902，最精确

写脚本批改字段用 patch，比 edit 适合自动化。

`delete`

kubectl delete pod nginx-abc                     # 删一个
kubectl delete deploy nginx                       # 删 deployment（连带 pod 一起删）
kubectl delete -f file.yaml                       # 删 yaml 里描述的所有资源
kubectl delete pod --all -n test                  # 删 test ns 所有 pod
kubectl delete pod nginx --force --grace-period=0 # 强制立刻删（pod 卡 Terminating 时用）

调试三剑客：`logs` / `exec` / `port-forward`

`logs` —— 看容器日志

kubectl logs nginx-abc                            # 单 pod
kubectl logs nginx-abc -c sidecar                 # 多容器 pod 选某个
kubectl logs nginx-abc -f                         # 跟随
kubectl logs nginx-abc --since=10m                # 最近 10 分钟
kubectl logs nginx-abc --tail=100                 # 最近 100 行
kubectl logs nginx-abc --previous                 # 上次崩溃的 pod 日志（pod crashloop 必查）
kubectl logs -l app=nginx --max-log-requests=10   # 按 label 多 pod 一起看
kubectl logs -l app=nginx --prefix                # 多 pod 时加 pod 名前缀

--previous 是排查 CrashLoopBackOff 的杀手锏：pod 起来就挂、当前 pod 已经是新一轮 → 看上一个挂掉的日志。

`exec` —— 进容器

kubectl exec nginx-abc -- ls /                    # 一次性命令
kubectl exec -it nginx-abc -- bash                # 交互式 shell
kubectl exec -it nginx-abc -c sidecar -- sh       # 指定容器
kubectl exec nginx-abc -- env                     # 看环境变量

-it = -i（stdin）+ -t（tty）。交互 shell 必须加。

如果容器没有 shell（distroless 镜像）：

kubectl debug nginx-abc -it --image=busybox        # 临时附加调试容器（K8s 1.23+）

`port-forward` —— 把远端端口拉到本地

kubectl port-forward pod/nginx-abc 8080:80         # 本机:8080 → pod:80
kubectl port-forward svc/nginx 8080:80             # 转发到 service
kubectl port-forward deploy/nginx 8080:80          # 转发到 deployment 的某个 pod

kubectl port-forward --address 0.0.0.0 svc/nginx 8080:80   # 监听所有网卡（默认只 localhost）

K8s 本地调试的瑞士军刀。dashboard / grafana / kafka 这种内部服务直接 port-forward 出来就能用。

输出格式：`-o`

kubectl get pod nginx -o yaml                     # YAML
kubectl get pod nginx -o json                     # JSON
kubectl get pod -o wide                            # 表格 + IP/节点
kubectl get pod -o name                            # 只输出 "pod/xxx"

# JSONPath（**重要**）
kubectl get pods -o jsonpath='{.items[*].metadata.name}'
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'

# custom-columns（更易读）
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName

# go-template（更强）
kubectl get pods -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}'

JSONPath 速查

.items[*].metadata.name                                  # 所有 item 的 name
.items[?(@.status.phase=="Running")].metadata.name        # 过滤
.status.containerStatuses[0].ready                        # 第一个容器是否 ready

JSONPath 弱于 jq，但内置在 kubectl 里、不依赖外部工具。复杂查询用 kubectl get ... -o json | jq。

资源运维：`scale` / `rollout` / `drain`

`scale` —— 改副本数

kubectl scale deploy nginx --replicas=10
kubectl scale --replicas=0 deploy/nginx          # 缩到 0（暂停服务）

`rollout` —— 滚动更新管理

kubectl rollout status deploy/nginx               # 看进度
kubectl rollout history deploy/nginx              # 看历史版本
kubectl rollout history deploy/nginx --revision=3 # 看某版本详情
kubectl rollout undo deploy/nginx                 # 回滚到上一版
kubectl rollout undo deploy/nginx --to-revision=2 # 回滚到指定版本
kubectl rollout restart deploy/nginx              # 重启所有 pod（无变更也能滚）
kubectl rollout pause deploy/nginx                # 暂停 rollout
kubectl rollout resume deploy/nginx

rollout restart 是日常调试神器：改了 ConfigMap、Secret 不会自动重启 pod，手动 restart 一下。

`drain` / `cordon` —— 节点维护

kubectl cordon m4                # 标记 m4 不可调度（新 pod 不会调度上来）
kubectl drain m4 --ignore-daemonsets --delete-emptydir-data
                                  # 把现有 pod 平稳迁走
                                  # 节点重启 / 升级前必做
kubectl uncordon m4              # 维护完恢复可调度

看集群事件：`events`

kubectl get events -A --sort-by='.lastTimestamp'        # 按时间排序
kubectl get events -A --field-selector type=Warning     # 只看 Warning
kubectl get events --watch                                # 实时
kubectl events                                            # K8s 1.27+ 新命令

集群层面的"什么时候发生了什么"——pod 调度失败、ImagePullBackOff、Eviction 都在这里。集群有故障第一时间看 events。

排错黄金套路：pod 起不来怎么查

# 1. 看 pod 状态
kubectl get pod my-pod -n my-ns
# STATUS：Pending / ImagePullBackOff / CrashLoopBackOff / Error

# 2. 看 describe 的 Events 段
kubectl describe pod my-pod -n my-ns

# 3. 按状态判断方向：
# Pending → 调度问题：节点资源 / 污点容忍 / PVC 未 bound
kubectl get nodes
kubectl describe nodes | grep -A 5 Taints
kubectl get pvc

# ImagePullBackOff → 镜像问题
kubectl describe pod ... | grep -i "image\|pull"

# CrashLoopBackOff → 容器跑起来就挂
kubectl logs my-pod -n my-ns                    # 当前 pod 日志
kubectl logs my-pod -n my-ns --previous          # 上一次挂掉的日志（关键！）

# Running 但 not ready → probe 失败
kubectl describe pod ... | grep -i probe

# 4. 如果以上都不通，看节点和容器运行时
kubectl get nodes
ssh m4 'crictl ps -a | grep my-pod'              # 在节点上看真实容器
ssh m4 'journalctl -u kubelet --since "5 min ago" | grep my-pod'

`top` —— 看资源用量

kubectl top nodes                # 节点 CPU/内存（需要 metrics-server 已装）
kubectl top pods                 # pod CPU/内存
kubectl top pods -n kube-system --sort-by=cpu
kubectl top pods --containers    # 容器级

要求集群装了 metrics-server（不是默认装的，kubeadm 集群要手动装）。

`auth can-i` —— 权限自查

kubectl auth can-i create pods                   # 我能创建 pod 吗？yes/no
kubectl auth can-i delete nodes                  # 我能删节点吗？
kubectl auth can-i '*' '*'                       # 我是不是 cluster-admin？
kubectl auth can-i list pods --as=system:serviceaccount:default:default
                                                  # 假装某 SA 能不能

RBAC 排查必备：写完 Role / RoleBinding 用 can-i 验证。

高效操作小技巧

1. shell 补全

# bash
source <(kubectl completion bash)
echo 'source <(kubectl completion bash)' >> ~/.bashrc
echo 'alias k=kubectl' >> ~/.bashrc
echo 'complete -F __start_kubectl k' >> ~/.bashrc

# zsh 类似

之后 k get pod Tab 补全 pod 名。

2. `--watch` 跟随变化

kubectl get pods -w                              # 实时打印变化

部署新版本时左边窗口 apply、右边窗口 kubectl get pods -w 看 pod 滚动。

3. `--all-namespaces` / `-A`

不知道 pod 在哪个 ns：kubectl get pods -A | grep my-pod。

4. 用 `--server-side` 避免 last-applied-configuration 膨胀

kubectl apply --server-side -f file.yaml

K8s 1.22+ 推荐。否则 kubectl.kubernetes.io/last-applied-configuration annotation 会越来越大。

5. `wait` 同步等待

kubectl wait --for=condition=ready pod/my-pod --timeout=60s
kubectl wait --for=delete pod/my-pod --timeout=60s
kubectl wait --for=condition=available deploy/my-app --timeout=5m

CI 脚本里部署完等到 ready 再继续。

常见踩坑

坑 1：误连了别的集群

kubectl delete deploy nginx                       # 删了生产的 nginx？？

防御：把当前 context 显示在 shell 提示符（用 kube-ps1 工具）：

[user@m1 ~] (⎈ |bootcamp-admin:default) $

或者敲删除命令前 kubectl config current-context 看一眼。

坑 2：忘了指定 ns

kubectl get pods                                  # 只看默认 ns 的
# 找不到 pod，去问别人：哦在 kube-system

养成 -n <ns> 或 -A 习惯。kubens 帮你切默认 ns。

坑 3：edit 改完没生效

kubectl edit deploy nginx
# 改完保存，但 pod 没重启

可能：

你改的字段不触发 pod 重建（label / annotation 改 deployment-level 不触发）
你改了 immutable 字段（service 的 clusterIP 等），K8s 忽略

修：kubectl rollout restart deploy nginx。

坑 4：apply 之后 metadata 越来越大

metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      { ... 整个 yaml ... }                        ← 一堆历史，越来越大

apply 把上一次的全文 yaml 写进 annotation 用于 diff。用 --server-side 避免。

坑 5：CrashLoopBackOff 的"上一次"日志怎么拿

kubectl logs my-pod                              # 当前的 pod，可能正在重启、日志短
kubectl logs my-pod --previous                   # 上一个挂掉的容器的日志（**关键**）

不加 --previous 永远只看到新 pod 的开头几行日志，看不出为什么挂。

坑 6：pod 卡 Terminating 删不掉

kubectl delete pod my-pod
# Terminating ... 一直停在这

kubectl delete pod my-pod --force --grace-period=0
# 强制立即删（apiserver 端清掉，节点端可能还有残留）

通常根因是 finalizer 卡住、节点失联、pre-stop hook 慢。删完检查 etcd 里没残留：

kubectl get pod my-pod                           # 应该 NotFound

坑 7：apply -f 不删除已不在 yaml 里的资源

# v1.yaml 里有 deploy A、deploy B
kubectl apply -f v1.yaml

# v2.yaml 里只有 deploy A（删了 B）
kubectl apply -f v2.yaml
# deploy B 还在！apply 不会因为 yaml 里没了就删

正确做法：

kubectl apply --prune -f v2.yaml -l app=myapp
# 加 --prune：删除带 label 但不在 yaml 里的资源

或者 GitOps 工具（Argo CD / Flux）天然处理这个。

坑 8：`-o yaml` 含 managedFields 一堆噪音

kubectl get pod my-pod -o yaml | head -50
# 大半屏 managedFields 看不懂

加 flag 去掉：

kubectl get pod my-pod -o yaml --show-managed-fields=false
# 或者 1.22+ 直接：
kubectl get pod my-pod -o yaml | yq 'del(.metadata.managedFields)'

坑 9：自动加的 default ns 让你认错对象

kubectl get pod my-pod                           # 找不到
# 因为 my-pod 在 kube-system，你的默认是 default

kubectl get pods -A | grep my-pod 先找全集群再看。

坑 10：用 `apply -f` 跑命令式输出

kubectl create deploy nginx --image=nginx --dry-run=client -o yaml | kubectl apply -f -

注意是 apply -f -（- 表示 stdin）。这个套路常用于"先生成再 apply"。

关联命令

kubeadm —— 装集群的工具，装完之后用 kubectl 管
crictl —— kubectl 看不到容器层时用
etcdctl —— kubectl 拿不到时直接看 etcd
helm —— 应用打包，本质是生成 yaml + kubectl apply
jq —— kubectl get -o json | jq 是黄金组合
kubectx / kubens —— context / ns 切换工具
k9s —— terminal 内的交互式 K8s 仪表板