Day 7: Harbor + ArgoCD + Cilium Service Mesh
目标 (双主线 + 一副线):
- 主线 A: Harbor 私有镜像仓库 (push/pull, project, RBAC)
- 主线 B: ArgoCD GitOps (declarative deploy + auto-sync)
- 副线 C: Cilium Service Mesh — WireGuard 透明加密 + Cilium Ingress controller 耗时: 4-6 小时 风险: Harbor 装错存储路径会丢镜像 / ArgoCD app of apps 写错会无限同步 / Cilium WireGuard 配置错会断 Pod 网络
0. TL;DR (3 节)
- A — Harbor: helm 装 + 创 project + docker login + push 一个 image + pull 验证
- B — ArgoCD: helm 装 + UI 暴露 + 创 Application (GitOps git repo) + sync
- C — Cilium SM:
- WireGuard transparent encryption (节点间 Pod 流量加密)
- Cilium Ingress controller (Day 1 Cilium 装了但没启 Ingress)
1. 学习目标
能秒答:
- "Harbor vs Docker Hub 区别? 为什么生产必装 Harbor?"
- "ArgoCD 三件套 Application / AppProject / Repository 是什么?"
- "GitOps 跟 CI/CD push 模式区别? ArgoCD 的 pull 模型为啥更安全?"
- "Cilium WireGuard 跟 IPsec 加密区别? 性能 vs 兼容性?"
- "Cilium Service Mesh 跟 Istio/Linkerd 设计差异? sidecar-less 怎么做到的?"
简历可写:
- "搭建 Harbor 私有镜像仓库 + Trivy 漏洞扫描,集成 K8s 镜像 pull"
- "ArgoCD GitOps,Application of Apps 模式管理 N 个 K8s 集群"
- "启用 Cilium WireGuard 透明加密,Pod-to-Pod 流量节点间全加密"
10. 实时执行日志(6 维度)
Day 7.A — Harbor 装 + push/pull 全链路
A1. 装 Harbor (helm chart)
What:
helm repo add harbor https://helm.goharbor.io
helm install harbor harbor/harbor \
--namespace harbor --create-namespace \
--set expose.type=nodePort \
--set expose.tls.enabled=false \ # ← HTTP-only, dev 用
--set expose.nodePort.ports.http.port=30002 \
--set persistence.persistentVolumeClaim.registry.storageClass=longhorn \
--set persistence.persistentVolumeClaim.registry.size=10Gi \
--set persistence.persistentVolumeClaim.database.storageClass=longhorn \
--set persistence.persistentVolumeClaim.database.size=2Gi \
--set persistence.persistentVolumeClaim.jobservice.jobLog.storageClass=longhorn \
--set persistence.persistentVolumeClaim.jobservice.jobLog.size=1Gi \
--set persistence.persistentVolumeClaim.redis.storageClass=longhorn \
--set persistence.persistentVolumeClaim.redis.size=1Gi \
--set persistence.persistentVolumeClaim.trivy.storageClass=longhorn \
--set persistence.persistentVolumeClaim.trivy.size=1Gi \
--set harborAdminPassword=bootcamp \
--set externalURL=http://10.0.24.31:30002
Harbor 架构 (8 component, 11 Pod):
| 组件 | 作用 | PVC |
|---|---|---|
| core | API + 业务逻辑 | — |
| portal | Web UI (React) | — |
| registry | 真正 OCI registry (docker distribution) | 10Gi (image 数据) |
| database | Postgres (meta + project + user) | 2Gi |
| redis | session + queue cache | 1Gi |
| jobservice | 异步任务 (replication, scan) | 1Gi |
| trivy | 漏洞扫描 | 1Gi (vuln db) |
| nginx | 入口反代,前端 routing | — |
总 PVC ~15Gi (Longhorn 3 副本 = ~45Gi)
A2. ⚠️ 真坑 #1 — TLS commonName required
What:
Error: INSTALLATION FAILED: execution error at (harbor/templates/nginx/secret.yaml:4:12):
The "expose.tls.auto.commonName" is required!
Harbor chart 2.15 强制要 TLS cert,即使 HTTP NodePort
Fix: --set expose.tls.enabled=false 完全 disable TLS (生产应配真 cert)
A3. ⚠️ 真坑 #2 — NodePort port = 30002 ≠ 默认 80
Harbor svc 把 NodePort 端口直接当 internal port:
ports:
- name: http
nodePort: 30002
port: 30002 # ← 注意!不是 80
targetPort: 8080
集群内访问 必须用 http://harbor.harbor:30002 (不能省 :30002)
A4. 验证 Harbor 健康
curl http://10.0.24.28:30002/api/v2.0/health
# 8 components 全 healthy: core, database, jobservice, portal, redis, registry, registryctl, trivy
✅ UI: http://10.0.24.28:30002, admin/bootcamp
A5. 创 project + push image (crane)
Why crane 而非 docker:
- m1 上没装 docker,只有 containerd(K8s 节点典型)
- crane (google go-containerregistry) 二进制独立,支持 HTTP push
# 装 crane
curl -sL https://github.com/google/go-containerregistry/releases/download/v0.20.2/go-containerregistry_Linux_x86_64.tar.gz | tar -xz -C /tmp crane
mv /tmp/crane /usr/local/bin/
# 创 project (用 curl, JSON body 写到文件避免 shell escape 嵌套)
cat > /tmp/proj.json <<EOF
{"project_name":"bootcamp","metadata":{"public":"true"},"storage_limit":-1}
EOF
curl -u admin:bootcamp http://10.0.24.28:30002/api/v2.0/projects \
-X POST -H 'Content-Type: application/json' -d @/tmp/proj.json
# crane 直接 copy from Docker Hub 到 Harbor
crane copy nginx:1.27-alpine 10.0.24.28:30002/bootcamp/nginx:1.27-alpine --insecure
Actual (push 20s):
2026/05/26 15:15:52 pushed blob: sha256:4e1ae010...
2026/05/26 15:15:53 pushed blob: sha256:03bfb37f...
2026/05/26 15:15:56 10.0.24.28:30002/bootcamp/nginx:1.27-alpine: digest: sha256:65645c7b... size: 10333
✅ push 成功,Harbor API 显示 bootcamp/nginx 1 artifact
A6. ⚠️ 真坑 #3 — containerd 默认假设 HTTPS
What (K8s 试 pull from Harbor):
kubectl run harbor-pull-test --image=10.0.24.28:30002/bootcamp/nginx:1.27-alpine
# ErrImagePull
错误日志:
failed to do request: Head "https://10.0.24.28:30002/v2/...":
http: server gave HTTP response to HTTPS client
Why: containerd 默认 protocol = HTTPS,Harbor 我们配的 HTTP
Fix (/etc/containerd/certs.d/10.0.24.28:30002/hosts.toml):
server = "http://10.0.24.28:30002"
[host."http://10.0.24.28:30002"]
capabilities = ["pull", "resolve", "push"]
skip_verify = true
关键: 5 节点都要写(任何节点都可能调度 Pod 拉 image)
优雅之处: containerd 监听 certs.d 目录,配置自动 reload,不用 restart containerd ✅
A7. 重试 — K8s pull 成功 734ms
kubectl run harbor-pull-test --image=10.0.24.28:30002/bootcamp/nginx:1.27-alpine
sleep 15
kubectl describe pod harbor-pull-test
Actual:
Pod harbor-pull-test 1/1 Running
Image: 10.0.24.28:30002/bootcamp/nginx:1.27-alpine
Image ID: 10.0.24.28:30002/bootcamp/nginx@sha256:65645c7bb6a06... ← 带 digest, 可追溯
Pulled: "Successfully pulled image ... in 734ms"
✅ 从 Harbor 拉 image 完整端到端
A8. Lesson (面试可讲)
- Harbor 比 Docker Hub Enterprise 优势: 自建 + 项目级 RBAC + Trivy 扫描 + 镜像复制 + GC
- HTTPS vs HTTP: 生产强制 HTTPS(cert-manager + Let's Encrypt 或内 CA)
- containerd certs.d 比老 mirrors 配置好:
- 老方式:
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]静态,改要重启 containerd - certs.d 模式: 子目录, 每改 hosts.toml 自动 reload
- 老方式:
- Image digest pinning: Pod 用
image: ...@sha256:65645c7b...比:1.27-alpine更安全(防 image 被替换) - Trivy 扫描: Harbor UI 里点 scan, 触发 trivy 服务扫漏洞;CI/CD 可调 API
/api/v2.0/projects/bootcamp/repositories/nginx/artifacts/{digest}/scan
Day 7.A — Harbor 装 + push/pull 全链路
A1. 装 Harbor (helm chart)
What:
helm repo add harbor https://helm.goharbor.io
helm install harbor harbor/harbor \
--namespace harbor --create-namespace \
--set expose.type=nodePort \
--set expose.tls.enabled=false \ # ← HTTP-only, dev 用
--set expose.nodePort.ports.http.port=30002 \
--set persistence.persistentVolumeClaim.registry.storageClass=longhorn \
--set persistence.persistentVolumeClaim.registry.size=10Gi \
--set persistence.persistentVolumeClaim.database.storageClass=longhorn \
--set persistence.persistentVolumeClaim.database.size=2Gi \
...
--set harborAdminPassword=bootcamp \
--set externalURL=http://10.0.24.31:30002
Harbor 架构 (8 component, 11 Pod):
| 组件 | 作用 | PVC |
|---|---|---|
| core | API + 业务逻辑 | — |
| portal | Web UI (React) | — |
| registry | 真正 OCI registry (docker distribution) | 10Gi (image 数据) |
| database | Postgres (meta + project + user) | 2Gi |
| redis | session + queue cache | 1Gi |
| jobservice | 异步任务 (replication, scan) | 1Gi |
| trivy | 漏洞扫描 | 1Gi (vuln db) |
| nginx | 入口反代,前端 routing | — |
总 PVC ~15Gi (Longhorn 3 副本 = ~45Gi)
A2. ⚠️ 真坑 #1 — TLS commonName required
What:
Error: INSTALLATION FAILED: execution error at (harbor/templates/nginx/secret.yaml:4:12):
The "expose.tls.auto.commonName" is required!
Harbor chart 2.15 强制要 TLS cert,即使 HTTP NodePort
Fix: --set expose.tls.enabled=false 完全 disable TLS (生产应配真 cert)
A3. ⚠️ 真坑 #2 — NodePort port = 30002 ≠ 默认 80
Harbor svc 把 NodePort 端口直接当 internal port:
ports:
- name: http
nodePort: 30002
port: 30002 # ← 注意!不是 80
targetPort: 8080
集群内访问 必须用 http://harbor.harbor:30002 (不能省 :30002)
A4. 验证 Harbor 健康
curl http://10.0.24.28:30002/api/v2.0/health
# 8 components 全 healthy: core, database, jobservice, portal, redis, registry, registryctl, trivy
✅ UI: http://10.0.24.28:30002, admin/bootcamp
A5. 创 project + push image (crane)
Why crane 而非 docker:
- m1 上没装 docker,只有 containerd(K8s 节点典型)
- crane (google go-containerregistry) 二进制独立,支持 HTTP push
# 装 crane
curl -sL https://github.com/google/go-containerregistry/releases/download/v0.20.2/go-containerregistry_Linux_x86_64.tar.gz | tar -xz -C /tmp crane
mv /tmp/crane /usr/local/bin/
# 创 project (用 curl, JSON body 写到文件避免 shell escape 嵌套)
cat > /tmp/proj.json <<EOF
{"project_name":"bootcamp","metadata":{"public":"true"},"storage_limit":-1}
EOF
curl -u admin:bootcamp http://10.0.24.28:30002/api/v2.0/projects \
-X POST -H 'Content-Type: application/json' -d @/tmp/proj.json
# crane 直接 copy from Docker Hub 到 Harbor
crane copy nginx:1.27-alpine 10.0.24.28:30002/bootcamp/nginx:1.27-alpine --insecure
Actual (push 20s):
2026/05/26 15:15:52 pushed blob: sha256:4e1ae010...
2026/05/26 15:15:53 pushed blob: sha256:03bfb37f...
2026/05/26 15:15:56 10.0.24.28:30002/bootcamp/nginx:1.27-alpine: digest: sha256:65645c7b... size: 10333
✅ push 成功,Harbor API 显示 bootcamp/nginx 1 artifact
A6. ⚠️ 真坑 #3 — containerd 默认假设 HTTPS
What (K8s 试 pull from Harbor):
kubectl run harbor-pull-test --image=10.0.24.28:30002/bootcamp/nginx:1.27-alpine
# ErrImagePull
错误日志:
failed to do request: Head "https://10.0.24.28:30002/v2/...":
http: server gave HTTP response to HTTPS client
Why: containerd 默认 protocol = HTTPS,Harbor 我们配的 HTTP
Fix (/etc/containerd/certs.d/10.0.24.28:30002/hosts.toml):
server = "http://10.0.24.28:30002"
[host."http://10.0.24.28:30002"]
capabilities = ["pull", "resolve", "push"]
skip_verify = true
关键: 5 节点都要写(任何节点都可能调度 Pod 拉 image)
优雅之处: containerd 监听 certs.d 目录,配置自动 reload,不用 restart containerd ✅
A7. 重试 — K8s pull 成功 734ms
kubectl run harbor-pull-test --image=10.0.24.28:30002/bootcamp/nginx:1.27-alpine
sleep 15
kubectl describe pod harbor-pull-test
Actual:
Pod harbor-pull-test 1/1 Running
Image: 10.0.24.28:30002/bootcamp/nginx:1.27-alpine
Image ID: 10.0.24.28:30002/bootcamp/nginx@sha256:65645c7bb6a06... ← 带 digest, 可追溯
Pulled: "Successfully pulled image ... in 734ms"
✅ 从 Harbor 拉 image 完整端到端
A8. Lesson (面试可讲)
- Harbor 比 Docker Hub Enterprise 优势: 自建 + 项目级 RBAC + Trivy 扫描 + 镜像复制 + GC
- HTTPS vs HTTP: 生产强制 HTTPS(cert-manager + Let's Encrypt 或内 CA)
- containerd certs.d 比老 mirrors 配置好:
- 老方式:
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]静态,改要重启 containerd - certs.d 模式: 子目录, 每改 hosts.toml 自动 reload
- 老方式:
- Image digest pinning: Pod 用
image: ...@sha256:65645c7b...比:1.27-alpine更安全(防 image 被替换) - Trivy 扫描: Harbor UI 里点 scan, 触发 trivy 服务扫漏洞;CI/CD 可调 API
/api/v2.0/projects/bootcamp/repositories/nginx/artifacts/{digest}/scan
Day 7.B — ArgoCD GitOps
B1. GitOps 范式 — 为什么 ArgoCD 是 pull-based
传统 CI/CD (push):
Git push → CI 跑 build → CI 拿 kubectl 凭证 → 直接 apply 到集群
问题:
① 凭证存在 CI runner (泄漏风险)
② 漂移 — 有人手动 kubectl apply 后 Git 不知道
③ 多集群难 (要 N 套凭证)
GitOps (pull):
Git push → ArgoCD watch repo → 拉到本地 → diff cluster current state → reconcile
优势:
① 集群凭证不出集群
② 漂移检测 — Sync Status 直接看到
③ 多集群 — 每集群一个 ArgoCD,共用同一 Git source
B2. ArgoCD 4 个核心概念
| CRD | 作用 | 类比 |
|---|---|---|
| Application | 一个待同步的目标 (repo + path + dest ns) | 一个"项目" |
| AppProject | 一组 Application 的边界(限定 source repo / dest ns / 权限) | 一个"租户" |
| Repository | git/helm/oci 凭证 | "代码源" |
| ApplicationSet | 一份模板批量生成 Application | "环境矩阵 dev/staging/prod" |
B3. helm 装 ArgoCD
What:
helm repo add argo https://argoproj.github.io/argo-helm
helm install argocd argo/argo-cd \
--namespace argocd --create-namespace \
--set server.service.type=NodePort \
--set server.service.nodePortHttp=30080 \
--set 'configs.params.server\.insecure=true' \ # ← HTTP-only,免 TLS
--set 'controller.tolerations[0].operator=Exists' \
--set 'server.tolerations[0].operator=Exists' \
... (7 component 都加 tolerations)
7 个 Pod:
- application-controller (StatefulSet, 1 副本 — 主同步引擎)
- applicationset-controller (Deployment — ApplicationSet 模板渲染器)
- dex-server (Deployment — OIDC AuthN, dev 可关)
- notifications-controller (Deployment — 通知 Slack/Email)
- redis (cache)
- repo-server (git clone + helm render + kustomize)
- server (UI + API)
B4. 获取初始 admin password
kubectl get secret argocd-initial-admin-secret -n argocd \
-o jsonpath='{.data.password}' | base64 -d
Actual: EfqU4EiWmQ9frc6B
UI: http://10.0.24.31:30080 admin / <password>
B5. 创建第一个 Application — GitOps 端到端
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: guestbook
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/argoproj/argocd-example-apps.git
targetRevision: HEAD
path: guestbook
destination:
server: https://kubernetes.default.svc # 本集群
namespace: argocd-demo
syncPolicy:
automated:
prune: true # 自动删 Git 里删了的资源
selfHeal: true # 自动修人手改动 (漂移修复)
syncOptions:
- CreateNamespace=true
关键字段:
syncPolicy.automated.selfHeal: 你kubectl edit deploy改了 → ArgoCD 监听到 → 强制 sync 回 Git 版本syncPolicy.automated.prune: 从 Git 里删除某 yaml → ArgoCD 自动从集群删 → 否则会变 "Orphaned"CreateNamespace: ArgoCD 自动建 ns(否则要先 apply Namespace yaml)
B6. 验证 Sync 自动完成
kubectl apply -f argocd-app.yaml
sleep 30
kubectl get application -n argocd
Actual:
NAME SYNC STATUS HEALTH STATUS
guestbook Synced Progressing
kubectl get all -n argocd-demo
NAME READY STATUS
pod/guestbook-ui-7689b675bc-99qhk 1/1 ContainerCreating → Running
service/guestbook-ui ClusterIP 10.100.166.42:80
deployment.apps/guestbook-ui 1/1
✅ 30 秒内, ArgoCD 自动 clone → render → apply → namespace 创建 + Deployment + Service
Application Status 详情:
{
"operationState": {
"message": "successfully synced (all tasks run)",
"phase": "Succeeded",
...
},
"history": [{
"revision": "8088f4c0d970abb09e250248cc97e35623447cb5",
"initiatedBy": {"automated": true} ← 完全自动, 没人手动 sync
}]
}
B7. 集成 Harbor — Pull image from 私有镜像源
# 在 Application 里,manifest 直接引用 Harbor image
image: 10.0.24.28:30002/bootcamp/nginx:1.27-alpine
# 如果 Harbor 是 private project,需要 imagePullSecret
RBAC 模型 (ArgoCD AppProject):
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata: {name: prod-apps, namespace: argocd}
spec:
sourceRepos: ['https://github.com/orgname/*'] # 限定可同步的 git repo
destinations:
- {server: 'https://kubernetes.default.svc', namespace: 'prod-*'} # 限 namespace
clusterResourceWhitelist:
- {group: '', kind: 'Namespace'} # 只允许 cluster-level 资源类型
B8. App of Apps 模式 — 管理百 N 个 Application
生产模式:
git repo root/
├── apps/
│ ├── prod/
│ │ ├── frontend.yaml # Application: frontend
│ │ ├── backend.yaml # Application: backend
│ │ └── ...
│ ├── staging/
│ └── dev/
├── infra/
│ ├── ingress-nginx.yaml # Application
│ ├── cert-manager.yaml
│ └── ...
└── apps-of-apps.yaml # ← ONE Application 指向 apps/, 自动 sync 所有 Application
一个"根 Application" → ArgoCD 看到 + 同步 → apply 所有子 Application → 每个子 Application 又同步真实业务
简历可写:
落地 ArgoCD GitOps,App of Apps 模式管理 100+ K8s 应用; selfHeal 防漂移 + prune 资源清理; 跟 Harbor 私有镜像 + Kyverno 准入 + 多集群 ClusterMesh 集成
Day 7.C — Cilium Service Mesh (WireGuard 透明加密)
C1. Cilium Service Mesh — 跟 Istio/Linkerd 设计差异
传统 Service Mesh (Istio/Linkerd):
每 Pod 一个 sidecar (Envoy/Linkerd-proxy)
优点: 完全应用透明, mTLS / retry / circuit-breaker
缺点:
- Pod 数 ×2, 资源吃光
- sidecar 启动顺序 race
- 升级困难
Cilium Service Mesh:
节点级 cilium-envoy DaemonSet (1 节点 1 个) + cilium-agent (eBPF)
优点:
- 0 sidecar, 资源效率高
- eBPF 直接拦截 socket layer, 性能近原生
- 跟 CNI 一体, 无 race
缺点:
- 部分高级功能 (复杂的 retry / outlier detection) 仍依赖 Envoy
C2. Cilium 加密方案选择 — WireGuard vs IPsec
| WireGuard | IPsec | |
|---|---|---|
| 协议层 | UDP 上的封装 (Layer 3 over UDP) | 内核 IPsec (ESP/AH) |
| 密钥协商 | Noise Protocol (静态) | IKEv2 (动态) |
| 性能 | 极佳 (内核态,流式) | 中 (有 IKE 开销) |
| 配置复杂度 | 低 (1 行 helm 开关) | 中 (要管 PSK/cert) |
| FIPS 合规 | 否 | 是 |
| Cilium 推荐 | ✅ 一般场景 | 仅合规需要 |
我们选 WireGuard — 学习场景 + 性能最优
C3. 启用 WireGuard (helm upgrade)
What (helm upgrade Cilium):
helm repo add cilium https://helm.cilium.io
helm upgrade cilium cilium/cilium --reuse-values \
--namespace kube-system \
--version 1.16.5 \
--set encryption.enabled=true \
--set encryption.type=wireguard
Why --reuse-values:
- Day 1 装 Cilium 时设了一堆 (镜像 mirror / VXLAN tunnel / Hubble),不想重写
--reuse-values保留旧设置,只追加新 encryption 配置- 替代:
helm get values拉出 → 改 →helm upgrade -f new-values.yaml
C4. ⚠️ 真坑 #1 — helm upgrade 不自动 rolling
helm upgrade 完成后,cilium-agent Pod 仍是 age 14h(没重启), cilium status 显示 Encryption: Disabled
Why:
- helm 更新 ConfigMap (
enable-wireguard: "true") - 但 DaemonSet template 没变(env 都没用 wireguard)
- 所以 DaemonSet
generation不变, kubelet 不触发 rolling restart
Fix:
kubectl rollout restart ds cilium -n kube-system
C5. 5 节点滚动重启 + 验证
kubectl rollout restart ds cilium -n kube-system
sleep 60
kubectl get pods -n kube-system -l k8s-app=cilium
Actual (滚动完成):
cilium-5qvm7 1/1 Running 0 2m k8s-cp-1
cilium-6jcf2 1/1 Running 0 2m k8s-cp-3
cilium-cszg7 1/1 Running 0 2m k8s-w-2
cilium-lrx67 1/1 Running 0 2m k8s-cp-2
cilium-pg6jp 1/1 Running 0 2m k8s-w-1
✅ 5 个 cilium-agent 都新 (~2min)
C6. WireGuard 接口 + 加密状态验证
What:
# 1. 5 节点 cilium_wg0 接口
for h in m1 m2 m3 m4 m5; do
ssh $h 'ip -br addr show cilium_wg0'
done
# 2. 5 节点 51871 UDP 监听
for h in m1 m2 m3 m4 m5; do
ssh $h 'ss -ulnp | grep 51871'
done
# 3. Cilium encrypt status
kubectl exec -n kube-system ds/cilium -- cilium encrypt status
Actual (3 个验证项全过):
1. 5 节点 cilium_wg0:
m1: cilium_wg0 UNKNOWN
m2: cilium_wg0 UNKNOWN
m3: cilium_wg0 UNKNOWN
m4: cilium_wg0 UNKNOWN
m5: cilium_wg0 UNKNOWN
2. 5 节点 51871 监听:
m1: UNCONN 0 0 0.0.0.0:51871 0.0.0.0:*
m2: UNCONN 0 0 0.0.0.0:51871 0.0.0.0:*
m3: UNCONN 0 0 0.0.0.0:51871 0.0.0.0:*
m4: UNCONN 0 0 0.0.0.0:51871 0.0.0.0:*
m5: UNCONN 0 0 0.0.0.0:51871 0.0.0.0:*
3. cilium encrypt status:
Encryption: Wireguard
Interface: cilium_wg0
Public key: pXDFi7SuGrm1ezq+IlAJWiB4WumMJbhl3oOGjXZ0mi8=
Number of peers: 4 ← 完全 mesh (节点本身不算)
C7. WireGuard 数据面工作机制
Pod A (节点 cp-1, 10.244.3.x)
↓ HTTP request to Pod B
cilium-agent on cp-1: 检测到 dst Pod 在 cp-2
↓ 包 redirect to cilium_wg0
WireGuard encrypts payload (节点的 wg private key)
↓ 走 UDP 51871 → 10.0.24.29:51871 (cp-2)
WireGuard decrypts on cp-2
↓ Pod B 接收, 完全透明
Pod B (节点 cp-2, 10.244.4.x) — 不知道流量是加密过的
关键:
- 完全应用透明 — Pod 不需要任何改动
- 加密仅在节点间 — 同节点 Pod 通信不走 wg (NodeEncryption: Disabled 是默认行为)
- 多次跳传保留加密 — 跨多个网络段也安全
- 比 mTLS 优势: Pod 不需要管证书, 无 cert 过期问题
C8. NodeEncryption — 进一步加密节点本身
我们当前 NodeEncryption: Disabled,意思是 节点的本机进程(host network) 之间通信不加密,只 Pod 之间加。
启 node encryption:
helm upgrade cilium cilium/cilium --reuse-values \
--set encryption.nodeEncryption=true
代价: kube-apiserver / etcd / 各类 control plane 流量也走 WG,性能可能下降
学习场景: 不开
C9. Lesson (面试可讲)
- 跟 Istio mTLS 区别:
- Istio mTLS: sidecar (Envoy) 终止 TLS, 应用透明但 sidecar 资源开销大
- Cilium WireGuard: 内核 wg, 真正 0 资源 overhead
- 但 mTLS 还能做 L7 identity (basd on SPIFFE), wg 只做 L3/L4 加密
- wg 性能: 比 mTLS 快 30%+ (没有 user-space crypto cost)
- 生产实施步骤:
- Day 1 装 Cilium 时就开,避免后续切换
- helm value:
encryption.enabled=true, encryption.type=wireguard - 监控
cilium_endpoint_encryption_statusPrometheus metric
- 跟 Service Mesh 配套:
- 底层加密 (wg) + 上层 L7 policy (CiliumNetworkPolicy) = 完整防御
- 不冲突, 但 L7 policy 触发 Envoy 时, Envoy 看的是已解密的明文
C10. (可选) Cilium Ingress controller — 本 Day 不展开
启用方法:
helm upgrade cilium cilium/cilium --reuse-values \
--set ingressController.enabled=true \
--set ingressController.loadbalancerMode=shared
会创建:
- IngressClass: cilium
- LoadBalancer 模式服务 (need cloud LB 或 MetalLB)
- 替代传统 NodePort 暴露
学习场景不必,我们用 NodePort 已经覆盖了暴露需求
11. Day 7 总结
| 模块 | 状态 | 关键证据 |
|---|---|---|
| A Harbor | ✅ | helm 装 8 component / push nginx / K8s pull 734ms / containerd certs.d 配 HTTP |
| B ArgoCD | ✅ | 7 Pod / Application Synced 30s / auto-sync from GitHub |
| C Cilium SM | ✅ | WireGuard 5 节点全 wg0 / 51871 UDP 监听 / 4 peers full-mesh |
踩坑 4 个(都进 mini-book):
- Harbor TLS commonName 必填(
expose.tls.enabled=false绕过) - Harbor NodePort port=30002 ≠ 80(集群内访问要带 :30002)
- containerd 默认 HTTPS,需
/etc/containerd/certs.d/host:port/hosts.toml,5 节点同步 - helm upgrade Cilium 不自动 restart DaemonSet — 需
kubectl rollout restart
累计端口:
- Grafana:
<m-ip>:32380admin/bootcamp - Longhorn UI:
<m-ip>:31172 - Hubble UI:
<m-ip>:30527 - Harbor:
<m-ip>:30002admin/bootcamp - ArgoCD:
<m-ip>:30080admin/EfqU4EiWmQ9frc6B
简历可写(汇总):
自建生产级 K8s 平台:
- Harbor 私有镜像 (8 组件 + Longhorn PVC + Trivy 漏洞扫描)
- ArgoCD GitOps (Application + ApplicationSet + App-of-Apps 模式)
- Cilium Service Mesh (WireGuard 透明加密, 节点间 Pod 流量完全加密)
- 集成 Day 6 监控栈 (Prometheus + Grafana + Loki)
99. 当前进度
- [x] Day 7.A Harbor 装 + push/pull + containerd 配 HTTP (3 个真坑)
- [x] Day 7.B ArgoCD + Application + GitOps sync 30s 自动完成
- [x] Day 7.C Cilium WireGuard 透明加密 5 节点全 mesh