这个东西整了我三天,贼累,多主节点再说吧,以后有空学一学
在Kubernetes集群中,节点的IP地址是其身份识别的关键部分。当IP地址改变时,会导致以下核心问题:
证书失效 Kubeadm使用节点IP地址来生成证书,所有依赖证书的通信,如apiserver,etcd都会因此失败
kubelet 配置失效 工作节点上的kubelet使用配置文件连接到apiserver,改变之后工作节点无法找到apiserver
控制平面组件配置失效 主节点上会有一些包含绑定到旧IP的配置,如apiserver的–advertise-address, etcd的–listen-peer-urls、–listen-client-urls、–initial-advertise-peer-urls、–advertise-client-urls
节点对象状态不一致 工作节点状态信息(如 InternalIP)依然是旧IP,导致网络功能异常
仅主节点IP改变 模拟故障 1 2 3 4 5 6 7 8 9 10 nmcli connection modify ens18 ipv4.method manual \ ipv4.addresses 192.168.10.152/24 ipv4.gateway 192.168.10.1 \ ipv4.dns 8.8.8.8 nmcli connection up ens18 kubectl get nodes Unable to connect to the server: dial tcp 192.168.10.151:6443: connect: no route to host
重新编辑kubeadm配置文件 将其中的192.168.10.151改为192.168.10.152,然后我们使用kubeadm初始化新的证书与配置文件
kubeadm init phase
用于分阶段执行集群初始化流程。它的核心作用是允许用户按需运行初始化过程中的某个独立步骤(Phase),而不是一次性执行完整的初始化流程。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 cat kubeadm.yamlapiVersion: kubeadm.k8s.io/v1beta3 bootstrapTokens: - groups : - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 192.168.10.152 bindPort: 6443 nodeRegistration: criSocket: unix://var/run/containerd/containerd.sock imagePullPolicy: IfNotPresent name: test taints: null --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: {} etcd: local : dataDir: /var/lib/etcd imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: 1.30.0 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 scheduler: {} --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: ipvs --- apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration cgroupDriver: systemd
重新生成apiserver证书 ca证书别动,etcd证书也别动
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 cp -r /etc/kubernetes /etc/kubernetes.bakcd /etc/kubernetes/pkimv apiserver.key apiserver.key.bakmv apiserver.crt apiserver.crt.bakkubeadm init phase certs all --config kubeadm.yaml [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Using existing ca certificate authority [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local test ] and IPs [10.96.0.1 192.168.10.151] [certs] Using existing apiserver-kubelet-client certificate and key on disk [certs] Using existing front-proxy-ca certificate authority [certs] Using existing front-proxy-client certificate and key on disk [certs] Using existing etcd/ca certificate authority [certs] Using existing etcd/server certificate and key on disk [certs] Using existing etcd/peer certificate and key on disk [certs] Using existing etcd/healthcheck-client certificate and key on disk [certs] Using existing apiserver-etcd-client certificate and key on disk [certs] Using the existing "sa" key ls -l /etc/kubernetes/pki/-rw-r--r-- 1 root root 1277 Jun 18 01:54 apiserver.crt ... 可见api-server证书,与etcd证书等均已经重新更新
更新控制平面配置文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 cp -a /etc/kubernetes/manifests /etc/kubernetes/manifests.backupkubeadm init phase control-plane all --config kubeadm.yaml [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep 152 kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.10.152:6443 - --advertise-address=192.168.10.152 host: 192.168.10.152 host: 192.168.10.152 host: 192.168.10.152 可见控制平面配置文件已经更新
更新etcd配置 1 2 3 4 5 6 7 8 9 10 11 kubeadm init phase etcd local --config kubeadm.yaml cat /etc/kubernetes/manifests/etcd.yaml | grep -E 'advertise-client-urls|listen-client-urls' kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.10.152:2379 - --advertise-client-urls=https://192.168.10.152:2379 - --listen-client-urls=https://127.0.0.1:2379,https://192.168.10.152:2379 systemctl restart kubelet
更新kubelet配置文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 rm -f /etc/kubernetes/kubelet.conf这个文件包含kubelet连接apiserver的证书,密钥和证书路径 kubeadm init phase kubeconfig kubelet --config kubeadm.yaml [kubeconfig] Writing "kubelet.conf" kubeconfig file rm -f /var/lib/kubelet/config.yamlkubeadm init phase kubelet-start --config kubeadm.yaml [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet kubeadm init phase kubelet-finalize all --config kubeadm.yaml systemctl daemon-reload systemctl restart kubelet
更新kubeconfig 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 rm -f /etc/kubernetes/{admin.conf,kubelet.conf,controller-manager.conf,scheduler.conf}rm -f /etc/kubernetes/*.confkubeadm init phase kubeconfig all --config kubeadm.yaml [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "super-admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file cp -i /etc/kubernetes/admin.conf $HOME /.kube/configchown $(id -u):$(id -g) $HOME /.kube/configsystemctl restart kubelet kubectl get nodes NAME STATUS ROLES AGE VERSION slave Ready <none> 14h v1.30.0 test Ready control-plane 14h v1.30.0
更新configmap配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 kubectl get cm -n kube-system NAME DATA AGE coredns 1 35m extension-apiserver-authentication 6 35m kube-apiserver-legacy-service-account-token-tracking 1 35m kube-proxy 2 35m kube-root-ca.crt 1 35m kubeadm-config 1 35m kubelet-config 1 35m kubeadm init phase upload-config kubeadm --config kubeadm.yaml kubectl -n kube-system edit cm kube-proxy 将原IP修改为现在IP
从节点kubelet配置调整 1 2 3 4 5 6 vim /etc/kubernetes/kubelet.conf 或者 sed -i 's/192.168.10.151/192.168.10.152/g' /etc/kubernetes/kubelet.conf systemctl restart kubelet
重新安装网络插件(可选) 直接重启也可以
1 2 3 4 5 6 7 8 9 10 11 12 我使用的是calico kubectl delete pods --all --namespace=calico-apiserver --force --grace-period=0 kubectl delete pods --all --namespace=calico-system --force --grace-period=0 kubectl delete pods --all --namespace=tigera-operator --force --grace-period=0 kubectl apply --server-side tigera-operator.yaml kubectl apply -f custom-resources.yaml systemctl reboot
验证 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 kubectl get pods -A -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-apiserver calico-apiserver-7dbb565dd-7fbx2 1/1 Running 1 (6m40s ago) 7m12s 10.244.25.8 slave <none> <none> calico-apiserver calico-apiserver-7dbb565dd-tq8rt 1/1 Running 0 7m12s 10.244.27.199 test <none> <none> calico-system calico-kube-controllers-64c85b8c9f-vf274 1/1 Running 0 7m12s 10.244.25.7 slave <none> <none> calico-system calico-node-4lj9h 1/1 Running 0 7m12s 192.168.10.231 slave <none> <none> calico-system calico-node-lzxfq 1/1 Running 0 7m12s 192.168.10.152 test <none> <none> calico-system calico-typha-84b5fbbc4f-skkpj 1/1 Running 0 7m12s 192.168.10.231 slave <none> <none> calico-system csi-node-driver-92mhf 2/2 Running 0 7m12s 10.244.25.9 slave <none> <none> calico-system csi-node-driver-lptz7 2/2 Running 0 7m12s 10.244.27.200 test <none> <none> kube-system coredns-6d58d46f65-2rw4q 1/1 Running 1 (7m46s ago) 75m 10.244.27.197 test <none> <none> kube-system coredns-6d58d46f65-tqxvb 1/1 Running 1 (7m46s ago) 75m 10.244.27.198 test <none> <none> kube-system etcd-test 1/1 Running 1 (7m46s ago) 60m 192.168.10.152 test <none> <none> kube-system kube-apiserver-test 1/1 Running 1 (7m46s ago) 60m 192.168.10.152 test <none> <none> kube-system kube-controller-manager-test 1/1 Running 13 (7m46s ago) 75m 192.168.10.152 test <none> <none> kube-system kube-proxy-5pg9g 1/1 Running 1 (7m49s ago) 73m 192.168.10.231 slave <none> <none> kube-system kube-proxy-dkvk4 1/1 Running 1 (7m46s ago) 75m 192.168.10.152 test <none> <none> kube-system kube-scheduler-test 1/1 Running 13 (7m46s ago) 75m 192.168.10.152 test <none> <none> tigera-operator tigera-operator-767c6b76db-kjd65 1/1 Running 0 7m12s 192.168.10.231 slave <none> <none>
快速脚本 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 #!/bin/bash export oldip1=192.168.10.152export newip1=192.168.10.151find /etc/kubernetes -type f | xargs sed -i "s/$oldip1 /$newip1 /" find /root/.kube/config -type f | xargs sed -i "s/$oldip1 /$newip1 /" cd /root/.kube/cache/discoverymv ${oldip1} _6443 ${newip1} _6443cd /etc/kubernetes/pkimv -f apiserver.key apiserver.key.bakmv -f apiserver.crt apiserver.crt.bakcd ~kubeadm init phase certs all --config kubeadm.yaml systemctl restart kubelet kubectl -n kube-system edit cm kube-proxy 修改为新IP systemctl reboot
主节点与从节点地址均改变 在修改主节点操作的基础上进行操作
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 nmcli connection modify ens18 ipv4.method manual \ ipv4.addresses 192.168.10.232/24 ipv4.gateway 192.168.10.1 \ ipv4.dns 8.8.8.8 nmcli connection up ens18 kubectl apply -f test-deploy.yaml kubectl get pods NAME READY STATUS RESTARTS AGE nginx-test-8458968cc8-4z78w 0/1 Running 2 (17s ago) 5m58s nginx-test-8458968cc8-vjlps 0/1 Running 1 (37s ago) 5m58s nginx-test-8458968cc8-xgcr4 0/1 Running 2 (17s ago) 5m58s 有镜像但跑不起来,网络原因 kubectl get nodes -o name node/slave node/test kubectl delete node slave node "slave" deleted kubectl set env daemonset/calico-node -n calico-system IP_AUTODETECTION_METHOD=can-reach=192.168.10.151 kubeadm reset -f iptables -F && iptables -t nat -F rm -rf /etc/cni/net.dkubeadm join 192.168.10.151:6443 --token w9o87t.mmpxgomqej2h1ifz \ --discovery-token-ca-cert-hash \ sha256:52502b8be55539e174c2a3ebdafaecc4b94ee6e976cf0c72aaa13c26aff6023c kubectl get pods -A -w -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-apiserver calico-apiserver-7dbb565dd-chvhh 1/1 Running 0 7m6s 10.244.27.206 test <none> <none> calico-apiserver calico-apiserver-7dbb565dd-tq8rt 1/1 Running 1 (135m ago) 147m 10.244.27.201 test <none> <none> calico-system calico-kube-controllers-64c85b8c9f-tqttr 1/1 Running 0 7m6s 10.244.27.205 test <none> <none> calico-system calico-node-k68rj 0/1 Running 0 7s 192.168.10.151 test <none> <none> calico-system calico-node-qjk68 1/1 Running 0 6m20s 192.168.10.232 slave <none> <none> calico-system calico-typha-84b5fbbc4f-4xsl7 1/1 Running 0 7m6s 192.168.10.151 test <none> <none> calico-system csi-node-driver-cx5ds 0/2 ContainerCreating 0 6m20s <none> slave <none> <none> calico-system csi-node-driver-lptz7 2/2 Running 3 (135m ago) 147m 10.244.27.202 test <none> <none> 此时calico与nginx均可以正常运行,节点状态也处于Ready kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE calico-apiserver calico-apiserver-7dbb565dd-chvhh 1/1 Running 0 10m calico-apiserver calico-apiserver-7dbb565dd-tq8rt 1/1 Running 1 (138m ago) 151m calico-system calico-kube-controllers-64c85b8c9f-tqttr 1/1 Running 0 10m calico-system calico-node-k68rj 1/1 Running 0 3m22s calico-system calico-node-qjk68 1/1 Running 1 (2m9s ago) 9m35s calico-system calico-typha-84b5fbbc4f-4xsl7 1/1 Running 0 10m calico-system csi-node-driver-cx5ds 2/2 Running 0 9m35s calico-system csi-node-driver-lptz7 2/2 Running 3 (138m ago) 151m default nginx-test-8458968cc8-8qp65 1/1 Running 0 10m default nginx-test-8458968cc8-gmwsw 1/1 Running 0 10m default nginx-test-8458968cc8-m7j7f 1/1 Running 0 10m kube-system coredns-6d58d46f65-2rw4q 1/1 Running 2 (138m ago) 3h39m kube-system coredns-6d58d46f65-tqxvb 1/1 Running 2 (138m ago) 3h39m kube-system etcd-test 1/1 Running 1 (138m ago) 141m kube-system kube-apiserver-test 1/1 Running 1 (138m ago) 141m kube-system kube-controller-manager-test 1/1 Running 15 (138m ago) 3h39m kube-system kube-proxy-6dgxp 1/1 Running 1 (2m9s ago) 9m35s kube-system kube-proxy-dkvk4 1/1 Running 2 (138m ago) 3h39m kube-system kube-scheduler-test 1/1 Running 15 (138m ago) 3h39m tigera-operator tigera-operator-767c6b76db-q4j2q 1/1 Running 0 10m kubectl get nodes NAME STATUS ROLES AGE VERSION slave Ready <none> 8m21s v1.30.0 test Ready control-plane 3h38m v1.30.0