在 minikube (kubernetes) 的早期版本中,使用 heapster 来对集群进行监控,从 1.8 版本之后就逐渐升级为使用 metrics-server 来完成资源监控的功能。 安装 metrics-server 很简单,但在经常会因为网络问题导致出现 ImagePullBackOff 的错误。
出错现象
在 minikube 中,为激活 metrics-server , 只需要简单的执行:
1
| minikube addons enable metrics-server
|
系统会反馈:
1
| * The 'metrics-server' addon is enabled
|
再执行 list 命令:
也可以看得到 metrics-server 已经被激活了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| |-----------------------------|----------|--------------| | ADDON NAME | PROFILE | STATUS | |-----------------------------|----------|--------------| | dashboard | minikube | enabled ✅ | | default-storageclass | minikube | enabled ✅ | | efk | minikube | disabled | | freshpod | minikube | disabled | | gvisor | minikube | disabled | | helm-tiller | minikube | disabled | | ingress | minikube | disabled | | ingress-dns | minikube | disabled | | istio | minikube | disabled | | istio-provisioner | minikube | disabled | | logviewer | minikube | disabled | | metrics-server | minikube | enabled ✅ | | nvidia-driver-installer | minikube | disabled | | nvidia-gpu-device-plugin | minikube | disabled | | registry | minikube | disabled | | registry-creds | minikube | disabled | | storage-provisioner | minikube | enabled ✅ | | storage-provisioner-gluster | minikube | disabled | |-----------------------------|----------|--------------|
|
但如果这时我们执行 top 命令:
得到的却是如下的错误信息:
1
| Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
|
说明 metrics-server 其实并没有安装好。 为什么呢?
通过执行 get pod 命令我们来找原因:
1
| kubectl get pod -n kube-system
|
可以得到如下结果:
1 2 3 4 5 6 7 8 9 10
| NAME READY STATUS RESTARTS AGE pod/coredns-6955765f44-5wsjv 1/1 Running 0 21m pod/coredns-6955765f44-lzkq2 1/1 Running 0 21m pod/etcd-minikube 1/1 Running 0 21m pod/kube-apiserver-minikube 1/1 Running 0 21m pod/kube-controller-manager-minikube 1/1 Running 0 21m pod/kube-proxy-dfd7m 1/1 Running 0 21m pod/kube-scheduler-minikube 1/1 Running 0 21m pod/metrics-server-6754dbc9df-lhp9p 0/1 ImagePullBackOff 0 19m pod/storage-provisioner 1/1 Running 2 3d5h
|
可以看到,metrics-server 对应的 POD 没有启动成功, 现在处于: ImagePullBackOff 状态。
进一步执行 describe 命令,查看事件:
1
| describe pod metrics-server-6754dbc9df-lhp9p -n kube-system
|
可以得到:
1 2 3 4 5 6 7 8 9 10 11 12
| Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/metrics-server-6754dbc9df- f5pl8 to minikube Normal BackOff 38s kubelet, minikube Back-off pulling image "k8s.gcr.io/metrics-server-amd64:v0.2 .1" Warning Failed 38s kubelet, minikube Error: ImagePullBackOff Normal Pulling 24s (x2 over 54s) kubelet, minikube Pulling image "k8s.gcr.io/metrics-server-amd64:v0.2.1" Warning Failed 9s (x2 over 38s) kubelet, minikube Failed to pull image "k8s.gcr.io/metrics-server-amd64:v0.2.1 ": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled w hile waiting for connection (Client.Timeout exceeded while awaiting headers) Warning Failed 9s (x2 over 38s) kubelet, minikube Error: ErrImagePull
|
这下比较清楚了,是因为不能访问 k8s.gcr.io 导致拉取不到镜像 metrics-server-amd64:v0.2.1 ,从而造成了不能正确启动所需要的 POD。
手动拉取镜像
要解决这个问题,我们首先需要拉取到镜像。因为不能访问 k8s.gcr.io, 那我们就从国内下载,一次执行以下命令:
- 登录到 minikube 虚拟机中
- 从阿里的的仓库拉取镜像
1
| docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.2.1
|
- 为镜像加上标签
1
| docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.2.1 k8s.gcr.io/metrics-server-amd64:v0.2.1
|
这样在 minikube 中就有了 metrics-server 正确的镜像。
修改 metrics-server 的部署文件
在本地有了镜像后,你会发现 minikube 仍然要从 k8s.gcr.io 去取镜像,这个时候就需要修改 metrics-server 的部署文件了。执行:
1
| kubectl -n kube-system edit deployment metrics-server
|
执行以后,会在你系统默认的编辑器里打开 metrics-server 文件,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
|
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"metrics-server","kubernetes.io/minikube-addons":"metrics-server"},"name":"metrics-server","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"metrics-server"}},"template":{"metadata":{"labels":{"k8s-app":"metrics-server"},"name":"metrics-server"},"spec":{"containers":[{"command":["/metrics-server","--source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true\u0026kubeletPort=10250\u0026insecure=true"],"image":"k8s.gcr.io/metrics-server-amd64:v0.2.1","imagePullPolicy":"Always","name":"metrics-server"}]}}}} creationTimestamp: "2020-03-24T04:05:19Z" generation: 1 labels: addonmanager.kubernetes.io/mode: Reconcile k8s-app: metrics-server kubernetes.io/minikube-addons: metrics-server name: metrics-server namespace: kube-system resourceVersion: "196126" selfLink: /apis/apps/v1/namespaces/kube-system/deployments/metrics-server uid: db2ef8a7-df5a-4787-9910-08eb87b85bb6 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: metrics-server strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: k8s-app: metrics-server name: metrics-server spec: containers: - command: - /metrics-server - --source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true image: k8s.gcr.io/metrics-server-amd64:v0.2.1 imagePullPolicy: Always name: metrics-server resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: conditions: - lastTransitionTime: "2020-03-24T04:05:19Z" lastUpdateTime: "2020-03-24T04:05:19Z" message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: "2020-03-24T04:15:20Z" lastUpdateTime: "2020-03-24T04:15:20Z" message: ReplicaSet "metrics-server-6754dbc9df" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 1 replicas: 1 unavailableReplicas: 1 updatedReplicas: 1
|
在文件的 47 行, 可以看到当前的拉取模式是: Always, 也就是无论本地是否已经有了镜像,都会从 k8s.gcr.io 取。 将该策略改为: IfNotPresent, 让系统优先使用本地的镜像。 保存文件以后,系统会自动更新 POD, 不需要重新 enable 这个 addon。
完成以后步骤后,就可以执行 top 命令了:
在我的环境中,显示:
1 2
| NAME CPU(cores) MEMORY(bytes) redis-ha-1584698965-server-0 2m 4Mi
|
当然,激活 metrics-server 部署为了执行这个简单的 top 命令,而是为了通过 metrics api 获取更多的监控信息,并可以配置根据条件自动化的伸缩应用。