处理安装 metrics-server 时出现的 ImagePullBackOff 错误

在 minikube (kubernetes) 的早期版本中,使用 heapster 来对集群进行监控,从 1.8 版本之后就逐渐升级为使用 metrics-server 来完成资源监控的功能。 安装 metrics-server 很简单,但在经常会因为网络问题导致出现 ImagePullBackOff 的错误。

出错现象

在 minikube 中,为激活 metrics-server , 只需要简单的执行:

1
minikube addons enable metrics-server

系统会反馈:

1
* The 'metrics-server' addon is enabled

再执行 list 命令:

1
minikube addons list

也可以看得到 metrics-server 已经被激活了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|-----------------------------|----------|--------------|
| ADDON NAME | PROFILE | STATUS |
|-----------------------------|----------|--------------|
| dashboard | minikube | enabled ✅ |
| default-storageclass | minikube | enabled ✅ |
| efk | minikube | disabled |
| freshpod | minikube | disabled |
| gvisor | minikube | disabled |
| helm-tiller | minikube | disabled |
| ingress | minikube | disabled |
| ingress-dns | minikube | disabled |
| istio | minikube | disabled |
| istio-provisioner | minikube | disabled |
| logviewer | minikube | disabled |
| metrics-server | minikube | enabled ✅ |
| nvidia-driver-installer | minikube | disabled |
| nvidia-gpu-device-plugin | minikube | disabled |
| registry | minikube | disabled |
| registry-creds | minikube | disabled |
| storage-provisioner | minikube | enabled ✅ |
| storage-provisioner-gluster | minikube | disabled |
|-----------------------------|----------|--------------|

但如果这时我们执行 top 命令:

1
kubectl top pod

得到的却是如下的错误信息:

1
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

说明 metrics-server 其实并没有安装好。 为什么呢?

通过执行 get pod 命令我们来找原因:

1
kubectl get pod -n kube-system

可以得到如下结果:

1
2
3
4
5
6
7
8
9
10
NAME                                   READY   STATUS             RESTARTS   AGE
pod/coredns-6955765f44-5wsjv 1/1 Running 0 21m
pod/coredns-6955765f44-lzkq2 1/1 Running 0 21m
pod/etcd-minikube 1/1 Running 0 21m
pod/kube-apiserver-minikube 1/1 Running 0 21m
pod/kube-controller-manager-minikube 1/1 Running 0 21m
pod/kube-proxy-dfd7m 1/1 Running 0 21m
pod/kube-scheduler-minikube 1/1 Running 0 21m
pod/metrics-server-6754dbc9df-lhp9p 0/1 ImagePullBackOff 0 19m
pod/storage-provisioner 1/1 Running 2 3d5h

可以看到,metrics-server 对应的 POD 没有启动成功, 现在处于: ImagePullBackOff 状态。

进一步执行 describe 命令,查看事件:

1
describe pod metrics-server-6754dbc9df-lhp9p -n kube-system

可以得到:

1
2
3
4
5
6
7
8
9
10
11
12
  Type     Reason     Age                From               Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/metrics-server-6754dbc9df-
f5pl8 to minikube
Normal BackOff 38s kubelet, minikube Back-off pulling image "k8s.gcr.io/metrics-server-amd64:v0.2
.1"
Warning Failed 38s kubelet, minikube Error: ImagePullBackOff
Normal Pulling 24s (x2 over 54s) kubelet, minikube Pulling image "k8s.gcr.io/metrics-server-amd64:v0.2.1"
Warning Failed 9s (x2 over 38s) kubelet, minikube Failed to pull image "k8s.gcr.io/metrics-server-amd64:v0.2.1
": rpc error: code = Unknown desc = Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled w
hile waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Failed 9s (x2 over 38s) kubelet, minikube Error: ErrImagePull

这下比较清楚了,是因为不能访问 k8s.gcr.io 导致拉取不到镜像 metrics-server-amd64:v0.2.1 ,从而造成了不能正确启动所需要的 POD。

手动拉取镜像

要解决这个问题,我们首先需要拉取到镜像。因为不能访问 k8s.gcr.io, 那我们就从国内下载,一次执行以下命令:

  1. 登录到 minikube 虚拟机中
1
minikube ssh
  1. 从阿里的的仓库拉取镜像
1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.2.1
  1. 为镜像加上标签
1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64:v0.2.1 k8s.gcr.io/metrics-server-amd64:v0.2.1

这样在 minikube 中就有了 metrics-server 正确的镜像。

修改 metrics-server 的部署文件

在本地有了镜像后,你会发现 minikube 仍然要从 k8s.gcr.io 去取镜像,这个时候就需要修改 metrics-server 的部署文件了。执行:

1
kubectl -n kube-system edit deployment metrics-server

执行以后,会在你系统默认的编辑器里打开 metrics-server 文件,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"metrics-server","kubernetes.io/minikube-addons":"metrics-server"},"name":"metrics-server","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"metrics-server"}},"template":{"metadata":{"labels":{"k8s-app":"metrics-server"},"name":"metrics-server"},"spec":{"containers":[{"command":["/metrics-server","--source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true\u0026kubeletPort=10250\u0026insecure=true"],"image":"k8s.gcr.io/metrics-server-amd64:v0.2.1","imagePullPolicy":"Always","name":"metrics-server"}]}}}}
creationTimestamp: "2020-03-24T04:05:19Z"
generation: 1
labels:
addonmanager.kubernetes.io/mode: Reconcile
k8s-app: metrics-server
kubernetes.io/minikube-addons: metrics-server
name: metrics-server
namespace: kube-system
resourceVersion: "196126"
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/metrics-server
uid: db2ef8a7-df5a-4787-9910-08eb87b85bb6
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: metrics-server
name: metrics-server
spec:
containers:
- command:
- /metrics-server
- --source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true
image: k8s.gcr.io/metrics-server-amd64:v0.2.1
imagePullPolicy: Always
name: metrics-server
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
conditions:
- lastTransitionTime: "2020-03-24T04:05:19Z"
lastUpdateTime: "2020-03-24T04:05:19Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2020-03-24T04:15:20Z"
lastUpdateTime: "2020-03-24T04:15:20Z"
message: ReplicaSet "metrics-server-6754dbc9df" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1

在文件的 47 行, 可以看到当前的拉取模式是: Always, 也就是无论本地是否已经有了镜像,都会从 k8s.gcr.io 取。 将该策略改为: IfNotPresent, 让系统优先使用本地的镜像。 保存文件以后,系统会自动更新 POD, 不需要重新 enable 这个 addon。

完成以后步骤后,就可以执行 top 命令了:

1
kubectl top pod

在我的环境中,显示:

1
2
NAME                           CPU(cores)   MEMORY(bytes)
redis-ha-1584698965-server-0 2m 4Mi

当然,激活 metrics-server 部署为了执行这个简单的 top 命令,而是为了通过 metrics api 获取更多的监控信息,并可以配置根据条件自动化的伸缩应用。

本文标题:处理安装 metrics-server 时出现的 ImagePullBackOff 错误

文章作者:Morning Star

发布时间:2020年03月24日 - 16:03

最后更新:2022年01月07日 - 15:01

原始链接:https://www.mls-tech.info/microservice/k8s/minikube-use-metrics-server/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。