在k8s集群的使用过程中,初学者可能会碰到这样的(怪异)问题: 在一个k8s集群里,部署服务(用的私有镜像仓库,如harbor)的时候,只有个别node的服务是部署成功的,其他都是部署失败的,错误的原因就是镜像拉取失败,如下:

kubectl get pods -A -owide |grep jenkins-demo
devlopment jenkins-demo-67d4f9d666-2fh8k 1/1 Running 0 27m 10.244.2.40 local-k8s-nd02
devlopment jenkins-demo-dbc9f5b6b-h78tx 0/1 ImagePullBackOff 0 6m4s 10.244.6.93 local-k8s-nd03
production jenkins-demo-dbc9f5b6b-tnkfs 1/1 Running 0 5m47s 10.244.2.44 local-k8s-nd02
qatest jenkins-demo-67d4f9d666-hb22t 1/1 Running 0 27m 10.244.2.41 local-k8s-nd02
qatest jenkins-demo-dbc9f5b6b-d6txr 0/1 ImagePullBackOff 0 6m 10.244.6.94 local-k8s-nd03

查看失败详情

# kubectl describe pods -n qatest jenkins-demo-6cbfb64844-79n8l
..........
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 78s default-scheduler Successfully assigned qatest/jenkins-demo-6cbfb64844-79n8l to local-k8s-nd03
Normal Pulling 37s (x3 over 78s) kubelet Pulling image "dev-hub.jiatuiyun.net/zeng/my-demo:429d9c1"
Warning Failed 36s (x3 over 77s) kubelet Failed to pull image "dev-hub.jiatuiyun.net/zeng/my-demo:429d9c1": rpc error: code = Unknown desc = Error response from daemon: pull access denied for dev-hub.jiatuiyun.net/zeng/my-demo, repository does not exist or may require docker login: denied: requested access to the resource is denied
Warning Failed 36s (x3 over 77s) kubelet Error: ErrImagePull
Normal BackOff 6s (x5 over 77s) kubelet Back-off pulling image "dev-hub.jiatuiyun.net/zeng/my-demo:429d9c1"
Warning Failed 6s (x5 over 77s) kubelet Error: ImagePullBackOff

然后我们去镜像拉取失败的机器上,直接用命令拉取,竟然是ok的

# docker pull dev-hub.jiatuiyun.net/zeng/my-demo:eb7ec1d
eb7ec1d: Pulling from zeng/my-demo
4fe2ade4980c: Already exists
2e793f0ebe8a: Already exists
77995fba1918: Already exists
4495499e856d: Already exists
0ff8f8e34aa6: Already exists
6c24ea7b9085: Pull complete
c07b8e5ec47b: Pull complete
Digest: sha256:95077089b59358820c4c763ae8bc390e470c62ac3d212abfe38292ff6389c7bb
Status: Downloaded newer image for dev-hub.jiatuiyun.net/zeng/my-demo:eb7ec1d
dev-hub.jiatuiyun.net/zeng/my-demo:eb7ec1d

同一个集群,同一个镜像仓库的同一个镜像,只是node不同而已,个别node拉取镜像ok,其余node拉取镜像失败,为何? 其实这个问题的原因就处在,服务的部署文件里边没有配置拉取镜像用的secret,在服务配置文件中的名字是 imagePullSecrets 如下:

.....
spec:
imagePullSecrets:
- name: registry-pull-secret
containers:
- image: dev-hub.xxxxx.net/zeng/my-demo:
imagePullPolicy: IfNotPresent
name: jenkins-demo
.....

注意:如果服务分布在多个不同的namespace下,那在这些namespace下都要创建secret 至于secret如何创建就不在此赘述了,网上很多可以参考的资料。