新钛云服-云和安全管理服务专家-助力企业数字化转型-Kubernetes的Startup, Liveness, Readiness深入探索

Kubernetes的Startup, Liveness, Readiness深入探索
新钛云服 2021/06/04

云管理服务专家新钛云服祝祥原创

Kubernetes在增加了云部署的可扩展性、可移植性和可观察性的同时，也增加了故障风险。虽然它带来了一个具有强大功能和选择的生态系统，以及简化了复杂的应用部署，但它也面临着很多的挑战。Kubernetes给我们带来的一个重要特性就是高可用性。

Kubernetes中有许多高可用性选项。在本文中，我们将讨论用于应用程序/微服务本身的高可用性选项。Pods是Kubernetes中最小的可部署单元，一旦应用了声明式配置，Pods就会被调度。Kube-scheduler负责计算和调度，一旦调度被接受，它就处于一个受控和计算的环境中，根据pod条件，它被视为服务就绪或不就绪。通过使用startup、readiness和liveness探针，我们可以控制pod何时应该被视为已启动、准备就绪或处于活动状态。我们将探讨这些条件和触发因素。

Pod与Container状态

Pod具有阶段性和条件性；容器有状态。这些状态属性可以并且将根据探测结果进行更改，因此让我们对其进行研究。

Pod阶段

Pod状态对象包括一个阶段字段。这个阶段字段告诉Kubernetes和我们pod的执行周期在那个阶段。
- Pengding：群集已接受，但尚未配置容器。
- Running：至少一个容器处于运行，启动或重新启动状态。
- Succeeded：所有容器退出，状态码为零；pod不会重新启动。
- Failed：所有容器都已终止，并且至少一个容器的状态代码为非零。
- Unknown：无法确定容器的状态。
Pod条件

除Pod阶段外，还有Pod条件。这些还提供有关Pod所在状态的信息。
- PodScheduled：已成功选择一个节点来调度Pod，并且调度已完成。
- ContainersReady：所有容器均已准备就绪。
- Initialized：启动了初始化容器。
- Ready： pod能够为请求提供服务；因此它需要包含在服务和负载均衡中。
我们可以通过kubectl describe pods <POD_NAME>命令查看pod条件。

示例输出如下：

…
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
…

Container States

容器具有三个简单状态。
- Waiting：正在运行启动进程。
- Running：容器正在正常运行。
- Terminated：容器开始执行，并以成功或失败结束。
探索Pod对象的状态

通过Kubernetes get pods -o yaml命令，我们可以从pod对象中看到pod条件和容器状态。

…
status:
conditions:
– lastProbeTime: null
lastTransitionTime: “2021-02-08T11:11:53Z”
status: “True”
type: Initialized
– lastProbeTime: null
lastTransitionTime: “2021-02-08T11:14:20Z”
status: “True”
type: Ready
– lastProbeTime: null
lastTransitionTime: “2021-02-08T11:14:20Z”
status: “True”
type: ContainersReady
– lastProbeTime: null
lastTransitionTime: “2021-02-08T11:11:52Z”
status: “True”
type: PodScheduled
containerStatuses:
– containerID: containerd://7fc67a850ba439f64ecb51a129a2d7dcbc4a3402b253daa3a6827787f7c80e40
image: docker.io/library/nginx:latest
imageID: docker.io/library/nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
lastState:
terminated:
containerID: containerd://c4416e69b7348a7e7be3f7046dc9745dfb38ba537e5b8c06da5020c67b12b3d8
exitCode: 137
finishedAt: “2021-02-08T11:14:52Z”
reason: Error
startedAt: “2021-02-08T11:14:05Z”
name: nginx
ready: true
restartCount: 1
started: true
state:
running:
startedAt: “2021-02-08T11:16:28Z”
hostIP: x.x.x.x
phase: Running
podIP: 10.1.239.205
podIPs:
– ip: 10.1.239.205
qosClass: BestEffort
startTime: “2021-02-08T11:11:53Z”

如果您更喜欢JSON的输出，则可以使用 kubectl get pods <POD_NAME> -o jsonpath='{.status}’ | jq

{
“conditions”: [
{
“lastProbeTime”: null,
“lastTransitionTime”: “2021-02-08T11:11:53Z”,
“status”: “True”,
“type”: “Initialized”
},
{
“lastProbeTime”: null,
“lastTransitionTime”: “2021-02-08T11:14:20Z”,
“status”: “True”,
“type”: “Ready”
},
{
“lastProbeTime”: null,
“lastTransitionTime”: “2021-02-08T11:14:20Z”,
“status”: “True”,
“type”: “ContainersReady”
},
{
“lastProbeTime”: null,
“lastTransitionTime”: “2021-02-08T11:11:52Z”,
“status”: “True”,
“type”: “PodScheduled”
}
],
“containerStatuses”: [
{
“containerID”: “containerd://7fc67a850ba439f64ecb51a129a2d7dcbc4a3402b253daa3a6827787f7c80e40”,
“image”: “docker.io/library/nginx:latest”,
“imageID”: “docker.io/library/nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa”,
“lastState”: {
“terminated”: {
“containerID”: “containerd://c4416e69b7348a7e7be3f7046dc9745dfb38ba537e5b8c06da5020c67b12b3d8”,
“exitCode”: 137,
“finishedAt”: “2021-02-08T11:14:52Z”,
“reason”: “Error”,
“startedAt”: “2021-02-08T11:14:05Z”
}
},
“name”: “nginx”,
“ready”: true,
“restartCount”: 1,
“started”: true,
“state”: {
“running”: {
“startedAt”: “2021-02-08T11:16:28Z”
}
}
}
],
“hostIP”: “x.x.x.x”,
“phase”: “Running”,
“podIP”: “10.1.239.205”,
“podIPs”: [
{
“ip”: “10.1.239.205”
}
],
“qosClass”: “BestEffort”,
“startTime”: “2021-02-08T11:11:53Z”
}

Kubernetes中的探针

Kubernetes提供了探针——health checks——来监控和操作pods的状态或状况，以确保只有健康的pods才能提供正常服务。Kubelet是运行health checks的主要组件，同时更新API服务器。

探针处理器

共有三种可用的处理程序，几乎可以涵盖所有情况。

Exec Action

ExecAction在容器内执行命令；这也是一个网关功能，可以处理任何事情，因为我们可以运行任意可执行文件；这可能是一个curl请求以确定状态的脚本，也可能是调用外部链接的可执行文件。同时，需要确保可执行文件不会创建僵尸进程。

TCP Socket Action

TCPSocketAction连接到已定义的端口以检查该端口是否打开，主要用于不使用HTTP的端点。HTTPGetAction将HTTP Get请求作为探针发送到定义的路径，HTTP响应代码确定探针是否成功。

通用探针参数

每种探针都有共同的可配置字段：
- initialDelaySeconds：容器启动之后和探测开始之前的秒数。（默认值：0）
- periodSeconds：Pod的频率。（默认值：10）
- timeoutSeconds：预期响应的超时。（默认值：1）
- successThreshold：从失败状态过渡到健康状态所获得的成功结果数。（默认值：1）
- failureThreshold：从正常状态转换为故障状态时收到了多少个失败结果。（默认值：3）
如您所见，我们可以详细配置探针。为了成功进行探针配置，我们需要分析应用程序/微服务的需求和依赖性。

Startup Probes

如果您的进程需要时间来准备、读取文件、解析大型配置、准备一些数据等等，那么应该使用Startup Probes。如果探测失败，超过阈值，它将重新启动，以便重新开始操作。您需要相应地调整initialDelaySeconds和periodSeconds，以确保进程有足够的时间完成。否则，你会发现pod的会不停的循环启动。

Readiness Probes

如果你想控制发送到pod的流量，你应该使用Readiness Probes。Readiness Probes可以修改Pod条件：确认Pod是否应包含在服务和负载均衡中。当探测成功足够的次数（阈值）时，这意味着pod可以接收流量，那么它应该包含在服务和负载平均衡中，从而正常提供业务访问。如果您的流程能够使自己脱离服务进行维护、读取大量用于服务的数据等，那么您应该再次使用Readiness Probes。这样pod就可以通过Readiness Probes向Kubelet发出信号，表示它希望暂时退出服务。

Liveness Probes

如果发生意外错误时容器本身无法崩溃，则使用liveness probes。使用liveness probes可以解决流程可能存在的一些缺陷。一旦liveness probes失败，Kubelet就会重启pod。如果您的进程可以通过退出来处理这些错误，则不需要使用liveness probes；但是，在修复未知错误之前，使用它们是有利的。

示例：Kubernetes API

Kubernetes API还包括运行endpoints的health check：healthz（不建议使用），readyz，livez。让我们看一下readyz旨在与现成探针一起使用的endpoint。kubectl get –raw=’/readyz?verbose’
将各个服务的运行状况合并以显示运行状况。

[+]ping ok
[+]log ok
[+]etcd ok
[+]informer-sync ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]shutdown ok
healthz check passed

让我们看一下livez endpoint。kubectl get –raw=’/livez?verbose’
将各个服务的运行状况合并以显示运行状况。

[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check passed

结论：

我们已经研究了Kubernetes探针技术；它们是应用高可用方案的重要组成部分。另一方面，很明显，错误的配置会对应用程序/微服务的可用性产生不利影响。最重要的是要适当地配置和测试不同的场景以找到最佳值；我们需要考虑外部源的稳定性，以及我们是否会在探测响应端点上包含此检查。我们已经看到Readiness Probe是用于在服务和负载均衡中删除异常pod，而liveness Probe则是在故障时重新启动pod。您可以在“进一步阅读”部分找到以前文章的链接，其中详细介绍了 Readiness, Liveness与Startup Probes。

进一步阅读：
- Kubernetes启动探针-示例和常见陷阱(https://loft.sh/blog/kubernetes-startup-probes-examples-common-pitfalls/)
- Kubernetes活力探针-示例和常见陷阱(https://loft.sh/blog/kubernetes-liveness-probes-examples-common-pitfalls/index-1/)
- Kubernetes准备就绪探针-示例和常见陷阱(https://loft.sh/blog/kubernetes-readiness-probes-examples-common-pitfalls/)
- Kubernetes核心探针文档(https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#probe-v1-core)
- 配置Liveness, Readiness 与Startup Probes(https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
- Kubernetes容器探针说明文件(https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes)
- 容器Lifecycle Hooks文档(https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/)
原文：

https://loft.sh/blog/kubernetes-probes-startup-liveness-readiness/

« 想了解Vuex？一定先把这篇笔记码住！

万台云主机环境下的运维管理之道 »

以专业成就每一位客户，让企业IT只为效果和安全买单

以专业成就每一位客户，让企业IT只为效果和安全买单

云管理服务: 云咨询; 云迁移; 云运维; 云治理

客户案例: 零售; 医药; 互联网; 能源制造; 餐饮娱乐

解决方案: 零售解决方案; 制造解决方案; 医药解决方案; 外企解决方案; 混合云安全解决方案; 办公网安全解决方案; 混合云管理解决方案

安全管理服务

技术学院: 技术博客

混合云产品

关于我们: 公司介绍; 新闻动态; 联系我们; 合作伙伴

联系我们: 服务热线：400 920 0057; 上海公司：上海市浦东新区达尔文路88号21栋2层&5层; 合肥公司：安徽省合肥市蜀山区望江西路900号A4栋319室; 邮箱：info@tyun.cn; 大量原创文章

© 钛信（上海）信息科技有限公司沪公网安备 31010702004114 | 沪ICP备18012007号-1 TYUN All Rights Reserved.

专注数字化转型、私有云、公有云、ITSM、CMP、IT运维、云安全、IT服务管理、主机安全、安全加固等信息安全服务！

联系我们

服务热线

400-920-0057

邮箱

info@tyun.cn
公众号&

视频号

微信公众号

抖音号
回到顶部