-
Alertmanager 配置终极指南:从“邪道”到正规军
当 Prometheus Operator 的 AlertmanagerConfig 死活不生效时,我选择了对加密配置下手… 问题背景
部署 Prometheus Operator 后,精心配置的 AlertmanagerConfig 资源死活不生效。在无数次调试无果后,我决定绕过 Operator,直接对加密的默认配置动手——这是一条邪修之路,但效果立竿见影!
邪道方案:直捣黄龙
- 获取加密配置
Plain Text
kubectl get secret alertmanager-rancher-monitoring-alertmanager-generated \
-n cattle-monitoring-system -o yaml > secret.yaml- 解密核心配置
Plain Text
# 安装 yq 工具
wget https://github.com/mikefarah/yq/releases/download/v4.25.1/yq_linux_amd64 -O /usr/local/bin/yq
chmod +x /usr/local/bin/yq# 解密 alertmanager 配置
echo “$(yq eval ‘.data.”alertmanager.yaml.gz”‘ secret.yaml)” | base64 -d | gzip -d > alertmanager.yaml# 解密模板文件
echo “$(yq eval ‘.data.”rancher_defaults.tmpl”‘ secret.yaml)” | base64 -d > rancher_defaults.tmpl- 魔改配置(QQ邮箱示例)
Plain Text
global:
resolve_timeout: 5m
smtp_smarthost: ‘smtp.qq.com:465’
smtp_from: ‘xxxx@qq.com’
smtp_auth_username: ‘xxxx@qq.com’
smtp_auth_password: ‘xxxxxxx’
smtp_require_tls: falseroute:
receiver: “k8s-alarm”
group_by: [alertname]
routes:
– receiver: “null”
matchers:
– alertname = “Watchdog”
group_wait: 30s
group_interval: 5m
repeat_interval: 12hreceivers:
– name: “k8s-alarm”
email_configs:
– to: ‘test@gmail.cn’
send_resolved: true
– name: “null”templates:
– /etc/alertmanager/config/*.tmpl- 重新加密并部署
Plain Text
# 压缩配置
gzip -c alertmanager.yaml > alertmanager.yaml.gz# Base64 编码
ALERTMANAGER_CONFIG=$(base64 -w0 alertmanager.yaml.gz)
TEMPLATE_CONFIG=$(base64 -w0 rancher_defaults.tmpl)# 生成新 Secret
yq eval “.data.\”alertmanager.yaml.gz\” = \”$ALERTMANAGER_CONFIG\” |
.data.\”rancher_defaults.tmpl\” = \”$TEMPLATE_CONFIG\”” secret.yaml > updated-secret.yaml# 修改 Secret 名称
sed -i ‘s/name: alertmanager-.*/name: alertmanager-main/’ updated-secret.yaml# 应用配置
kubectl apply -f updated-secret.yaml -n cattle-monitoring-system- 修改 Alertmanager 工作负载
Plain Text
# 修改 volumes 配置
volumes:
– name: config-volume
secret:
secretName: alertmanager-main # 替换默认值效果验证



警告:此方案虽快但险,Operator 升级可能导致配置被覆盖! 正规军方案:优雅之道
- 配置告警接收器和路由
Plain Text
# k8s-alarm.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: k8s-alarm
namespace: test
spec:
receivers:
– name: tialert
webhookConfigs:
– url: https://your-webhook-url
sendResolved: trueroute:
groupBy: [alertname]
groupInterval: 5m
groupWait: 30s
matchers:
– name: severity
value: “warning|critical”
regex: true
receiver: tialert
repeatInterval: 4h- 配置静默路由
Plain Text
# null.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: silence-watchdog
namespace: cattle-monitoring-system
spec:
receivers:
– name: null-receiverroute:
matchers:
– name: alertname
value: “Watchdog”
receiver: null-receiver- 自定义告警规则
Plain Text
# app-alert.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-backend-alerts
namespace: test
labels:
prometheus: rancher-monitoring
role: alert-rules
spec:
groups:
– name: app-backend
rules:
– alert: HighRequestRate
expr: |
sum(rate(http_requests_total{job=”app-backend”}[5m])) by (service) > 100
for: 10m
labels:
severity: critical
annotations:
summary: “High request rate on {{ $labels.service }}”
description: “Request rate is {{ $value }} per second”总结对比

点击图片可查看完整电子表格
选择建议:调试阶段可用“邪道”快速验证,生产环境务必使用正规方案!