• Alertmanager 配置终极指南:从“邪道”到正规军

     

    当 Prometheus Operator 的 AlertmanagerConfig 死活不生效时,我选择了对加密配置下手…

    问题背景

    部署 Prometheus Operator 后,精心配置的 AlertmanagerConfig 资源死活不生效。在无数次调试无果后,我决定绕过 Operator,直接对加密的默认配置动手——这是一条邪修之路​,但效果立竿见影!

    邪道方案:直捣黄龙

    1. 获取加密配置
    Plain Text
    kubectl get secret alertmanager-rancher-monitoring-alertmanager-generated \
    -n cattle-monitoring-system -o yaml > secret.yaml
    1. 解密核心配置
    Plain Text
    # 安装 yq 工具
    wget https://github.com/mikefarah/yq/releases/download/v4.25.1/yq_linux_amd64 -O /usr/local/bin/yq
    chmod +x /usr/local/bin/yq# 解密 alertmanager 配置
    echo “$(yq eval ‘.data.”alertmanager.yaml.gz”‘ secret.yaml)” | base64 -d | gzip -d > alertmanager.yaml

    # 解密模板文件
    echo “$(yq eval ‘.data.”rancher_defaults.tmpl”‘ secret.yaml)” | base64 -d > rancher_defaults.tmpl

    1. 魔改配置(QQ邮箱示例)
    Plain Text
    global:
    resolve_timeout: 5m
    smtp_smarthost: ‘smtp.qq.com:465’
    smtp_from: ‘xxxx@qq.com’
    smtp_auth_username: ‘xxxx@qq.com’
    smtp_auth_password: ‘xxxxxxx’
    smtp_require_tls: falseroute:
    receiver: “k8s-alarm”
    group_by: [alertname]
    routes:
    – receiver: “null”
    matchers:
    – alertname = “Watchdog”
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h

    receivers:
    – name: “k8s-alarm”
    email_configs:
    – to: ‘test@gmail.cn’
    send_resolved: true
    – name: “null”

    templates:
    – /etc/alertmanager/config/*.tmpl

    1. 重新加密并部署
    Plain Text
    # 压缩配置
    gzip -c alertmanager.yaml > alertmanager.yaml.gz# Base64 编码
    ALERTMANAGER_CONFIG=$(base64 -w0 alertmanager.yaml.gz)
    TEMPLATE_CONFIG=$(base64 -w0 rancher_defaults.tmpl)

    # 生成新 Secret
    yq eval “.data.\”alertmanager.yaml.gz\” = \”$ALERTMANAGER_CONFIG\” |
    .data.\”rancher_defaults.tmpl\” = \”$TEMPLATE_CONFIG\”” secret.yaml > updated-secret.yaml

    # 修改 Secret 名称
    sed -i ‘s/name: alertmanager-.*/name: alertmanager-main/’ updated-secret.yaml

    # 应用配置
    kubectl apply -f updated-secret.yaml -n cattle-monitoring-system

    1. 修改 Alertmanager 工作负载
    Plain Text
    # 修改 volumes 配置
    volumes:
    – name: config-volume
    secret:
    secretName: alertmanager-main  # 替换默认值

    效果验证

     

    警告:此方案虽快但险,Operator 升级可能导致配置被覆盖!

    正规军方案:优雅之道

    1. 配置告警接收器和路由
    Plain Text
    # k8s-alarm.yaml
    apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    metadata:
    name: k8s-alarm
    namespace: test
    spec:
    receivers:
    – name: tialert
    webhookConfigs:
    – url: https://your-webhook-url
    sendResolved: trueroute:
    groupBy: [alertname]
    groupInterval: 5m
    groupWait: 30s
    matchers:
    – name: severity
    value: “warning|critical”
    regex: true
    receiver: tialert
    repeatInterval: 4h
    1. 配置静默路由
    Plain Text
    # null.yaml
    apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    metadata:
    name: silence-watchdog
    namespace: cattle-monitoring-system
    spec:
    receivers:
    – name: null-receiverroute:
    matchers:
    – name: alertname
    value: “Watchdog”
    receiver: null-receiver
    1. 自定义告警规则
    Plain Text
    # app-alert.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
    name: app-backend-alerts
    namespace: test
    labels:
    prometheus: rancher-monitoring
    role: alert-rules
    spec:
    groups:
    – name: app-backend
    rules:
    – alert: HighRequestRate
    expr: |
    sum(rate(http_requests_total{job=”app-backend”}[5m])) by (service) > 100
    for: 10m
    labels:
    severity: critical
    annotations:
    summary: “High request rate on {{ $labels.service }}”
    description: “Request rate is {{ $value }} per second”

    总结对比

    点击图片可查看完整电子表格

    选择建议:调试阶段可用“邪道”快速验证,生产环境务必使用正规方案!

     

    «
    »
以专业成就每一位客户,让企业IT只为效果和安全买单

以专业成就每一位客户,让企业IT只为效果和安全买单

在线咨询
连接中...