• 用于管理任何规模集群的10个必要的Ceph命令

    云和安全管理服务专家新钛云服 侯明明翻译

    USH规则来强制执行该行为,而不管我们有多少节点,可能在每一边都有。

    crush rule dump 是一个快速获取crush rule列表以及如何在集群中定义他们的好方法,如果我们想要进行更改,我们可以使用大量的CRUSH命令来进行修改,或者,我们可以手动下载下来然后通过反编译的方式以进行手动更改,重新编译并将其推送回我们的集群。

    $ ceph osd crush rule dump
    [
      {
           "rule_id": 0,
           "rule_name": "replicated_rule",
           "ruleset": 0,
           "type": 1,
           "min_size": 1,
           "max_size": 10,
           "steps": [
              {
                   "op": "take",
                   "item": -1,
                   "item_name": "default"
              },
              {
                   "op": "chooseleaf_firstn",
                   "num": 0,
                   "type": "host"
              },
              {
                   "op": "emit"
              }
          ]
      }
    ]

    六、versions

    在生产环境中运行分布式集群,一次升级所有内容任何和祈祷不出现问题问题,这显然不是个好注意。为此,每个在ceph内的集群范围的守护进程都有自己的版本并且能独立升级,并使集群保持最新状态,而不会或几乎不中断服务。

    只要我们保持版本相互接近,不同版本的守护进程就可以完美的相互协作。这意味可能在升级过程中管理数百个不同的守护进程和各自的版本。输入ceph version ,一个很简单的查看正在运行的特定版本的守护进程实例。

    $ ceph versions
    {
       "mon": {
           "ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 1
      },
       "mgr": {
           "ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 1
      },
       "osd": {
           "ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 36
      },
       "mds": {},
       "rgw": {
           "ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 1
      },
       "overall": {
           "ceph version 14.2.15-2-g7407245e7b (7407245e7b329ac9d475f61e2cbf9f8c616505d6) nautilus (stable)": 39
      }
    }

    七、auth print-key

    如果有很多不同的客户端使用集群,我们需要从集群中拿到密钥以便他们进行认证,使用ceph auth print-key命令是一个查看密钥比较好的方法,比通过配置文件获取要好一些。其他有用相关的命令是ceph auth list,这将列出整个集群中和守护进程的所有身份认证密钥的完整列表,以及他们各自的功能。

    $ ceph auth print-key client.admin
    AQDgrLhg3qY1ChAAzzZPHCw2tYz/o+2RkpaSIg==d

    八、crash ls

    守护进程崩溃?发生这种情况的原因很多,但是ceph crash ls是首要的查看方法,我们将得到崩溃的线索,然后我们可以进一步的进行诊断,通常他们会有些次要的告警帮助更简单的定位错误,但是crash能够提供严肃的问题,其他一些有用的命令是ceph crash info <id>,它会给出在问题中的crash ID的更多信息,如果告警不必担心的话,我们将存储所有crash,或已经处理的问题。

    $ ceph crash ls
    1 daemons have recently crashed
    osd.9 crashed on host danny-1 at 2021-03-06 07:28:12.665310Z

    九、osd flags

    有许多OSD flags是非常有用的,可以在OSDMAP_FLAGS** 查看到完整的列表,这里将一些场景的列出,如下**

    • pauserd, pausewr – 不再回应读写请求
    • noout – 如果守护进程由于某种原因失效,Ceph不会将OSD视为集群之外。
    • nobackfill, norecover, norebalance – 恢复和重新均衡处于关闭状态

    我们可以在下边的演示看到如何使用ceph osd set命令设置这些标志,以及这如何影响我们的健康消息传递,另一个有用且相关的命令是通过简单的bash扩展取出过的OSD的能力。

    $ ceph osd out {7..11}
    marked out osd.7. marked out osd.8. marked out osd.9. marked out osd.10. marked out osd.11.
    $ ceph osd set noout
    noout is set$ ceph osd set nobackfill
    nobackfill is set$ ceph osd set norecover
    norecover is set$ ceph osd set norebalance
    norebalance is set
    $ ceph osd set nodown
    nodown is set$ ceph osd set pause
    pauserd,pausewr is set$ ceph health detail
    HEALTH_WARN pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set
    OSDMAP_FLAGS pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set

    十、pg dump

    所有的数据计划放到Ceph中,它提供了一个抽象层 – 类似数据buckets的bit(不是S3存储桶) – 用于我们的存储,并允许集群轻松决定如何分发数据并对故障做出最佳反应。详细了解我们的放置组是如何在我们的OSD上映射的。或者相反的,我们可以使用pgdump完成两种操作,虽然很多放置组命令可以非常冗长且难以阅读,但ceph pg dump osds可以很好的将其提取到单个窗格中。

    $ ceph pg dump osds
    dumped osds
    OSD_STAT USED    AVAIL   USED_RAW TOTAL   HB_PEERS                                                                          PG_SUM PRIMARY_PG_SUM
    31        70 GiB 9.1 TiB   71 GiB 9.2 TiB                     [0,1,2,3,4,5,6,8,9,12,13,14,15,16,17,18,19,20,21,22,23,30,32]    175             72
    13        70 GiB 9.1 TiB   71 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,12,14,24,25,26,27,28,29,30,31,32,33,34,35]    185             66
    25        77 GiB 9.1 TiB   78 GiB 9.2 TiB                         [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,24,26]    180             64
    32        83 GiB 9.1 TiB   84 GiB 9.2 TiB                       [0,1,2,3,4,5,6,7,12,13,14,15,16,17,18,19,20,21,22,23,31,33]    181             73
    23       102 GiB 9.1 TiB  103 GiB 9.2 TiB                [0,1,2,3,4,5,6,7,8,9,10,11,22,24,25,26,27,28,29,30,31,32,33,34,35]    191             69
    18        77 GiB 9.1 TiB   78 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,17,19,24,25,26,27,28,29,30,31,32,33,34,35]    188             67
    11        64 GiB 9.1 TiB   65 GiB 9.2 TiB                                                   [10,12,21,28,29,31,32,33,34,35]      0              0
    8         90 GiB 9.1 TiB   91 GiB 9.2 TiB                                                       [1,2,7,9,14,15,21,27,30,33]      2              0
    14        70 GiB 9.1 TiB   71 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,13,15,24,25,26,27,28,29,30,31,32,33,34,35]    177             64
    33        77 GiB 9.1 TiB   78 GiB 9.2 TiB                         [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,32,34]    187             80
    3         89 GiB 9.1 TiB   90 GiB 9.2 TiB   [2,4,8,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]    303             74
    30        77 GiB 9.1 TiB   78 GiB 9.2 TiB                       [0,1,2,3,4,5,6,9,12,13,14,15,16,17,18,19,20,21,22,23,29,31]    179             76
    15        71 GiB 9.1 TiB   72 GiB 9.2 TiB               [0,1,2,3,4,5,6,7,8,10,11,14,16,24,25,26,27,28,29,30,31,32,33,34,35]    178             72
    7         70 GiB 9.1 TiB   71 GiB 9.2 TiB                                                     [6,8,15,17,30,31,32,33,34,35]      0              0
    28        90 GiB 9.1 TiB   91 GiB 9.2 TiB                     [0,1,2,3,4,5,6,7,9,12,13,14,15,16,17,18,19,20,21,22,23,27,29]    188             73
    16        77 GiB 9.1 TiB   78 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,15,17,24,25,26,27,28,29,30,31,32,33,34,35]    183             66
    1         77 GiB 9.1 TiB   78 GiB 9.2 TiB [0,2,8,9,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]    324             70
    26        77 GiB 9.1 TiB   78 GiB 9.2 TiB                         [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,25,27]    186             61
    22        89 GiB 9.1 TiB   90 GiB 9.2 TiB                [0,1,2,3,4,5,6,7,8,9,11,21,23,24,25,26,27,28,29,30,31,32,33,34,35]    178             80
    0        103 GiB 9.1 TiB  104 GiB 9.2 TiB       [1,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]    308             83
    5         70 GiB 9.1 TiB   71 GiB 9.2 TiB     [4,6,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]    312             69
    21        77 GiB 9.1 TiB   78 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,20,22,24,25,26,27,28,29,30,31,32,33,34,35]    187             63
    4         96 GiB 9.1 TiB   97 GiB 9.2 TiB  [3,5,10,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]    305             77
    34        96 GiB 9.1 TiB   97 GiB 9.2 TiB                     [0,1,2,3,4,5,6,8,9,12,13,14,15,16,17,18,19,20,21,22,23,33,35]    189             73
    17        96 GiB 9.1 TiB   97 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,16,18,24,25,26,27,28,29,30,31,32,33,34,35]    185             72
    24        77 GiB 9.1 TiB   78 GiB 9.2 TiB                         [0,1,2,3,4,5,6,10,12,13,14,15,16,17,18,19,20,21,22,23,25]    186             73
    10        76 GiB 9.1 TiB   77 GiB 9.2 TiB                                                     [4,9,11,15,17,18,25,29,34,35]      1              0
    27        89 GiB 9.1 TiB   90 GiB 9.2 TiB                      [0,1,2,3,4,5,6,10,12,13,14,15,16,17,18,19,20,21,22,23,26,28]    185             75
    2         77 GiB 9.1 TiB   78 GiB 9.2 TiB   [1,3,8,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]    310             62
    19        77 GiB 9.1 TiB   78 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,18,20,24,25,26,27,28,29,30,31,32,33,34,35]    184             77
    20        77 GiB 9.1 TiB   78 GiB 9.2 TiB             [0,1,2,3,4,5,6,7,8,9,10,11,19,21,24,25,26,27,28,29,30,31,32,33,34,35]    183             69
    35        96 GiB 9.1 TiB   97 GiB 9.2 TiB                            [0,1,2,3,4,5,6,12,13,14,15,16,17,18,19,20,21,22,23,34]    187             78
    9         77 GiB 9.1 TiB   78 GiB 9.2 TiB                                                     [1,8,10,12,13,16,21,23,32,35]      1              0
    6         83 GiB 9.1 TiB   84 GiB 9.2 TiB     [5,7,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]    323             58
    12        89 GiB 9.1 TiB   90 GiB 9.2 TiB                  [0,1,2,3,4,5,6,8,9,10,11,13,24,25,26,27,28,29,30,31,32,33,34,35]    189             78
    29        64 GiB 9.1 TiB   65 GiB 9.2 TiB                       [0,1,2,3,4,5,6,9,12,13,14,15,16,17,18,19,20,21,22,23,28,30]    185             74
    sum      2.8 TiB 327 TiB  2.9 TiB 330 TiB

    使用这些基本的命令,您可以很好的处理日常Ceph集群管理。借助HyperDrive存储管理器,你可以更轻松一些。

    就像孩子在使用计算器之前学习如何在纸上进行加法、减法、除法和乘法一样,任何Ceph管理员都必须了解这些关键的Ceph命令,但是一旦他们掌握在您的掌控之下,那么为什么不使用HyperDrive Storage Manager让集群管理更加简单/或将简单的管理任务委托给团队中不太精通的人呢?

    HyperDrive Storage Manager是一个功能强大、统一且直观的系统,它从根本上简化了对所有Ceph软件和存储硬件的管理,无论他们是HyperDrive还是通用存储。

    原文:https://softiron.com/blog/10-essential-ceph-commands-for-managing-any-cluster-at-any-scale/

    «
    »
以专业成就每一位客户,让企业IT只为效果和安全买单

以专业成就每一位客户,让企业IT只为效果和安全买单