处理 pgs 100.000% pgs not active 的问题

在ceph 集群中建立 pool 以后,集群中的 pgs 一致处于 not active 的状态。这个问题和CRUSH算法中的分配规则有关。

解决方法,修改CRUSHMap。

导出 crushmap, 执行

1
ceph osd getcrushmap -o crushmap

反编译

1
crushtool -d crushmap crushmap.txt

查看内容,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host ceph-01 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.029
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.010
item osd.1 weight 0.010
item osd.2 weight 0.010
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.029
alg straw2
hash 0 # rjenkins1
item ceph-01 weight 0.029
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

可以看到其中有一条规则, “step chooseleaf firstn 0 type host”,

将 host 修改为 osd

修改为 step chooseleaf firstn 0 type osd

编译

1
crushtool -c crushmap.txt -o crushmap-new

加载新规则

1
ceph osd setcrushmap -i crushmap-new

再查看集群状态,可以看到 pgs 已经成为 active

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cluster:
id: 6c0d548f-5da2-46a5-b289-f1cd16a31aa2
health: HEALTH_WARN
1 pool(s) have non-power-of-two pg_num

services:
mon: 1 daemons, quorum ceph-01 (age 52m)
mgr: ceph-01(active, since 45m)
osd: 3 osds: 3 up (since 50m), 3 in (since 50m)

data:
pools: 1 pools, 20 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 27 GiB / 30 GiB avail
pgs: 20 active+clean

问题解决。

本文标题:处理 pgs 100.000% pgs not active 的问题

文章作者:Morning Star

发布时间:2022年12月08日 - 06:12

最后更新:2022年12月08日 - 07:12

原始链接:https://www.mls-tech.info/ceph/ceph-pgs-100-precent-not-active/

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。