Your need to fix this first.


    pgs:     0.056% pgs unknown 

             0.553% pgs not active



The back filling will cause slow I/O, but having pgs unknown and not active 
will cause I/O blocking which your seeing with the VM booting.



Seems you have 4 OSD's down, if you get them back online you should be able to 
get all the PG's online.



---- On Fri, 20 Sep 2019 14:14:01 +0800 Thomas <74cmo...@gmail.com> wrote ----


Hi, 
 
here I describe 1 of the 2 major issues I'm currently facing in my 8 
node ceph cluster (2x MDS, 6x ODS). 
 
The issue is that I cannot start any virtual machine KVM or container 
LXC; the boot process just hangs after a few seconds. 
All these KVMs and LXCs have in common that their virtual disks reside 
in the same pool: hdd 
 
This pool hdd is relatively small compared to the largest pool: hdb_backup 
root@ld3955:~# rados df 
POOL_NAME              USED  OBJECTS CLONES    COPIES MISSING_ON_PRIMARY 
UNFOUND DEGRADED    RD_OPS       RD    WR_OPS      WR USED COMPR UNDER COMPR 
backup                  0 B        0      0         0                  
0       0        0         0      0 B         0     0 B        0 
B         0 B 
hdb_backup          589 TiB 51262212      0 153786636                  
0       0   124895  12266095  4.3 TiB 247132863 463 TiB        0 
B         0 B 
hdd                 3.2 TiB   281884   6568    845652                  
0       0     1658 275277357   16 TiB 208213922  10 TiB        0 
B         0 B 
pve_cephfs_data     955 GiB    91832      0    275496                  
0       0     3038      2103 1021 MiB    102170 318 GiB        0 
B         0 B 
pve_cephfs_metadata 486 MiB       62      0       186                  
0       0        7       860  1.4 GiB     12393 166 MiB        0 
B         0 B 
 
total_objects    51635990 
total_used       597 TiB 
total_avail      522 TiB 
total_space      1.1 PiB 
 
This is the current health status of the ceph cluster: 
  cluster: 
    id:     6b1b5117-6e08-4843-93d6-2da3cf8a6bae 
    health: HEALTH_ERR 
            1 filesystem is degraded 
            1 MDSs report slow metadata IOs 
            1 backfillfull osd(s) 
            87 nearfull osd(s) 
            1 pool(s) backfillfull 
            Reduced data availability: 54 pgs inactive, 47 pgs peering, 
1 pg stale 
            Degraded data redundancy: 129598/154907946 objects degraded 
(0.084%), 33 pgs degraded, 33 pgs undersized 
            Degraded data redundancy (low space): 322 pgs backfill_toofull 
            1 subtrees have overcommitted pool target_size_bytes 
            1 subtrees have overcommitted pool target_size_ratio 
            1 pools have too many placement groups 
            21 slow requests are blocked > 32 sec 
 
  services: 
    mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 14h) 
    mgr: ld5507(active, since 16h), standbys: ld5506, ld5505 
    mds: pve_cephfs:1/1 {0=ld3955=up:replay} 1 up:standby 
    osd: 360 osds: 356 up, 356 in; 382 remapped pgs 
 
  data: 
    pools:   5 pools, 8868 pgs 
    objects: 51.64M objects, 197 TiB 
    usage:   597 TiB used, 522 TiB / 1.1 PiB avail 
    pgs:     0.056% pgs unknown 
             0.553% pgs not active 
             129598/154907946 objects degraded (0.084%) 
             2211119/154907946 objects misplaced (1.427%) 
             8458 active+clean 
             298  active+remapped+backfill_toofull 
             29   remapped+peering 
             24   active+undersized+degraded+remapped+backfill_toofull 
             22   active+remapped+backfill_wait 
             17   peering 
             5    unknown 
             5    active+recovery_wait+undersized+degraded+remapped 
             3    active+undersized+degraded+remapped+backfill_wait 
             2    activating+remapped 
             1    active+clean+remapped 
             1    stale+peering 
             1    active+remapped+backfilling 
             1    active+recovering+undersized+remapped 
             1    active+recovery_wait+degraded 
 
  io: 
    client:   9.2 KiB/s wr, 0 op/s rd, 1 op/s wr 
 
I believe the cluster is busy with rebalancing pool hdb_backup. 
I set the balance mode upmap recently after the 589TB data was written. 
root@ld3955:~# ceph balancer status 
{ 
    "active": true, 
    "plans": [], 
    "mode": "upmap" 
} 
 
 
In order to resolve the issue with pool hdd I started some investigation. 
First step was to install drivers for the NIC provided Mellanox. 
Then I configured some kernel parameters recommended 
<https://community.mellanox.com/s/article/linux-sysctl-tuning> by Mellanox. 
 
However this didn't fix the issue. 
In my opinion I must get rid of all "slow requests are blocked". 
 
When I check the output of ceph health detail any OSD listed under 
REQUEST_SLOW points to an OSD that belongs to pool hdd. 
This means none of the disks belonging to pool hdb_backup is showing a 
comparable behaviour. 
 
Then I checked the running processes on the different OSD nodes; I use 
tool "glances" here. 
Here I can see single processes that are running for hours and consuming 
much CPU, e.g. 
66.8   0.2   2.13G 1.17G 1192756 ceph        17h8:33 58    0 S  41M 2K   
/usr/bin/ceph-osd -f --cluster ceph --id 37 --setuser ceph --setgroup ceph 
34.2   0.2   4.31G 1.20G  971267 ceph       15h38:46 58    0 S  14M 3K   
/usr/bin/ceph-osd -f --cluster ceph --id 73 --setuser ceph --setgroup ceph 
 
Similar processes are running on 4 OSD nodes. 
All processes have in common that the relevant OSD belongs to pool hdd. 
 
Furthermore glances gives me this alert: 
CRITICAL on CPU_IOWAIT (Min:1.9 Mean:2.3 Max:2.6): ceph-osd, ceph-osd, 
ceph-osd 
 
What can / should I do now? 
Kill the long running processes? 
Stop the relevant OSDs? 
 
Please advise? 
 
THX 
Thomas 
_______________________________________________
ceph-users mailing list -- mailto:ceph-users@ceph.io
To unsubscribe send an email to mailto:ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to