root@testk8s1:~# ceph osd pool ls detail pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 1 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 12 flags hashpspool crash_replay_interval 45 stripe_width 0 pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 11 flags hashpspool stripe_width 0
I haven't changed any crush rule. Here's the dump: root@testk8s1:~# ceph osd crush rule dump [ { "rule_id": 0, "rule_name": "replicated_ruleset", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default" }, { "op": "chooseleaf_firstn", "num": 0, "type": "host" }, { "op": "emit" } ] } ] ? kind regards, Grigori ________________________________ От: Paul Emmerich <paul.emmer...@croit.io> Отправлено: 7 июня 2018 г. 18:26 Кому: Grigori Frolov Копия: ceph-users@lists.ceph.com Тема: Re: [ceph-users] I/O hangs when one of three nodes is down can you post your pool configuration? ceph osd pool ls detail and the crush rule if you modified it. Paul 2018-06-07 14:52 GMT+02:00 Фролов Григорий <gfro...@naumen.ru<mailto:gfro...@naumen.ru>>: ?Hello. Could you please help me troubleshoot the issue. I have 3 nodes in a cluster. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.02637 root default -2 0.00879 host testk8s3 0 0.00879 osd.0 up 1.00000 1.00000 -3 0.00879 host testk8s1 1 0.00879 osd.1 down 0 1.00000 -4 0.00879 host testk8s2 2 0.00879 osd.2 up 1.00000 1.00000 Each node runs ceph-osd, ceph-mon and ceph-mds. So when all nodes are up, everything is fine. When any of 3 nodes goes down, no matter if it shuts down gracefully or in a hard way, remaining nodes cannot read or write to the catalog where ceph storage is mounted. They also cannot unmount the volume. Every process touching the catalog just hangs forever, going into uninterruptible sleep. When I try to strace that process, strace hangs too. When the failed node goes up, each hung process finishes successfully. So what could cause the issue? root@testk8s2:~# ps -eo pid,stat,cmd | grep ls 3700 D ls --color=auto /mnt/db 3997 S+ grep --color=auto ls root@testk8s2:~# strace -p 3700& [1] 4020 root@testk8s2:~# strace: Process 3700 attached root@testk8s2:~# ps -eo pid,stat,cmd | grep strace 4020 S strace -p 3700 root@testk8s2:~# umount /mnt& [2] 4084 root@testk8s2:~# ps -eo pid,state,cmd | grep umount 4084 D umount /mnt root@testk8s2:~# ceph -v ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) root@testk8s2:~# ceph -s cluster 0bcc00ec-731a-4734-8d76-599f70f06209 health HEALTH_ERR 80 pgs degraded 80 pgs stuck degraded 80 pgs stuck unclean 80 pgs stuck undersized 80 pgs undersized recovery 1075/3225 objects degraded (33.333%) mds rank 2 has failed mds cluster is degraded 1 mons down, quorum 1,2 testk8s2,testk8s3 monmap e1: 3 mons at {testk8s1=10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0<http://10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0>} election epoch 120, quorum 1,2 testk8s2,testk8s3 fsmap e14084: 2/3/3 up {0=testk8s2=up:active,1=testk8s3=up:active}, 1 failed osdmap e9939: 3 osds: 2 up, 2 in; 80 remapped pgs flags sortbitwise,require_jewel_osds pgmap v17491: 80 pgs, 3 pools, 194 MB data, 1075 objects 1530 MB used, 16878 MB / 18408 MB avail 1075/3225 objects degraded (33.333%) 80 active+undersized+degraded Thanks. kind regards, Grigori _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 Munchen www.croit.io<http://www.croit.io> Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com