[ceph-users] ceph is stuck after increasing pg_nums
Hi, We have a Pacific cluster (16.2.4) with 30 servers and 30 osds. We started to increase the pg_num for the data bucket for more than a month, I usually added 64 pgs in every step I didn't have any issue. The cluster was healthy before increasing the pgs. Today I've added 128 pgs and the cluster is stuck with some unknown pgs and some other in peering state. I've restarted a few osds with slow_ops and even a few hosts but it didn't change anything. We don't have any networking issue . Do you have any suggestion ? Our service is completely down ... cluster: id: 322ef292-d129-11eb-96b2-a1b38fd61d55 health: HEALTH_WARN Slow OSD heartbeats on back (longest 1517.814ms) Slow OSD heartbeats on front (longest 1551.680ms) Reduced data availability: 42 pgs inactive, 33 pgs peering 1 pool(s) have non-power-of-two pg_num 2888 slow ops, oldest one blocked for 6028 sec, daemons [osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... have slow ops. services: mon: 5 daemons, quorum osd-new-01,osd04,osd05,osd09,osd22 (age 11m) mgr: osd-new-01.babahi(active, since 11m), standbys: osd02.wqcizg osd: 311 osds: 311 up (since 3m), 311 in (since 3m); 29 remapped pgs rgw: 26 daemons active (26 hosts, 1 zones) data: pools: 8 pools, 2649 pgs objects: 590.57M objects, 1.5 PiB usage: 2.2 PiB used, 1.2 PiB / 3.4 PiB avail pgs: 0.340% pgs unknown 1.246% pgs not active 4056622/3539747751 objects misplaced (0.115%) 2529 active+clean 33 peering 31 active+clean+laggy 26 active+remapped+backfilling 18 active+clean+scrubbing+deep 9 unknown 3 active+remapped+backfill_wait io: client: 38 KiB/s rd, 0 B/s wr, 37 op/s rd, 25 op/s wr recovery: 426 MiB/s, 158 objects/s ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph is stuck after increasing pg_nums
ceph health detail HEALTH_WARN Reduced data availability: 42 pgs inactive, 33 pgs peering; 1 pool(s) have non-power-of-two pg_num; 2371 slow ops, oldest one blocked for 6218 sec, daemons [osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... have slow ops. [WRN] PG_AVAILABILITY: Reduced data availability: 42 pgs inactive, 33 pgs peering pg 6.eb is stuck peering for 54m, current state peering, last acting [79,279,68,179,264,240] pg 6.10f is stuck peering for 36m, current state peering, last acting [288,161,37,63,178,240] pg 6.115 is stuck inactive for 14m, current state unknown, last acting [] pg 6.139 is stuck inactive for 14m, current state unknown, last acting [] pg 6.17e is stuck peering for 103m, current state peering, last acting [126,190,252,282,113,240] pg 6.1a5 is stuck peering for 103m, current state peering, last acting [41,158,240,177,66,228] pg 6.1ae is stuck peering for 103m, current state peering, last acting [186,240,162,221,289,219] pg 6.1eb is stuck peering for 36m, current state peering, last acting [220,240,184,226,205,254] pg 6.21b is stuck peering for 58m, current state peering, last acting [179,301,168,292,240,121] pg 6.26d is stuck peering for 36m, current state peering, last acting [68,305,240,47,137,184] pg 6.348 is stuck peering for 77m, current state peering, last acting [138,307,221,125,240,285] pg 6.369 is stuck peering for 54m, current state peering, last acting [35,66,240,254,58,179] pg 6.39f is stuck peering for 28m, current state peering, last acting [264,46,240,154,101,194] pg 6.3ca is stuck peering for 58m, current state peering, last acting [202,213,174,296,240,45] pg 6.3cb is stuck inactive for 14m, current state unknown, last acting [] pg 6.3e1 is stuck peering for 77m, current state peering, last acting [115,168,240,85,56,26] pg 6.3f3 is stuck inactive for 14m, current state unknown, last acting [] pg 6.473 is stuck peering for 36m, current state peering, last acting [265,53,77,240,182,92] pg 6.576 is stuck inactive for 14m, current state unknown, last acting [] pg 6.5a6 is stuck peering for 103m, current state peering, last acting [257,37,240,54,263,68] pg 6.5eb is stuck inactive for 14m, current state unknown, last acting [] pg 6.63f is stuck peering for 85m, current state peering, last acting [252,53,240,131,25,278] pg 6.655 is stuck peering for 103m, current state peering, last acting [103,267,222,308,240,277] pg 6.6d5 is stuck peering for 36m, current state peering, last acting [197,171,276,177,210,240] pg 6.6f2 is stuck peering for 85m, current state peering, last acting [174,122,81,129,304,240] pg 6.721 is stuck peering for 51m, current state peering, last acting [181,76,294,249,299,240] pg 6.757 is stuck peering for 23m, current state peering, last acting [288,194,213,240,37,22] pg 6.785 is stuck inactive for 14m, current state unknown, last acting [] pg 6.793 is stuck peering for 77m, current state peering, last acting [155,301,240,294,214,265] pg 6.798 is stuck peering for 51m, current state peering, last acting [186,278,196,211,260,240] pg 6.79b is stuck peering for 54m, current state peering, last acting [186,25,108,240,300,39] pg 6.7b7 is stuck inactive for 14m, current state unknown, last acting [] pg 6.7c5 is stuck peering for 103m, current state peering, last acting [130,179,266,240,162,294] pg 6.7df is stuck peering for 36m, current state peering, last acting [188,240,182,282,265,199] pg 6.83c is stuck peering for 77m, current state peering, last acting [155,81,228,65,207,240] pg 6.85f is stuck peering for 103m, current state peering, last acting [129,263,307,28,240,63] pg 6.917 is stuck peering for 54m, current state peering, last acting [84,179,240,295,92,269] pg 6.939 is stuck inactive for 14m, current state unknown, last acting [] pg 6.97b is stuck peering for 103m, current state peering, last acting [34,96,293,129,147,240] pg 6.97e is stuck peering for 103m, current state peering, last acting [126,190,252,282,113,240] pg 6.9a5 is stuck peering for 103m, current state peering, last acting [41,158,240,186,66,228] pg 6.9ae is stuck peering for 103m, current state peering, last acting [186,240,162,221,289,219] [WRN] POOL_PG_NUM_NOT_POWER_OF_TWO: 1 pool(s) have non-power-of-two pg_num pool 'us-east-1.rgw.buckets.data' pg_num 2480 is not a power of two [WRN] SLOW_OPS: 2371 slow ops, oldest one blocked for 6218 sec, daemons [osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... have slow ops. On 11/4/2022 10:45 AM, Adrian Nicolae wrote: Hi, We have a Pacific cluster (16.2.4) with 30 servers and 30 osds. We started to increase the pg_num for the data bucket for more than a month, I usually added 64 pgs in every step I didn't have any issue. The cluster was healthy befo
[ceph-users] What is the reason of the rgw_user_quota_bucket_sync_interval and rgw_bucket_quota_ttl values?
Hi, One of my user told me that they can upload bigger files to the bucket than the limit. My question is to the developers mainly what’s the reason to set the rgw_bucket_quota_ttl=600 and rgw_user_quota_bucket_sync_interval=180? I don’t want to set to 0 before I know the reason 😃 With this settings if the user has pretty high bandwidth they can upload terabytes of files before the 10minutes limit reached. I set the following values on a specific bucket: "bucket_quota": { "enabled": true, "check_on_raw": false, "max_size": 524288000, "max_size_kb": 512000, "max_objects": 1126400 But they can upload 600MB files also. This article came into my face: https://bugzilla.redhat.com/show_bug.cgi?id=1417775 Seems like if these values set to 0: "name": "rgw_bucket_quota_ttl", "type": "int", "level": "advanced", "desc": "Bucket quota stats cache TTL", "long_desc": "Length of time for bucket stats to be cached within RGW instance.", "default": 600, and "name": "rgw_user_quota_bucket_sync_interval", "type": "int", "level": "advanced", "desc": "User quota bucket sync interval", "long_desc": "Time period for accumulating modified buckets before syncing these stats.", "default": 180, They will be terminated on bucket limit. Thank you This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to remove remaining bucket index shard objects
Hi, Mysteriously, the large omap objects alert recurred recently. The values for omap_used_mbytes and omap_used_keys are slightly different from the previous investigation, but very close. Our team is going to keep this cluster to investigate and create another cluster to work. Therefore, my reply may be slow. Previous values: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/TNQM2W4EDG3J33W7CML2JLCDNFDA6Q3W/ ``` $ kubectl exec -n ceph-poc deploy/rook-ceph-tools -- ceph -s cluster: id: 49bd471e-84e6-412e-8ed0-75d7bc176657 health: HEALTH_WARN 25 large omap objects services: mon: 3 daemons, quorum b,d,f (age 36h) mgr: b(active, since 38h), standbys: a osd: 96 osds: 96 up (since 31h), 96 in (since 31h) rgw: 6 daemons active (6 hosts, 2 zones) data: pools: 16 pools, 4432 pgs objects: 10.74k objects, 34 GiB usage: 158 GiB used, 787 TiB / 787 TiB avail pgs: 4432 active+clean io: client: 2.2 KiB/s rd, 169 B/s wr, 2 op/s rd, 0 op/s wr ``` ``` $ (header="id used_mbytes used_objects omap_used_mbytes omap_used_keys" > echo "${header}" > echo "${header}" | tr '[[:alpha:]_' '-' > kubectl exec -n ceph-poc deploy/rook-ceph-tools -- ceph pg ls-by-pool > "${OSD_POOL}" --format=json | jq -r '.pg_stats | > sort_by(.stat_sum.num_bytes) | .[] | (.pgid, .stat_sum.num_bytes/1024/1024, > .stat_sum.num_objects, .stat_sum.num_omap_bytes/1024/1024, > .stat_sum.num_omap_keys)' | paste - - - - -) | column -t idused_mbytes used_objects omap_used_mbytesomap_used_keys ----- -- 6.0 00 0 0 6.1 00 0 0 6.2 00 86.14682674407959 298586 6.3 00 93.08089542388916 323902 6.4 01 0 0 6.5 01 0 0 6.6 00 0 0 6.7 00 0 0 6.8 00 0 0 6.9 00 439.5090618133545 1524746 6.a 00 0 0 6.b 00 3.4069366455078125 12416 6.c 00 0 0 6.d 00 0 0 6.e 00 0 0 6.f 01 0 0 6.10 01 0 0 6.11 00 0 0 6.12 00 7.727175712585449 28160 6.13 00 114.01904964447021 394996 6.14 00 0 0 6.15 00 0 0 6.16 00 0 0 6.17 00 7.6217451095581055 27776 6.18 00 0 0 6.19 01 0 0 6.1a 01 0 0 6.1b 00 0 0 6.1c 00 88.36568355560303 306677 6.1d 00 0 0 6.1e 01 0 0 6.1f 00 0 0 6.20 01 0 0 6.21 00 0 0 6.22 00 5.883256912231445 21440 6.23 00 0 0 6.24 00 7.938144683837891 28928 6.25 00 0 0 6.26 00 4.267669677734375 15552 6.27 01 0 0 6.28 00 0 0 6.29 00 2.1601409912109375 7872 6.2a 01 0 0 6.2b 00 0 0 6.2c 00 5.479369163513184 19968 6.2d 00 0 0 6.2e 00 0 0 6.2f 00 0 0 6.30 00 0 0 6.31 01 0 0 6.32 01 0 0 6.33 00 5.812973976135254 21184 6.34 00 0 0 6.35 00 0 0 6.36 00 5.865510940551758 21376 6.37 00 0 0 6.38 00 93.97305393218994 327089 6.39 00 15.493829727172852 71787 6
[ceph-users] Re: [PHISHING VERDACHT] ceph is stuck after increasing pg_nums
Hi, On 11/4/22 09:45, Adrian Nicolae wrote: Hi, We have a Pacific cluster (16.2.4) with 30 servers and 30 osds. We started to increase the pg_num for the data bucket for more than a month, I usually added 64 pgs in every step I didn't have any issue. The cluster was healthy before increasing the pgs. Today I've added 128 pgs and the cluster is stuck with some unknown pgs and some other in peering state. I've restarted a few osds with slow_ops and even a few hosts but it didn't change anything. We don't have any networking issue . Do you have any suggestion ? Our service is completely down ... *snipsnap* Do some of the OSDs exceed the PGs per OSD limit? If this is the case, the affected OSDs will not allow peering, and tI/O to that OSDs will be stuck. You can check the number of PGs in the 'ceph osd df tree' output. To solve this problem you can increase the limit e.g. by setting 'osd.mon_max_pg_per_osd' in 'ceph config'. The default limit is 200 AFAIK. Regards, Burkhard ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: PG Ratio for EC overwrites Pool
Thank you very much. I've increased it to 2*#OSD rounded to the next power of 2. Best Ken On 03.11.22 15:30, Anthony D'Atri wrote: PG count isn’t just about storage size, it also affects performance, parallelism, and recovery. You want pgp_num for RBD metadata pool to be at the VERY least the number of OSDs it lives on, rounded up to the next power of 2. I’d probably go for at least (2x#OSD) rounded up. If you have two few, your metadata operations will contend with each other. On Nov 3, 2022, at 10:24, mailing-lists wrote: Dear Ceph'ers, I am wondering on how to choose the number of PGs for a RBD-EC-Pool. To be able to use RBD-Images on a EC-Pool, it needs to have an regular RBD-replicated-pool, as well as an EC-Pool with EC overwrites enabled, but how many PGs would you need for the RBD-replicated-pool. It doesn't seem to eat a lot of storage, so if I'm not mistaken, it could be actually a quite low number of PGs, but is this recommended? Is there a best practice for this? Best Ken ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph is stuck after increasing pg_nums
the problem was a single osd daemon (not reported on health detail) which slowed down the entire peering process, after restarting it the cluster got back to normal. On 11/4/2022 10:49 AM, Adrian Nicolae wrote: ceph health detail HEALTH_WARN Reduced data availability: 42 pgs inactive, 33 pgs peering; 1 pool(s) have non-power-of-two pg_num; 2371 slow ops, oldest one blocked for 6218 sec, daemons [osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... have slow ops. [WRN] PG_AVAILABILITY: Reduced data availability: 42 pgs inactive, 33 pgs peering pg 6.eb is stuck peering for 54m, current state peering, last acting [79,279,68,179,264,240] pg 6.10f is stuck peering for 36m, current state peering, last acting [288,161,37,63,178,240] pg 6.115 is stuck inactive for 14m, current state unknown, last acting [] pg 6.139 is stuck inactive for 14m, current state unknown, last acting [] pg 6.17e is stuck peering for 103m, current state peering, last acting [126,190,252,282,113,240] pg 6.1a5 is stuck peering for 103m, current state peering, last acting [41,158,240,177,66,228] pg 6.1ae is stuck peering for 103m, current state peering, last acting [186,240,162,221,289,219] pg 6.1eb is stuck peering for 36m, current state peering, last acting [220,240,184,226,205,254] pg 6.21b is stuck peering for 58m, current state peering, last acting [179,301,168,292,240,121] pg 6.26d is stuck peering for 36m, current state peering, last acting [68,305,240,47,137,184] pg 6.348 is stuck peering for 77m, current state peering, last acting [138,307,221,125,240,285] pg 6.369 is stuck peering for 54m, current state peering, last acting [35,66,240,254,58,179] pg 6.39f is stuck peering for 28m, current state peering, last acting [264,46,240,154,101,194] pg 6.3ca is stuck peering for 58m, current state peering, last acting [202,213,174,296,240,45] pg 6.3cb is stuck inactive for 14m, current state unknown, last acting [] pg 6.3e1 is stuck peering for 77m, current state peering, last acting [115,168,240,85,56,26] pg 6.3f3 is stuck inactive for 14m, current state unknown, last acting [] pg 6.473 is stuck peering for 36m, current state peering, last acting [265,53,77,240,182,92] pg 6.576 is stuck inactive for 14m, current state unknown, last acting [] pg 6.5a6 is stuck peering for 103m, current state peering, last acting [257,37,240,54,263,68] pg 6.5eb is stuck inactive for 14m, current state unknown, last acting [] pg 6.63f is stuck peering for 85m, current state peering, last acting [252,53,240,131,25,278] pg 6.655 is stuck peering for 103m, current state peering, last acting [103,267,222,308,240,277] pg 6.6d5 is stuck peering for 36m, current state peering, last acting [197,171,276,177,210,240] pg 6.6f2 is stuck peering for 85m, current state peering, last acting [174,122,81,129,304,240] pg 6.721 is stuck peering for 51m, current state peering, last acting [181,76,294,249,299,240] pg 6.757 is stuck peering for 23m, current state peering, last acting [288,194,213,240,37,22] pg 6.785 is stuck inactive for 14m, current state unknown, last acting [] pg 6.793 is stuck peering for 77m, current state peering, last acting [155,301,240,294,214,265] pg 6.798 is stuck peering for 51m, current state peering, last acting [186,278,196,211,260,240] pg 6.79b is stuck peering for 54m, current state peering, last acting [186,25,108,240,300,39] pg 6.7b7 is stuck inactive for 14m, current state unknown, last acting [] pg 6.7c5 is stuck peering for 103m, current state peering, last acting [130,179,266,240,162,294] pg 6.7df is stuck peering for 36m, current state peering, last acting [188,240,182,282,265,199] pg 6.83c is stuck peering for 77m, current state peering, last acting [155,81,228,65,207,240] pg 6.85f is stuck peering for 103m, current state peering, last acting [129,263,307,28,240,63] pg 6.917 is stuck peering for 54m, current state peering, last acting [84,179,240,295,92,269] pg 6.939 is stuck inactive for 14m, current state unknown, last acting [] pg 6.97b is stuck peering for 103m, current state peering, last acting [34,96,293,129,147,240] pg 6.97e is stuck peering for 103m, current state peering, last acting [126,190,252,282,113,240] pg 6.9a5 is stuck peering for 103m, current state peering, last acting [41,158,240,186,66,228] pg 6.9ae is stuck peering for 103m, current state peering, last acting [186,240,162,221,289,219] [WRN] POOL_PG_NUM_NOT_POWER_OF_TWO: 1 pool(s) have non-power-of-two pg_num pool 'us-east-1.rgw.buckets.data' pg_num 2480 is not a power of two [WRN] SLOW_OPS: 2371 slow ops, oldest one blocked for 6218 sec, daemons [osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... have slow ops. On 11/4/2022 10:45 AM, Adrian Nicolae wrote: Hi, We have a Pacifi
[ceph-users] Re: What is the reason of the rgw_user_quota_bucket_sync_interval and rgw_bucket_quota_ttl values?
Den fre 4 nov. 2022 kl 10:48 skrev Szabo, Istvan (Agoda) : > Hi, > One of my user told me that they can upload bigger files to the bucket than > the limit. My question is to the developers mainly what’s the reason to set > the rgw_bucket_quota_ttl=600 and rgw_user_quota_bucket_sync_interval=180? I > don’t want to set to 0 before I know the reason 😃 > With this settings if the user has pretty high bandwidth they can upload > terabytes of files before the 10minutes limit reached. The reason is probably that reading/updating it and syncing stats between rgws is a costly operation in time, so if you do it for every 4k object someone uploads, the overhead will be very noticeable. For us mortals whose systems do not allow for many TBs in <600s, the default timeout is mostly fine. Even if you check when the file is about to be created, it could grow quite large while uploading, so you would still pass the limit after it has been finalized and closed. It's never 100% foolproof but more like a 'limit within reason'. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Question about quorum
Hi Tyler, thanks for clarifying, it makes total sense now. Hypothetically, if there are any failures and most stop, how can I re-initialize the cluster in its current state or what can be done in this kind of case? Em qui., 3 de nov. de 2022 às 17:00, Tyler Brekke escreveu: > Hi Murilo, > > Since we need a majority to maintain a quorum when you lost 2 mons, you > only had 50% available and lost quorum. This is why all recommendations > specify having an odd number of mons. As you do not get any added > availability with 4 instead of 3. If you had 5 mons, you can lose two > without losing availability. > > > On Thu, Nov 3, 2022, 2:55 PM Murilo Morais wrote: > >> Good afternoon everyone! >> >> I have a lab with 4 mons, I was testing the behavior in case a certain >> amount of hosts went offline, as soon as the second one went offline >> everything stopped. It would be interesting if there was a fifth node to >> ensure that, if two fall, everything will work, but why did everything >> stop >> with only 2 nodes when if there were 3 nodes in the cluster and one fell, >> everything would still be working? Is there no way to get this behavior >> with 4 nodes? >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph filesystem stuck in read only
Hi, i'm looking for some help/ideas/advices in order to solve the problem that occurs on my metadata server after the server reboot. "Ceph status" warns about my MDS being "read only" but the fileystem and the data seem healthy. It is still possible to access the content of my cephfs volumes since it's read only but i don't know how to make my filesystem writable again. Logs keeps showing the same error when i restart the MDS server : 2022-11-04T11:50:14.506+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map state change up:reconnect --> up:rejoin 2022-11-04T11:50:14.510+0100 7fbbf83c2700 1 mds.0.6872 rejoin_start 2022-11-04T11:50:14.510+0100 7fbbf83c2700 1 mds.0.6872 rejoin_joint_start 2022-11-04T11:50:14.702+0100 7fbbf83c2700 1 mds.0.6872 rejoin_done 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.node3-5 Updating MDS map to version 6881 from mon.3 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map i am now mds.0.6872 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map state change up:rejoin --> up:active 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 recovery_done -- successful recovery! 2022-11-04T11:50:15.550+0100 7fbbf83c2700 1 mds.0.6872 active_start 2022-11-04T11:50:15.558+0100 7fbbf83c2700 1 mds.0.6872 cluster recovered. 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request. 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: rank=1 was never sent ping request. 2022-11-04T11:50:18.554+0100 7fbbf23b6700 1 mds.0.cache.dir(0x106cf14) commit error -22 v 1933183 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 log_channel(cluster) log [ERR] : failed to commit dir 0x106cf14 object, errno -22 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 mds.0.6872 unhandled write error (22) Invalid argument, force readonly... 2022-11-04T11:50:18.554+0100 7fbbf23b6700 1 mds.0.cache force file system read-only 2022-11-04T11:50:18.554+0100 7fbbf23b6700 0 log_channel(cluster) log [WRN] : force file system read-only More info: cluster: id: f36b996f-221d-4bcb-834b-19fc20bcad6b health: HEALTH_WARN 1 MDSs are read only 1 MDSs behind on trimming services: mon: 5 daemons, quorum node2-4,node2-5,node3-4,node3-5,node1-1 (age 22h) mgr: node2-4(active, since 28h), standbys: node2-5, node3-4, node3-5, node1-1 mds: 3/3 daemons up, 3 standby osd: 112 osds: 112 up (since 22h), 112 in (since 2w) data: volumes: 2/2 healthy pools: 12 pools, 529 pgs objects: 8.54M objects, 1.9 TiB usage: 7.8 TiB used, 38 TiB / 46 TiB avail pgs: 491 active+clean 29 active+clean+snaptrim 9 active+clean+snaptrim_wait All MDSs, MONs and OSDs are in version 16.2.9. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Question about quorum
Den fre 4 nov. 2022 kl 13:37 skrev Murilo Morais : > Hi Tyler, thanks for clarifying, it makes total sense now. > Hypothetically, if there are any failures and most stop, how can I > re-initialize the cluster in its current state or what can be done in this > kind of case? > Just add one more mon so you have 5? -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)
I have upgraded the majority of the nodes in a cluster that I manage from CentOS 8.6 to AlmaLinux 9. We have done the upgrade by emptying one node at a time and then reinstalling and bringing it back into the cluster. With AlmaLinux 9 I install the default "Server without GUI" packages and run with default SE Linux and firewall settings with good results. Before starting the upgrade we first upgraded to Ceph 17.2.3. On a new empty node I first add cephadm and then run "cephadm add-repo --release quincy", then install "ceph-common", "cephadm" and the dependencies with dnf. I have been doing the upgrade very slowly on purpose over several weeks and it has not been an issue for the users, the upgrade is not finished yet, 3 storage nodes are still in progress and my rados gateways will be last. /Jimmy On Thu, Nov 3, 2022 at 4:02 PM Prof. Dr. Christian Dietrich wrote: > > > Hi all, > > we're running a ceph cluster with v15.2.17 and cephadm on various CentOS > hosts. Since CentOS 8.x is EOL, we'd like to upgrade/migrate/reinstall > the OS, possibly migrating to Rocky or CentOS stream: > > host | CentOS | Podman > -|--|--- > osd* | 7.9.2009 | 1.6.4 x5 > osd* | 8.4.2105 | 3.0.1 x2 > mon0 | 8.4.2105 | 3.2.3 > mon1 | 8.4.2105 | 3.0.1 > mon2 | 8.4.2105 | 3.0.1 > mds* | 7.9.2009 | 1.6.4 x2 > > We have a few specific questions: > 1) Does anyone have experience using Rocky Linux 8 or 9 or CentOS stream > with ceph? Rocky is not mentioned specifically in the cephadm docs [2]. > > 2) Is the Podman compatibility list [1] still up to date? CentOS Stream > 8 as of 2022-10-19 appears to have Podman version 4.x, IIRC. 4.x does > not appear in the compatibility table. Anyone using Podman 4.x > successfully (with which ceph version)? > > Thanks in advance, > > Chris > > > [1]: > https://docs.ceph.com/en/quincy/cephadm/compatibility/#compatibility-with-podman-versions > > [2]: > https://docs.ceph.com/en/quincy/cephadm/install/#cephadm-install-distros > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs are not utilized evenly
Hi Denis, can you share the following data points? ceph osd df tree (to see how the osd's are distributed) ceph osd crush rule dump (to see what your ec rule looks like) ceph osd pool ls detail (to see the pools and pools to crush rule mapping and pg nums) Also "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect is the auto scaler currently adjusting your pg counts? -Joseph On Wed, Nov 2, 2022 at 5:01 PM Denis Polom wrote: > Hi Joseph, > > thank you for answer. But if I'm looking correctly to 'ceph osd df' output > I posted I see there are about 195 PGs per OSD. > > There are 608 OSDs in the pool, which is the only data pool. What I have > calculated - PG calc says that PG number is fine. > > > On 11/1/22 14:03, Joseph Mundackal wrote: > > If the GB per pg is high, the balancer module won't be able to help. > > Your pg count per osd also looks low (30's), so increasing pgs per pool > would help with both problems. > > You can use the pg calculator to determine which pools need what > > On Tue, Nov 1, 2022, 08:46 Denis Polom wrote: > >> Hi >> >> I observed on my Ceph cluster running latest Pacific that same size OSDs >> are utilized differently even if balancer is running and reports status >> as perfectly balanced. >> >> { >> "active": true, >> "last_optimize_duration": "0:00:00.622467", >> "last_optimize_started": "Tue Nov 1 12:49:36 2022", >> "mode": "upmap", >> "optimize_result": "Unable to find further optimization, or pool(s) >> pg_num is decreasing, or distribution is already perfect", >> "plans": [] >> } >> >> balancer settings for upmap are: >> >>mgr advanced >> mgr/balancer/mode upmap >>mgr advanced mgr/balancer/upmap_max_deviation >> 1 >>mgr advanced mgr/balancer/upmap_max_optimizations >> 20 >> >> It's obvious that utilization is not same (difference is about 1TB) from >> command `ceph osd df`. Following is just a partial output: >> >> ID CLASS WEIGHTREWEIGHT SIZE RAW USE DATA OMAP >> META AVAIL%USE VAR PGS STATUS >>0hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 3.0 MiB >> 37 GiB 3.6 TiB 78.09 1.05 196 up >> 124hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 32 >> GiB 4.7 TiB 71.20 0.96 195 up >> 157hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 5.3 MiB 35 >> GiB 3.7 TiB 77.67 1.05 195 up >>1hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 2.0 MiB >> 35 GiB 3.7 TiB 77.69 1.05 195 up >> 243hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 31 >> GiB 4.7 TiB 71.16 0.96 195 up >> 244hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 31 >> GiB 4.7 TiB 71.19 0.96 195 up >> 245hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 32 >> GiB 4.7 TiB 71.55 0.96 196 up >> 246hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 31 >> GiB 4.7 TiB 71.17 0.96 195 up >> 249hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 30 >> GiB 4.7 TiB 71.18 0.96 195 up >> 500hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 30 >> GiB 4.7 TiB 71.19 0.96 195 up >> 501hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 31 >> GiB 4.7 TiB 71.57 0.96 196 up >> 502hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 31 >> GiB 4.7 TiB 71.18 0.96 195 up >> 532hdd 18.00020 1.0 16 TiB 12 TiB 12 TiB 0 B 31 >> GiB 4.7 TiB 71.16 0.96 195 up >> 549hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 576 KiB 36 >> GiB 3.7 TiB 77.70 1.05 195 up >> 550hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 3.8 MiB 36 >> GiB 3.7 TiB 77.67 1.05 195 up >> 551hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 2.4 MiB 35 >> GiB 3.7 TiB 77.68 1.05 195 up >> 552hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 5.5 MiB 35 >> GiB 3.7 TiB 77.69 1.05 195 up >> 553hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 5.1 MiB 37 >> GiB 3.6 TiB 77.71 1.05 195 up >> 554hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 967 KiB 36 >> GiB 3.6 TiB 77.71 1.05 195 up >> 555hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 1.3 MiB 36 >> GiB 3.6 TiB 78.08 1.05 196 up >> 556hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 4.7 MiB 36 >> GiB 3.6 TiB 78.10 1.05 196 up >> 557hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 2.4 MiB 36 >> GiB 3.7 TiB 77.69 1.05 195 up >> 558hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 4.5 MiB 36 >> GiB 3.6 TiB 77.72 1.05 195 up >> 559hdd 18.00020 1.0 16 TiB 13 TiB 13 TiB 1.5 MiB 35 >> GiB 3.6 TiB 78.09 1.05 196
[ceph-users] Re: ceph filesystem stuck in read only
On Fri, Nov 4, 2022 at 9:36 AM Galzin Rémi wrote: > > > Hi, > i'm looking for some help/ideas/advices in order to solve the problem > that occurs on my metadata > server after the server reboot. You rebooted a MDS's host and your file system became read-only? Was the Ceph cluster healthy before reboot? Any issues with the MDSs, OSDs? Did this happen after an upgrade? > "Ceph status" warns about my MDS being "read only" but the fileystem and > the data seem healthy. > It is still possible to access the content of my cephfs volumes since > it's read only but i don't know how > to make my filesystem writable again. > > Logs keeps showing the same error when i restart the MDS server : > > 2022-11-04T11:50:14.506+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map > state change up:reconnect --> up:rejoin > 2022-11-04T11:50:14.510+0100 7fbbf83c2700 1 mds.0.6872 rejoin_start > 2022-11-04T11:50:14.510+0100 7fbbf83c2700 1 mds.0.6872 > rejoin_joint_start > 2022-11-04T11:50:14.702+0100 7fbbf83c2700 1 mds.0.6872 rejoin_done > 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.node3-5 Updating MDS > map to version 6881 from mon.3 > 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map i > am now mds.0.6872 > 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 handle_mds_map > state change up:rejoin --> up:active > 2022-11-04T11:50:15.546+0100 7fbbf83c2700 1 mds.0.6872 recovery_done -- > successful recovery! > 2022-11-04T11:50:15.550+0100 7fbbf83c2700 1 mds.0.6872 active_start > 2022-11-04T11:50:15.558+0100 7fbbf83c2700 1 mds.0.6872 cluster > recovered. > 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: > rank=0 was never sent ping request. > 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: > rank=1 was never sent ping request. > 2022-11-04T11:50:18.554+0100 7fbbf23b6700 1 > mds.0.cache.dir(0x106cf14) commit error -22 v 1933183 > 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 log_channel(cluster) log > [ERR] : failed to commit dir 0x106cf14 object, errno -22 > 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 mds.0.6872 unhandled write > error (22) Invalid argument, force readonly... > 2022-11-04T11:50:18.554+0100 7fbbf23b6700 1 mds.0.cache force file > system read-only The MDS is unable to write a metadata object to the OSD. Set debug_mds=20 and debug_objecter=20 for the MDS, and capture the MDS logs when this happens for more details. e.g., $ ceph config set mds. debug_mds 20 Also, check the OSD logs when you're hitting this issue. You can then reset the MDS log level. You can share the relevant MDS and OSD logs using, https://docs.ceph.com/en/pacific/man/8/ceph-post-file/ > 2022-11-04T11:50:18.554+0100 7fbbf23b6700 0 log_channel(cluster) log > [WRN] : force file system read-only > > More info: > >cluster: > id: f36b996f-221d-4bcb-834b-19fc20bcad6b > health: HEALTH_WARN > 1 MDSs are read only > 1 MDSs behind on trimming > >services: > mon: 5 daemons, quorum node2-4,node2-5,node3-4,node3-5,node1-1 (age > 22h) > mgr: node2-4(active, since 28h), standbys: node2-5, node3-4, > node3-5, node1-1 > mds: 3/3 daemons up, 3 standby > osd: 112 osds: 112 up (since 22h), 112 in (since 2w) > >data: > volumes: 2/2 healthy > pools: 12 pools, 529 pgs > objects: 8.54M objects, 1.9 TiB > usage: 7.8 TiB used, 38 TiB / 46 TiB avail > pgs: 491 active+clean > 29 active+clean+snaptrim > 9 active+clean+snaptrim_wait > > All MDSs, MONs and OSDs are in version 16.2.9. > What are the outputs of `ceph fs status` and `ceph fs dump`? -Ramana > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io