[ceph-users] Ceph monitor won't start after Ubuntu update
Hello Ceph-users, I've upgraded my Ubuntu server from 18.04.5 LTS to Ubuntu 20.04.2 LTS via 'do-release-upgrade', during that process ceph packages were upgraded from Luminous to Octopus and now ceph-mon daemon(I have only one) won't start, log error is: "2021-06-15T20:23:41.843+ 7fbb55e9b540 -1 mon.target@-1(probing) e2 current monmap has recorded min_mon_release 12 (luminous) is >2 releases older than installed 15 (octopus); you can only upgrade 2 releases at a time you should first upgrade to 13 (mimic) or 14 (nautilus) stopping." Is there any way to get cluster running or at least get data from OSDs? Will appreciate any help. Thank you -- Best regards, Petr ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph monitor won't start after Ubuntu update
Hello Konstantin, Wednesday, June 16, 2021, 1:50:55 PM, you wrote: > Hi, >> On 16 Jun 2021, at 01:33, Petr wrote: >> >> I've upgraded my Ubuntu server from 18.04.5 LTS to Ubuntu 20.04.2 LTS via >> 'do-release-upgrade', >> during that process ceph packages were upgraded from Luminous to Octopus and >> now ceph-mon daemon(I have only one) won't start, log error is: >> "2021-06-15T20:23:41.843+ 7fbb55e9b540 -1 mon.target@-1(probing) e2 >> current monmap has recorded min_mon_release 12 (luminous) is >2 releases >> older than installed 15 (octopus); >> you can only upgrade 2 releases at a time you should first upgrade to 13 >> (mimic) or 14 (nautilus) stopping." >> >> Is there any way to get cluster running or at least get data from OSDs? > Ceph is not supported +3 releases upgrade, only +1 or +2. Yep, already got that. > I suggest to install Nautilus packages and start cluster again I would like to, but there is no Nautilus packages for Ubuntu 20(focal). > k > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Best regards, Petr ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Large amount of empty objects in unused cephfs data pool
I created a cephfs using mgr dashboard, which created two pools: cephfs.fs.meta and cephfs.fs.data We are using custom provisioning for user defined volumes (users provide yaml manifests with definition of what they want) which creates dedicated data pools for them, so cephfs.fs.data is never used for anything, it's literally empty despite dashboard reporting about 120kb in there (probably metadata files from new subvolume API that we are using). I wanted to decrease number of PGs in that unused cephfs.fs.data pool and to my surprise I received warning about uneved object distribution (too many objects per PG). So I dived deeper and figured out that there are in fact many objects in seemingly empty cephfs.fs.data: PROD [root@ceph-drc-mgmt ~]# rados -p cephfs.fs.data ls | head 12a3b4a. 11a1876. 11265b5. 12af216. 14e07ec. 153a31f. 1455214. 13e4c36. 149e91a. 15d0bc7. When I tried dumping any of those objects - they are empty, 0 bytes each of them. But there is over 7 millions of them: PROD [root@ceph-drc-mgmt ~]# rados -p cephfs.fs.data ls | wc -l 7260394 Why is that unused pool containing 7 million empty objects? Is that some kind of bug in MDS? It's 18.2.2 Thanks ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] MDS not becoming active after migrating to cephadm
Hi, We’ve recently upgraded from Nautilus to Pacific, and tried moving our services to cephadm/ceph orch. For some reason, MDS nodes deployed through orch never become active (or at least standby-replay). Non-dockerized MDS nodes can still be deployed and work fine. Non-dockerized mds version is 16.2.6, docker image version is 16.2.5-387-g7282d81d (came as a default). In the MDS log, the only related message is monitors assigning MDS as standby. Increasing the log level does not help much, it only adds beacon messages. Monitor log also contains no differences compared to a non-dockerized MDS startup. Mds metadata command output is identical to that of a non-dockerized MDS. The only difference I can see in the log is the value in curly braces after the node name, e.g. mds.storage{0:1234ff}. For dockerized MDS, the first value is , for non-dockerized it’s zero. Compat flags are identical. Could someone please advise me why the dockerized MDS is being stuck as a standby? Maybe some config values missing or smth? Best regards, Petr ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MDS not becoming active after migrating to cephadm
Hi Weiwen, Yes, we did that during the upgrade. In fact, we did that multiple times even after the upgrade to see if it will resolve the issue (disabling hot standby, scaling everything down to a single MDS, swapping it with the new one, scaling back up). The upgrade itself went fine, problems started during the migration to cephadm (which was done after migrating everything to Pacific). It only occurs when using dockerized MDS. Non-dockerized MDS nodes, also Pacific, everything runs fine. Petr > On 4 Oct 2021, at 12:43, 胡 玮文 wrote: > > Hi Petr, > > Please read https://docs.ceph.com/en/latest/cephfs/upgrading/ > <https://docs.ceph.com/en/latest/cephfs/upgrading/> for MDS upgrade procedure. > > In short, when upgrading to 16.2.6, you need to disable standby-replay and > reduce the number of ranks to 1. > > Weiwen Hu > > 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送 > > 发件人: Petr Belyaev <mailto:p.bely...@alohi.com> > 发送时间: 2021年10月4日 18:00 > 收件人: ceph-users@ceph.io <mailto:ceph-users@ceph.io> > 主题: [ceph-users] MDS not becoming active after migrating to cephadm > > Hi, > > We’ve recently upgraded from Nautilus to Pacific, and tried moving our > services to cephadm/ceph orch. > For some reason, MDS nodes deployed through orch never become active (or at > least standby-replay). Non-dockerized MDS nodes can still be deployed and > work fine. Non-dockerized mds version is 16.2.6, docker image version is > 16.2.5-387-g7282d81d (came as a default). > > In the MDS log, the only related message is monitors assigning MDS as > standby. Increasing the log level does not help much, it only adds beacon > messages. > Monitor log also contains no differences compared to a non-dockerized MDS > startup. > Mds metadata command output is identical to that of a non-dockerized MDS. > > The only difference I can see in the log is the value in curly braces after > the node name, e.g. mds.storage{0:1234ff}. For dockerized MDS, the first > value is , for non-dockerized it’s zero. Compat flags are identical. > > Could someone please advise me why the dockerized MDS is being stuck as a > standby? Maybe some config values missing or smth? > > Best regards, > Petr > ___ > ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> > To unsubscribe send an email to ceph-users-le...@ceph.io > <mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MDS not becoming active after migrating to cephadm
Just tried it, stopped all mds nodes and created one using orch. Result: 0/1 daemons up (1 failed), 1 standby. Same as before, and logs don’t show any errors as well. I’ll probably try upgrading the orch-based setup to 16.2.6 over the weekend to match the exact non-dockerized MDS version, maybe it will work. > On 4 Oct 2021, at 13:41, 胡 玮文 wrote: > > By saying upgrade, I mean upgrade from the non-dockerized 16.2.5 to cephadm > version 16.2.6. So I think you need to disable standby-replay and reduce the > number of ranks to 1, then stop all the non-dockerized mds, deploy new mds > with cephadm. Only scaling back up after you finish the migration. Did you > also tried that? > > In fact, similar issue has been reported several times on this list when > upgrade mds to 16.2.6, e.g. [1]. I have faced that too. So I’m pretty > confident that you are facing the same issue. > > [1]: > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/ > > <https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KQ5A5OWRIUEOJBC7VILBGDIKPQGJQIWN/> > >> 在 2021年10月4日,19:00,Petr Belyaev 写道: >> >> Hi Weiwen, >> >> Yes, we did that during the upgrade. In fact, we did that multiple times >> even after the upgrade to see if it will resolve the issue (disabling hot >> standby, scaling everything down to a single MDS, swapping it with the new >> one, scaling back up). >> >> The upgrade itself went fine, problems started during the migration to >> cephadm (which was done after migrating everything to Pacific). >> It only occurs when using dockerized MDS. Non-dockerized MDS nodes, also >> Pacific, everything runs fine. >> >> Petr >> >>> On 4 Oct 2021, at 12:43, 胡 玮文 >> <mailto:huw...@outlook.com>> wrote: >>> >>> Hi Petr, >>> >>> Please read https://docs.ceph.com/en/latest/cephfs/upgrading/ >>> <https://docs.ceph.com/en/latest/cephfs/upgrading/> for MDS upgrade >>> procedure. >>> >>> In short, when upgrading to 16.2.6, you need to disable standby-replay and >>> reduce the number of ranks to 1. >>> >>> Weiwen Hu >>> >>> 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送 >>> >>> 发件人: Petr Belyaev <mailto:p.bely...@alohi.com> >>> 发送时间: 2021年10月4日 18:00 >>> 收件人: ceph-users@ceph.io <mailto:ceph-users@ceph.io> >>> 主题: [ceph-users] MDS not becoming active after migrating to cephadm >>> >>> Hi, >>> >>> We’ve recently upgraded from Nautilus to Pacific, and tried moving our >>> services to cephadm/ceph orch. >>> For some reason, MDS nodes deployed through orch never become active (or at >>> least standby-replay). Non-dockerized MDS nodes can still be deployed and >>> work fine. Non-dockerized mds version is 16.2.6, docker image version is >>> 16.2.5-387-g7282d81d (came as a default). >>> >>> In the MDS log, the only related message is monitors assigning MDS as >>> standby. Increasing the log level does not help much, it only adds beacon >>> messages. >>> Monitor log also contains no differences compared to a non-dockerized MDS >>> startup. >>> Mds metadata command output is identical to that of a non-dockerized MDS. >>> >>> The only difference I can see in the log is the value in curly braces after >>> the node name, e.g. mds.storage{0:1234ff}. For dockerized MDS, the first >>> value is , for non-dockerized it’s zero. Compat flags are identical. >>> >>> Could someone please advise me why the dockerized MDS is being stuck as a >>> standby? Maybe some config values missing or smth? >>> >>> Best regards, >>> Petr >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io <mailto:ceph-users@ceph.io> >>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> <mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Testing CEPH scrubbing / self-healing capabilities
Hello, I wanted to try out (lab ceph setup) what exactly is going to happen when parts of data on OSD disk gets corrupted. I created a simple test where I was going through the block device data until I found something that resembled user data (using dd and hexdump) (/dev/sdd is a block device that is used by OSD) INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | hexdump -C 6e 20 69 64 3d 30 20 65 78 65 3d 22 2f 75 73 72 |n id=0 exe="/usr| 0010 2f 73 62 69 6e 2f 73 73 68 64 22 20 68 6f 73 74 |/sbin/sshd" host| Then I deliberately overwrote 32 bytes using random data: INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/urandom of=/dev/sdd bs=32 count=1 seek=33920 INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | hexdump -C 25 75 af 3e 87 b0 3b 04 78 ba 79 e3 64 fc 76 d2 |%u.>..;.x.y.d.v.| 0010 9e 94 00 c2 45 a5 e1 d2 a8 86 f1 25 fc 18 07 5a |E..%...Z| At this point I would expect some sort of data corruption. I restarted the OSD daemon on this host to make sure it flushes any potentially buffered data. It restarted OK without noticing anything, which was expected. Then I ran ceph osd scrub 5 ceph osd deep-scrub 5 And waiting for all scheduled scrub operations for all PGs to finish. No inconsistency was found. No errors reported, scrubs just finished OK, data are still visibly corrupt via hexdump. Did I just hit some block of data that WAS used by OSD, but was marked deleted and therefore no longer used or am I missing something? I would expect CEPH to detect disk corruption and automatically replace the invalid data with a valid copy? I use only replica pools in this lab setup, for RBD and CephFS. Thanks ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Hello, No I don't have osd_scrub_auto_repair, interestingly after about a week after forgetting about this, an error manifested: [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent pg 4.1d is active+clean+inconsistent, acting [4,2] which could be repaired as expected, since I damaged only 1 OSD. It's interesting it took a whole week to find it. For some reason it seems to be that running deep-scrub on entire OSD only runs it for PGs where the OSD is considered "primary", so maybe that's why it wasn't detected when I ran it manually? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Most likely it wasn't, the ceph help or documentation is not very clear about this: osd deep-scrub initiate deep scrub on osd , or use to deep scrub all It doesn't say anything like "initiate deep scrub of primary PGs on osd" I assumed it just runs a scrub of everything on given OSD. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Documentation for meaning of "tag cephfs" in OSD caps
Hello In https://docs.ceph.com/en/latest/cephfs/client-auth/ we can find that ceph fs authorize cephfs_a client.foo / r /bar rw Results in client.foo key: *key* caps: [mds] allow r, allow rw path=/bar caps: [mon] allow r caps: [osd] allow rw tag cephfs data=cephfs_a What is this "tag cephfs" thing? It seems like some undocumented black magic to me, since I can't find anything that documents it. Can someone explain how it works under the hood? What does it expand to? What does it limit and how? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Multisite RGW setup not working when following the docs step by step
Hello, My goal is to setup multisite RGW with 2 separate CEPH clusters in separate datacenters, where RGW data are being replicated. I created a lab for this purpose in both locations (with latest reef ceph installed using cephadm) and tried to follow this guide: https://docs.ceph.com/en/reef/radosgw/multisite/ Unfortunatelly, even after multiple attempts it always failed when creating a secondary zone. I could succesfully pull the realm from master, but that was pretty much the last trully succesful step. I can notice that immediately after pulling the realm to secondary, radosgw-admin user list returns an empty list (which IMHO should contain replicated user list from master). Continuing by setting default real and zonegroup and creating the secondary zone in secondary cluster I end up having 2 zones in each cluster, both seemingly in same zonegroup, but with replication failing - this is what I see in sync status: (master) [ceph: root@ceph-lab-brn-01 /]# radosgw-admin sync status realm d2c4ebf9-e156-4c4e-9d56-3fff6a652e75 (ceph) zonegroup abc3c0ae-a84d-48d4-8e78-da251eb78781 (cz) zone 97fb5842-713a-4995-8966-5afe1384f17f (cz-brn) current time 2023-08-30T12:58:12Z zonegroup features enabled: resharding disabled: compress-encrypted metadata sync no sync (zone is master) 2023-08-30T12:58:13.991+ 7f583a52c780 0 ERROR: failed to fetch datalog info data sync source: 13a8c663-b241-4d8a-a424-8785fc539ec5 (cz-hol) failed to retrieve sync info: (13) Permission denied (secondary) [ceph: root@ceph-lab-hol-01 /]# radosgw-admin sync status realm d2c4ebf9-e156-4c4e-9d56-3fff6a652e75 (ceph) zonegroup abc3c0ae-a84d-48d4-8e78-da251eb78781 (cz) zone 13a8c663-b241-4d8a-a424-8785fc539ec5 (cz-hol) current time 2023-08-30T12:58:54Z zonegroup features enabled: resharding disabled: compress-encrypted metadata sync failed to read sync status: (2) No such file or directory 2023-08-30T12:58:55.617+ 7ff37c9db780 0 ERROR: failed to fetch datalog info data sync source: 97fb5842-713a-4995-8966-5afe1384f17f (cz-brn) failed to retrieve sync info: (13) Permission denied In master there is one user created during the process (synchronization-user), on slave there are no users and when I try to re-create this synchronization user it complains I shouldn't even try and instead execute the command on master. I can see same realm and zonegroup IDs on both sides, zone list is different though: (master) [ceph: root@ceph-lab-brn-01 /]# radosgw-admin zone list { "default_info": "97fb5842-713a-4995-8966-5afe1384f17f", "zones": [ "cz-brn", "default" ] } (secondary) [ceph: root@ceph-lab-hol-01 /]# radosgw-admin zone list { "default_info": "13a8c663-b241-4d8a-a424-8785fc539ec5", "zones": [ "cz-hol", "default" ] } The permission denied error is puzzling me - could it be because real pull didn't sync the users? I tried this multiple times with clean ceph install on both sides - and always ended up the same. I even tried force creating the same user with same secrets on the other side, but it didn't help. How can I debug what kind of secret is secondary trying to use when communicating with master? Could it be that this multisite RGW setup is not yet truly supported in reef? I noticed that the documentation itself seems written for older ceph versions, as there are no mentions about orchestrator (for example in steps where configuration files of RGW need to be edited, which is done differently when using cephadm). I think that documentation is simply wrong at this time. Either it's missing some crucial steps, or it's outdated or otherwise unclear - simply by following all the steps as outlined there, you are likely to end up the same. Thanks for help! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] CephFS clients waiting for lock when one of them goes slow
Hi all, I have a very small Ceph setup - 3 OSDs, 3 MDS, 3 MON, CephFS with ~20 ceph-fuse clients connected. All run version 14.2.9 (both servers and clients). Some clients work with a relatively large CephFS directory, few tens of thousands of files, with lots of files being added and deleted all the time. Almost every day I see exactly the same pattern when system goes under load: - One of the clients starts running slow (e.g. high CPU or out-of-memory, sometimes with OOM killer intervention) - A number of requests from other clients gets blocked for ~15 seconds waiting for the (r/w/x)lock on other files in the same directory - Objecter requests empty - No requests visibly running in the dump_ops_in_flight output, only those waiting for the lock MDS cache size is 1.5 gb, total capacity - around 1TB, usual load is ~300 [rw]ops, 3-4 mb/s, most of the config is the default one. Have somebody seen similar issues before? Best regards, Petr Belyaev ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] postgresql vs ceph, fsync
We are evaluating pros and cons of running postgresql backed by ceph. We know that running pg on dedicated physical hw is highly recommended, but we've got our reasons. So to the question: What could happen if we switch fsync to off on postgre backed by ceph? The increase of perfomance is huge, which is the reason we are considering it. Ceph pool is running on osds connected to controllers with batteries so we could mitigate power losses. What happens when an osd run out of space? Are there other considerations? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io