[ceph-users] Re: Remapping OSDs under a PG
Create a crush rule that only chooses non-ssd drives, then ceph osd pool set crush_rule YourNewRuleName and it will move over to the non-ssd OSDs. Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen : > > > I’m very new to Ceph so if this question makes no sense, I apologize. > Continuing to study but I thought an answer to this question would help me > understand Ceph a bit more. > > Using cephadm, I set up a cluster. Cephadm automatically creates a pool for > Ceph metrics. It looks like one of my ssd osd’s was allocated for the PG. > I’d like to understand how to remap this PG so it’s not using the SSD OSDs. > > ceph pg map 1.0 > osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10] > > OSD 28 is the SSD. > > Is this possible? Does this make any sense? I’d like to reserve the SSDs > for their own pool. > > Thank you! > -jeremy > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Remapping OSDs under a PG
Thank you both for your response. So this leads me to the next question: ceph osd crush rule create-replicated What is and in this case? It also looks like this is responsible for things like “rack awareness” type attributes which is something I’d like to utilize.: # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 zone type 10 region type 11 root This is something I will eventually take advantage of as well. Thank you! -jeremy > On May 28, 2021, at 12:03 AM, Janne Johansson wrote: > > Create a crush rule that only chooses non-ssd drives, then > ceph osd pool set crush_rule YourNewRuleName > and it will move over to the non-ssd OSDs. > > Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen : >> >> >> I’m very new to Ceph so if this question makes no sense, I apologize. >> Continuing to study but I thought an answer to this question would help me >> understand Ceph a bit more. >> >> Using cephadm, I set up a cluster. Cephadm automatically creates a pool for >> Ceph metrics. It looks like one of my ssd osd’s was allocated for the PG. >> I’d like to understand how to remap this PG so it’s not using the SSD OSDs. >> >> ceph pg map 1.0 >> osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10] >> >> OSD 28 is the SSD. >> >> Is this possible? Does this make any sense? I’d like to reserve the SSDs >> for their own pool. >> >> Thank you! >> -jeremy >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > May the most significant bit of your life be positive. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io signature.asc Description: Message signed with OpenPGP ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: XFS on RBD on EC painfully slow
On Thu, May 27, 2021 at 02:54:00PM -0500, Reed Dier wrote: > Hoping someone may be able to help point out where my bottleneck(s) may be. > > I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of > that. > This was not an ideal scenario, rather it was a rescue mission to dump a > large, aging raid array before it was too late, so I'm working with the hand > I was dealt. > > To further conflate the issues, the main directory structure consists of lots > and lots of small file sizes, and deep directories. > > My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but > its just unbearably slow and will take ~150 days to transfer ~35TB, which is > far from ideal. (Disclaimer: no experience with cephfs) I found rsync a wonderful tool for long distances and large files, less so for local networks and small files, even with local disks. Usually I do something like ( cd src/ && tar --acls --xattrs --numeric-owner --sparse -cf - . ) | pv -pterab | (cd dst/ && tar --acls --xattrs --numeric-owner --sparse -xf -) If src and dst are not mounted on the same machine you can use netcat/socat to stream the tar from one system to the other, or pipe it through ssh if you need encrypted transport. This does not have the resume capability of rsync, but for small files it is much faster. After that you can still throw in a final rsync for changes accumulated while the initial transfer was running. Matthias ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Messed up placement of MDS
I managed by experimenting how to get rid of that wrongly created MDS service, so for those who are looking for that information too, I used the following command: ceph orch rm mds.label:mds ‐‐‐ Original Message ‐‐‐ On Thursday, May 27, 2021 9:16 PM, mabi wrote: > Hello, > > I am trying to place the two MDS daemons for CephFS on dedicated nodes. For > that purpose I tried out a few different "cephadm orch apply ..." commands > with a label but at the end it looks like I messed up with the placement as I > now have two mds service_types as you can see below: > > ceph orch ls --service-type mds --export > > = > > service_type: mds > service_id: ceph1fs > service_name: mds.ceph1fs > placement: > count: 2 > hosts: > > - ceph1g > - ceph1a > > service_type: mds > service_id: label:mds > service_name: mds.label:mds > placement: > count: 2 > > This second entry at the bottom seems totally wrong and I would like to > remove it but I haven't found how to remove it totally. Any ideas? > > Ideally I just want to place two MDS daemons on node ceph1a and ceph1g. > > Regards, > Mabi ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs auditing
On 5/27/21 10:47 PM, Michael Thomas wrote: Is there a way to log or track which cephfs files are being accessed? This would help us in planning where to place certain datasets based on popularity, eg on a EC HDD pool or a replicated SSD pool. I know I can run inotify on the ceph clients, but I was hoping that the MDS would have a way to log this information centrally. Progress has been made on performance metrics of CephFS [1]. However AFAIK it does not keep track on /what/ files are being accesssed. Gr. Stefan [1]: https://docs.ceph.com/en/latest/cephfs/cephfs-top/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: XFS on RBD on EC painfully slow
Hi Reed, Have you tried just start multiple rsync process simultaneously to transfer different directories? Distributed system like ceph often benefits from more parallelism. Weiwen Hu > 在 2021年5月28日,03:54,Reed Dier 写道: > > Hoping someone may be able to help point out where my bottleneck(s) may be. > > I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of > that. > This was not an ideal scenario, rather it was a rescue mission to dump a > large, aging raid array before it was too late, so I'm working with the hand > I was dealt. > > To further conflate the issues, the main directory structure consists of lots > and lots of small file sizes, and deep directories. > > My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but > its just unbearably slow and will take ~150 days to transfer ~35TB, which is > far from ideal. > >> 15.41G 79%4.36MB/s0:56:09 (xfr#23165, ir-chk=4061/27259) > >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.170.001.34 13.230.00 85.26 >> >> Devicer/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s >> wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm >> d_await dareq-sz aqu-sz %util >> rbd0 124.00 0.66 0.00 0.00 17.30 5.48 50.00 >> 0.17 0.00 0.00 31.70 3.490.00 0.00 0.00 0.00 >> 0.00 0.003.39 96.40 > > Rsync progress and iostat (during the rsync) from the rbd to a local ssd, to > remove any bottlenecks doubling back to cephfs. > About 16G in 1h, not exactly blazing, this being 5 of the 7000 directories > I'm looking to offload to cephfs. > > Currently running 15.2.11, and the host is Ubuntu 20.04 (5.4.0-72-generic) > with a single E5-2620, 64GB of memory, and 4x10GbT bond talking to ceph, > iperf proves it out. > EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, > and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in > play here. > Only 128 PGs, in this pool, but its the only RBD image in this pool. > Autoscaler recommends going to 512, but was hoping to avoid the performance > overhead of the PG splits if possible, given perf is bad enough as is. > > Examining the main directory structure it looks like there are 7000 files per > directory, about 60% of which are <1MiB, and in all totaling nearly 5GiB per > directory. > > My fstab for this is: >> xfs_netdev,noatime00 > > I tried to increase the read_ahead_kb to 4M from 128K at > /sys/block/rbd0/queue/read_ahead_kb to match the object/stripe size of the EC > pool, but that doesn't appear to have had much of an impact. > > The only thing I can think of that I could possibly try as a change would be > to increase the queue depth in the rbdmap up from 128, so thats my next > bullet to fire. > > Attaching xfs_info in case there are any useful nuggets: >> meta-data=/dev/rbd0 isize=256agcount=81, agsize=268435455 >> blks >> = sectsz=512 attr=2, projid32bit=0 >> = crc=0finobt=0, sparse=0, rmapbt=0 >> = reflink=0 >> data = bsize=4096 blocks=21483470848, imaxpct=5 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0, ftype=0 >> log =internal log bsize=4096 blocks=32768, version=2 >> = sectsz=512 sunit=0 blks, lazy-count=0 >> realtime =none extsz=4096 blocks=0, rtextents=0 > > And rbd-info: >> rbd image 'rbd-image-name: >>size 85 TiB in 22282240 objects >>order 22 (4 MiB objects) >>snapshot_count: 0 >>id: a09cac2b772af5 >>data_pool: rbd-ec82-pool >>block_name_prefix: rbd_data.29.a09cac2b772af5 >>format: 2 >>features: layering, exclusive-lock, object-map, fast-diff, >> deep-flatten, data-pool >>op_features: >>flags: >>create_timestamp: Mon Apr 12 18:44:38 2021 >>access_timestamp: Mon Apr 12 18:44:38 2021 >>modify_timestamp: Mon Apr 12 18:44:38 2021 > > > Any other ideas or hints are greatly appreciated. > > Thanks, > Reed > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Remapping OSDs under a PG
I’m continuing to read and it’s becoming more clear. The CRUSH map seems pretty amazing! -jeremy > On May 28, 2021, at 1:10 AM, Jeremy Hansen wrote: > > Thank you both for your response. So this leads me to the next question: > > ceph osd crush rule create-replicated > > > What is and in this case? > > It also looks like this is responsible for things like “rack awareness” type > attributes which is something I’d like to utilize.: > > # types > type 0 osd > type 1 host > type 2 chassis > type 3 rack > type 4 row > type 5 pdu > type 6 pod > type 7 room > type 8 datacenter > type 9 zone > type 10 region > type 11 root > This is something I will eventually take advantage of as well. > > Thank you! > -jeremy > > >> On May 28, 2021, at 12:03 AM, Janne Johansson wrote: >> >> Create a crush rule that only chooses non-ssd drives, then >> ceph osd pool set crush_rule YourNewRuleName >> and it will move over to the non-ssd OSDs. >> >> Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen : >>> >>> >>> I’m very new to Ceph so if this question makes no sense, I apologize. >>> Continuing to study but I thought an answer to this question would help me >>> understand Ceph a bit more. >>> >>> Using cephadm, I set up a cluster. Cephadm automatically creates a pool >>> for Ceph metrics. It looks like one of my ssd osd’s was allocated for the >>> PG. I’d like to understand how to remap this PG so it’s not using the SSD >>> OSDs. >>> >>> ceph pg map 1.0 >>> osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10] >>> >>> OSD 28 is the SSD. >>> >>> Is this possible? Does this make any sense? I’d like to reserve the SSDs >>> for their own pool. >>> >>> Thank you! >>> -jeremy >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> >> -- >> May the most significant bit of your life be positive. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: XFS on RBD on EC painfully slow
I guess I should probably have been more clear, this is one pool of many, so the other OSDs aren't idle. So I don't necessarily think that the PG bump would be the worst thing to try, but its definitely not as bad as I may have made it sound. Thanks, Reed > On May 27, 2021, at 11:37 PM, Anthony D'Atri wrote: > > That gives you a PG ratio of …. 5.3 ??? > > Run `ceph osd df` ; I wouldn’t be surprised if some of your drives have 0 PGs > on them, for sure I would suspect that they aren’t even at all. > > There are bottlenecks in the PG code, and in the OSD code — one reason why > with NVMe clusters it’s common to split each drive into at least 2 OSDs. > With spinners you don’t want to do that, but you get the idea. > > The pg autoscaler is usually out of its Vulcan mind. 512 would give you a > ratio of just 21. > > Prior to 12.2.1 conventional wisdom was a PG ratio of 100-200 on spinners. > > 2048 PGs would give you a ratio of 85, which current (retconned) guidance > would call good. I’d probably go to 4096 but 2048 would be way better than > 128. > > I strongly suspect that PG splitting would still get you done faster than the > way it is, esp. if you’re running BlueStore OSDs. > > Try bumping pg_num up to say 262 and see how bad it is, and if when pgp_num > catches up if your ingest rate isn’t a bit higher than it was before. > >> EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, >> and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in >> play here. >> Only 128 PGs, in this pool, but its the only RBD image in this pool. >> Autoscaler recommends going to 512, but was hoping to avoid the performance >> overhead of the PG splits if possible, given perf is bad enough as is. > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: XFS on RBD on EC painfully slow
I had it on my list of things to possibly try, a tar in | tar out copy to see if it yielded different results. On its face, it seems like cp -a is getting ever so slightly better speed, but not a clear night and day difference. I will definitely look into this and report back any findings, positive or negative. Thanks for the suggestion, Reed > On May 28, 2021, at 3:24 AM, Matthias Ferdinand wrote: > > On Thu, May 27, 2021 at 02:54:00PM -0500, Reed Dier wrote: >> Hoping someone may be able to help point out where my bottleneck(s) may be. >> >> I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of >> that. >> This was not an ideal scenario, rather it was a rescue mission to dump a >> large, aging raid array before it was too late, so I'm working with the hand >> I was dealt. >> >> To further conflate the issues, the main directory structure consists of >> lots and lots of small file sizes, and deep directories. >> >> My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but >> its just unbearably slow and will take ~150 days to transfer ~35TB, which is >> far from ideal. > > (Disclaimer: no experience with cephfs) > > I found rsync a wonderful tool for long distances and large files, less > so for local networks and small files, even with local disks. > > Usually I do something like > > ( cd src/ && tar --acls --xattrs --numeric-owner --sparse -cf - . ) | > pv -pterab | > (cd dst/ && tar --acls --xattrs --numeric-owner --sparse -xf -) > > If src and dst are not mounted on the same machine you can use > netcat/socat to stream the tar from one system to the other, or pipe it > through ssh if you need encrypted transport. > > This does not have the resume capability of rsync, but for small files > it is much faster. After that you can still throw in a final rsync for > changes accumulated while the initial transfer was running. > > Matthias ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: XFS on RBD on EC painfully slow
There is also a longstanding belief that using cpio saves you context switches and data through a pipe. ymmv. > On May 28, 2021, at 7:26 AM, Reed Dier wrote: > > I had it on my list of things to possibly try, a tar in | tar out copy to see > if it yielded different results. > > On its face, it seems like cp -a is getting ever so slightly better speed, > but not a clear night and day difference. > > I will definitely look into this and report back any findings, positive or > negative. > > Thanks for the suggestion, > > Reed > >> On May 28, 2021, at 3:24 AM, Matthias Ferdinand wrote: >> >> On Thu, May 27, 2021 at 02:54:00PM -0500, Reed Dier wrote: >>> Hoping someone may be able to help point out where my bottleneck(s) may be. >>> >>> I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top >>> of that. >>> This was not an ideal scenario, rather it was a rescue mission to dump a >>> large, aging raid array before it was too late, so I'm working with the >>> hand I was dealt. >>> >>> To further conflate the issues, the main directory structure consists of >>> lots and lots of small file sizes, and deep directories. >>> >>> My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but >>> its just unbearably slow and will take ~150 days to transfer ~35TB, which >>> is far from ideal. >> >> (Disclaimer: no experience with cephfs) >> >> I found rsync a wonderful tool for long distances and large files, less >> so for local networks and small files, even with local disks. >> >> Usually I do something like >> >> ( cd src/ && tar --acls --xattrs --numeric-owner --sparse -cf - . ) | >> pv -pterab | >> (cd dst/ && tar --acls --xattrs --numeric-owner --sparse -xf -) >> >> If src and dst are not mounted on the same machine you can use >> netcat/socat to stream the tar from one system to the other, or pipe it >> through ssh if you need encrypted transport. >> >> This does not have the resume capability of rsync, but for small files >> it is much faster. After that you can still throw in a final rsync for >> changes accumulated while the initial transfer was running. >> >> Matthias > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: XFS on RBD on EC painfully slow
Hi Reed, To add to this command by Weiwen: On 28.05.21 13:03, 胡 玮文 wrote: Have you tried just start multiple rsync process simultaneously to transfer different directories? Distributed system like ceph often benefits from more parallelism. When I migrated from XFS on iSCSI (legacy system, no Ceph) to CephFS a few months ago, I used msrsync [1] and was quite happy with the speed. For your use case, I would start with -p 12 but might experiment with up to -p 24 (as you only have 6C/12T in your CPU). With many small files, you also might want to increase -s from the default 1000. Note that msrsync does not work with the --delete rsync flag. As I was syncing a live system, I ended up with this workflow: - Initial sync with msrsync (something like ./msrsync -p 12 --progress --stats --rsync "-aS --numeric-ids" ...) - Second sync with msrsync (to sync changes during the first sync) - Take old storage off-line for users / read-only - Final rsync with --delete (i.e. rsync -aS --numeric-ids --delete ...) - Mount cephfs at location of old storage, adjust /etc/exports with fsid entries where necessary, turn system back on-line / read-write Cheers Sebastian [1] https://github.com/jbd/msrsync ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] mons assigned via orch label 'committing suicide' upon reboot.
FYI, I'm getting monitors assigned via '... apply label:mon' with current and valid 'mon' tags: 'committing suicide' after surprise reboots in the 'Pacific' 16.2.4 release. The tag indicating a monitor should be assigned to that host is present and never changed. Deleting the mon tag, waiting a minute, then re-adding the 'mon' tag to the host causes the monitor to redeploy and run properly. I have 5 monitors assigned via the orchestrator's 'label:mon', all in docker containers. Upon reboot that goes to 4 monitors deployed. On the offending host in the logs I see this: May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.771+ 7f7a029bf700 0 using public_addr v2:[fc00:1002:c7::44]:0/0 -> [v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0] May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.771+ 7f7a029bf700 0 starting mon.noc4 rank -1 at public addrs [v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0] at bind addrs [v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0] mon_data /var/lib/ceph/mon/ceph-noc4 fsid 4067126d-01cb-40af-824a-881c130140f8 May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+ 7f7a029bf700 1 mon.noc4@-1(???) e40 preinit fsid 4067126d-01cb-40af-824a-x May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+ 7f7a029bf700 -1 mon.noc4@-1(???) e40 not in monmap and have been in a quorum before; must have been removed May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+ 7f7a029bf700 -1 mon.noc4@-1(???) e40 commit suicide! May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+ 7f7a029bf700 -1 failed to initialize Seems odd. And, you know as debug comments go, 'commit suicide!', appears to have an 'extra coffee that day' aspect. HC ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Fwd: Re: Ceph osd will not start.
Peter, We're seeing the same issues as you are. We have 2 new hosts Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB SED drives and we have tried both 15.2.13 and 16.2.4 Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 with Docker. Seems to be a bug in Cephadm and a product regression, as we have 4 near identical nodes on Centos running Nautilus (240 x 10TB SED drives) and had no problems. FWIW we had no luck yet with one-by-one OSD daemon additions through ceph orch either. We also reproduced the issue easily in a virtual lab using small virtual disks on a single ceph VM with 1 mon. We are now looking into whether we can get past this with a manual buildout. If you, or anyone, has hit the same stumbling block and gotten past it, I would really appreciate some guidance. Thanks, Marco On Thu, May 27, 2021 at 2:23 PM Peter Childs wrote: > In the end it looks like I might be able to get the node up to about 30 > odds before it stops creating any more. > > Or more it formats the disks but freezes up starting the daemons. > > I suspect I'm missing somthing I can tune to get it working better. > > If I could see any error messages that might help, but I'm yet to spit > anything. > > Peter. > > On Wed, 26 May 2021, 10:57 Eugen Block, wrote: > > > > If I add the osd daemons one at a time with > > > > > > ceph orch daemon add osd drywood12:/dev/sda > > > > > > It does actually work, > > > > Great! > > > > > I suspect what's happening is when my rule for creating osds run and > > > creates them all-at-once it ties the orch it overloads cephadm and it > > can't > > > cope. > > > > It's possible, I guess. > > > > > I suspect what I might need to do at least to work around the issue is > > set > > > "limit:" and bring it up until it stops working. > > > > It's worth a try, yes, although the docs state you should try to avoid > > it, it's possible that it doesn't work properly, in that case create a > > bug report. ;-) > > > > > I did work out how to get ceph-volume to nearly work manually. > > > > > > cephadm shell > > > ceph auth get client.bootstrap-osd -o > > > /var/lib/ceph/bootstrap-osd/ceph.keyring > > > ceph-volume lvm create --data /dev/sda --dmcrypt > > > > > > but given I've now got "add osd" to work, I suspect I just need to fine > > > tune my osd creation rules, so it does not try and create too many osds > > on > > > the same node at the same time. > > > > I agree, no need to do it manually if there is an automated way, > > especially if you're trying to bring up dozens of OSDs. > > > > > > Zitat von Peter Childs : > > > > > After a bit of messing around. I managed to get it somewhat working. > > > > > > If I add the osd daemons one at a time with > > > > > > ceph orch daemon add osd drywood12:/dev/sda > > > > > > It does actually work, > > > > > > I suspect what's happening is when my rule for creating osds run and > > > creates them all-at-once it ties the orch it overloads cephadm and it > > can't > > > cope. > > > > > > service_type: osd > > > service_name: osd.drywood-disks > > > placement: > > > host_pattern: 'drywood*' > > > spec: > > > data_devices: > > > size: "7TB:" > > > objectstore: bluestore > > > > > > I suspect what I might need to do at least to work around the issue is > > set > > > "limit:" and bring it up until it stops working. > > > > > > I did work out how to get ceph-volume to nearly work manually. > > > > > > cephadm shell > > > ceph auth get client.bootstrap-osd -o > > > /var/lib/ceph/bootstrap-osd/ceph.keyring > > > ceph-volume lvm create --data /dev/sda --dmcrypt > > > > > > but given I've now got "add osd" to work, I suspect I just need to fine > > > tune my osd creation rules, so it does not try and create too many osds > > on > > > the same node at the same time. > > > > > > > > > > > > On Wed, 26 May 2021 at 08:25, Eugen Block wrote: > > > > > >> Hi, > > >> > > >> I believe your current issue is due to a missing keyring for > > >> client.bootstrap-osd on the OSD node. But even after fixing that > > >> you'll probably still won't be able to deploy an OSD manually with > > >> ceph-volume because 'ceph-volume activate' is not supported with > > >> cephadm [1]. I just tried that in a virtual environment, it fails when > > >> activating the systemd-unit: > > >> > > >> ---snip--- > > >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO ] Running > > >> command: /usr/bin/systemctl enable > > >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456 > > >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO ] stderr Failed > > >> to connect to bus: No such file or directory > > >> [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm > > >> activate was unable to complete, while creating the OSD > > >> Traceback (most recent call last): > > >>File > > >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", > > >> line 32, in create > > >> Activate([]).activate(args)
[ceph-users] Re: Remapping OSDs under a PG
So I did this: ceph osd crush rule create-replicated hdd-rule default rack hdd [ceph: root@cn01 ceph]# ceph osd crush rule ls replicated_rule hdd-rule ssd-rule [ceph: root@cn01 ceph]# ceph osd crush rule dump hdd-rule { "rule_id": 1, "rule_name": "hdd-rule", "ruleset": 1, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -2, "item_name": "default~hdd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "rack" }, { "op": "emit" } ] } Then this: ceph osd pool set device_health_metrics crush_rule hdd-rule How do I prove that my device_health_metrics pool is no longer using any SSDs? ceph pg ls PGOBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 1.041 0 00 00 0 71 active+clean22h 205'71 253:484 [28,33,10]p28 [28,33,10]p28 2021-05-27T14:44:37.466384+ 2021-05-26T04:23:11.758060+ 2.0 0 0 00 00 0 0 active+clean21h 0'0254:56 [9,5,26]p9 [9,5,26]p9 2021-05-28T00:46:34.470208+ 2021-05-28T00:46:15.122042+ 2.1 0 0 00 00 0 0 active+clean21h 0'0254:42 [34,0,13]p34 [34,0,13]p34 2021-05-28T00:46:41.578301+ 2021-05-28T00:46:15.122042+ 2.2 0 0 00 00 0 0 active+clean21h 0'0254:42 [30,25,5]p30 [30,25,5]p30 2021-05-28T00:46:41.394685+ 2021-05-28T00:46:15.122042+ 2.3 0 0 00 00 0 0 active+clean21h 0'0254:42 [14,35,32]p14 [14,35,32]p14 2021-05-28T00:46:40.545088+ 2021-05-28T00:46:15.122042+ 2.4 0 0 00 00 0 0 active+clean21h 0'0254:42 [27,28,7]p27 [27,28,7]p27 2021-05-28T00:46:41.208159+ 2021-05-28T00:46:15.122042+ 2.5 0 0 00 00 0 0 active+clean21h 0'0254:42 [8,4,35]p8 [8,4,35]p8 2021-05-28T00:46:39.845197+ 2021-05-28T00:46:15.122042+ 2.6 0 0 00 00 0 0 active+clean21h 0'0254:42 [31,26,6]p31 [31,26,6]p31 2021-05-28T00:46:45.808430+ 2021-05-28T00:46:15.122042+ 2.7 0 0 00 00 0 0 active+clean21h 0'0254:42 [12,7,19]p12 [12,7,19]p12 2021-05-28T00:46:39.313525+ 2021-05-28T00:46:15.122042+ 2.8 0 0 00 00 0 0 active+clean21h 0'0254:42 [20,21,11]p20 [20,21,11]p20 2021-05-28T00:46:38.840636+ 2021-05-28T00:46:15.122042+ 2.9 0 0 00 00 0 0 active+clean21h 0'0254:42 [31,14,10]p31 [31,14,10]p31 2021-05-28T00:46:46.791644+ 2021-05-28T00:46:15.122042+ 2.a 0 0 00 00 0 0 active+clean21h 0'0254:42 [16,27,35]p16 [16,27,35]p16 2021-05-28T00:46:39.025320+ 2021-05-28T00:46:15.122042+ 2.b 0 0 00 00 0 0 active+clean21h 0'0254:42 [20,15,11]p20 [20,15,11]p20 2021-05-28T00:46:42.841924+ 2021-05-28T00:46:15.122042+ 2.c 0 0 00 00 0 0 active+clean21h 0'0254:42 [32,11,0]p32 [32,11,0]p32 2021-05-28T00:46:38.403701+ 2021-05-28T00:46:15.122042+ 2.d 0 0 00 00 0 0 active+clean21h 0'0254:56 [5,19,3]p5 [5,19,3]p5 2021-05-28T00:46:39.808986+ 2021-05-28T00:46:15.122042+ 2.e 0 0 00 00 0 0 active+clean21h 0'0254:42 [27,13,17]p27 [27,13,17]p27 2021-05-28T00:46:42.253293+ 2021-05-28T00:46:15.122042+ 2.f 0 0 00 00 0 0 active+clean21h 0'0254:42 [11,22,18]p11 [11,22,18]p11 2021-05-28T00:46:38.721405+ 2021-05-28T00:46:15.122042+ 2.100 0 00 00 0 0 active+clean21h 0'0254:42 [10,17,7]p10 [10,17,7]p10 2021-05-28T00:46:38.770867+ 2021-05-28T00:46:15.122042+ 2.110 0 00 0
[ceph-users] HEALTH_WARN Reduced data availability: 33 pgs inactive
I’m trying to understand this situation: ceph health detail HEALTH_WARN Reduced data availability: 33 pgs inactive [WRN] PG_AVAILABILITY: Reduced data availability: 33 pgs inactive pg 1.0 is stuck inactive for 20h, current state unknown, last acting [] pg 2.0 is stuck inactive for 20h, current state unknown, last acting [] pg 2.1 is stuck inactive for 20h, current state unknown, last acting [] pg 2.2 is stuck inactive for 20h, current state unknown, last acting [] pg 2.3 is stuck inactive for 20h, current state unknown, last acting [] pg 2.4 is stuck inactive for 20h, current state unknown, last acting [] pg 2.5 is stuck inactive for 20h, current state unknown, last acting [] pg 2.6 is stuck inactive for 20h, current state unknown, last acting [] pg 2.7 is stuck inactive for 20h, current state unknown, last acting [] pg 2.8 is stuck inactive for 20h, current state unknown, last acting [] pg 2.9 is stuck inactive for 20h, current state unknown, last acting [] pg 2.a is stuck inactive for 20h, current state unknown, last acting [] pg 2.b is stuck inactive for 20h, current state unknown, last acting [] pg 2.c is stuck inactive for 20h, current state unknown, last acting [] pg 2.d is stuck inactive for 20h, current state unknown, last acting [] pg 2.e is stuck inactive for 20h, current state unknown, last acting [] pg 2.f is stuck inactive for 20h, current state unknown, last acting [] pg 2.10 is stuck inactive for 20h, current state unknown, last acting [] pg 2.11 is stuck inactive for 20h, current state unknown, last acting [] pg 2.12 is stuck inactive for 20h, current state unknown, last acting [] pg 2.13 is stuck inactive for 20h, current state unknown, last acting [] pg 2.14 is stuck inactive for 20h, current state unknown, last acting [] pg 2.15 is stuck inactive for 20h, current state unknown, last acting [] pg 2.16 is stuck inactive for 20h, current state unknown, last acting [] pg 2.17 is stuck inactive for 20h, current state unknown, last acting [] pg 2.18 is stuck inactive for 20h, current state unknown, last acting [] pg 2.19 is stuck inactive for 20h, current state unknown, last acting [] pg 2.1a is stuck inactive for 20h, current state unknown, last acting [] pg 2.1b is stuck inactive for 20h, current state unknown, last acting [] pg 2.1c is stuck inactive for 20h, current state unknown, last acting [] pg 2.1d is stuck inactive for 20h, current state unknown, last acting [] pg 2.1e is stuck inactive for 20h, current state unknown, last acting [] pg 2.1f is stuck inactive for 20h, current state unknown, last acting [] [ceph: root@cn01 /]# date Sat May 29 01:28:37 UTC 2021 [ceph: root@cn01 /]# ceph pg dump_stuck inactive PG_STAT STATEUP UP_PRIMARY ACTING ACTING_PRIMARY 2.1f unknown [] -1 [] -1 2.1e unknown [] -1 [] -1 2.1d unknown [] -1 [] -1 2.1c unknown [] -1 [] -1 2.1b unknown [] -1 [] -1 2.1a unknown [] -1 [] -1 2.19 unknown [] -1 [] -1 2.18 unknown [] -1 [] -1 2.17 unknown [] -1 [] -1 2.16 unknown [] -1 [] -1 2.15 unknown [] -1 [] -1 2.14 unknown [] -1 [] -1 2.13 unknown [] -1 [] -1 2.12 unknown [] -1 [] -1 2.11 unknown [] -1 [] -1 2.10 unknown [] -1 [] -1 2.f unknown [] -1 [] -1 2.9 unknown [] -1 [] -1 2.b unknown [] -1 [] -1 2.c unknown [] -1 [] -1 2.e unknown [] -1 [] -1 2.a unknown [] -1 [] -1 2.d unknown [] -1 [] -1 2.8 unknown [] -1 [] -1 2.7 unknown [] -1 [] -1 2.6 unknown [] -1 [] -1 2.5 unknown [] -1 [] -1 2.0 unknown [] -1 [] -1 1.0 unknown [] -1 [] -1 2.3 unknown [] -1 [] -1 2.1 unknown [] -1 [] -1 2.2 unknown [] -1 [] -1 2.4 unknown [] -1 [] -1 ok [ceph: root@cn01 /]# ceph pg 2.4 query Couldn't parse JSON : Expecting value: line 1 column 1 (char 0) Traceback (most recent call last): File "/usr/bin/ceph", line 1310, in retval = main() File "/usr/bin/ceph", line 1230, in main si