date:20210528

[ceph-users] Re: Remapping OSDs under a PG

2021-05-28 Thread Janne Johansson

Create a crush rule that only chooses non-ssd drives, then
ceph osd pool set  crush_rule YourNewRuleName
and it will move over to the non-ssd OSDs.

Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen :
>
>
> I’m very new to Ceph so if this question makes no sense, I apologize.  
> Continuing to study but I thought an answer to this question would help me 
> understand Ceph a bit more.
>
> Using cephadm, I set up a cluster.  Cephadm automatically creates a pool for 
> Ceph metrics.  It looks like one of my ssd osd’s was allocated for the PG.  
> I’d like to understand how to remap this PG so it’s not using the SSD OSDs.
>
> ceph pg map 1.0
> osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10]
>
> OSD 28 is the SSD.
>
> Is this possible?  Does this make any sense?  I’d like to reserve the SSDs 
> for their own pool.
>
> Thank you!
> -jeremy
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Remapping OSDs under a PG

2021-05-28 Thread Jeremy Hansen

Thank you both for your response.  So this leads me to the next question:

ceph osd crush rule create-replicated


What is  and  in this case?

It also looks like this is responsible for things like “rack awareness” type 
attributes which is something I’d like to utilize.:

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
This is something I will eventually take advantage of as well.

Thank you!
-jeremy


> On May 28, 2021, at 12:03 AM, Janne Johansson  wrote:
> 
> Create a crush rule that only chooses non-ssd drives, then
> ceph osd pool set  crush_rule YourNewRuleName
> and it will move over to the non-ssd OSDs.
> 
> Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen :
>> 
>> 
>> I’m very new to Ceph so if this question makes no sense, I apologize.  
>> Continuing to study but I thought an answer to this question would help me 
>> understand Ceph a bit more.
>> 
>> Using cephadm, I set up a cluster.  Cephadm automatically creates a pool for 
>> Ceph metrics.  It looks like one of my ssd osd’s was allocated for the PG.  
>> I’d like to understand how to remap this PG so it’s not using the SSD OSDs.
>> 
>> ceph pg map 1.0
>> osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10]
>> 
>> OSD 28 is the SSD.
>> 
>> Is this possible?  Does this make any sense?  I’d like to reserve the SSDs 
>> for their own pool.
>> 
>> Thank you!
>> -jeremy
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



signature.asc
Description: Message signed with OpenPGP
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: XFS on RBD on EC painfully slow

2021-05-28 Thread Matthias Ferdinand

On Thu, May 27, 2021 at 02:54:00PM -0500, Reed Dier wrote:
> Hoping someone may be able to help point out where my bottleneck(s) may be.
> 
> I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of 
> that.
> This was not an ideal scenario, rather it was a rescue mission to dump a 
> large, aging raid array before it was too late, so I'm working with the hand 
> I was dealt.
> 
> To further conflate the issues, the main directory structure consists of lots 
> and lots of small file sizes, and deep directories.
> 
> My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but 
> its just unbearably slow and will take ~150 days to transfer ~35TB, which is 
> far from ideal.

(Disclaimer: no experience with cephfs)

I found rsync a wonderful tool for long distances and large files, less
so for local networks and small files, even with local disks.

Usually I do something like

( cd src/ && tar --acls --xattrs --numeric-owner --sparse -cf - . ) | 
  pv -pterab |
  (cd dst/ && tar --acls --xattrs --numeric-owner --sparse -xf -)

If src and dst are not mounted on the same machine you can use
netcat/socat to stream the tar from one system to the other, or pipe it
through ssh if you need encrypted transport.

This does not have the resume capability of rsync, but for small files
it is much faster. After that you can still throw in a final rsync for
changes accumulated while the initial transfer was running.

Matthias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Messed up placement of MDS

2021-05-28 Thread mabi

I managed by experimenting how to get rid of that wrongly created MDS service, 
so for those who are looking for that information too, I used the following 
command:

ceph orch rm mds.label:mds


‐‐‐ Original Message ‐‐‐
On Thursday, May 27, 2021 9:16 PM, mabi  wrote:

> Hello,
>
> I am trying to place the two MDS daemons for CephFS on dedicated nodes. For 
> that purpose I tried out a few different "cephadm orch apply ..." commands 
> with a label but at the end it looks like I messed up with the placement as I 
> now have two mds service_types as you can see below:
>
> ceph orch ls --service-type mds --export
>
> =
>
> service_type: mds
> service_id: ceph1fs
> service_name: mds.ceph1fs
> placement:
> count: 2
> hosts:
>
> -   ceph1g
> -   ceph1a
>
> service_type: mds
> service_id: label:mds
> service_name: mds.label:mds
> placement:
> count: 2
>
> This second entry at the bottom seems totally wrong and I would like to 
> remove it but I haven't found how to remove it totally. Any ideas?
>
> Ideally I just want to place two MDS daemons on node ceph1a and ceph1g.
>
> Regards,
> Mabi

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephfs auditing

2021-05-28 Thread Stefan Kooman


On 5/27/21 10:47 PM, Michael Thomas wrote:
Is there a way to log or track which cephfs files are being accessed? 
This would help us in planning where to place certain datasets based on 
popularity, eg on a EC HDD pool or a replicated SSD pool.


I know I can run inotify on the ceph clients, but I was hoping that the 
MDS would have a way to log this information centrally.


Progress has been made on performance metrics of CephFS [1]. However 
AFAIK it does not keep track on /what/ files are being accesssed.


Gr. Stefan

[1]: https://docs.ceph.com/en/latest/cephfs/cephfs-top/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: XFS on RBD on EC painfully slow

2021-05-28 Thread 胡玮文

Hi Reed,

Have you tried just start multiple rsync process simultaneously to transfer 
different directories? Distributed system like ceph often benefits from more 
parallelism.

Weiwen Hu

> 在 2021年5月28日，03:54，Reed Dier  写道：
> 
> Hoping someone may be able to help point out where my bottleneck(s) may be.
> 
> I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of 
> that.
> This was not an ideal scenario, rather it was a rescue mission to dump a 
> large, aging raid array before it was too late, so I'm working with the hand 
> I was dealt.
> 
> To further conflate the issues, the main directory structure consists of lots 
> and lots of small file sizes, and deep directories.
> 
> My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but 
> its just unbearably slow and will take ~150 days to transfer ~35TB, which is 
> far from ideal.
> 
>> 15.41G  79%4.36MB/s0:56:09 (xfr#23165, ir-chk=4061/27259)
> 
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>   0.170.001.34   13.230.00   85.26
>> 
>> Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz w/s 
>> wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s   drqm/s  %drqm 
>> d_await dareq-sz  aqu-sz  %util
>> rbd0   124.00  0.66 0.00   0.00   17.30 5.48   50.00 
>>  0.17 0.00   0.00   31.70 3.490.00  0.00 0.00   0.00
>> 0.00 0.003.39  96.40
> 
> Rsync progress and iostat (during the rsync) from the rbd to a local ssd, to 
> remove any bottlenecks doubling back to cephfs.
> About 16G in 1h, not exactly blazing, this being 5 of the 7000 directories 
> I'm looking to offload to cephfs.
> 
> Currently running 15.2.11, and the host is Ubuntu 20.04 (5.4.0-72-generic) 
> with a single E5-2620, 64GB of memory, and 4x10GbT bond talking to ceph, 
> iperf proves it out.
> EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, 
> and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in 
> play here.
> Only 128 PGs, in this pool, but its the only RBD image in this pool. 
> Autoscaler recommends going to 512, but was hoping to avoid the performance 
> overhead of the PG splits if possible, given perf is bad enough as is.
> 
> Examining the main directory structure it looks like there are 7000 files per 
> directory, about 60% of which are <1MiB, and in all totaling nearly 5GiB per 
> directory.
> 
> My fstab for this is:
>> xfs_netdev,noatime00
> 
> I tried to increase the read_ahead_kb to 4M from 128K at 
> /sys/block/rbd0/queue/read_ahead_kb to match the object/stripe size of the EC 
> pool, but that doesn't appear to have had much of an impact.
> 
> The only thing I can think of that I could possibly try as a change would be 
> to increase the queue depth in the rbdmap up from 128, so thats my next 
> bullet to fire.
> 
> Attaching xfs_info in case there are any useful nuggets:
>> meta-data=/dev/rbd0  isize=256agcount=81, agsize=268435455 
>> blks
>> =   sectsz=512   attr=2, projid32bit=0
>> =   crc=0finobt=0, sparse=0, rmapbt=0
>> =   reflink=0
>> data =   bsize=4096   blocks=21483470848, imaxpct=5
>> =   sunit=0  swidth=0 blks
>> naming   =version 2  bsize=4096   ascii-ci=0, ftype=0
>> log  =internal log   bsize=4096   blocks=32768, version=2
>> =   sectsz=512   sunit=0 blks, lazy-count=0
>> realtime =none   extsz=4096   blocks=0, rtextents=0
> 
> And rbd-info:
>> rbd image 'rbd-image-name:
>>size 85 TiB in 22282240 objects
>>order 22 (4 MiB objects)
>>snapshot_count: 0
>>id: a09cac2b772af5
>>data_pool: rbd-ec82-pool
>>block_name_prefix: rbd_data.29.a09cac2b772af5
>>format: 2
>>features: layering, exclusive-lock, object-map, fast-diff, 
>> deep-flatten, data-pool
>>op_features:
>>flags:
>>create_timestamp: Mon Apr 12 18:44:38 2021
>>access_timestamp: Mon Apr 12 18:44:38 2021
>>modify_timestamp: Mon Apr 12 18:44:38 2021
> 
> 
> Any other ideas or hints are greatly appreciated.
> 
> Thanks,
> Reed
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Remapping OSDs under a PG

2021-05-28 Thread Jeremy Hansen

I’m continuing to read and it’s becoming more clear. 

The CRUSH map seems pretty amazing!

-jeremy

> On May 28, 2021, at 1:10 AM, Jeremy Hansen  wrote:
> 
> Thank you both for your response.  So this leads me to the next question:
> 
> ceph osd crush rule create-replicated
> 
> 
> What is  and  in this case?
> 
> It also looks like this is responsible for things like “rack awareness” type 
> attributes which is something I’d like to utilize.:
> 
> # types
> type 0 osd
> type 1 host
> type 2 chassis
> type 3 rack
> type 4 row
> type 5 pdu
> type 6 pod
> type 7 room
> type 8 datacenter
> type 9 zone
> type 10 region
> type 11 root
> This is something I will eventually take advantage of as well.
> 
> Thank you!
> -jeremy
> 
> 
>> On May 28, 2021, at 12:03 AM, Janne Johansson  wrote:
>> 
>> Create a crush rule that only chooses non-ssd drives, then
>> ceph osd pool set  crush_rule YourNewRuleName
>> and it will move over to the non-ssd OSDs.
>> 
>> Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen :
>>> 
>>> 
>>> I’m very new to Ceph so if this question makes no sense, I apologize.  
>>> Continuing to study but I thought an answer to this question would help me 
>>> understand Ceph a bit more.
>>> 
>>> Using cephadm, I set up a cluster.  Cephadm automatically creates a pool 
>>> for Ceph metrics.  It looks like one of my ssd osd’s was allocated for the 
>>> PG.  I’d like to understand how to remap this PG so it’s not using the SSD 
>>> OSDs.
>>> 
>>> ceph pg map 1.0
>>> osdmap e205 pg 1.0 (1.0) -> up [28,33,10] acting [28,33,10]
>>> 
>>> OSD 28 is the SSD.
>>> 
>>> Is this possible?  Does this make any sense?  I’d like to reserve the SSDs 
>>> for their own pool.
>>> 
>>> Thank you!
>>> -jeremy
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
>> 
>> -- 
>> May the most significant bit of your life be positive.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: XFS on RBD on EC painfully slow

2021-05-28 Thread Reed Dier

I guess I should probably have been more clear, this is one pool of many, so 
the other OSDs aren't idle.

So I don't necessarily think that the PG bump would be the worst thing to try, 
but its definitely not as bad as I may have made it sound.

Thanks,
Reed

> On May 27, 2021, at 11:37 PM, Anthony D'Atri  wrote:
> 
> That gives you a PG ratio of …. 5.3 ???
> 
> Run `ceph osd df` ; I wouldn’t be surprised if some of your drives have 0 PGs 
> on them, for sure I would suspect that they aren’t even at all.
> 
> There are bottlenecks in the PG code, and in the OSD code — one reason why 
> with NVMe clusters it’s common to split each drive into at least 2 OSDs.  
> With spinners you don’t want to do that, but you get the idea.
> 
> The pg autoscaler is usually out of its Vulcan mind.  512 would give you a 
> ratio of just 21.
> 
> Prior to 12.2.1 conventional wisdom was a PG ratio of 100-200 on spinners.
> 
> 2048 PGs would give you a ratio of 85, which current (retconned) guidance 
> would call good.  I’d probably go to 4096 but 2048 would be way better than 
> 128.
> 
> I strongly suspect that PG splitting would still get you done faster than the 
> way it is, esp. if you’re running BlueStore OSDs.
> 
> Try bumping pg_num up to say 262 and see how bad it is, and if when pgp_num 
> catches up if your ingest rate isn’t a bit higher than it was before.  
> 
>> EC8:2, across about 16 hosts, 240 OSDs, with 24 of those being 8TB 7.2k SAS, 
>> and the other 216 being 2TB 7.2K SATA. So there are quite a few spindles in 
>> play here.
>> Only 128 PGs, in this pool, but its the only RBD image in this pool. 
>> Autoscaler recommends going to 512, but was hoping to avoid the performance 
>> overhead of the PG splits if possible, given perf is bad enough as is.
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: XFS on RBD on EC painfully slow

2021-05-28 Thread Reed Dier

I had it on my list of things to possibly try, a tar in | tar out copy to see 
if it yielded different results.

On its face, it seems like cp -a is getting ever so slightly better speed, but 
not a clear night and day difference.

I will definitely look into this and report back any findings, positive or 
negative.

Thanks for the suggestion,

Reed

> On May 28, 2021, at 3:24 AM, Matthias Ferdinand  wrote:
> 
> On Thu, May 27, 2021 at 02:54:00PM -0500, Reed Dier wrote:
>> Hoping someone may be able to help point out where my bottleneck(s) may be.
>> 
>> I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top of 
>> that.
>> This was not an ideal scenario, rather it was a rescue mission to dump a 
>> large, aging raid array before it was too late, so I'm working with the hand 
>> I was dealt.
>> 
>> To further conflate the issues, the main directory structure consists of 
>> lots and lots of small file sizes, and deep directories.
>> 
>> My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but 
>> its just unbearably slow and will take ~150 days to transfer ~35TB, which is 
>> far from ideal.
> 
> (Disclaimer: no experience with cephfs)
> 
> I found rsync a wonderful tool for long distances and large files, less
> so for local networks and small files, even with local disks.
> 
> Usually I do something like
> 
> ( cd src/ && tar --acls --xattrs --numeric-owner --sparse -cf - . ) | 
>  pv -pterab |
>  (cd dst/ && tar --acls --xattrs --numeric-owner --sparse -xf -)
> 
> If src and dst are not mounted on the same machine you can use
> netcat/socat to stream the tar from one system to the other, or pipe it
> through ssh if you need encrypted transport.
> 
> This does not have the resume capability of rsync, but for small files
> it is much faster. After that you can still throw in a final rsync for
> changes accumulated while the initial transfer was running.
> 
> Matthias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: XFS on RBD on EC painfully slow

2021-05-28 Thread Anthony D'Atri

There is also a longstanding belief that using cpio saves you context switches 
and data through a pipe.  ymmv.


> On May 28, 2021, at 7:26 AM, Reed Dier  wrote:
> 
> I had it on my list of things to possibly try, a tar in | tar out copy to see 
> if it yielded different results.
> 
> On its face, it seems like cp -a is getting ever so slightly better speed, 
> but not a clear night and day difference.
> 
> I will definitely look into this and report back any findings, positive or 
> negative.
> 
> Thanks for the suggestion,
> 
> Reed
> 
>> On May 28, 2021, at 3:24 AM, Matthias Ferdinand  wrote:
>> 
>> On Thu, May 27, 2021 at 02:54:00PM -0500, Reed Dier wrote:
>>> Hoping someone may be able to help point out where my bottleneck(s) may be.
>>> 
>>> I have an 80TB kRBD image on an EC8:2 pool, with an XFS filesystem on top 
>>> of that.
>>> This was not an ideal scenario, rather it was a rescue mission to dump a 
>>> large, aging raid array before it was too late, so I'm working with the 
>>> hand I was dealt.
>>> 
>>> To further conflate the issues, the main directory structure consists of 
>>> lots and lots of small file sizes, and deep directories.
>>> 
>>> My goal is to try and rsync (or otherwise) data from the RBD to cephfs, but 
>>> its just unbearably slow and will take ~150 days to transfer ~35TB, which 
>>> is far from ideal.
>> 
>> (Disclaimer: no experience with cephfs)
>> 
>> I found rsync a wonderful tool for long distances and large files, less
>> so for local networks and small files, even with local disks.
>> 
>> Usually I do something like
>> 
>> ( cd src/ && tar --acls --xattrs --numeric-owner --sparse -cf - . ) | 
>> pv -pterab |
>> (cd dst/ && tar --acls --xattrs --numeric-owner --sparse -xf -)
>> 
>> If src and dst are not mounted on the same machine you can use
>> netcat/socat to stream the tar from one system to the other, or pipe it
>> through ssh if you need encrypted transport.
>> 
>> This does not have the resume capability of rsync, but for small files
>> it is much faster. After that you can still throw in a final rsync for
>> changes accumulated while the initial transfer was running.
>> 
>> Matthias
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: XFS on RBD on EC painfully slow

2021-05-28 Thread Sebastian Knust


Hi Reed,

To add to this command by Weiwen:

On 28.05.21 13:03, 胡 玮文 wrote:

Have you tried just start multiple rsync process simultaneously to transfer 
different directories? Distributed system like ceph often benefits from more 
parallelism.


When I migrated from XFS on iSCSI (legacy system, no Ceph) to CephFS a 
few months ago, I used msrsync [1] and was quite happy with the speed. 
For your use case, I would start with -p 12 but might experiment with up 
to -p 24 (as you only have 6C/12T in your CPU). With many small files, 
you also might want to increase -s from the default 1000.


Note that msrsync does not work with the --delete rsync flag. As I was 
syncing a live system, I ended up with this workflow:


- Initial sync with msrsync (something like ./msrsync -p 12 --progress 
--stats --rsync "-aS --numeric-ids" ...)

- Second sync with msrsync (to sync changes during the first sync)
- Take old storage off-line for users / read-only
- Final rsync with --delete (i.e. rsync -aS --numeric-ids --delete ...)
- Mount cephfs at location of old storage, adjust /etc/exports with fsid 
entries where necessary, turn system back on-line / read-write


Cheers
Sebastian

[1] https://github.com/jbd/msrsync
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] mons assigned via orch label 'committing suicide' upon reboot.

2021-05-28 Thread Harry G. Coin

FYI, I'm getting monitors assigned via '... apply label:mon' with
current and valid 'mon' tags:  'committing suicide' after surprise
reboots in the  'Pacific' 16.2.4 release.  The tag indicating a monitor
should be assigned to that host is present and never changed.

Deleting the mon tag, waiting a minute, then re-adding the 'mon' tag to
the host causes the monitor to redeploy and run properly.

I have 5 monitors assigned via the orchestrator's 'label:mon', all in
docker containers. Upon reboot that goes to 4 monitors deployed. On the
offending host in the logs I see this:

May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.771+
7f7a029bf700  0 using public_addr v2:[fc00:1002:c7::44]:0/0 ->
[v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0]
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.771+
7f7a029bf700  0 starting mon.noc4 rank -1 at public addrs
[v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0] at bind
addrs [v2:[fc00:1002:c7::44]:3300/0,v1:[fc00:1002:c7::44]:6789/0]
mon_data /var/lib/ceph/mon/ceph-noc4 fsid
4067126d-01cb-40af-824a-881c130140f8
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+
7f7a029bf700  1 mon.noc4@-1(???) e40 preinit fsid
4067126d-01cb-40af-824a-x
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+
7f7a029bf700 -1 mon.noc4@-1(???) e40 not in monmap and have been in a
quorum before; must have been removed
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+
7f7a029bf700 -1 mon.noc4@-1(???) e40 commit suicide!
May 28 11:06:59 noc4 bash[10563]: debug 2021-05-28T16:06:59.775+
7f7a029bf700 -1 failed to initialize

Seems odd.  And, you know as debug comments go, 'commit suicide!',
appears to have an 'extra coffee that day' aspect.

HC



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Fwd: Re: Ceph osd will not start.

2021-05-28 Thread Marco Pizzolo

Peter,

We're seeing the same issues as you are.  We have 2 new hosts Intel(R)
Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB SED
drives and we have tried both 15.2.13 and 16.2.4

Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 with
Docker.

Seems to be a bug in Cephadm and a product regression, as we have 4 near
identical nodes on Centos running Nautilus (240 x 10TB SED drives) and had
no problems.

FWIW we had no luck yet with one-by-one OSD daemon additions through ceph
orch either.  We also reproduced the issue easily in a virtual lab using
small virtual disks on a single ceph VM with 1 mon.

We are now looking into whether we can get past this with a manual buildout.

If you, or anyone, has hit the same stumbling block and gotten past it, I
would really appreciate some guidance.

Thanks,
Marco

On Thu, May 27, 2021 at 2:23 PM Peter Childs  wrote:

> In the end it looks like I might be able to get the node up to about 30
> odds before it stops creating any more.
>
> Or more it formats the disks but freezes up starting the daemons.
>
> I suspect I'm missing somthing I can tune to get it working better.
>
> If I could see any error messages that might help, but I'm yet to spit
> anything.
>
> Peter.
>
> On Wed, 26 May 2021, 10:57 Eugen Block,  wrote:
>
> > > If I add the osd daemons one at a time with
> > >
> > > ceph orch daemon add osd drywood12:/dev/sda
> > >
> > > It does actually work,
> >
> > Great!
> >
> > > I suspect what's happening is when my rule for creating osds run and
> > > creates them all-at-once it ties the orch it overloads cephadm and it
> > can't
> > > cope.
> >
> > It's possible, I guess.
> >
> > > I suspect what I might need to do at least to work around the issue is
> > set
> > > "limit:" and bring it up until it stops working.
> >
> > It's worth a try, yes, although the docs state you should try to avoid
> > it, it's possible that it doesn't work properly, in that case create a
> > bug report. ;-)
> >
> > > I did work out how to get ceph-volume to nearly work manually.
> > >
> > > cephadm shell
> > > ceph auth get client.bootstrap-osd -o
> > > /var/lib/ceph/bootstrap-osd/ceph.keyring
> > > ceph-volume lvm create --data /dev/sda --dmcrypt
> > >
> > > but given I've now got "add osd" to work, I suspect I just need to fine
> > > tune my osd creation rules, so it does not try and create too many osds
> > on
> > > the same node at the same time.
> >
> > I agree, no need to do it manually if there is an automated way,
> > especially if you're trying to bring up dozens of OSDs.
> >
> >
> > Zitat von Peter Childs :
> >
> > > After a bit of messing around. I managed to get it somewhat working.
> > >
> > > If I add the osd daemons one at a time with
> > >
> > > ceph orch daemon add osd drywood12:/dev/sda
> > >
> > > It does actually work,
> > >
> > > I suspect what's happening is when my rule for creating osds run and
> > > creates them all-at-once it ties the orch it overloads cephadm and it
> > can't
> > > cope.
> > >
> > > service_type: osd
> > > service_name: osd.drywood-disks
> > > placement:
> > >   host_pattern: 'drywood*'
> > > spec:
> > >   data_devices:
> > > size: "7TB:"
> > >   objectstore: bluestore
> > >
> > > I suspect what I might need to do at least to work around the issue is
> > set
> > > "limit:" and bring it up until it stops working.
> > >
> > > I did work out how to get ceph-volume to nearly work manually.
> > >
> > > cephadm shell
> > > ceph auth get client.bootstrap-osd -o
> > > /var/lib/ceph/bootstrap-osd/ceph.keyring
> > > ceph-volume lvm create --data /dev/sda --dmcrypt
> > >
> > > but given I've now got "add osd" to work, I suspect I just need to fine
> > > tune my osd creation rules, so it does not try and create too many osds
> > on
> > > the same node at the same time.
> > >
> > >
> > >
> > > On Wed, 26 May 2021 at 08:25, Eugen Block  wrote:
> > >
> > >> Hi,
> > >>
> > >> I believe your current issue is due to a missing keyring for
> > >> client.bootstrap-osd on the OSD node. But even after fixing that
> > >> you'll probably still won't be able to deploy an OSD manually with
> > >> ceph-volume because 'ceph-volume activate' is not supported with
> > >> cephadm [1]. I just tried that in a virtual environment, it fails when
> > >> activating the systemd-unit:
> > >>
> > >> ---snip---
> > >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO  ] Running
> > >> command: /usr/bin/systemctl enable
> > >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456
> > >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO  ] stderr Failed
> > >> to connect to bus: No such file or directory
> > >> [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm
> > >> activate was unable to complete, while creating the OSD
> > >> Traceback (most recent call last):
> > >>File
> > >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
> > >> line 32, in create
> > >>  Activate([]).activate(args)

[ceph-users] Re: Remapping OSDs under a PG

2021-05-28 Thread Jeremy Hansen

So I did this:

ceph osd crush rule create-replicated hdd-rule default rack hdd

[ceph: root@cn01 ceph]# ceph osd crush rule ls
replicated_rule
hdd-rule
ssd-rule

[ceph: root@cn01 ceph]# ceph osd crush rule dump hdd-rule
{
"rule_id": 1,
"rule_name": "hdd-rule",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -2,
"item_name": "default~hdd"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "rack"
},
{
"op": "emit"
}
]
}


Then this:

ceph osd pool set device_health_metrics crush_rule hdd-rule

How do I prove that my device_health_metrics pool is no longer using any SSDs?

ceph pg ls
PGOBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES  OMAP_BYTES*  OMAP_KEYS*  
LOG  STATE SINCE  VERSION  REPORTED  UP ACTING 
SCRUB_STAMP  DEEP_SCRUB_STAMP
1.041 0  00  00   0   
71  active+clean22h   205'71   253:484  [28,33,10]p28  [28,33,10]p28  
2021-05-27T14:44:37.466384+  2021-05-26T04:23:11.758060+
2.0 0 0  00  00   0
0  active+clean21h  0'0254:56 [9,5,26]p9 [9,5,26]p9  
2021-05-28T00:46:34.470208+  2021-05-28T00:46:15.122042+
2.1 0 0  00  00   0
0  active+clean21h  0'0254:42   [34,0,13]p34   [34,0,13]p34  
2021-05-28T00:46:41.578301+  2021-05-28T00:46:15.122042+
2.2 0 0  00  00   0
0  active+clean21h  0'0254:42   [30,25,5]p30   [30,25,5]p30  
2021-05-28T00:46:41.394685+  2021-05-28T00:46:15.122042+
2.3 0 0  00  00   0
0  active+clean21h  0'0254:42  [14,35,32]p14  [14,35,32]p14  
2021-05-28T00:46:40.545088+  2021-05-28T00:46:15.122042+
2.4 0 0  00  00   0
0  active+clean21h  0'0254:42   [27,28,7]p27   [27,28,7]p27  
2021-05-28T00:46:41.208159+  2021-05-28T00:46:15.122042+
2.5 0 0  00  00   0
0  active+clean21h  0'0254:42 [8,4,35]p8 [8,4,35]p8  
2021-05-28T00:46:39.845197+  2021-05-28T00:46:15.122042+
2.6 0 0  00  00   0
0  active+clean21h  0'0254:42   [31,26,6]p31   [31,26,6]p31  
2021-05-28T00:46:45.808430+  2021-05-28T00:46:15.122042+
2.7 0 0  00  00   0
0  active+clean21h  0'0254:42   [12,7,19]p12   [12,7,19]p12  
2021-05-28T00:46:39.313525+  2021-05-28T00:46:15.122042+
2.8 0 0  00  00   0
0  active+clean21h  0'0254:42  [20,21,11]p20  [20,21,11]p20  
2021-05-28T00:46:38.840636+  2021-05-28T00:46:15.122042+
2.9 0 0  00  00   0
0  active+clean21h  0'0254:42  [31,14,10]p31  [31,14,10]p31  
2021-05-28T00:46:46.791644+  2021-05-28T00:46:15.122042+
2.a 0 0  00  00   0
0  active+clean21h  0'0254:42  [16,27,35]p16  [16,27,35]p16  
2021-05-28T00:46:39.025320+  2021-05-28T00:46:15.122042+
2.b 0 0  00  00   0
0  active+clean21h  0'0254:42  [20,15,11]p20  [20,15,11]p20  
2021-05-28T00:46:42.841924+  2021-05-28T00:46:15.122042+
2.c 0 0  00  00   0
0  active+clean21h  0'0254:42   [32,11,0]p32   [32,11,0]p32  
2021-05-28T00:46:38.403701+  2021-05-28T00:46:15.122042+
2.d 0 0  00  00   0
0  active+clean21h  0'0254:56 [5,19,3]p5 [5,19,3]p5  
2021-05-28T00:46:39.808986+  2021-05-28T00:46:15.122042+
2.e 0 0  00  00   0
0  active+clean21h  0'0254:42  [27,13,17]p27  [27,13,17]p27  
2021-05-28T00:46:42.253293+  2021-05-28T00:46:15.122042+
2.f 0 0  00  00   0
0  active+clean21h  0'0254:42  [11,22,18]p11  [11,22,18]p11  
2021-05-28T00:46:38.721405+  2021-05-28T00:46:15.122042+
2.100 0  00  00   0
0  active+clean21h  0'0254:42   [10,17,7]p10   [10,17,7]p10  
2021-05-28T00:46:38.770867+  2021-05-28T00:46:15.122042+
2.110 0  00  0

[ceph-users] HEALTH_WARN Reduced data availability: 33 pgs inactive

2021-05-28 Thread Jeremy Hansen

I’m trying to understand this situation:

ceph health detail
HEALTH_WARN Reduced data availability: 33 pgs inactive
[WRN] PG_AVAILABILITY: Reduced data availability: 33 pgs inactive
pg 1.0 is stuck inactive for 20h, current state unknown, last acting []
pg 2.0 is stuck inactive for 20h, current state unknown, last acting []
pg 2.1 is stuck inactive for 20h, current state unknown, last acting []
pg 2.2 is stuck inactive for 20h, current state unknown, last acting []
pg 2.3 is stuck inactive for 20h, current state unknown, last acting []
pg 2.4 is stuck inactive for 20h, current state unknown, last acting []
pg 2.5 is stuck inactive for 20h, current state unknown, last acting []
pg 2.6 is stuck inactive for 20h, current state unknown, last acting []
pg 2.7 is stuck inactive for 20h, current state unknown, last acting []
pg 2.8 is stuck inactive for 20h, current state unknown, last acting []
pg 2.9 is stuck inactive for 20h, current state unknown, last acting []
pg 2.a is stuck inactive for 20h, current state unknown, last acting []
pg 2.b is stuck inactive for 20h, current state unknown, last acting []
pg 2.c is stuck inactive for 20h, current state unknown, last acting []
pg 2.d is stuck inactive for 20h, current state unknown, last acting []
pg 2.e is stuck inactive for 20h, current state unknown, last acting []
pg 2.f is stuck inactive for 20h, current state unknown, last acting []
pg 2.10 is stuck inactive for 20h, current state unknown, last acting []
pg 2.11 is stuck inactive for 20h, current state unknown, last acting []
pg 2.12 is stuck inactive for 20h, current state unknown, last acting []
pg 2.13 is stuck inactive for 20h, current state unknown, last acting []
pg 2.14 is stuck inactive for 20h, current state unknown, last acting []
pg 2.15 is stuck inactive for 20h, current state unknown, last acting []
pg 2.16 is stuck inactive for 20h, current state unknown, last acting []
pg 2.17 is stuck inactive for 20h, current state unknown, last acting []
pg 2.18 is stuck inactive for 20h, current state unknown, last acting []
pg 2.19 is stuck inactive for 20h, current state unknown, last acting []
pg 2.1a is stuck inactive for 20h, current state unknown, last acting []
pg 2.1b is stuck inactive for 20h, current state unknown, last acting []
pg 2.1c is stuck inactive for 20h, current state unknown, last acting []
pg 2.1d is stuck inactive for 20h, current state unknown, last acting []
pg 2.1e is stuck inactive for 20h, current state unknown, last acting []
pg 2.1f is stuck inactive for 20h, current state unknown, last acting []
[ceph: root@cn01 /]# date
Sat May 29 01:28:37 UTC 2021
[ceph: root@cn01 /]# ceph pg dump_stuck inactive
PG_STAT  STATEUP  UP_PRIMARY  ACTING  ACTING_PRIMARY
2.1f unknown  []  -1  []  -1
2.1e unknown  []  -1  []  -1
2.1d unknown  []  -1  []  -1
2.1c unknown  []  -1  []  -1
2.1b unknown  []  -1  []  -1
2.1a unknown  []  -1  []  -1
2.19 unknown  []  -1  []  -1
2.18 unknown  []  -1  []  -1
2.17 unknown  []  -1  []  -1
2.16 unknown  []  -1  []  -1
2.15 unknown  []  -1  []  -1
2.14 unknown  []  -1  []  -1
2.13 unknown  []  -1  []  -1
2.12 unknown  []  -1  []  -1
2.11 unknown  []  -1  []  -1
2.10 unknown  []  -1  []  -1
2.f  unknown  []  -1  []  -1
2.9  unknown  []  -1  []  -1
2.b  unknown  []  -1  []  -1
2.c  unknown  []  -1  []  -1
2.e  unknown  []  -1  []  -1
2.a  unknown  []  -1  []  -1
2.d  unknown  []  -1  []  -1
2.8  unknown  []  -1  []  -1
2.7  unknown  []  -1  []  -1
2.6  unknown  []  -1  []  -1
2.5  unknown  []  -1  []  -1
2.0  unknown  []  -1  []  -1
1.0  unknown  []  -1  []  -1
2.3  unknown  []  -1  []  -1
2.1  unknown  []  -1  []  -1
2.2  unknown  []  -1  []  -1
2.4  unknown  []  -1  []  -1
ok


[ceph: root@cn01 /]# ceph pg 2.4 query
Couldn't parse JSON : Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File "/usr/bin/ceph", line 1310, in 
retval = main()
  File "/usr/bin/ceph", line 1230, in main
si

[ceph-users] Re: Remapping OSDs under a PG

[ceph-users] Re: Remapping OSDs under a PG

[ceph-users] Re: XFS on RBD on EC painfully slow

[ceph-users] Re: Messed up placement of MDS

[ceph-users] Re: cephfs auditing

[ceph-users] Re: XFS on RBD on EC painfully slow

[ceph-users] Re: Remapping OSDs under a PG

[ceph-users] Re: XFS on RBD on EC painfully slow

[ceph-users] Re: XFS on RBD on EC painfully slow

[ceph-users] Re: XFS on RBD on EC painfully slow

[ceph-users] Re: XFS on RBD on EC painfully slow

[ceph-users] mons assigned via orch label 'committing suicide' upon reboot.

[ceph-users] Fwd: Re: Ceph osd will not start.

[ceph-users] Re: Remapping OSDs under a PG

[ceph-users] HEALTH_WARN Reduced data availability: 33 pgs inactive

15 matches

Site Navigation

Mail list logo

Footer information