[ceph-users] Re: classes crush rules new cluster

2024-11-29 Thread Marc
I know a bit the work-arounds for manually editing the crush map. I just think 
this is not the best way to get acquainted to with a new ceph cluster. I would 
make these hdd,nvme,ssd classes available directly.
 
> You could decompile the crushmap, add a dummy OSD (with a non-existing
> ID) with your new device class and add a rule, then compile it and
> inject. Here's an excerpt from a lab cluster with 4 OSDs (0..3),
> adding a fifth non-existing:
> 
> device 4 osd.4 class test
> 
> rule testrule {
>  id 6
>  type erasure
>  step set_chooseleaf_tries 5
>  step set_choose_tries 100
>  step take default class test
>  step chooseleaf indep 0 type host
>  step emit
> }
> 
> Note that testing this rule with crushtool won't work here since the
> fake OSD isn't assigned to a hosts.
> 
> But what's the point in having a rule without the corresponding
> devices? You won't be able to create a pool with that rule anyway
> until the OSDs are present.
> 
> Zitat von Marc :
> 
> > It looks like it is not possible to create crush rules when you
> > don't have harddrives active in this class.
> >
> > I am testing with new squid and did not add ssd's yet, eventhough I
> > added class like this.
> >
> > ceph osd crush class create ssd
> >
> > I can't execute this
> > ceph osd crush rule create-replicated replicated_ssd default host ssd
> >
> > Is there any way around this?
> >
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: classes crush rules new cluster

2024-11-29 Thread Andre Tann

Hi yall,

Am 29.11.24 um 08:51 schrieb Eugen Block:


rule testrule {
     id 6
     type erasure
     step set_chooseleaf_tries 5
     step set_choose_tries 100
     step take default class test
     step chooseleaf indep 0 type host
     step emit
}


Does anyone know a good and comprehensive discussion about all the 
options for a crush rule, and what they do.


Of course I know the original documentation, but I find that too short, 
and leaves me with many questions.


Thanks for any hints.

--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: new cluser ceph osd perf = 0

2024-11-29 Thread Janne Johansson
I see the same on a newly deployed 17.2.8 cluster.
all empty perf values.

Den tors 28 nov. 2024 kl 23:45 skrev Marc :
>
>
>
> My ceph osd perf are all 0, do I need to enable module for this? 
> osd_perf_query? Where should I find this in manuals? Or do I just need to 
> wait?
>
>
> [@ target]# ceph osd perf
> osd  commit_latency(ms)  apply_latency(ms)
>  25   0  0
>  24   0  0
>  23   0  0
>  22   0  0
>  21   0  0
>  20   0  0
>  19   0  0
>  18   0  0
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CephFS] Completely exclude some MDS rank from directory processing

2024-11-29 Thread Eugen Block

Hi,

> I mean that even if I pin all top dirs (of course without repinning 
on next levels) to rank 1 - I see some amount of Reqs on rank 1.


I assume you mean if you pin all top dirs to rank 0, you still see IO on 
rank 1? I still can't reproduce that, I waited for 15 minutes or so with 
rank 1 down, but I still could read/write to the rank 0 pinned dirs. And 
no IO visible on rank 1.


But what I don't fully understand yet is, I have a third directory which 
is unpinned:


ll /mnt/
insgesamt 0
drwxr-xr-x 2 root root 4023 29. Nov 09:41 dir1
drwxr-xr-x 2 root root    0 22. Nov 12:18 dir2
drwxr-xr-x 2 root root   11 29. Nov 09:34 dir3

dir1 and dir2 are pinned to rank 0, dir3 is unpinned:

getfattr -n ceph.dir.pin /mnt/dir3
# file: mnt/dir3
ceph.dir.pin="-1"

Shouldn't rank 0 take over dir3 as well since it's the only active rank 
left? I couldn't read/write into dir3 until I brought another mds daemon 
back up.


Am 26.11.24 um 10:08 schrieb Александр Руденко:

And, Eugen, try to see ceph fs status during write.

I can see next INOS, DNS and Reqs distribution:
RANK  STATE   MDS ACTIVITY     DNS    INOS   DIRS   CAPS
 0    active   c   Reqs:    127 /s  12.6k  12.5k   333  505
 1    active   b   Reqs:    11 /s    21     24     19  1

I mean that even if I pin all top dirs (of course without repinning on 
next levels) to rank 1 - I see some amount of Reqs on rank 1.



вт, 26 нояб. 2024 г. в 12:01, Александр Руденко :

Hm, the same test worked for me with version 16.2.13... I
mean, I only
do a few writes from a single client, so this may be an
invalid test,
but I don't see any interruption.


I tried many times and I'm sure that my test is correct.
Yes, write can be active for some time after rank 1 went down,
maybe tens seconds. And listing files (ls) can work some time for
dirs which were listed before rank down, but only fews seconds.

Before shutdown rank 1 I run write in this way:

while true; do dd if=/dev/vda of=/cephfs-mount/dir1/`uuidgen`
count=1 oflag=direct; sleep 0.01; done

Maybe it depends on the RPS...

пт, 22 нояб. 2024 г. в 14:48, Eugen Block :

Hm, the same test worked for me with version 16.2.13... I
mean, I only
do a few writes from a single client, so this may be an
invalid test,
but I don't see any interruption.

Zitat von Eugen Block :

> I just tried to reproduce the behaviour but failed to do so.
I have
> a Reef (18.2.2) cluster with multi-active MDS. Don't mind the
> hostnames, this cluster was deployed with Nautilus.
>
> # mounted the FS
> mount -t ceph nautilus:/ /mnt -o
> name=admin,secret=,mds_namespace=secondfs
>
> # created and pinned directories
> nautilus:~ # mkdir /mnt/dir1
> nautilus:~ # mkdir /mnt/dir2
>
> nautilus:~ # setfattr -n ceph.dir.pin -v 0 /mnt/dir1
> nautilus:~ # setfattr -n ceph.dir.pin -v 0 /mnt/dir2
>
> I stopped all standby daemons while writing into /mnt/dir1,
then I
> also stopped rank 1. But the writes were not interrupted
(until I
> stopped them). You're on Pacific, I'll see if I can
reproduce it
> there.
>
> Zitat von Александр Руденко :
>
>>>
>>> Can you show the entire 'ceph fs status' output? Any maybe
also 'ceph
>>> fs dump'?
>>
>>
>> Nothing special, just smoll test cluster.
>> fs1 - 10 clients
>> ===
>> RANK  STATE   MDS     ACTIVITY     DNS    INOS  DIRS   CAPS
>> 0    active   a   Reqs:    0 /s  18.7k  18.4k  351    513
>> 1    active   b   Reqs:    0 /s    21     24  16      1
>>  POOL      TYPE     USED  AVAIL
>> fs1_meta  metadata   116M  3184G
>> fs1_data    data    23.8G  3184G
>> STANDBY MDS
>>     c
>>
>>
>> fs dump
>>
>> e48
>> enable_multiple, ever_enabled_multiple: 1,1
>> default compat: compat={},rocompat={},incompat={1=base
v0.20,2=client
>> writeable ranges,3=default file layouts on dirs,4=dir inode
in separate
>> object,5=mds uses versioned encoding,6=dirfrag is stored in
omap,8=no
>> anchor table,9=file layout v2,10=snaprealm v2}
>> legacy client fscid: 1
>>
>> Filesystem 'fs1' (1)
>> fs_name fs1
>> epoch 47
>> flags 12
>> created 2024-10-15T18:55:10.905035+0300
>> modified 2024-11-21T10:55:12.688598+0300
>> tableserver 0
>> root 0
>> session_timeout 60
>> session_autoclose 300
>> max_file_size 1099511627776
>> required_client_features {}
>> last_failure 0
>> last_failure_osd_epoch 943
>> compat compat={},rocompat={},incompat={

[ceph-users] Re: new cluser ceph osd perf = 0

2024-11-29 Thread Eugen Block

I tried to get the counters, then I was pointed to enabling the module:

# ceph osd perf
osd  commit_latency(ms)  apply_latency(ms)
 11   0  0
  8   0  0
  6   0  0
  1   0  0
  0   0  0
  2   0  0
  3   0  0
  4   0  0
  5   0  0

# ceph osd perf counters get 0
Error ENOTSUP: Module 'osd_perf_query' is not enabled/loaded (required  
by command 'osd perf counters get'): use `ceph mgr module enable  
osd_perf_query` to enable it


After enabling it, I get values:

# ceph osd perf
osd  commit_latency(ms)  apply_latency(ms)
 11 184184
  8   8  8
  6  17 17
  1   4  4
  0   3  3
  2  72 72
  3   9  9
  4  33 33
  5 166166

# ceph osd perf query add all_subkeys

added query all_subkeys with id 0

# ceph osd perf counters get 0
++--+-+---++---+--+-+---+--+
|  CLIENT_ID |   CLIENT_ADDRESS |POOL_ID  |NAMESPACE   
|OSD_ID  |PG_ID  |OBJECT_NAME   |SNAP_ID  |WRITE_OPS   
|READ_OPS  |

++--+-+---++---+--+-+---+--+
|client.2624154  | IP:0/47925986   |   18|   |  11|  
18.0  |data_loggenerations_metadata  |  head   |0  |   6  |
|client.2624154  | IP:0/47925986   |   19|   |  2 |  
19.1  |  notify.0|  head   |0  |   6  |

...

Zitat von Janne Johansson :


I see the same on a newly deployed 17.2.8 cluster.
all empty perf values.

Den tors 28 nov. 2024 kl 23:45 skrev Marc :




My ceph osd perf are all 0, do I need to enable module for this?  
osd_perf_query? Where should I find this in manuals? Or do I just  
need to wait?



[@ target]# ceph osd perf
osd  commit_latency(ms)  apply_latency(ms)
 25   0  0
 24   0  0
 23   0  0
 22   0  0
 21   0  0
 20   0  0
 19   0  0
 18   0  0
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: internal communication network

2024-11-29 Thread Eugen Block

Hi,

just to clarify, the public network (in your case 192.168.1.0/24) is  
basically for all traffic if you don't have a cluster network defined.  
If you do, it will be only used for OSD to OSD communication for  
replication, recovery and heartbeats [0]. The communication to the  
MONs will happen via public network anyway.


[0]  
https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#cluster-network


Zitat von Michel Niyoyita :


Hello team ,

I am creating new cluster which will be created using CEPHADM, I will use
192.168.1.0/24 as public network and 10.10.90.0/24 as internal network for
osd , mon communication . would like if this command is helpful as it is my
first time to use caphadm .  sudo cephadm bootstrap --mon-ip 192.168.1.23
--cluster-network 10.10.90.0/24  .

Kindly help me if there is alternative.

Best regards

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snaptriming speed degrade with pg increase

2024-11-29 Thread Frédéric Nass
Hi Istvan,

Did the PG split involved using more OSDs than before? If so then increasing 
these values (apart from the sleep) should not have a negative impact on 
clients I/O compared to before the split and should accelerate the whole 
process.

Did you reshard the buckets as discussed in the other thread?

Regards,
Frédéric.

- Le 29 Nov 24, à 3:30, Istvan Szabo, Agoda istvan.sz...@agoda.com a écrit :

> Hi,
> 
> When we scale the placement group on a pool located in a full nvme cluster, 
> the
> snaptriming speed degrades a lot.
> Currently we are running with these values to not degrade client op and have
> some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on ubuntu
> 20.04)
> 
> -osd_max_trimming_pgs=2
> --osd_snap_trim_sleep=0.1
> --osd_pg_max_concurrent_snap_trims=2
> 
> We had a big pool which we used to have 128PG and that length of the
> snaptrimming took around 45-60 minutes.
> Due to impossible to do maintenance on the cluster with 600GB pg sizes because
> it can easily max out a cluster (which we did), we increased to 1024 and the
> snaptrimming duration increased to 3.5 hours.
> 
> Is there any good solution that we are missing to fix this?
> 
> On the hardware level I've changed server profile to tune some numa settings 
> but
> seems like didn't help still.
> 
> Thank you
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: internal communication network

2024-11-29 Thread Frédéric Nass
Hi Michel,

This is correct. Don't see anything wrong with that.

Regards,
Frédéric.

- Le 28 Nov 24, à 8:16, Michel Niyoyita mico...@gmail.com a écrit :

> Hello team ,
> 
> I am creating new cluster which will be created using CEPHADM, I will use
> 192.168.1.0/24 as public network and 10.10.90.0/24 as internal network for
> osd , mon communication . would like if this command is helpful as it is my
> first time to use caphadm .  sudo cephadm bootstrap --mon-ip 192.168.1.23
> --cluster-network 10.10.90.0/24  .
> 
> Kindly help me if there is alternative.
> 
> Best regards
> 
> Michel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snaptriming speed degrade with pg increase

2024-11-29 Thread Szabo, Istvan (Agoda)
Increased from 9 servers to 11 so let's say 20% capacity and performance added.

This is a different cluster purely rbd.

(For the other topic can't be resharded because in multisite it will disappear 
all the data disappear on remote site, need to create new bucket and migrate 
data first to a higher sharded bucket).

Istvan

From: Frédéric Nass 
Sent: Friday, November 29, 2024 4:58:52 PM
To: Szabo, Istvan (Agoda) 
Cc: Ceph Users 
Subject: Re: [ceph-users] Snaptriming speed degrade with pg increase

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi Istvan,

Did the PG split involved using more OSDs than before? If so then increasing 
these values (apart from the sleep) should not have a negative impact on 
clients I/O compared to before the split and should accelerate the whole 
process.

Did you reshard the buckets as discussed in the other thread?

Regards,
Frédéric.

- Le 29 Nov 24, à 3:30, Istvan Szabo, Agoda istvan.sz...@agoda.com a écrit :

> Hi,
>
> When we scale the placement group on a pool located in a full nvme cluster, 
> the
> snaptriming speed degrades a lot.
> Currently we are running with these values to not degrade client op and have
> some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on ubuntu
> 20.04)
>
> -osd_max_trimming_pgs=2
> --osd_snap_trim_sleep=0.1
> --osd_pg_max_concurrent_snap_trims=2
>
> We had a big pool which we used to have 128PG and that length of the
> snaptrimming took around 45-60 minutes.
> Due to impossible to do maintenance on the cluster with 600GB pg sizes because
> it can easily max out a cluster (which we did), we increased to 1024 and the
> snaptrimming duration increased to 3.5 hours.
>
> Is there any good solution that we are missing to fix this?
>
> On the hardware level I've changed server profile to tune some numa settings 
> but
> seems like didn't help still.
>
> Thank you
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snaptriming speed degrade with pg increase

2024-11-29 Thread Frédéric Nass
- Le 29 Nov 24, à 11:11, Istvan Szabo, Agoda  a 
écrit : 

> Increased from 9 servers to 11 so let's say 20% capacity and performance 
> added.

> This is a different cluster purely rbd.

I see, so big objects. You might want to increase osd_max_trimming_pgs and 
eventually osd_pg_max_concurrent_snap_trims and see how it goes. 

> (For the other topic can't be resharded because in multisite it will disappear
> all the data disappear on remote site, need to create new bucket and migrate
> data first to a higher sharded bucket).

Hum... You have fallen significantly behind on Ceph versions, which must be 
hindering you in many operational tasks today. Another option would be to catch 
up and reshard into a recent version in multi-site mode. 

Frédéric. 

> Istvan

> From: Frédéric Nass 
> Sent: Friday, November 29, 2024 4:58:52 PM
> To: Szabo, Istvan (Agoda) 
> Cc: Ceph Users 
> Subject: Re: [ceph-users] Snaptriming speed degrade with pg increase
> Email received from the internet. If in doubt, don't click any link nor open 
> any
> attachment !
> 

> Hi Istvan,

> Did the PG split involved using more OSDs than before? If so then increasing
> these values (apart from the sleep) should not have a negative impact on
> clients I/O compared to before the split and should accelerate the whole
> process.

> Did you reshard the buckets as discussed in the other thread?

> Regards,
> Frédéric.

> - Le 29 Nov 24, à 3:30, Istvan Szabo, Agoda istvan.sz...@agoda.com a 
> écrit :

> > Hi,

> > When we scale the placement group on a pool located in a full nvme cluster, 
> > the
> > snaptriming speed degrades a lot.
> > Currently we are running with these values to not degrade client op and have
> > some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on ubuntu
> > 20.04)

> > -osd_max_trimming_pgs=2
> > --osd_snap_trim_sleep=0.1
> > --osd_pg_max_concurrent_snap_trims=2

> > We had a big pool which we used to have 128PG and that length of the
> > snaptrimming took around 45-60 minutes.
> > Due to impossible to do maintenance on the cluster with 600GB pg sizes 
> > because
> > it can easily max out a cluster (which we did), we increased to 1024 and the
> > snaptrimming duration increased to 3.5 hours.

> > Is there any good solution that we are missing to fix this?

> > On the hardware level I've changed server profile to tune some numa 
> > settings but
> > seems like didn't help still.

> > Thank you
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Replacing Ceph Monitors for Openstack

2024-11-29 Thread Adrien Georget

Hello,

We are using Ceph as a storage backend for Openstack (Cinder, Nova, 
Glance, Manila) and we are replacing old hardware hosting Ceph monitors 
(MON,MGR,MDS) to new ones.
I have already added the new ones in production, monitors successfully 
joined the quorum and new MGR/MDS are standby.


For the monitors, I'm sure that the monmap is already up to date and 
Openstack clients are already aware of the change and it should not be a 
problem when I will next shut down the old monitors.
The ceph.conf will be updated in all Openstack controllers to replace 
"mon host" with the new ones before shutting old mons down.


But I have some doubts with the resilience of Openstack Manila service 
because the IP addresses of the monitors look hardcoded in the export 
location of the manila share :

The manila show command returns for example :

| export_locations | |
|   | path = 
134.158.208.140:6789,134.158.208.141:6789,134.158.208.142:6789:/volumes/EC_manila/_nogroup/7a6c05d9-2fea-43b1-a6d4-06eec1e384f2 
|
|   | share_instance_id = 
7a6c05d9-2fea-43b1-a6d4-06eec1e384f2 |



Has anyone already done this kind of migration in the past and can 
confirm my doubts?

Is there any process to update shares?

Cheers,
Adrien
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: classes crush rules new cluster

2024-11-29 Thread Eugen Block
Which questions do you have? When I first started to deal with crush  
rules I was overwhelmed, but with a bit of practice and trial & error  
you're going to figure it out.


Maybe this helps a bit (inline comments):

id 6 -> self explanatory

type erasure -> self explanatory

step set_chooseleaf_tries 5 -> stick to defaults, usually works  
(number of max attempts to find suitable OSDs)


step set_choose_tries 100 -> stick to defaults, usually works (number  
of max attempts to find suitable buckets, e.g. hosts)


step take default class test -> "default" is the usual default crush  
root (check 'ceph osd tree'), you can specify other roots if you have  
them


step chooseleaf indep 0 type host -> within bucket "root" (from "step  
take default") choose {pool-num-replicas} hosts


step emit -> execute

There are more details in [0].

@Marc: I guess one could argue about that. On the one hand, those  
three device classes you mention are discovered automatically when  
OSDs are added (depending on controllers, etc.), so an operator  
doesn't have to deal with it. Just create rules using those classes.  
On the other hand, some users like to have full control over  
everything and don't need that kind of automatic discovery.



[0]  
https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/#crush-map-rules


Zitat von Andre Tann :


Hi yall,

Am 29.11.24 um 08:51 schrieb Eugen Block:


rule testrule {
    id 6
    type erasure
    step set_chooseleaf_tries 5
    step set_choose_tries 100
    step take default class test
    step chooseleaf indep 0 type host
    step emit
}


Does anyone know a good and comprehensive discussion about all the  
options for a crush rule, and what they do.


Of course I know the original documentation, but I find that too  
short, and leaves me with many questions.


Thanks for any hints.

--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: internal communication network

2024-11-29 Thread Michel Niyoyita
Thank you all


On Fri, Nov 29, 2024 at 12:05 PM Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:

> Hi Michel,
>
> This is correct. Don't see anything wrong with that.
>
> Regards,
> Frédéric.
>
> - Le 28 Nov 24, à 8:16, Michel Niyoyita mico...@gmail.com a écrit :
>
> > Hello team ,
> >
> > I am creating new cluster which will be created using CEPHADM, I will use
> > 192.168.1.0/24 as public network and 10.10.90.0/24 as internal network
> for
> > osd , mon communication . would like if this command is helpful as it is
> my
> > first time to use caphadm .  sudo cephadm bootstrap --mon-ip 192.168.1.23
> > --cluster-network 10.10.90.0/24  .
> >
> > Kindly help me if there is alternative.
> >
> > Best regards
> >
> > Michel
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Additional rgw pool

2024-11-29 Thread Rok Jaklič
Hi,

we are already running the "default" rgw pool with some users.

Data is stored in pool:
pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5
min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512
autoscale_mode on last_change 309346 lfor 0/127784/214408 flags
hashpspool,ec_overwrites,backfillfull stripe_width 12288 application rgw

Is it possible to create another rgw pool with diferent ec profile and
associate those pools to specific s3 users?

Kind regards,
Rok
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-29 Thread Igor Fedotov

Hi Frederic,

>My question was more about why bluefs would still fail to allocate 4k 
chunks after being allowed to do so by 
https://tracker.ceph.com/issues/53466 (John's case with v17.2.6 actually)


My hypothesis is that it's facing real "no-space" case not the one I 
explained above. But again  - we haven't performed thorough analysis of 
John's case, so we are just speculating...


>Is BlueFS aware of the remaining space and maybe using some sort of 
reserved blocks/chunks like other filesystems to handle full/near null 
situations ?


What remaining space are you talking here - if the one consisting of 
small (<64K) extents only then BlueFS hopefully uses it since v17.2.6.
Reserving more spare space at Bluefs on mkfs? Previously at Bluestore 
team we had discussions on something like that to permit easier recovery 
from "no-space" cases. No final solution has been taken yet and in fact 
this provides no complete fix for the issue anyway - that spare space 
might end at some point as well...


>If so, then it should never crash, right?
What we have here is not a real crash (although looks like that) - it's 
an expected  assertion. We just don't have good enough [automatic] 
scenario to exit from this state. Hence "crying" about that aloud.


The problem is  that by its design RocksDB has to write out some data 
(either due to the need for internal maintenance or to fulfill client 
requests) on any update access. So at some point we have no space to 
record such a transaction. How one can proceed in that case - refuse to 
execute it and return an error? OK, but what's next? Any followup data 
removal would need some transactions to be recorded in RocksDB as well. 
And DB/FS would need more space for that. Use some reserved spare space? 
But that's conceptually similar to stalling OSD writes at some free 
space threshold we already have at OSD level - as you can see this 
protection doesn't work from time to time.

So IMO it's better polish existing protection means then.

Not to mention - I'm pretty sure that someone would abuse additional 
spare space mechanics if any and finally face the same problem at some 
point. Thus triggering another iteration of  it.. ;)


Thanks,
Igor


On 29.11.2024 10:20, Frédéric Nass wrote:

Hi Igor,

Thank you for taking the time to explains the fragmentation issue. I 
had figured out the most part of it by reading the tracker and the PR 
but it's always clearer when you explain it.


My question was more about why bluefs would still fail to allocate 4k 
chunks after being allowed to do so by 
https://tracker.ceph.com/issues/53466 (John's case with v17.2.6 actually)


Is BlueFS aware of the remaining space and maybe using some sort of 
reserved blocks/chunks like other filesystems to handle full/near null 
situations ? If so, then it should never crash, right?

Like other filesystems don't crash, drives's firmwares dont crash, etc.

Thanks,
Frédéric.

- Le 28 Nov 24, à 12:52, Igor Fedotov  a 
écrit :


Hi Frederic,

here is an overview of the case when BlueFS ıs unable to allocate
more space at main/shared device albeıt free space is available.
Below I'm talking about stuff exısted before fıxıng
https://tracker.ceph.com/issues/53466.

First of al - BlueFS's minimal allocation unit for shared device
was bluefs_shared_alloc_size (=64K by default). Which means that
it was unable to use e.g. 2x32K or 16x4K chunks when it needed
additional 64K bytes.

Secondly - sometimes RocksDB performs recovery - and some other
maintenance tasks that require space allocation - on startup.
Which evidently triggers allocation of N*64K chunks from shared
device.

Thirdly - a while ago we switched to 4K chunk allocations for user
data (please not confuse with BlueFS allocation). Which
potentially could result ın specific free space fragmentation
pattern when there ıs limited (or even empty) set of long (>=64K)
chunks free. Still technically having enough free space available.
E.g. free extent list could look like (off~len, both in hex):

0x0~1000, 0x2000~1000, 0x4000~2000, 0x1~4000, 0x2000~1000, etc...

In that case original BlueFS allocator implementation was unable
to locate more free space which in turn was effectively breaking
both RockDB and OSD boot up.

One should realize that the above free space fragmentation depends
on a bunch of factors, none of which is absolutely dominating:

1. how user write/remove objects

2. how allocator seeks for free space

3. how much free space is available

So we don't have full control on 1. and 3. and have limited
opportunities in tuning 2.

Small device sizes and high space utilization severely increase
the probability for the issue to happen but theoretically even a
large disk with mediocre utilization could reach "bad" state over
time if used (by both clients and allocator)
"improperly/inefficiently". H

[ceph-users] Single unfound object in cluster with no previous version - is there anyway to recover rather than deleting the object?

2024-11-29 Thread Ivan Clayson

Hello,

We have an Alma8.9 (version 4 kernel) quincy (17.2.7) CephFS cluster 
with spinners for our bulk data and SSDs for the metadata where we have 
a single unfound object in the bulk pool:


   [root@ceph-n30 ~]# ceph -s
  cluster:
    id: fa7cf62b-e261-49cd-b00e-383c36b79ef3
    health: HEALTH_ERR
    1/849660811 objects unfound (0.000%)
    Possible data damage: 1 pg recovery_unfound
    Degraded data redundancy: 9/8468903874 objects degraded
   (0.000%), 1 pg degraded

  services:
    mon: 3 daemons, quorum ceph-s2,ceph-s3,ceph-s1 (age 44h)
    mgr: ceph-s2(active, since 45h), standbys: ceph-s3, ceph-s1
    mds: 1/1 daemons up, 3 standby
    osd: 439 osds: 439 up (since 43h), 439 in (since 43h); 176
   remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   9 pools, 4321 pgs
    objects: 849.66M objects, 2.3 PiB
    usage:   3.0 PiB used, 1.7 PiB / 4.6 PiB avail
    pgs: 9/8468903874 objects degraded (0.000%)
 36630744/8468903874 objects misplaced (0.433%)
 1/849660811 objects unfound (0.000%)
 4122 active+clean
 174  active+remapped+backfill_wait
 22   active+clean+scrubbing+deep
 2    active+remapped+backfilling
 1    active+recovery_unfound+degraded

  io:
    client:   669 MiB/s rd, 87 MiB/s wr, 302 op/s rd, 77 op/s wr
    recovery: 175 MiB/s, 59 objects/s
   [root@ceph-n30 ~]# ceph health detail | grep unfound
   HEALTH_ERR 1/849661114 objects unfound (0.000%); Possible data
   damage: 1 pg recovery_unfound; Degraded data redundancy:
   9/8468906904 objects degraded (0.000%), 1 pg degraded
   [WRN] OBJECT_UNFOUND: 1/849661114 objects unfound (0.000%)
    pg 2.c90 has 1 unfound objects
   [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
    pg 2.c90 is active+recovery_unfound+degraded, acting
   [259,210,390,209,43,66,322,297,25,374], 1 unfound
    pg 2.c90 is active+recovery_unfound+degraded, acting
   [259,210,390,209,43,66,322,297,25,374], 1 unfound

We've tried deep-scrubbing and repairing the PG as well as rebooting the 
entire cluster but unfortunately this has not resolved our issue.


The primary OSD (259) log reports that our 1009e1df26d.00c9 object 
is missing where when we do rados commands on the object that command 
just hangs:


   [root@ceph-n30 ~]# grep 2.c90 /var/log/ceph/ceph-osd.259.log
   ...
   2024-11-25T11:38:33.860+ 7fd409870700  1 osd.259 pg_epoch:
   512353 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512348/512349 n=211842
   ec=1175/1168 lis/c=512348/472766 les/c/f=512349/472770/232522
   sis=512353 pruub=11.010143280s)
   [259,210,390,209,43,66,322,297,NONE,374]p259(0) r=0 lpr=512353
   pi=[472766,512353)/11 crt=512310'8145216 mlcod 0'0 unknown pruub
   205.739364624s@ m=1 mbc={}] state: transitioning to Primary
   2024-11-25T11:38:54.926+ 7fd409870700  1 osd.259 pg_epoch:
   512356 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512353/512354 n=211842
   ec=1175/1168 lis/c=512353/472766 les/c/f=512354/472770/232522
   sis=512356 pruub=11.945847511s)
   [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512356
   pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0 active pruub
   227.741577148s@ m=1
   
mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(0+0)=1},9={(1+0)=1}}]
   start_peering_interval up
   [259,210,390,209,43,66,322,297,2147483647,374] ->
   [259,210,390,209,43,66,322,297,25,374], acting
   [259,210,390,209,43,66,322,297,2147483647,374] ->
   [259,210,390,209,43,66,322,297,25,374], acting_primary 259(0) ->
   259, up_primary 259(0) -> 259, role 0 -> 0, features acting
   4540138320759226367 upacting 4540138320759226367
   2024-11-25T11:38:54.926+ 7fd409870700  1 osd.259 pg_epoch:
   512356 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512353/512354 n=211842
   ec=1175/1168 lis/c=512353/472766 les/c/f=512354/472770/232522
   sis=512356 pruub=11.945847511s)
   [259,210,390,209,43,66,322,297,25,374]p259(0) r=0 lpr=512356
   pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0 unknown pruub
   227.741577148s@ m=1 mbc={}] state: transitioning to Primary
   2024-11-25T11:38:59.910+ 7fd409870700  0 osd.259 pg_epoch:
   512359 pg[2.c90s0( v 512310'8145216 lc 0'0
   (511405'8142151,512310'8145216] local-lis/les=512356/512357 n=211842
   ec=1175/1168 lis/c=512356/472766 les/c/f=512357/472770/232522
   sis=512356) [259,210,390,209,43,66,322,297,25,374]p259(0) r=0
   lpr=512356 pi=[472766,512356)/10 crt=512310'8145216 mlcod 0'0
   active+recovering+degraded rops=1 m=1
   
mbc={0={(0+0)=1},1={(1+0)=1},2={(1+0)=1},3={(1+0)=1},4={(1+0)=1},5={(1+0)=1},6={(1+0)=1},7={(1+0)=1},8={(1+0)=1},9={(1+0)=1}}
   trimq=[13f6e~134]] get_remai

[ceph-users] Re: Additional rgw pool

2024-11-29 Thread Janne Johansson
> we are already running the "default" rgw pool with some users.
>
> Data is stored in pool:
> pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5
> min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512
> autoscale_mode on last_change 309346 lfor 0/127784/214408 flags
> hashpspool,ec_overwrites,backfillfull stripe_width 12288 application rgw
>
> Is it possible to create another rgw pool with diferent ec profile and
> associate those pools to specific s3 users?

https://docs.ceph.com/en/latest/radosgw/placement/


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snaptriming speed degrade with pg increase

2024-11-29 Thread Szabo, Istvan (Agoda)
The reshard topic is running on quincy 17.2.7, but tested today the reshard, 
objects gone.

Istvan

From: Frédéric Nass 
Sent: Friday, November 29, 2024 5:17:27 PM
To: Szabo, Istvan (Agoda) 
Cc: Ceph Users 
Subject: Re: [ceph-users] Snaptriming speed degrade with pg increase

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

- Le 29 Nov 24, à 11:11, Istvan Szabo, Agoda  a 
écrit :
Increased from 9 servers to 11 so let's say 20% capacity and performance added.

This is a different cluster purely rbd.
I see, so big objects. You might want to increase osd_max_trimming_pgs and 
eventually osd_pg_max_concurrent_snap_trims and see how it goes.

(For the other topic can't be resharded because in multisite it will disappear 
all the data disappear on remote site, need to create new bucket and migrate 
data first to a higher sharded bucket).
Hum... You have fallen significantly behind on Ceph versions, which must be 
hindering you in many operational tasks today. Another option would be to catch 
up and reshard into a recent version in multi-site mode.

Frédéric.

Istvan

From: Frédéric Nass 
Sent: Friday, November 29, 2024 4:58:52 PM
To: Szabo, Istvan (Agoda) 
Cc: Ceph Users 
Subject: Re: [ceph-users] Snaptriming speed degrade with pg increase

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi Istvan,

Did the PG split involved using more OSDs than before? If so then increasing 
these values (apart from the sleep) should not have a negative impact on 
clients I/O compared to before the split and should accelerate the whole 
process.

Did you reshard the buckets as discussed in the other thread?

Regards,
Frédéric.

- Le 29 Nov 24, à 3:30, Istvan Szabo, Agoda istvan.sz...@agoda.com a écrit :

> Hi,
>
> When we scale the placement group on a pool located in a full nvme cluster, 
> the
> snaptriming speed degrades a lot.
> Currently we are running with these values to not degrade client op and have
> some progress on snaptrimmin, but it is terrible. (octopus 15.2.17 on ubuntu
> 20.04)
>
> -osd_max_trimming_pgs=2
> --osd_snap_trim_sleep=0.1
> --osd_pg_max_concurrent_snap_trims=2
>
> We had a big pool which we used to have 128PG and that length of the
> snaptrimming took around 45-60 minutes.
> Due to impossible to do maintenance on the cluster with 600GB pg sizes because
> it can easily max out a cluster (which we did), we increased to 1024 and the
> snaptrimming duration increased to 3.5 hours.
>
> Is there any good solution that we are missing to fix this?
>
> On the hardware level I've changed server profile to tune some numa settings 
> but
> seems like didn't help still.
>
> Thank you
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing Ceph Monitors for Openstack

2024-11-29 Thread Tyler Stachecki
On Fri, Nov 29, 2024, 5:33 AM Adrien Georget 
wrote:

> Hello,
>
> We are using Ceph as a storage backend for Openstack (Cinder, Nova,
> Glance, Manila) and we are replacing old hardware hosting Ceph monitors
> (MON,MGR,MDS) to new ones.
> I have already added the new ones in production, monitors successfully
> joined the quorum and new MGR/MDS are standby.
>
> For the monitors, I'm sure that the monmap is already up to date and
> Openstack clients are already aware of the change and it should not be a
> problem when I will next shut down the old monitors.
> The ceph.conf will be updated in all Openstack controllers to replace
> "mon host" with the new ones before shutting old mons down.
>
> But I have some doubts with the resilience of Openstack Manila service
> because the IP addresses of the monitors look hardcoded in the export
> location of the manila share :
> The manila show command returns for example :
>
> | export_locations | |
> |   | path =
> 134.158.208.140:6789,134.158.208.141:6789,134.158.208.142:6789:/volumes/EC_manila/_nogroup/7a6c05d9-2fea-43b1-a6d4-06eec1e384f2
>
> |
> |   | share_instance_id =
> 7a6c05d9-2fea-43b1-a6d4-06eec1e384f2 |
>
>
> Has anyone already done this kind of migration in the past and can
> confirm my doubts?
> Is there any process to update shares?
>
> Cheers,
> Adrien
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


I can't speak for Manila, but for Cinder/Glance/Nova this is a bit of a
headache. Unfortunately, the mon IPs get hard coded there as well, both in
the database and in the libvirt XML. Go to any nova-compute node with a
Ceph-backed Cinder volume (or Nova image, including config drives) attached
to it and run `virsh dumpxml ` and you'll see it.

Unfortunately, changing all of the mon IPs will result in the situation
where you can neither live-migrate your VMs nor will you be able to
start/hard reboot VMs until volumes are detached and attached with the new
monitor IPs.

The only way we found around this with zero downtime was to rebuild _some_
of the ceph-mons with new IPs, and then leverage some custom patches (which
I can share) that rewrite the libvirt and database info during a
live-migration (so, in essence, we had to live-migrate each VM once in
order to pull this off) with the new set of intended mon IPs (not the ones
currently in ceph.conf).

If you don't require live-migration or don't use it, you can probably get
away with just doing some database updates (carefully!). The VMs do observe
monmap changes at runtime like any other RADOS client - it's only when you
try to perform control plane actions against them that it becomes a
problem, because it'll use the mon IPs in the database (which are old) and
not from ceph.conf in that case.

Thanks,
Tyler
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: classes crush rules new cluster

2024-11-29 Thread Andre Tann

Ahoi Eugen,

Am 29.11.24 um 11:31 schrieb Eugen Block:

step set_chooseleaf_tries 5 -> stick to defaults, usually works (number 
of max attempts to find suitable OSDs)


Why do we need more than one attempt to find an OSD? Why is the result 
different if we walk through a rule more than once?



step take default class test -> "default" is the usual default crush 
root (check 'ceph osd tree'), you can specify other roots if you have them


where are these classes defined? Or is "default class test" the name of 
a root? Most probably not.

Could I also say step take default type host?
What are the keywords that are allowed after the root's name?



step chooseleaf indep 0 type host -> within bucket "root" (from "step 
take default") choose {pool-num-replicas} hosts


What if I did exactly this, but have nested fault domains (e.g. racks > 
hosts)? Would the rule then pick {pool-num-replicas} hosts out of 
different racks, even though this rule doesn't mention racks anywhere?


But what if I have size=4, but only two racks, would the picked hosts 
spread evenly across the two racks, or randomly, like 1 host in one 
rack, 3 in the other, or all 4 in one rack?


Assume a pool with size=4, could I say

  step take default
  choose firstn 1 type row
  choose firstn 3 type racks
  chooseleaf firstn 0 type host

Meaning:
- force all chunks of a pg in one row
- force all chunks in exactly three racks inside this row
- out of these three racks, pick 4 hosts

I don't want to say that the latter makes much sense, I just wonder if 
it would work that way.


--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing Ceph Monitors for Openstack

2024-11-29 Thread Eugen Block
Confirming Tyler's description, we had to do lots of database 
manipulation in order to get the new IPs into the connection parameters. 
Since you already added the new monitors, there's not much else you can 
do. But I would have suggested to rather reinstall the MONs instead of 
adding new ones as Tyler already stated.


Am 29.11.24 um 13:19 schrieb Tyler Stachecki:

On Fri, Nov 29, 2024, 5:33 AM Adrien Georget 
wrote:


Hello,

We are using Ceph as a storage backend for Openstack (Cinder, Nova,
Glance, Manila) and we are replacing old hardware hosting Ceph monitors
(MON,MGR,MDS) to new ones.
I have already added the new ones in production, monitors successfully
joined the quorum and new MGR/MDS are standby.

For the monitors, I'm sure that the monmap is already up to date and
Openstack clients are already aware of the change and it should not be a
problem when I will next shut down the old monitors.
The ceph.conf will be updated in all Openstack controllers to replace
"mon host" with the new ones before shutting old mons down.

But I have some doubts with the resilience of Openstack Manila service
because the IP addresses of the monitors look hardcoded in the export
location of the manila share :
The manila show command returns for example :

| export_locations | |
|   | path =
134.158.208.140:6789,134.158.208.141:6789,134.158.208.142:6789:/volumes/EC_manila/_nogroup/7a6c05d9-2fea-43b1-a6d4-06eec1e384f2

|
|   | share_instance_id =
7a6c05d9-2fea-43b1-a6d4-06eec1e384f2 |


Has anyone already done this kind of migration in the past and can
confirm my doubts?
Is there any process to update shares?

Cheers,
Adrien
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


I can't speak for Manila, but for Cinder/Glance/Nova this is a bit of a
headache. Unfortunately, the mon IPs get hard coded there as well, both in
the database and in the libvirt XML. Go to any nova-compute node with a
Ceph-backed Cinder volume (or Nova image, including config drives) attached
to it and run `virsh dumpxml ` and you'll see it.

Unfortunately, changing all of the mon IPs will result in the situation
where you can neither live-migrate your VMs nor will you be able to
start/hard reboot VMs until volumes are detached and attached with the new
monitor IPs.

The only way we found around this with zero downtime was to rebuild _some_
of the ceph-mons with new IPs, and then leverage some custom patches (which
I can share) that rewrite the libvirt and database info during a
live-migration (so, in essence, we had to live-migrate each VM once in
order to pull this off) with the new set of intended mon IPs (not the ones
currently in ceph.conf).

If you don't require live-migration or don't use it, you can probably get
away with just doing some database updates (carefully!). The VMs do observe
monmap changes at runtime like any other RADOS client - it's only when you
try to perform control plane actions against them that it becomes a
problem, because it'll use the mon IPs in the database (which are old) and
not from ceph.conf in that case.

Thanks,
Tyler
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nfs-ganesha 5 changes

2024-11-29 Thread P Wagner-Beccard
Syntax errors on the config?
Try to start manually with -x to be sure
what does the journal log has to say?
https://github.com/nfs-ganesha/nfs-ganesha/issues/730

Release notes:
https://github.com/nfs-ganesha/nfs-ganesha/wiki/ReleaseNotes_5




On Thu, 28 Nov 2024 at 12:35, Marc  wrote:

> >
> > In my old environment I have simple nfs-ganesha export like this, which
> > is sufficent and mounts.
> >
> > EXPORT {
> > Export_Id = 200;
> > Path = /backup;
> > Pseudo = /backup;
> > FSAL { Name = CEPH; Filesystem = ""; User_Id =
> > "cephfs..bakup"; Secret_Access_Key = "x=="; }
> > Disable_ACL = FALSE;
> > CLIENT { Clients = 192.168.11.200; access_type = "RW"; }
> > CLIENT { Clients = *; Access_Type = NONE; }
> > }
> >
> > In the new ganesha 5 I am getting these errors. Don't really get why it
> > wants to create a pool
> >
> > rados_kv_connect :CLIENT ID :EVENT :Failed to create pool: -34
> > rados_ng_init :CLIENT ID :EVENT :Failed to connect to cluster: -34
> > main :NFS STARTUP :CRIT :Recovery backend initialization failed!
> >
> > cephfs kernel mount with this userid is ok. User only has access to this
> > dir.
> >
> > Anyone an idea what config I need to update?
>
> I missed this, check later what it is.
> #RecoveryBackend = rados_ng;
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: nfs-ganesha 5 changes

2024-11-29 Thread Marc


This is new, and on in the default config. I am currently running without it 
just fine. I guess it is storing info for when you have other cluster nodes.

RecoveryBackend = rados_ng;

> 
> 
> Syntax errors on the config?
> Try to start manually with -x to be sure
> what does the journal log has to say?
> https://github.com/nfs-ganesha/nfs-ganesha/issues/730
> 
> Release notes:
> https://github.com/nfs-ganesha/nfs-ganesha/wiki/ReleaseNotes_5
> 
> 
> 
> 
> 
> On Thu, 28 Nov 2024 at 12:35, Marc   > wrote:
> 
> 
>   >
>   > In my old environment I have simple nfs-ganesha export like this,
> which
>   > is sufficent and mounts.
>   >
>   > EXPORT {
>   > Export_Id = 200;
>   > Path = /backup;
>   > Pseudo = /backup;
>   > FSAL { Name = CEPH; Filesystem = ""; User_Id =
>   > "cephfs..bakup"; Secret_Access_Key = "x=="; }
>   > Disable_ACL = FALSE;
>   > CLIENT { Clients = 192.168.11.200; access_type = "RW"; }
>   > CLIENT { Clients = *; Access_Type = NONE; }
>   > }
>   >
>   > In the new ganesha 5 I am getting these errors. Don't really get
> why it
>   > wants to create a pool
>   >
>   > rados_kv_connect :CLIENT ID :EVENT :Failed to create pool: -34
>   > rados_ng_init :CLIENT ID :EVENT :Failed to connect to cluster: -
> 34
>   > main :NFS STARTUP :CRIT :Recovery backend initialization failed!
>   >
>   > cephfs kernel mount with this userid is ok. User only has access
> to this
>   > dir.
>   >
>   > Anyone an idea what config I need to update?
> 
>   I missed this, check later what it is.
>   #RecoveryBackend = rados_ng;
>   ___
>   ceph-users mailing list -- ceph-users@ceph.io  us...@ceph.io>
>   To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-29 Thread Frédéric Nass
- Le 29 Nov 24, à 12:03, Igor Fedotov  a écrit : 

> Hi Frederic,

>>My question was more about why bluefs would still fail to allocate 4k chunks
>>after being allowed to do so by [ https://tracker.ceph.com/issues/53466 |
> >https://tracker.ceph.com/issues/53466 ] (John's case with v17.2.6 actually)
> My hypothesis is that it's facing real "no-space" case not the one I explained
> above. But again - we haven't performed thorough analysis of John's case, so 
> we
> are just speculating...

Yep. Ok. 

>>Is BlueFS aware of the remaining space and maybe using some sort of reserved
> >blocks/chunks like other filesystems to handle full/near null situations ?

> What remaining space are you talking here - if the one consisting of small
> (<64K) extents only then BlueFS hopefully uses it since v17.2.6.

I meant every unallocated chunks would they be contigus or not. As far as 
allocating 4k chunks (is chunk the right work here, I'm not sure) is concerned, 
one would expect it should be able to allocate 4k chunks multiplied by the 
number of unallocated chunks, I think. 

> Reserving more spare space at Bluefs on mkfs?

Yep. That's what I had in mind talking "reserved blocks/chunks" above. 

> Previously at Bluestore team we had discussions on something like that to 
> permit
> easier recovery from "no-space" cases. No final solution has been taken yet 
> and
> in fact this provides no complete fix for the issue anyway - that spare space
> might end at some point as well...

Yeah. 

> >If so, then it should never crash, right?
> What we have here is not a real crash (although looks like that) - it's an
> expected assertion. We just don't have good enough [automatic] scenario to 
> exit
> from this state. Hence "crying" about that aloud.

> The problem is that by its design RocksDB has to write out some data (either 
> due
> to the need for internal maintenance or to fulfill client requests) on any
> update access. So at some point we have no space to record such a transaction.
> How one can proceed in that case - refuse to execute it and return an error?
> OK, but what's next? Any followup data removal would need some transactions to
> be recorded in RocksDB as well. And DB/FS would need more space for that. Use
> some reserved spare space? But that's conceptually similar to stalling OSD
> writes at some free space threshold we already have at OSD level - as you can
> see this protection doesn't work from time to time.

Maybe the minimal spare space mentioned above (lets say 1-3%) could provide 
enough "time" and "space" for internal sanitization tasks only and let the OSD 
start and respond to tasks that would only free up some data/metadata (if 
possible) and prioritize any task that would free up some space before 
recording the transactions. Like 'I know I'm full like an egg, I receive a new 
request, will this request allow me to free up some space? Yes --> I will 
proceed with the request. No --> I'll refuse the request'. 

Maybe that would allow the admin to 1/ increases near full / full ratios, 2/ 
boot up all OSDs, 3/ remove a few snapshots and get back on track. I don't 
know... I'm just speculating here with no in-depth knowledge of bluestore 
internals. 
You guys certainly thought about this multiple times. :-) 

> So IMO it's better polish existing protection means then.

For sure. 

> Not to mention - I'm pretty sure that someone would abuse additional spare 
> space
> mechanics if any and finally face the same problem at some point. Thus
> triggering another iteration of it.. ;)

I see what you mean. :-) 

That, or you make this part of the code so cryptic that no one can notice 
there's unused space here. :-)) 

Cheers, 
Frédéric. 

> Thanks,
> Igor

> On 29.11.2024 10:20, Frédéric Nass wrote:

>> Hi Igor,

>> Thank you for taking the time to explains the fragmentation issue. I had 
>> figured
>> out the most part of it by reading the tracker and the PR but it's always
>> clearer when you explain it.

>> My question was more about why bluefs would still fail to allocate 4k chunks
>> after being allowed to do so by [ https://tracker.ceph.com/issues/53466 |
>> https://tracker.ceph.com/issues/53466 ] (John's case with v17.2.6 actually)

>> Is BlueFS aware of the remaining space and maybe using some sort of reserved
>> blocks/chunks like other filesystems to handle full/near null situations ? If
>> so, then it should never crash, right?
>> Like other filesystems don't crash, drives's firmwares dont crash, etc.

>> Thanks,
>> Frédéric.

>> - Le 28 Nov 24, à 12:52, Igor Fedotov [ mailto:igor.fedo...@croit.io |
>>  ] a écrit :

>>> Hi Frederic,

>>> here is an overview of the case when BlueFS ıs unable to allocate more 
>>> space at
>>> main/shared device albeıt free space is available. Below I'm talking about
>>> stuff exısted before fıxıng [ https://tracker.ceph.com/issues/53466 |
>>> https://tracker.ceph.com/issues/53466 ] .

>>> First of al - BlueFS's minimal allocation unit for shared device 

[ceph-users] Re: classes crush rules new cluster

2024-11-29 Thread Eugen Block

Andre,

see responses inline.

Zitat von Andre Tann :


Ahoi Eugen,

Am 29.11.24 um 11:31 schrieb Eugen Block:

step set_chooseleaf_tries 5 -> stick to defaults, usually works  
(number of max attempts to find suitable OSDs)


Why do we need more than one attempt to find an OSD? Why is the  
result different if we walk through a rule more than once?


There have been cases with a large number of OSDs where crush "gave up  
too soon". Although I haven't read about that in quite a while, it may  
or may not still be an issue.


step take default class test -> "default" is the usual default  
crush root (check 'ceph osd tree'), you can specify other roots if  
you have them


where are these classes defined? Or is "default class test" the name  
of a root? Most probably not.


You define those classes. By default, Ceph creates a "default" entry  
point into the crush tree of type "root":


ceph osd tree | head -2
ID  CLASS  WEIGHT   TYPE NAMESTATUS  REWEIGHT  PRI-AFF
-1 0.14648  root default

You can create multiple roots with arbitrary names. Those roots can be  
addressed in crush rules. Before there were device classes, users  
split their trees into multiple roots, for example one for HDD, one  
for SSD devices.



Could I also say step take default type host?


I haven't tried that, I would assume that the entry point still has to  
be a bucket of type "root". I encourage you to play around in a lab  
cluster to get familiar with crushmaps and especially the crushtool,  
you'll benefit from it.



What are the keywords that are allowed after the root's name?


Fair question, I'm only aware of "class XYZ", so the device classes. I  
haven't checked in detail though.


step chooseleaf indep 0 type host -> within bucket "root" (from  
"step take default") choose {pool-num-replicas} hosts


What if I did exactly this, but have nested fault domains (e.g.  
racks > hosts)? Would the rule then pick {pool-num-replicas} hosts  
out of different racks, even though this rule doesn't mention racks  
anywhere?


Since I don't have racks in my lab cluster, I don't specify them. You  
need to modify your rule(s) according to your infrastructure, my  
example was just a simple one from one of my lab clusters.


But what if I have size=4, but only two racks, would the picked  
hosts spread evenly across the two racks, or randomly, like 1 host  
in one rack, 3 in the other, or all 4 in one rack?


You can (and most likely will) end up with the random result if you  
don't specifically tell crush what to do.



Assume a pool with size=4, could I say

  step take default
  choose firstn 1 type row
  choose firstn 3 type racks
  chooseleaf firstn 0 type host

Meaning:
- force all chunks of a pg in one row
- force all chunks in exactly three racks inside this row
- out of these three racks, pick 4 hosts

I don't want to say that the latter makes much sense, I just wonder  
if it would work that way.


I think it would, but again, give it a try. You can create "virtual"  
rows and racks, just add the respective buckets to the crushmap (of  
your test cluster):


ceph osd crush add-bucket row1 row root=default
added bucket row1 type row to location {root=default}

ceph osd crush add-bucket rack1 rack row=row1
added bucket rack1 type rack to location {row=row1}

ceph osd crush add-bucket rack2 rack row=row1
added bucket rack2 type rack to location {row=row1}

ceph osd crush add-bucket rack3 rack row=row1
added bucket rack3 type rack to location {row=row1}

Then move some of your hosts into the racks with ceph 'osd crush  
move...' and test your crush rules.





--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-29 Thread Martin Konold
Hi,the traditional solution is to deny anything but deletions and either write the transaction log to another device or even Filesystem or add support deletions without a transaction log together with a force switch.Regards --martinAm 29.11.2024 12:03 schrieb Igor Fedotov :Hi Frederic,

 >My question was more about why bluefs would still fail to allocate 4k 
chunks after being allowed to do so by 
https://tracker.ceph.com/issues/53466 (John's case with v17.2.6 actually)

My hypothesis is that it's facing real "no-space" case not the one I 
explained above. But again  - we haven't performed thorough analysis of 
John's case, so we are just speculating...

 >Is BlueFS aware of the remaining space and maybe using some sort of 
reserved blocks/chunks like other filesystems to handle full/near null 
situations ?

What remaining space are you talking here - if the one consisting of 
small (<64K) extents only then BlueFS hopefully uses it since v17.2.6.
Reserving more spare space at Bluefs on mkfs? Previously at Bluestore 
team we had discussions on something like that to permit easier recovery 
from "no-space" cases. No final solution has been taken yet and in fact 
this provides no complete fix for the issue anyway - that spare space 
might end at some point as well...

 >If so, then it should never crash, right?
What we have here is not a real crash (although looks like that) - it's 
an expected  assertion. We just don't have good enough [automatic] 
scenario to exit from this state. Hence "crying" about that aloud.

The problem is  that by its design RocksDB has to write out some data 
(either due to the need for internal maintenance or to fulfill client 
requests) on any update access. So at some point we have no space to 
record such a transaction. How one can proceed in that case - refuse to 
execute it and return an error? OK, but what's next? Any followup data 
removal would need some transactions to be recorded in RocksDB as well. 
And DB/FS would need more space for that. Use some reserved spare space? 
But that's conceptually similar to stalling OSD writes at some free 
space threshold we already have at OSD level - as you can see this 
protection doesn't work from time to time.
So IMO it's better polish existing protection means then.

Not to mention - I'm pretty sure that someone would abuse additional 
spare space mechanics if any and finally face the same problem at some 
point. Thus triggering another iteration of  it.. ;)

Thanks,
Igor


On 29.11.2024 10:20, Frédéric Nass wrote:
> Hi Igor,
>
> Thank you for taking the time to explains the fragmentation issue. I 
> had figured out the most part of it by reading the tracker and the PR 
> but it's always clearer when you explain it.
>
> My question was more about why bluefs would still fail to allocate 4k 
> chunks after being allowed to do so by 
> https://tracker.ceph.com/issues/53466 (John's case with v17.2.6 actually)
>
> Is BlueFS aware of the remaining space and maybe using some sort of 
> reserved blocks/chunks like other filesystems to handle full/near null 
> situations ? If so, then it should never crash, right?
> Like other filesystems don't crash, drives's firmwares dont crash, etc.
>
> Thanks,
> Frédéric.
>
> - Le 28 Nov 24, à 12:52, Igor Fedotov  a 
> écrit :
>
> Hi Frederic,
>
> here is an overview of the case when BlueFS ıs unable to allocate
> more space at main/shared device albeıt free space is available.
> Below I'm talking about stuff exısted before fıxıng
> https://tracker.ceph.com/issues/53466.
>
> First of al - BlueFS's minimal allocation unit for shared device
> was bluefs_shared_alloc_size (=64K by default). Which means that
> it was unable to use e.g. 2x32K or 16x4K chunks when it needed
> additional 64K bytes.
>
> Secondly - sometimes RocksDB performs recovery - and some other
> maintenance tasks that require space allocation - on startup.
> Which evidently triggers allocation of N*64K chunks from shared
> device.
>
> Thirdly - a while ago we switched to 4K chunk allocations for user
> data (please not confuse with BlueFS allocation). Which
> potentially could result ın specific free space fragmentation
> pattern when there ıs limited (or even empty) set of long (>=64K)
> chunks free. Still technically having enough free space available.
> E.g. free extent list could look like (off~len, both in hex):
>
> 0x0~1000, 0x2000~1000, 0x4000~2000, 0x1~4000, 0x2000~1000, etc...
>
> In that case original BlueFS allocator implementation was unable
> to locate more free space which in turn was effectively breaking
> both RockDB and OSD boot up.
>
> One should realize that the above free space fragmentation depends
> on a bunch of factors, none of which is absolutely dominating:
>
> 1. how user write/remove objects
>
>   

[ceph-users] Dump/Add users yaml/json

2024-11-29 Thread Albert Shih
Hi everyone,

Stupid questionafter some test I was able to dump a user caps with 

  ceph auth get --format json

but I'wasn't able to find the other way, something 

  ceph auth add fubar.json 

Is they are any way to add a user (without given a key and with a key). 

Regards
-- 
Albert SHIH 🦫 🐸
Observatoire de Paris
France
Heure locale/Local time:
ven. 29 nov. 2024 18:12:35 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Additional rgw pool

2024-11-29 Thread Anthony D'Atri
Absolutely.

You define a placement target and storage class in the zone / zonegroup, commit 
the period, and create/modify the users.  New buckets they create will then go 
to the secondary storage class.  Clients can also specify a storage class in 
their request headers, and you can also force the issue with an ingest Lua 
script.

https://docs.ceph.com/en/latest/radosgw/placement/


https://www.youtube.com/watch?v=m0Ok5X2I5Ps



> On Nov 29, 2024, at 5:55 AM, Rok Jaklič  wrote:
> 
> Hi,
> 
> we are already running the "default" rgw pool with some users.
> 
> Data is stored in pool:
> pool 9 'default.rgw.buckets.data' erasure profile ec-32-profile size 5
> min_size 4 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512
> autoscale_mode on last_change 309346 lfor 0/127784/214408 flags
> hashpspool,ec_overwrites,backfillfull stripe_width 12288 application rgw
> 
> Is it possible to create another rgw pool with diferent ec profile and
> associate those pools to specific s3 users?
> 
> Kind regards,
> Rok
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: new cluser ceph osd perf = 0

2024-11-29 Thread Linas Vepstas
For me, the perf stats are non-zero only for those OSD's that are
currently writing. The others that are idle/reading show zero.  (I
have a recovery going on, lots of PG's being moved to two new disks.
The two new ones have stats, all the others show zero.)

-- linas

On Fri, Nov 29, 2024 at 3:10 AM Eugen Block  wrote:
>
> I tried to get the counters, then I was pointed to enabling the module:
>
> # ceph osd perf
> osd  commit_latency(ms)  apply_latency(ms)
>   11   0  0
>8   0  0
>6   0  0
>1   0  0
>0   0  0
>2   0  0
>3   0  0
>4   0  0
>5   0  0
>
> # ceph osd perf counters get 0
> Error ENOTSUP: Module 'osd_perf_query' is not enabled/loaded (required
> by command 'osd perf counters get'): use `ceph mgr module enable
> osd_perf_query` to enable it
>
> After enabling it, I get values:
>
> # ceph osd perf
> osd  commit_latency(ms)  apply_latency(ms)
>   11 184184
>8   8  8
>6  17 17
>1   4  4
>0   3  3
>2  72 72
>3   9  9
>4  33 33
>5 166166
>
> # ceph osd perf query add all_subkeys
>
> added query all_subkeys with id 0
>
> # ceph osd perf counters get 0
> ++--+-+---++---+--+-+---+--+
> |  CLIENT_ID |   CLIENT_ADDRESS |POOL_ID  |NAMESPACE
> |OSD_ID  |PG_ID  |OBJECT_NAME   |SNAP_ID  |WRITE_OPS
> |READ_OPS  |
> ++--+-+---++---+--+-+---+--+
> |client.2624154  | IP:0/47925986   |   18|   |  11|
> 18.0  |data_loggenerations_metadata  |  head   |0  |   6  |
> |client.2624154  | IP:0/47925986   |   19|   |  2 |
> 19.1  |  notify.0|  head   |0  |   6  |
> ...
>
> Zitat von Janne Johansson :
>
> > I see the same on a newly deployed 17.2.8 cluster.
> > all empty perf values.
> >
> > Den tors 28 nov. 2024 kl 23:45 skrev Marc :
> >>
> >>
> >>
> >> My ceph osd perf are all 0, do I need to enable module for this?
> >> osd_perf_query? Where should I find this in manuals? Or do I just
> >> need to wait?
> >>
> >>
> >> [@ target]# ceph osd perf
> >> osd  commit_latency(ms)  apply_latency(ms)
> >>  25   0  0
> >>  24   0  0
> >>  23   0  0
> >>  22   0  0
> >>  21   0  0
> >>  20   0  0
> >>  19   0  0
> >>  18   0  0
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> >
> > --
> > May the most significant bit of your life be positive.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: classes crush rules new cluster

2024-11-29 Thread Anthony D'Atri
Or, just reassign one existing OSD to the new class. 

>> Note that testing this rule with crushtool won't work here since the
>> fake OSD isn't assigned to a hosts.

> But what's the point in having a rule without the corresponding
> devices? You won't be able to create a pool with that rule anyway
> until the OSDs are present.

There is that.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: classes crush rules new cluster

2024-11-29 Thread Marc
> 
> Or, just reassign one existing OSD to the new class.
> 
> >> Note that testing this rule with crushtool won't work here since the
> >> fake OSD isn't assigned to a hosts.
> 
> > But what's the point in having a rule without the corresponding
> > devices? You won't be able to create a pool with that rule anyway
> > until the OSDs are present.
> 
> There is that.

Yes indeed so you can prepare creating all rules and pools before you have 
added the osds. Especially handy if you have a few shell commands you paste.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Squid: deep scrub issues

2024-11-29 Thread Laimis Juzeliūnas
Hi Anthony,
No we dont have any hours set - scrubbing happens at all times. The only thing 
we changed from default and kept was increasing osd_max_scrubs to 5 to try and 
catch up. Other than that it was just expanding the window of scrubbing 
intervals as pgs not deep-scrubbed in time alerts kept hitting us.
And yes, there are some pgs taking 20+ days to complete deep scrubs - thats 
visible in the pg dump with entries like "deep scrubbing for 1871733s". They do 
complete eventually though. Most of pgs take 2 to 5-7 days for deep scrubbing 
to finish.

I'll try reducing osd_scrub_chunk_max from 25 to 15 as suggested by Frédéric 
and see if that can help solving this.


Thanks,
Laimis J.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Squid: deep scrub issues

2024-11-29 Thread Laimis Juzeliūnas
Hi all, sveikas,

Thanks everyone for the tips and trying to help out! 
I've eventually raised a bug tracker for the case to get more developers 
involved: https://tracker.ceph.com/issues/69078

We tried decreasing osd_scrub_chunk_max from 25 to 15 as per Frédéric 
suggestion, but unfortunately did not observe any signs relief. One Squid user 
in reddit community thread confirmed the same after decreasing - no results. 
There are more users there in the thread that tried out various cluster 
configuration tunings, including osd_mclock_profile with high_recovery_ops but 
still no one managed to get any good results.

Our scrub cycle runs 24/7 with no time windows/schedules therefore no 
possibilities of queue buildups due to time constrains.
And yes - our longest running pg now is 23 days in the deep scrub (and still 
counting).


Laimis J.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Squid: deep scrub issues

2024-11-29 Thread Laimis Juzeliūnas
Hi Frédéric,

Thanks for pointing out! I see we have 25 set for osd_scrub_chunk_max 
(default). 
I will try reducing it back to 15 and see if that helps this case.


Regards,
Laimis J.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Issue creating LVs within cephadm shell

2024-11-29 Thread Ed Krotee
Trying to create the Block and Block.DB devices for BlueStore. Within the 
cephadm shell able to run the vgcreate but get the following errors and we 
cannot see the vg device in /dev so doesn't seem to actually create the vg, but 
vgs within the cephadm shell shows it, but vgs at the o/s level does not. FYI - 
SE Linux is disabled.

stdout: Physical volume "/dev/sda" successfully created.

Not creating system devices file due to existing VGs.

stdout: Volume group "ceph-b80b7206-0c2e-4770-9895-51077b1d59d4" successfully 
created

Running command: lvcreate --yes -l 5245439 -n 
osd-block-b992b707-c77a-412d-9286-3b3ec1d8b3e9 
ceph-b80b7206-0c2e-4770-9895-51077b1d59d4

stderr: 
/dev/ceph-b80b7206-0c2e-4770-9895-51077b1d59d4/osd-block-b992b707-c77a-412d-9286-3b3ec1d8b3e9:
 not found: device not cleared

Aborting. Failed to wipe start of new LV.

--> Was unable to complete a new OSD, will rollback changes

Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.0 
--yes-i-really-mean-it

stderr: purged osd.0

--> RuntimeError: Unable to find any LV for zapping OSD: 0

[ceph: root@ritcephstrdata09 /]# lvcreate --yes -l 5245439 -n 
osd-block-b992b707-c77a-412d-9286-3b3ec1d8b3e9 
ceph-b80b7206-0c2e-4770-9895-51077b1d59d4

  
/dev/ceph-b80b7206-0c2e-4770-9895-51077b1d59d4/osd-block-b992b707-c77a-412d-9286-3b3ec1d8b3e9:
 not found: device not cleared

Aborting. Failed to wipe start of new LV.



Any thoughts?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io