[ceph-users] Re: Ceph rbd mirror journal pool

2022-04-04 Thread Eugen Block

Hi samuel,

I haven't used dedicated rbd journal pools so I don't have any comment  
on that. But there's an alternative to journal-based mirroring, you  
can also mirror based on snapshot [1]. Would this be an alternative  
for you to look deeper into?


Regards,
Eugen

[1] https://docs.ceph.com/en/latest/rbd/rbd-mirroring/


Zitat von huxia...@horebdata.cn:


Dear Cephers,

Enabling Ceph mirroring means double writes on the same data pool,  
thus possibly degrading the write performance dramatically. By  
google searching, i found the following words   (apparently appeared  
several year ago)


"The rbd CLI allows you to use the "--journal-pool" argument when  
creating, copying, cloning, or importing and image with journaling  
enabled. You can also specify the journal data pool when dynamically  
enabling the journaling feature using the same argument. Finally,  
there is a Ceph config setting of "rbd journal pool = XYZ" that  
allows you to default new journals to a specific pool."


Does anyone really have experience of using a specific SSD/NVMe pool  
to offload journaling workload from the data pool for RBD Mirroring  
disaster recovery? what would the best practice for improving  
performance of rbd mirroring (large amounts of data)?


best regards,

samuel




huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: can't deploy osd/db on nvme with other db logical volume

2022-04-04 Thread Eugen Block

Hi,

this is handled by ceph-volume, do you find anything helpful in  
/var/log/ceph//ceph-volume.log? Also check the cephadm.log  
for any hints.



Zitat von 彭勇 :


we have a running ceph, 16.2.7, with SATA OSD and DB on nvme.
and we insert some SATA to host, and the status of new host is AVAILABLE.
then we apply the osd-spec.yml, it can't create the OSD automatically.

# ceph orch device ls
HOST PATH  TYPE  DEVICE ID
   SIZE  AVAILABLE  REJECT REASONS
h172-18-100-100  /dev/nvme0n1  ssd   INTEL SSDPF2KX038TZ_PHAC1036009Z3P8AGN
 3840G LVM detected, locked
h172-18-100-100  /dev/sdb  hdd   ST16000NM000G-2K_ZL2CB8ZR
  16.0T Insufficient space (<10 extents) on vgs, LVM detected,
locked
h172-18-100-100  /dev/sdc  hdd   ST16000NM000G-2K_ZL2CB0J2
  16.0T Insufficient space (<10 extents) on vgs, LVM detected,
locked
h172-18-100-100  /dev/sdd  hdd   ST16000NM000G-2K_ZL2CBFSF
  16.0T Insufficient space (<10 extents) on vgs, LVM detected,
locked
h172-18-100-100  /dev/sde  hdd   ST16000NM000G-2K_ZL2CAYQB
  16.0T Insufficient space (<10 extents) on vgs, LVM detected,
locked
h172-18-100-100  /dev/sdf  hdd   ST16000NM000G-2K_ZL2CBEMC
  16.0T  Yes
h172-18-100-100  /dev/sdg  hdd   ST16000NM000G-2K_ZL2C427J
  16.0T  Yes
h172-18-100-100  /dev/sdh  hdd   ST16000NM000G-2K_ZL2CAZCZ
  16.0T  Yes
h172-18-100-100  /dev/sdi  hdd   ST16000NM000G-2K_ZL2CBM7M
  16.0T  Yes



osd-spec.yml:

service_type: osd
service_id: osd-spec
placement:
host_pattern: '*'
spec:
objectstore: bluestore
block_db_size: 7301032
data_devices:
rotational: 1
db_devices:
rotational: 0

ceph orch apply osd -i osd-spec.yml --dry-run




--
Peng Yong
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG down, due to 3 OSD failing

2022-04-04 Thread Fulvio Galeazzi

Hi again Dan!
Things are improving, all OSDs are up, but still that one PG is down. 
More info below.


On 4/1/22 19:26, Dan van der Ster wrote:

Here is the output of "pg 85.12 query":
  https://pastebin.ubuntu.com/p/ww3JdwDXVd/
and its status (also showing the other 85.XX, for reference):


This is very weird:

  "up": [
  2147483647,
  2147483647,
  2147483647,
  2147483647,
  2147483647
  ],
  "acting": [
  67,
  91,
  82,
  2147483647,
  112
  ],


Meanwhile, since a random PG still shows an output like the above one, I 
think I found the problem with the crush rule: it syas "choose" rather 
than "chooseleaf"!


rule csd-data-pool {
id 5
type erasure
min_size 3
max_size 5
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class big
step choose indep 0 type host<--- HERE!
step emit
}

...relic of a more complicated, two-step rule... sigh!


PGs are active if at least 3 shards are up.
Our immediate goal remains to get 3 shards up for PG 85.25 (I'm
assuming 85.25 remains the one and only PG which is down?)


Yes, 85.25 is still the single 'down' PG.


pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 616460 flags
hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288
application rbd


Yup okay, we need to fix that later to make this cluster correctly
configured. To be followed up.


At some point, need to update min_size to 4.


If I understand correctly, it should now be safe (but I will wait for
your green light) to repeat the same for:
osd.121 chunk 85.11s0
osd.145 chunk 85.33s0
   so they can also start.


Yes, please go ahead and do the same.
I expect that your PG 85.25 will go active as soon as both those OSDs
start correctly.


Hmmm, unfortunately not. All OSDs are up, but 85.25 is still down.
Its chunks are in:

85.25s0: osd.64
85.25s1: osd.140 osd.159
85.25s2: osd.96
85.25s3: osd.121 osd.176
85.25s4: osd.159 osd.56


BTW, I also noticed in your crush map below that the down osds have
crush weight zero!
So -- this means they are the only active OSDs for a PG, and they are
all set to be drained.
How did this happen? It is also surely part of the root cause here!

I suggest to reset the crush weight of those back to what it was
before, probably 1 ?


At some point I changed those weight to 0., but this was well after the 
beginning of the problem: this helped, at least, healing a lot of 
degraded/undersized.



After you have all the PGs active, we need to find out why their "up"
set is completely bogus.
This is evidence that your crush rule is broken.
If a PG doesn't have an complete "up" set, then it can never not be
degraded -- the PGs don't know where to go.


Do you think the choose-chooseleaf issue mentioned above, could be the 
culprit?



I'm curious about that "storage" type you guys invented.


Oh, nothing too fancy... foreword, we happen to be using (and are 
currently finally replacing) hardware (based on FiberChannel-SAN) which 
is not the first choice in the Ceph world: but purchase happened before 
we turned to Ceph as our storage solution. Each OSD server has access to 
2 such distinct storage systems, hence the idea to describe these 
failure domains in the crush rule.



Could you please copy to pastebin and share the crush.txt from

ceph osd getcrushmap -o crush.map
crushtool -d crush.map -o crush.txt


Here it is:
https://pastebin.ubuntu.com/p/THkcT6xNgC/


Sure! Here it is. For historical reasons there are buckets of type
"storage" which however you can safely ignore as they are no longer
present in any crush_rule.


I think they may be relevant, as mentioned earlier.


Please also don't worry about the funny weights, as I am preparing for
hardware replacemente and am freeing up space.


As a general rule, never drain osds (never decrease their crush
weight) when any PG is degraded.
You risk deleting the last copy of a PG!


--
Fulvio Galeazzi
GARR-CSD Department
tel.: +39-334-6533-250
skype: fgaleazzi70
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG down, due to 3 OSD failing

2022-04-04 Thread Dan van der Ster
Hi Fulvio,

Yes -- that choose/chooseleaf thing is definitely a problem.. Good catch!
I suggest to fix it and inject the new crush map and see how it goes.


Next, in your crush map for the storage type, you have an error:

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
type 11 storage

The *order* of types is very important in crush -- they must be nested
in the order they appear in the tree. "storage" should therefore be
something between host and osd.
If not, and if you use that type, it can break things.
But since you're not actually using "storage" at the moment, it
probably isn't causing any issue.

So -- could you go ahead with that chooseleaf fix then let us know how it goes?

Cheers, Dan





On Mon, Apr 4, 2022 at 10:01 AM Fulvio Galeazzi  wrote:
>
> Hi again Dan!
> Things are improving, all OSDs are up, but still that one PG is down.
> More info below.
>
> On 4/1/22 19:26, Dan van der Ster wrote:
>  Here is the output of "pg 85.12 query":
>    https://pastebin.ubuntu.com/p/ww3JdwDXVd/
>  and its status (also showing the other 85.XX, for reference):
> >>>
> >>> This is very weird:
> >>>
> >>>   "up": [
> >>>   2147483647,
> >>>   2147483647,
> >>>   2147483647,
> >>>   2147483647,
> >>>   2147483647
> >>>   ],
> >>>   "acting": [
> >>>   67,
> >>>   91,
> >>>   82,
> >>>   2147483647,
> >>>   112
> >>>   ],
>
> Meanwhile, since a random PG still shows an output like the above one, I
> think I found the problem with the crush rule: it syas "choose" rather
> than "chooseleaf"!
>
> rule csd-data-pool {
>  id 5
>  type erasure
>  min_size 3
>  max_size 5
>  step set_chooseleaf_tries 5
>  step set_choose_tries 100
>  step take default class big
>  step choose indep 0 type host<--- HERE!
>  step emit
> }
>
> ...relic of a more complicated, two-step rule... sigh!
>
> > PGs are active if at least 3 shards are up.
> > Our immediate goal remains to get 3 shards up for PG 85.25 (I'm
> > assuming 85.25 remains the one and only PG which is down?)
>
> Yes, 85.25 is still the single 'down' PG.
>
> >> pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5
> >> object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
> >> last_change 616460 flags
> >> hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288
> >> application rbd
> >
> > Yup okay, we need to fix that later to make this cluster correctly
> > configured. To be followed up.
>
> At some point, need to update min_size to 4.
>
> >> If I understand correctly, it should now be safe (but I will wait for
> >> your green light) to repeat the same for:
> >> osd.121 chunk 85.11s0
> >> osd.145 chunk 85.33s0
> >>so they can also start.
> >
> > Yes, please go ahead and do the same.
> > I expect that your PG 85.25 will go active as soon as both those OSDs
> > start correctly.
>
> Hmmm, unfortunately not. All OSDs are up, but 85.25 is still down.
> Its chunks are in:
>
> 85.25s0: osd.64
> 85.25s1: osd.140 osd.159
> 85.25s2: osd.96
> 85.25s3: osd.121 osd.176
> 85.25s4: osd.159 osd.56
>
> > BTW, I also noticed in your crush map below that the down osds have
> > crush weight zero!
> > So -- this means they are the only active OSDs for a PG, and they are
> > all set to be drained.
> > How did this happen? It is also surely part of the root cause here!
> >
> > I suggest to reset the crush weight of those back to what it was
> > before, probably 1 ?
>
> At some point I changed those weight to 0., but this was well after the
> beginning of the problem: this helped, at least, healing a lot of
> degraded/undersized.
>
> > After you have all the PGs active, we need to find out why their "up"
> > set is completely bogus.
> > This is evidence that your crush rule is broken.
> > If a PG doesn't have an complete "up" set, then it can never not be
> > degraded -- the PGs don't know where to go.
>
> Do you think the choose-chooseleaf issue mentioned above, could be the
> culprit?
>
> > I'm curious about that "storage" type you guys invented.
>
> Oh, nothing too fancy... foreword, we happen to be using (and are
> currently finally replacing) hardware (based on FiberChannel-SAN) which
> is not the first choice in the Ceph world: but purchase happened before
> we turned to Ceph as our storage solution. Each OSD server has access to
> 2 such distinct storage systems, hence the idea to describe these
> failure domains in the crush rule.
>
> > Could you please copy to pastebin and share the crush.txt from
> >
> > ceph osd getcrushmap -o crush.map
> > crushtool -d crush.map -o crush.txt
>
> Here it is:
> https://pastebin.ubuntu.com/p/THkcT6xNgC/
>
> >> Sure! Here it is. For historical reasons there are buckets of type
> >> "storage" which however you c

[ceph-users] Re: PG down, due to 3 OSD failing

2022-04-04 Thread Dan van der Ster
Could you share the output of `ceph pg 85.25 query`.

Then increase the crush weights of those three osds to 0.1, then check
if the PG goes active.
(It is possible that the OSDs are not registering as active while they
have weight zero).

-- dan


On Mon, Apr 4, 2022 at 10:01 AM Fulvio Galeazzi  wrote:
>
> Hi again Dan!
> Things are improving, all OSDs are up, but still that one PG is down.
> More info below.
>
> On 4/1/22 19:26, Dan van der Ster wrote:
>  Here is the output of "pg 85.12 query":
>    https://pastebin.ubuntu.com/p/ww3JdwDXVd/
>  and its status (also showing the other 85.XX, for reference):
> >>>
> >>> This is very weird:
> >>>
> >>>   "up": [
> >>>   2147483647,
> >>>   2147483647,
> >>>   2147483647,
> >>>   2147483647,
> >>>   2147483647
> >>>   ],
> >>>   "acting": [
> >>>   67,
> >>>   91,
> >>>   82,
> >>>   2147483647,
> >>>   112
> >>>   ],
>
> Meanwhile, since a random PG still shows an output like the above one, I
> think I found the problem with the crush rule: it syas "choose" rather
> than "chooseleaf"!
>
> rule csd-data-pool {
>  id 5
>  type erasure
>  min_size 3
>  max_size 5
>  step set_chooseleaf_tries 5
>  step set_choose_tries 100
>  step take default class big
>  step choose indep 0 type host<--- HERE!
>  step emit
> }
>
> ...relic of a more complicated, two-step rule... sigh!
>
> > PGs are active if at least 3 shards are up.
> > Our immediate goal remains to get 3 shards up for PG 85.25 (I'm
> > assuming 85.25 remains the one and only PG which is down?)
>
> Yes, 85.25 is still the single 'down' PG.
>
> >> pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5
> >> object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
> >> last_change 616460 flags
> >> hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288
> >> application rbd
> >
> > Yup okay, we need to fix that later to make this cluster correctly
> > configured. To be followed up.
>
> At some point, need to update min_size to 4.
>
> >> If I understand correctly, it should now be safe (but I will wait for
> >> your green light) to repeat the same for:
> >> osd.121 chunk 85.11s0
> >> osd.145 chunk 85.33s0
> >>so they can also start.
> >
> > Yes, please go ahead and do the same.
> > I expect that your PG 85.25 will go active as soon as both those OSDs
> > start correctly.
>
> Hmmm, unfortunately not. All OSDs are up, but 85.25 is still down.
> Its chunks are in:
>
> 85.25s0: osd.64
> 85.25s1: osd.140 osd.159
> 85.25s2: osd.96
> 85.25s3: osd.121 osd.176
> 85.25s4: osd.159 osd.56
>
> > BTW, I also noticed in your crush map below that the down osds have
> > crush weight zero!
> > So -- this means they are the only active OSDs for a PG, and they are
> > all set to be drained.
> > How did this happen? It is also surely part of the root cause here!
> >
> > I suggest to reset the crush weight of those back to what it was
> > before, probably 1 ?
>
> At some point I changed those weight to 0., but this was well after the
> beginning of the problem: this helped, at least, healing a lot of
> degraded/undersized.
>
> > After you have all the PGs active, we need to find out why their "up"
> > set is completely bogus.
> > This is evidence that your crush rule is broken.
> > If a PG doesn't have an complete "up" set, then it can never not be
> > degraded -- the PGs don't know where to go.
>
> Do you think the choose-chooseleaf issue mentioned above, could be the
> culprit?
>
> > I'm curious about that "storage" type you guys invented.
>
> Oh, nothing too fancy... foreword, we happen to be using (and are
> currently finally replacing) hardware (based on FiberChannel-SAN) which
> is not the first choice in the Ceph world: but purchase happened before
> we turned to Ceph as our storage solution. Each OSD server has access to
> 2 such distinct storage systems, hence the idea to describe these
> failure domains in the crush rule.
>
> > Could you please copy to pastebin and share the crush.txt from
> >
> > ceph osd getcrushmap -o crush.map
> > crushtool -d crush.map -o crush.txt
>
> Here it is:
> https://pastebin.ubuntu.com/p/THkcT6xNgC/
>
> >> Sure! Here it is. For historical reasons there are buckets of type
> >> "storage" which however you can safely ignore as they are no longer
> >> present in any crush_rule.
> >
> > I think they may be relevant, as mentioned earlier.
> >
> >> Please also don't worry about the funny weights, as I am preparing for
> >> hardware replacemente and am freeing up space.
> >
> > As a general rule, never drain osds (never decrease their crush
> > weight) when any PG is degraded.
> > You risk deleting the last copy of a PG!
>
> --
> Fulvio Galeazzi
> GARR-CSD Department
> tel.: +39-334-6533-250
> skype: fgaleazzi70
___
ceph-u

[ceph-users] Re: PG down, due to 3 OSD failing

2022-04-04 Thread Fulvio Galeazzi

Yesss! Fixing the choose/chooseleaf thing did make the magic.  :-)

  Thanks a lot for your support Dan. Lots of lessons learned from my 
side, I'm really grateful.


  All PGs are now active, will let Ceph rebalance.

  Ciao ciao

Fulvio

On 4/4/22 10:50, Dan van der Ster wrote:

Hi Fulvio,

Yes -- that choose/chooseleaf thing is definitely a problem.. Good catch!
I suggest to fix it and inject the new crush map and see how it goes.


Next, in your crush map for the storage type, you have an error:

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
type 11 storage

The *order* of types is very important in crush -- they must be nested
in the order they appear in the tree. "storage" should therefore be
something between host and osd.
If not, and if you use that type, it can break things.
But since you're not actually using "storage" at the moment, it
probably isn't causing any issue.

So -- could you go ahead with that chooseleaf fix then let us know how it goes?

Cheers, Dan





On Mon, Apr 4, 2022 at 10:01 AM Fulvio Galeazzi  wrote:


Hi again Dan!
Things are improving, all OSDs are up, but still that one PG is down.
More info below.

On 4/1/22 19:26, Dan van der Ster wrote:

Here is the output of "pg 85.12 query":
   https://pastebin.ubuntu.com/p/ww3JdwDXVd/
 and its status (also showing the other 85.XX, for reference):


This is very weird:

   "up": [
   2147483647,
   2147483647,
   2147483647,
   2147483647,
   2147483647
   ],
   "acting": [
   67,
   91,
   82,
   2147483647,
   112
   ],


Meanwhile, since a random PG still shows an output like the above one, I
think I found the problem with the crush rule: it syas "choose" rather
than "chooseleaf"!

rule csd-data-pool {
  id 5
  type erasure
  min_size 3
  max_size 5
  step set_chooseleaf_tries 5
  step set_choose_tries 100
  step take default class big
  step choose indep 0 type host<--- HERE!
  step emit
}

...relic of a more complicated, two-step rule... sigh!


PGs are active if at least 3 shards are up.
Our immediate goal remains to get 3 shards up for PG 85.25 (I'm
assuming 85.25 remains the one and only PG which is down?)


Yes, 85.25 is still the single 'down' PG.


pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5
object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
last_change 616460 flags
hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288
application rbd


Yup okay, we need to fix that later to make this cluster correctly
configured. To be followed up.


At some point, need to update min_size to 4.


If I understand correctly, it should now be safe (but I will wait for
your green light) to repeat the same for:
osd.121 chunk 85.11s0
osd.145 chunk 85.33s0
so they can also start.


Yes, please go ahead and do the same.
I expect that your PG 85.25 will go active as soon as both those OSDs
start correctly.


Hmmm, unfortunately not. All OSDs are up, but 85.25 is still down.
Its chunks are in:

85.25s0: osd.64
85.25s1: osd.140 osd.159
85.25s2: osd.96
85.25s3: osd.121 osd.176
85.25s4: osd.159 osd.56


BTW, I also noticed in your crush map below that the down osds have
crush weight zero!
So -- this means they are the only active OSDs for a PG, and they are
all set to be drained.
How did this happen? It is also surely part of the root cause here!

I suggest to reset the crush weight of those back to what it was
before, probably 1 ?


At some point I changed those weight to 0., but this was well after the
beginning of the problem: this helped, at least, healing a lot of
degraded/undersized.


After you have all the PGs active, we need to find out why their "up"
set is completely bogus.
This is evidence that your crush rule is broken.
If a PG doesn't have an complete "up" set, then it can never not be
degraded -- the PGs don't know where to go.


Do you think the choose-chooseleaf issue mentioned above, could be the
culprit?


I'm curious about that "storage" type you guys invented.


Oh, nothing too fancy... foreword, we happen to be using (and are
currently finally replacing) hardware (based on FiberChannel-SAN) which
is not the first choice in the Ceph world: but purchase happened before
we turned to Ceph as our storage solution. Each OSD server has access to
2 such distinct storage systems, hence the idea to describe these
failure domains in the crush rule.


Could you please copy to pastebin and share the crush.txt from

ceph osd getcrushmap -o crush.map
crushtool -d crush.map -o crush.txt


Here it is:
 https://pastebin.ubuntu.com/p/THkcT6xNgC/


Sure! Here it is. For historical reasons there are buckets of type
"storage" which however you can safely ignore as they are no longe

[ceph-users] Re: PG down, due to 3 OSD failing

2022-04-04 Thread Dan van der Ster
Excellent news!
After everything is back to active+clean, don't forget to set min_size to 4 :)

have a nice day

On Mon, Apr 4, 2022 at 10:59 AM Fulvio Galeazzi  wrote:
>
> Yesss! Fixing the choose/chooseleaf thing did make the magic.  :-)
>
>Thanks a lot for your support Dan. Lots of lessons learned from my
> side, I'm really grateful.
>
>All PGs are now active, will let Ceph rebalance.
>
>Ciao ciao
>
> Fulvio
>
> On 4/4/22 10:50, Dan van der Ster wrote:
> > Hi Fulvio,
> >
> > Yes -- that choose/chooseleaf thing is definitely a problem.. Good catch!
> > I suggest to fix it and inject the new crush map and see how it goes.
> >
> >
> > Next, in your crush map for the storage type, you have an error:
> >
> > # types
> > type 0 osd
> > type 1 host
> > type 2 chassis
> > type 3 rack
> > type 4 row
> > type 5 pdu
> > type 6 pod
> > type 7 room
> > type 8 datacenter
> > type 9 region
> > type 10 root
> > type 11 storage
> >
> > The *order* of types is very important in crush -- they must be nested
> > in the order they appear in the tree. "storage" should therefore be
> > something between host and osd.
> > If not, and if you use that type, it can break things.
> > But since you're not actually using "storage" at the moment, it
> > probably isn't causing any issue.
> >
> > So -- could you go ahead with that chooseleaf fix then let us know how it 
> > goes?
> >
> > Cheers, Dan
> >
> >
> >
> >
> >
> > On Mon, Apr 4, 2022 at 10:01 AM Fulvio Galeazzi  
> > wrote:
> >>
> >> Hi again Dan!
> >> Things are improving, all OSDs are up, but still that one PG is down.
> >> More info below.
> >>
> >> On 4/1/22 19:26, Dan van der Ster wrote:
> >> Here is the output of "pg 85.12 query":
> >>https://pastebin.ubuntu.com/p/ww3JdwDXVd/
> >>  and its status (also showing the other 85.XX, for reference):
> >
> > This is very weird:
> >
> >"up": [
> >2147483647,
> >2147483647,
> >2147483647,
> >2147483647,
> >2147483647
> >],
> >"acting": [
> >67,
> >91,
> >82,
> >2147483647,
> >112
> >],
> >>
> >> Meanwhile, since a random PG still shows an output like the above one, I
> >> think I found the problem with the crush rule: it syas "choose" rather
> >> than "chooseleaf"!
> >>
> >> rule csd-data-pool {
> >>   id 5
> >>   type erasure
> >>   min_size 3
> >>   max_size 5
> >>   step set_chooseleaf_tries 5
> >>   step set_choose_tries 100
> >>   step take default class big
> >>   step choose indep 0 type host<--- HERE!
> >>   step emit
> >> }
> >>
> >> ...relic of a more complicated, two-step rule... sigh!
> >>
> >>> PGs are active if at least 3 shards are up.
> >>> Our immediate goal remains to get 3 shards up for PG 85.25 (I'm
> >>> assuming 85.25 remains the one and only PG which is down?)
> >>
> >> Yes, 85.25 is still the single 'down' PG.
> >>
>  pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5
>  object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
>  last_change 616460 flags
>  hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288
>  application rbd
> >>>
> >>> Yup okay, we need to fix that later to make this cluster correctly
> >>> configured. To be followed up.
> >>
> >> At some point, need to update min_size to 4.
> >>
>  If I understand correctly, it should now be safe (but I will wait for
>  your green light) to repeat the same for:
>  osd.121 chunk 85.11s0
>  osd.145 chunk 85.33s0
>  so they can also start.
> >>>
> >>> Yes, please go ahead and do the same.
> >>> I expect that your PG 85.25 will go active as soon as both those OSDs
> >>> start correctly.
> >>
> >> Hmmm, unfortunately not. All OSDs are up, but 85.25 is still down.
> >> Its chunks are in:
> >>
> >> 85.25s0: osd.64
> >> 85.25s1: osd.140 osd.159
> >> 85.25s2: osd.96
> >> 85.25s3: osd.121 osd.176
> >> 85.25s4: osd.159 osd.56
> >>
> >>> BTW, I also noticed in your crush map below that the down osds have
> >>> crush weight zero!
> >>> So -- this means they are the only active OSDs for a PG, and they are
> >>> all set to be drained.
> >>> How did this happen? It is also surely part of the root cause here!
> >>>
> >>> I suggest to reset the crush weight of those back to what it was
> >>> before, probably 1 ?
> >>
> >> At some point I changed those weight to 0., but this was well after the
> >> beginning of the problem: this helped, at least, healing a lot of
> >> degraded/undersized.
> >>
> >>> After you have all the PGs active, we need to find out why their "up"
> >>> set is completely bogus.
> >>> This is evidence that your crush rule is broken.
> >>> If a PG doesn't have an complete "up" set, then it can never not be
> >>> de

[ceph-users] Re: PG down, due to 3 OSD failing

2022-04-04 Thread Dan van der Ster
BTW -- i've created https://tracker.ceph.com/issues/55169 to ask that
we add some input validation. Injecting such a crush map would ideally
not be possible.

-- dan

On Mon, Apr 4, 2022 at 11:02 AM Dan van der Ster  wrote:
>
> Excellent news!
> After everything is back to active+clean, don't forget to set min_size to 4 :)
>
> have a nice day
>
> On Mon, Apr 4, 2022 at 10:59 AM Fulvio Galeazzi  
> wrote:
> >
> > Yesss! Fixing the choose/chooseleaf thing did make the magic.  :-)
> >
> >Thanks a lot for your support Dan. Lots of lessons learned from my
> > side, I'm really grateful.
> >
> >All PGs are now active, will let Ceph rebalance.
> >
> >Ciao ciao
> >
> > Fulvio
> >
> > On 4/4/22 10:50, Dan van der Ster wrote:
> > > Hi Fulvio,
> > >
> > > Yes -- that choose/chooseleaf thing is definitely a problem.. Good catch!
> > > I suggest to fix it and inject the new crush map and see how it goes.
> > >
> > >
> > > Next, in your crush map for the storage type, you have an error:
> > >
> > > # types
> > > type 0 osd
> > > type 1 host
> > > type 2 chassis
> > > type 3 rack
> > > type 4 row
> > > type 5 pdu
> > > type 6 pod
> > > type 7 room
> > > type 8 datacenter
> > > type 9 region
> > > type 10 root
> > > type 11 storage
> > >
> > > The *order* of types is very important in crush -- they must be nested
> > > in the order they appear in the tree. "storage" should therefore be
> > > something between host and osd.
> > > If not, and if you use that type, it can break things.
> > > But since you're not actually using "storage" at the moment, it
> > > probably isn't causing any issue.
> > >
> > > So -- could you go ahead with that chooseleaf fix then let us know how it 
> > > goes?
> > >
> > > Cheers, Dan
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Apr 4, 2022 at 10:01 AM Fulvio Galeazzi  
> > > wrote:
> > >>
> > >> Hi again Dan!
> > >> Things are improving, all OSDs are up, but still that one PG is down.
> > >> More info below.
> > >>
> > >> On 4/1/22 19:26, Dan van der Ster wrote:
> > >> Here is the output of "pg 85.12 query":
> > >>https://pastebin.ubuntu.com/p/ww3JdwDXVd/
> > >>  and its status (also showing the other 85.XX, for reference):
> > >
> > > This is very weird:
> > >
> > >"up": [
> > >2147483647,
> > >2147483647,
> > >2147483647,
> > >2147483647,
> > >2147483647
> > >],
> > >"acting": [
> > >67,
> > >91,
> > >82,
> > >2147483647,
> > >112
> > >],
> > >>
> > >> Meanwhile, since a random PG still shows an output like the above one, I
> > >> think I found the problem with the crush rule: it syas "choose" rather
> > >> than "chooseleaf"!
> > >>
> > >> rule csd-data-pool {
> > >>   id 5
> > >>   type erasure
> > >>   min_size 3
> > >>   max_size 5
> > >>   step set_chooseleaf_tries 5
> > >>   step set_choose_tries 100
> > >>   step take default class big
> > >>   step choose indep 0 type host<--- HERE!
> > >>   step emit
> > >> }
> > >>
> > >> ...relic of a more complicated, two-step rule... sigh!
> > >>
> > >>> PGs are active if at least 3 shards are up.
> > >>> Our immediate goal remains to get 3 shards up for PG 85.25 (I'm
> > >>> assuming 85.25 remains the one and only PG which is down?)
> > >>
> > >> Yes, 85.25 is still the single 'down' PG.
> > >>
> >  pool 85 'csd-dataonly-ec-pool' erasure size 5 min_size 3 crush_rule 5
> >  object_hash rjenkins pg_num 64 pgp_num 64 autoscale_mode warn
> >  last_change 616460 flags
> >  hashpspool,ec_overwrites,nodelete,selfmanaged_snaps stripe_width 12288
> >  application rbd
> > >>>
> > >>> Yup okay, we need to fix that later to make this cluster correctly
> > >>> configured. To be followed up.
> > >>
> > >> At some point, need to update min_size to 4.
> > >>
> >  If I understand correctly, it should now be safe (but I will wait for
> >  your green light) to repeat the same for:
> >  osd.121 chunk 85.11s0
> >  osd.145 chunk 85.33s0
> >  so they can also start.
> > >>>
> > >>> Yes, please go ahead and do the same.
> > >>> I expect that your PG 85.25 will go active as soon as both those OSDs
> > >>> start correctly.
> > >>
> > >> Hmmm, unfortunately not. All OSDs are up, but 85.25 is still down.
> > >> Its chunks are in:
> > >>
> > >> 85.25s0: osd.64
> > >> 85.25s1: osd.140 osd.159
> > >> 85.25s2: osd.96
> > >> 85.25s3: osd.121 osd.176
> > >> 85.25s4: osd.159 osd.56
> > >>
> > >>> BTW, I also noticed in your crush map below that the down osds have
> > >>> crush weight zero!
> > >>> So -- this means they are the only active OSDs for a PG, and they are
> > >>> all set to be drained.
> > >>> How did this happen? It is also surely part of the root cause here!
> > >>>
> > >>>

[ceph-users] Re: Recovery or recreation of a monitor rocksdb

2022-04-04 Thread Konstantin Shalygin
Hi,

The fast way to fix quorum issue is redeploy ceph-mon service


k
Sent from my iPhone

> On 1 Apr 2022, at 14:43, Victor Rodriguez  wrote:
> 
> Hello,
> 
> Have a 3 node cluster using Proxmox + ceph version 14.2.22 (nautilus). After 
> a power failure one of the monitors does not start. The log states some kind 
> of problem with it's rocksdb but I can't really pinpoint the issue. The log 
> is available at https://pastebin.com/TZrFrZ1u.
> 
> How can I check or repair the rocksdb of this monitor?
> 
> Is there anyway to force the replication from another monitor?
> 
> Should I just remove that monitor from the cluster and re-add it back?
> 
> Should I force something to remove it from the cluster?
> 
> 
> I've had problems with rocksdb only once before. Then it was an OSD and 
> simply removed it and recreated and Ceph did rebuild/replace all PGs, etc.
> 
> Many thanks in advance.
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: can't deploy osd/db on nvme with other db logical volume

2022-04-04 Thread Eugen Block
I know, that’s why I asked if the logs show why ceph-volume didn’t  
create the required logical volumes.


Zitat von 彭勇 :


thanks, we have done by following commands:

ceph-volume lvm prepare  --no-systemd --bluestore --data /dev/sdh
--block.db /dev/nvme0n1 --block.db-size 7301032

we should ssh the host and execute the command for each OSD. if we have to
add many OSD, it will take lots of time.



On Mon, Apr 4, 2022 at 3:42 PM Eugen Block  wrote:


Hi,

this is handled by ceph-volume, do you find anything helpful in
/var/log/ceph//ceph-volume.log? Also check the cephadm.log
for any hints.


Zitat von 彭勇 :

> we have a running ceph, 16.2.7, with SATA OSD and DB on nvme.
> and we insert some SATA to host, and the status of new host is AVAILABLE.
> then we apply the osd-spec.yml, it can't create the OSD automatically.
>
> # ceph orch device ls
> HOST PATH  TYPE  DEVICE ID
>SIZE  AVAILABLE  REJECT REASONS
> h172-18-100-100  /dev/nvme0n1  ssd   INTEL
SSDPF2KX038TZ_PHAC1036009Z3P8AGN
>  3840G LVM detected, locked
> h172-18-100-100  /dev/sdb  hdd   ST16000NM000G-2K_ZL2CB8ZR
>   16.0T Insufficient space (<10 extents) on vgs, LVM
detected,
> locked
> h172-18-100-100  /dev/sdc  hdd   ST16000NM000G-2K_ZL2CB0J2
>   16.0T Insufficient space (<10 extents) on vgs, LVM
detected,
> locked
> h172-18-100-100  /dev/sdd  hdd   ST16000NM000G-2K_ZL2CBFSF
>   16.0T Insufficient space (<10 extents) on vgs, LVM
detected,
> locked
> h172-18-100-100  /dev/sde  hdd   ST16000NM000G-2K_ZL2CAYQB
>   16.0T Insufficient space (<10 extents) on vgs, LVM
detected,
> locked
> h172-18-100-100  /dev/sdf  hdd   ST16000NM000G-2K_ZL2CBEMC
>   16.0T  Yes
> h172-18-100-100  /dev/sdg  hdd   ST16000NM000G-2K_ZL2C427J
>   16.0T  Yes
> h172-18-100-100  /dev/sdh  hdd   ST16000NM000G-2K_ZL2CAZCZ
>   16.0T  Yes
> h172-18-100-100  /dev/sdi  hdd   ST16000NM000G-2K_ZL2CBM7M
>   16.0T  Yes
>
>
>
> osd-spec.yml:
>
> service_type: osd
> service_id: osd-spec
> placement:
> host_pattern: '*'
> spec:
> objectstore: bluestore
> block_db_size: 7301032
> data_devices:
> rotational: 1
> db_devices:
> rotational: 0
>
> ceph orch apply osd -i osd-spec.yml --dry-run
>
>
>
>
> --
> Peng Yong
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




--
彭勇 (Peng Yong)




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Bluestore tweaks for Bcache

2022-04-04 Thread Richard Bade
Hi Everyone,
I just wanted to share a discovery I made about running bluestore on
top of Bcache in case anyone else is doing this or considering it.
We've run Bcache under Filestore for a long time with good results but
recently rebuilt all the osds on bluestore. This caused some
degradation in performance that I couldn't quite put my finger on.
Bluestore osds have some smarts where they detect the disk type.
Unfortunately in the case of Bcache it detects as SSD, when in fact
the HDD parameters are better suited.
I changed the following parameters to match the HDD default values and
immediately saw my average osd latency during normal workload drop
from 6ms to 2ms. Peak performance didn't change really, but a test
machine that I have running a constant iops workload was much more
stable as was the average latency.
Performance has returned to Filestore or better levels.
Here are the parameters.

 ; Make sure that we use values appropriate for HDD not SSD - Bcache
gets detected as SSD
 bluestore_prefer_deferred_size = 32768
 bluestore_compression_max_blob_size = 524288
 bluestore_deferred_batch_ops = 64
 bluestore_max_blob_size = 524288
 bluestore_min_alloc_size = 65536
 bluestore_throttle_cost_per_io = 67

 ; Try to improve responsiveness when some disks are fully utilised
 osd_op_queue = wpq
 osd_op_queue_cut_off = high

Hopefully someone else finds this useful.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.0 QE Validation status

2022-04-04 Thread Josh Durgin
Hi Venky and Ernesto, how are the mount fix and grafana container build
looking?

Josh


On Fri, Apr 1, 2022 at 8:22 AM Venky Shankar  wrote:

> On Thu, Mar 31, 2022 at 8:51 PM Venky Shankar  wrote:
> >
> > Hi Yuri,
> >
> > On Wed, Mar 30, 2022 at 11:24 PM Yuri Weinstein 
> wrote:
> > >
> > > We merged rgw, cephadm and core PRs, but some work is still pending on
> fs and dashboard components.
> > >
> > > Seeking approvals for:
> > >
> > > smoke - Venky
> > > fs - Venky
> >
> > I approved the latest batch for cephfs PRs:
> >
> https://trello.com/c/Iq3WtUK5/1494-wip-yuri-testing-2022-03-29-0741-quincy
> >
> > There is one pending (blocker) PR:
> > https://github.com/ceph/ceph/pull/45689 - I'll let you know when the
> > backport is available.
>
> Smoke test passes with the above PR:
>
> https://pulpito.ceph.com/vshankar-2022-04-01_12:29:01-smoke-wip-vshankar-testing1-20220401-123425-testing-default-smithi/
>
> Requested Yuri to run FS suite w/ master (jobs were not getting
> scheduled in my run). Thanks, Yuri!
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] loosing one node from a 3-node cluster

2022-04-04 Thread Felix Joussein

Hi Everyone,

I run a 3-node proxmox+ceph cluster in my home-lab serving as rdb storage for virtual machines for 2 years now.

When I installed it, I did some testing to ensure, that when one node would fail, the remaining 2 nodes would keep the system up while the 3rd node is being replaced.

Recently I had to reboot a node on that cluster and realized, that the redundancy was gone.

 

Each of the 3 nodes has 4x4TB OSDs which makes 16TB per node or 48 in total.

As mentioned, I use proxmox, so I used their interface to set up the OSDs and Pools.

I have 2 Pools. One for my Virtual machines, one for ceph-fs.

Each pool's size/min is 3/2, has 256 PGs and Autoscaler on.

And now here's what I don't understand: I have the impression, that for what reason ever, it seams, as if my cluster would be over provisioned:

 

As the command outputs below show, ceph-iso_metadata consume 19TB accordingly to ceph df, how ever, the mounted ceph-iso filesystem is only 9.2TB big.

Same goes with my ceph-vm storage, that ceph belives is 8.3TB but in reality is only 6.3TB (accordingly to the proxmox gui).

 

The problem now is obvious: out of my 48TB Rawdata I should not be using more then 16TB, else I can't afford to loose a node.

Now Ceph tells me, that in total I am using 27TB, but compared to the mounted volumes/storages I am not using more then 16TB.

So, where are the 11TB (27-16) gone?

 

What am I not understanding?

 

Thank you for any hint on that.

regards,

Felix


 

 

ceph df
--- RAW STORAGE ---
CLASS  SIZE    AVAIL   USED    RAW USED  %RAW USED
hdd    44 TiB  17 TiB  27 TiB    27 TiB  61.70
TOTAL  44 TiB  17 TiB  27 TiB    27 TiB  61.70
 
--- POOLS ---
POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX AVAIL
device_health_metrics   1    1  0 B    0  0 B  0    3.0 TiB
ceph-vm 2  256  2.7 TiB  804.41k  8.3 TiB  47.76    3.0 TiB
ceph-iso_data   3  256  6.1 TiB    3.11M   19 TiB  67.23    3.0 TiB
ceph-iso_metadata   4   32  3.1 GiB  132.51k  9.3 GiB   0.10    3.0 TiB

 


rados df
POOL_NAME USED  OBJECTS  CLONES   COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED   RD_OPS  RD   WR_OPS   WR  USED COMPR  UNDER COMPR
ceph-iso_data   19 TiB  3105013   0  9315039   0    0 0    75202  97 GiB    28776  9.2 MiB 0 B  0 B
ceph-iso_metadata  9.3 GiB   132515   0   397545   0    0 0  15856613330  13 TiB  28336539064   93 TiB 0 B  0 B
ceph-vm    8.3 TiB   804409   0  2413227   0    0 0 94160784  40 TiB 62581002  4.4 TiB 0 B  0 B
device_health_metrics  0 B    0   0    0   0    0 0    0 0 B    0  0 B 0 B  0 B

total_objects    4041937
total_used   27 TiB
total_avail  17 TiB
total_space  44 TiB


 

df -h

Size    Used  Avail  Avail% mounted on

9,2T    6,2T  3,1T   67%    /mnt/pve/ceph-iso

 

 

 

 

 

 

 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: loosing one node from a 3-node cluster

2022-04-04 Thread Wolfpaw - Dale Corse
Hi Felix,

 

Where are your monitors located? Do you have one on each node?

 

 




   


Dale Corse 


CEO/CTO 


Cell:   780-504-1756 


24/7 NOC:   888-965-3729 


  www.wolfpaw.com   
  
 

 

From: Felix Joussein [mailto:felix.jouss...@gmx.at] 
Sent: Monday, April 4, 2022 6:54 PM
To: ceph-users@ceph.io
Subject: [ceph-users] loosing one node from a 3-node cluster

 

Hi Everyone,

I run a 3-node proxmox+ceph cluster in my home-lab serving as rdb storage for 
virtual machines for 2 years now.

When I installed it, I did some testing to ensure, that when one node would 
fail, the remaining 2 nodes would keep the system up while the 3rd node is 
being replaced.

Recently I had to reboot a node on that cluster and realized, that the 
redundancy was gone.

 

Each of the 3 nodes has 4x4TB OSDs which makes 16TB per node or 48 in total.

As mentioned, I use proxmox, so I used their interface to set up the OSDs and 
Pools.

I have 2 Pools. One for my Virtual machines, one for ceph-fs.

Each pool's size/min is 3/2, has 256 PGs and Autoscaler on.

And now here's what I don't understand: I have the impression, that for what 
reason ever, it seams, as if my cluster would be over provisioned:

 

As the command outputs below show, ceph-iso_metadata consume 19TB accordingly 
to ceph df, how ever, the mounted ceph-iso filesystem is only 9.2TB big.

Same goes with my ceph-vm storage, that ceph belives is 8.3TB but in reality is 
only 6.3TB (accordingly to the proxmox gui).

 

The problem now is obvious: out of my 48TB Rawdata I should not be using more 
then 16TB, else I can't afford to loose a node.

Now Ceph tells me, that in total I am using 27TB, but compared to the mounted 
volumes/storages I am not using more then 16TB.

So, where are the 11TB (27-16) gone?

 

What am I not understanding?

 

Thank you for any hint on that.

regards,

Felix

 

 

ceph df
--- RAW STORAGE ---
CLASS  SIZEAVAIL   USEDRAW USED  %RAW USED
hdd44 TiB  17 TiB  27 TiB27 TiB  61.70
TOTAL  44 TiB  17 TiB  27 TiB27 TiB  61.70
 
--- POOLS ---
POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX AVAIL
device_health_metrics   11  0 B0  0 B  03.0 TiB
ceph-vm 2  256  2.7 TiB  804.41k  8.3 TiB  47.763.0 TiB
ceph-iso_data   3  256  6.1 TiB3.11M   19 TiB  67.233.0 TiB
ceph-iso_metadata   4   32  3.1 GiB  132.51k  9.3 GiB   0.103.0 TiB

 

rados df
POOL_NAME USED  OBJECTS  CLONES   COPIES  MISSING_ON_PRIMARY  
UNFOUND  DEGRADED   RD_OPS  RD   WR_OPS   WR  USED COMPR  UNDER 
COMPR
ceph-iso_data   19 TiB  3105013   0  9315039   0
0 075202  97 GiB28776  9.2 MiB 0 B  
0 B
ceph-iso_metadata  9.3 GiB   132515   0   397545   0
0 0  15856613330  13 TiB  28336539064   93 TiB 0 B  
0 B
ceph-vm8.3 TiB   804409   0  2413227   0
0 0 94160784  40 TiB 62581002  4.4 TiB 0 B  
0 B
device_health_metrics  0 B0   00   0
0 00 0 B0  0 B 0 B  
0 B

total_objects4041937
total_used   27 TiB
total_avail  17 TiB
total_space  44 TiB

 

df -h

SizeUsed  Avail  Avail% mounted on

9,2T6,2T  3,1T   67%/mnt/pve/ceph-iso

 

 

 

 

 

 

 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.0 QE Validation status

2022-04-04 Thread Venky Shankar
Hey Josh,

On Tue, Apr 5, 2022 at 4:34 AM Josh Durgin  wrote:
>
> Hi Venky and Ernesto, how are the mount fix and grafana container build 
> looking?

Currently running into various teuthology related issues when testing
out the mount fix.

We'll want a test run without these failures to be really sure that we
aren't missing anything.

>
> Josh
>
>
> On Fri, Apr 1, 2022 at 8:22 AM Venky Shankar  wrote:
>>
>> On Thu, Mar 31, 2022 at 8:51 PM Venky Shankar  wrote:
>> >
>> > Hi Yuri,
>> >
>> > On Wed, Mar 30, 2022 at 11:24 PM Yuri Weinstein  
>> > wrote:
>> > >
>> > > We merged rgw, cephadm and core PRs, but some work is still pending on 
>> > > fs and dashboard components.
>> > >
>> > > Seeking approvals for:
>> > >
>> > > smoke - Venky
>> > > fs - Venky
>> >
>> > I approved the latest batch for cephfs PRs:
>> > https://trello.com/c/Iq3WtUK5/1494-wip-yuri-testing-2022-03-29-0741-quincy
>> >
>> > There is one pending (blocker) PR:
>> > https://github.com/ceph/ceph/pull/45689 - I'll let you know when the
>> > backport is available.
>>
>> Smoke test passes with the above PR:
>> https://pulpito.ceph.com/vshankar-2022-04-01_12:29:01-smoke-wip-vshankar-testing1-20220401-123425-testing-default-smithi/
>>
>> Requested Yuri to run FS suite w/ master (jobs were not getting
>> scheduled in my run). Thanks, Yuri!
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>


-- 
Cheers,
Venky

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: loosing one node from a 3-node cluster

2022-04-04 Thread Felix Joussein
Yes, each node has one monitor, manager and mds running.

regards,

Felix

 

 
 

Gesendet: Dienstag, 05. April 2022 um 03:00 Uhr
Von: "Wolfpaw - Dale Corse" 
An: "'Felix Joussein'" , ceph-users@ceph.io
Betreff: RE: [ceph-users] loosing one node from a 3-node cluster




Hi Felix,

 

Where are your monitors located? Do you have one on each node?

 


 


	
		
			
			

	
		
		
			

	
	
	

			
		
		
		
		
			

	
	Dale Corse 
	


	
	CEO/CTO 
	


	
	Cell: 780-504-1756 
	


	
	24/7 NOC: 888-965-3729 
	


	
	www.wolfpaw.com 
	

			
		
		
	

			
			
		
	



 



From: Felix Joussein [mailto:felix.jouss...@gmx.at]
Sent: Monday, April 4, 2022 6:54 PM
To: ceph-users@ceph.io
Subject: [ceph-users] loosing one node from a 3-node cluster



 



Hi Everyone,

I run a 3-node proxmox+ceph cluster in my home-lab serving as rdb storage for virtual machines for 2 years now.

When I installed it, I did some testing to ensure, that when one node would fail, the remaining 2 nodes would keep the system up while the 3rd node is being replaced.

Recently I had to reboot a node on that cluster and realized, that the redundancy was gone.

 

Each of the 3 nodes has 4x4TB OSDs which makes 16TB per node or 48 in total.

As mentioned, I use proxmox, so I used their interface to set up the OSDs and Pools.

I have 2 Pools. One for my Virtual machines, one for ceph-fs.

Each pool's size/min is 3/2, has 256 PGs and Autoscaler on.

And now here's what I don't understand: I have the impression, that for what reason ever, it seams, as if my cluster would be over provisioned:

 

As the command outputs below show, ceph-iso_metadata consume 19TB accordingly to ceph df, how ever, the mounted ceph-iso filesystem is only 9.2TB big.

Same goes with my ceph-vm storage, that ceph belives is 8.3TB but in reality is only 6.3TB (accordingly to the proxmox gui).

 

The problem now is obvious: out of my 48TB Rawdata I should not be using more then 16TB, else I can't afford to loose a node.

Now Ceph tells me, that in total I am using 27TB, but compared to the mounted volumes/storages I am not using more then 16TB.

So, where are the 11TB (27-16) gone?

 

What am I not understanding?

 


Thank you for any hint on that.



regards,



Felix




 



 



ceph df
--- RAW STORAGE ---
CLASS  SIZE    AVAIL   USED    RAW USED  %RAW USED
hdd    44 TiB  17 TiB  27 TiB    27 TiB  61.70
TOTAL  44 TiB  17 TiB  27 TiB    27 TiB  61.70
 
--- POOLS ---
POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX AVAIL
device_health_metrics   1    1  0 B    0  0 B  0    3.0 TiB
ceph-vm 2  256  2.7 TiB  804.41k  8.3 TiB  47.76    3.0 TiB
ceph-iso_data   3  256  6.1 TiB    3.11M   19 TiB  67.23    3.0 TiB
ceph-iso_metadata   4   32  3.1 GiB  132.51k  9.3 GiB   0.10    3.0 TiB



 




rados df
POOL_NAME USED  OBJECTS  CLONES   COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED   RD_OPS  RD   WR_OPS   WR  USED COMPR  UNDER COMPR
ceph-iso_data   19 TiB  3105013   0  9315039   0    0 0    75202  97 GiB    28776  9.2 MiB 0 B  0 B
ceph-iso_metadata  9.3 GiB   132515   0   397545   0    0 0  15856613330  13 TiB  28336539064   93 TiB 0 B  0 B
ceph-vm    8.3 TiB   804409   0  2413227   0    0 0 94160784  40 TiB 62581002  4.4 TiB 0 B  0 B
device_health_metrics  0 B    0   0    0   0    0 0    0 0 B    0  0 B 0 B  0 B



total_objects    4041937
total_used   27 TiB
total_avail  17 TiB
total_space  44 TiB




 



df -h



Size    Used  Avail  Avail% mounted on



9,2T    6,2T  3,1T   67%    /mnt/pve/ceph-iso



 



 



 



 



 



 



 








___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io