[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Lars Köppel
I updated from 17.2.6 to 17.2.7 and a few hours later to 18.2.2.
Would it be an option to go back to 17.2.6?


[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koep...@ariadne.ai
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai


On Wed, Jun 12, 2024 at 5:30 PM Eugen Block  wrote:

> Which version did you upgrade from to 18.2.2?
> I can’t pin it down to a specific issue, but somewhere in the back of
> my mind is something related to a new omap format or something. But
> I’m really not sure at all.
>
> Zitat von Lars Köppel :
>
> > I am happy to help you with as much information as possible. I probably
> > just don't know where to look for it.
> > Below are the requested information. The cluster is rebuilding the
> > zapped OSD at the moment. This will probably take the next few days.
> >
> >
> > sudo ceph pg ls-by-pool metadata
> > PG OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES OMAP_BYTES*
> >  OMAP_KEYS*  LOG   LOG_DUPS  STATE
> >  SINCE  VERSION  REPORTED UP ACTING
> > SCRUB_STAMP  DEEP_SCRUB_STAMP
> > LAST_SCRUB_DURATION  SCRUB_SCHEDULING
> > 10.0   5217325   4994695  00   4194304   5880891340
> > 9393865  1885  3000
> active+undersized+degraded+remapped+backfill_wait
> > 2h  79875'180849582  79875:391519635  [0,1,2]p0  [1,2]p1
> >  2024-06-11T09:08:09.829362+  2024-05-28T05:52:59.321589+
> >627  periodic scrub scheduled @ 2024-06-17T08:21:31.808348+
> > 10.1   5214785   5193424  00 0   5843682713
> > 9410150  1912  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'180914288  79875:342746928  [2,1,0]p2  [2,1]p2
> >  2024-06-01T15:56:28.927288+  2024-05-27T03:31:37.682966+
> >966  queued for scrub
> > 10.2   5218432   5187168  00 0   6402011266
> > 9812513  1874  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'180970531  79875:341340204  [0,1,2]p0  [1,2]p1
> >  2024-06-11T13:40:58.994256+  2024-06-11T13:40:58.994256+
> >   1942  periodic scrub scheduled @ 2024-06-17T06:07:15.329675+
> > 10.3   5217413   5217413  00   8388788   5766005023
> > 9271787  1923  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'181012233  79875:388295881  [1,0,2]p1  [1,2]p1
> >  2024-06-12T00:35:56.965547+  2024-05-23T19:54:56.121729+
> >492  periodic scrub scheduled @ 2024-06-18T06:39:31.103864+
> > 10.4   5220069   5220069  00  12583466   6027548724
> > 9537290  1959  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'181576075  79875:405295868  [1,2,0]p1  [1,2]p1
> >  2024-06-11T17:47:22.923514+  2024-05-31T02:06:55.339574+
> >581  periodic scrub scheduled @ 2024-06-17T00:59:37.214420+
> > 10.5   5216162   5211999  00   4194304   5941347251
> > 9542764  1930  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'180455793  79875:338418517  [2,1,0]p2  [2,1]p2
> >  2024-06-11T22:50:16.170708+  2024-05-30T23:49:54.316379+
> >528  periodic scrub scheduled @ 2024-06-17T04:39:25.905185+
> > 10.6   5216100   4980459  00   4521984   6428088514
> > 9850762  1911  3000
> active+undersized+degraded+remapped+backfill_wait
> > 2h  79875'184045876  79875:396809795  [0,2,1]p0  [1,2]p1
> >  2024-06-11T22:24:05.102716+  2024-06-11T22:24:05.102716+
> >   1082  periodic scrub scheduled @ 2024-06-17T07:58:44.289885+
> > 10.7   5218232   5218232  00   4194304   6377065363
> > 9849360  1919  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'182672562  79875:342449062  [1,0,2]p1  [1,2]p1
> >  2024-06-11T06:22:15.689422+  2024-06-11T06:22:15.689422+
> >   8225  periodic scrub scheduled @ 2024-06-17T13:05:59.225052+
> > 10.8   5219620   5182816  00 0   6167304290
> > 9691796  1896  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'179628377  79875:378022884  [2,1,0]p2  [2,1]p2
> >  2024-06-11T22:06:01.386763+  2024-06-11T22:06:01.386763+
> >   1286  periodic scrub scheduled @ 2024-06-17T07:54:54.133093+
> > 10.9   5219448   5164591  00   8388698   5796048346
> > 9338312  1868  3000
> active+undersized+degraded+remapped+backfill_wait
> > 3h  79875'181739392  79875:387412389  [2,1,0]p2  [2,1]p2
> >  2024-06-12T05:21:00.586747+  2024-05-26T11:10:59.780673+
> >539  periodic scrub scheduled @ 2024-06-18T15:32:59.155092+
> > 10.a   5219861   5163635  00  12582912   5841839055
> > 9387200  1916  3000
> active+undersized+degraded+remapped+backfill_wait
> >   

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-13 Thread Bandelow, Gunnar
Hi Torkil,

Maybe im overlooking something, but how about just renaming the
datacenter buckets?

Best regards, 

Gunnar

--- Original Nachricht ---
Betreff: [ceph-users] Re: Safe to move misplaced hosts between
failure domains in the crush tree?
Von: "Torkil Svensgaard" 
An: "Matthias Grandl" 
CC: ceph-users@ceph.io, "Ruben Vestergaard" 
Datum: 12-06-2024 10:33





On 12/06/2024 10:22, Matthias Grandl wrote:
> Correct, this should only result in misplaced objects.
> 
>  > We made a mistake when we moved the servers physically so while
the 
> replica 3 is intact the crush tree is not accurate.
> 
> Can you elaborate on that? Does this mean after the move, multiple
hosts 
> are inside the same physical datacenter? In that case, once you
correct 
> the CRUSH layout, you would be running misplaced without a way to 
> rebalance pools that are you using a datacenter crush rule.

Hi Matthias

Thanks for replying. Two of the three hosts was swapped so I would do:

ceph osd crush move ceph-flash1 datacenter=HX1
ceph osd crush move ceph-flash2 datacenter=714


And end up with 2/3 misplaced:

   -1 4437.29248  root default
  -33 1467.84814  datacenter 714
  -69   69.86389  host
ceph-flash2
  -34 1511.25378  datacenter HX1
  -73   69.86389  host
ceph-flash1
  -36 1458.19067  datacenter UXH
  -77   69.86389  host
ceph-flash3

It would only briefly be invalid between the two commands.

Mvh.

Torkil


> Cheers!
> 
> --
> 
> Matthias Grandl
> Head Storage Engineer
> matthias.gra...@croit.io 
> 
> Looking for help with your Ceph cluster? Contact us at https://croit

> .io
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> 
>> On 12. Jun 2024, at 09:13, Torkil Svensgaard  wrote:
>>
>> Hi
>>
>> We have 3 servers for replica 3 with failure domain datacenter:
>>
>>  -1 4437.29248  root default
>> -33 1467.84814  datacenter 714
>> -69   69.86389  host ceph-flash1
>> -34 1511.25378  datacenter HX1
>> -73   69.86389  host ceph-flash2
>> -36 1458.19067  datacenter UXH
>> -77   69.86389  host ceph-flash3
>>
>> We made a mistake when we moved the servers physically so while the

>> replica 3 is intact the crush tree is not accurate.
>>
>> If we just remedy the situation with "ceph osd crush move
ceph-flashX 
>> datacenter=Y" we will just end up with a lot of misplaced data and 
>> some churn, right? Or will the affected pool go
degraded/unavailable?
>>
>> Mvh.
>>
>> Torkil
>> -- 
>> Torkil Svensgaard
>> Sysadmin
>> MR-Forskningssektionen, afs. 714
>> DRCMR, Danish Research Centre for Magnetic Resonance
>> Hvidovre Hospital
>> Kettegård Allé 30
>> DK-2650 Hvidovre
>> Denmark
>> Tel: +45 386 22828
>> E-mail: tor...@drcmr.dk
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: tor...@drcmr.dk
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-13 Thread Torkil Svensgaard



On 13/06/2024 12:17, Bandelow, Gunnar wrote:

Hi Torkil,


Hi Gunnar

Maybe im overlooking something, but how about just renaming the 
datacenter buckets?


Here's the ceph osd tree command header and my pruned tree:

IDCLASS  WEIGHT  TYPE NAME STATUS  REWEIGHT  PRI-AFF
  -1 4437.29248  root default
 -33 1467.84814  datacenter 714
 -69   69.86389  host ceph-flash1
 -34 1511.25378  datacenter HX1
 -73   69.86389  host ceph-flash2
 -36 1458.19067  datacenter UXH
 -77   69.86389  host ceph-flash3

The weights reveal that there are other hosts in the datacenter buckets 
so renaming won't help.


Mvh.

Torkil


Best regards,
Gunnar

--- Original Nachricht ---
*Betreff: *[ceph-users] Re: Safe to move misplaced hosts between failure 
domains in the crush tree?

*Von: *"Torkil Svensgaard" mailto:tor...@drcmr.dk>>
*An: *"Matthias Grandl" >
*CC: *ceph-users@ceph.io , "Ruben 
Vestergaard" mailto:r...@drcmr.dk>>

*Datum: *12-06-2024 10:33



On 12/06/2024 10:22, Matthias Grandl wrote:
 > Correct, this should only result in misplaced objects.
 >
 >  > We made a mistake when we moved the servers physically so
while the
 > replica 3 is intact the crush tree is not accurate.
 >
 > Can you elaborate on that? Does this mean after the move,
multiple hosts
 > are inside the same physical datacenter? In that case, once you
correct
 > the CRUSH layout, you would be running misplaced without a way to
 > rebalance pools that are you using a datacenter crush rule.

Hi Matthias

Thanks for replying. Two of the three hosts was swapped so I would do:

ceph osd crush move ceph-flash1 datacenter=HX1
ceph osd crush move ceph-flash2 datacenter=714


And end up with 2/3 misplaced:

    -1 4437.29248  root default
   -33 1467.84814  datacenter 714
   -69   69.86389  host ceph-flash2
   -34 1511.25378  datacenter HX1
   -73   69.86389  host ceph-flash1
   -36 1458.19067  datacenter UXH
   -77   69.86389  host ceph-flash3

It would only briefly be invalid between the two commands.

Mvh.

Torkil


 > Cheers!
 >
 > --
 >
 > Matthias Grandl
 > Head Storage Engineer
 > matthias.gra...@croit.io 
mailto:matthias.gra...@croit.io>>
 >
 > Looking for help with your Ceph cluster? Contact us at
https://croit 
 > >.io
 >
 > croit GmbH, Freseniusstr. 31h, 81247 Munich
 > CEO: Martin Verges - VAT-ID: DE310638492
 > Com. register: Amtsgericht Munich HRB 231263
 > Web: https://croit.io  | YouTube:
https://goo.gl/PGE1Bx 
 >
 >> On 12. Jun 2024, at 09:13, Torkil Svensgaard mailto:tor...@drcmr.dk>> wrote:
 >>
 >> Hi
 >>
 >> We have 3 servers for replica 3 with failure domain datacenter:
 >>
 >>  -1 4437.29248  root default
 >> -33 1467.84814  datacenter 714
 >> -69   69.86389  host ceph-flash1
 >> -34 1511.25378  datacenter HX1
 >> -73   69.86389  host ceph-flash2
 >> -36 1458.19067  datacenter UXH
 >> -77   69.86389  host ceph-flash3
 >>
 >> We made a mistake when we moved the servers physically so while the
 >> replica 3 is intact the crush tree is not accurate.
 >>
 >> If we just remedy the situation with "ceph osd crush move
ceph-flashX
 >> datacenter=Y" we will just end up with a lot of misplaced data and
 >> some churn, right? Or will the affected pool go
degraded/unavailable?
 >>
 >> Mvh.
 >>
 >> Torkil
 >> --
 >> Torkil Svensgaard
 >> Sysadmin
 >> MR-Forskningssektionen, afs. 714
 >> DRCMR, Danish Research Centre for Magnetic Resonance
 >> Hvidovre Hospital
 >> Kettegård Allé 30
 >> DK-2650 Hvidovre
 >> Denmark
 >> Tel: +45 386 22828
 >> E-mail: tor...@drcmr.dk 
 >> ___
 >> ceph-users mailing list -- ceph-users@ceph.io

 >> To unsubscribe send an email to ceph-users-le...@ceph.io

 >

-- 
Torkil Svensgaard

Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: tor...@drcmr.dk 
___
ceph-users mailing list -- ceph-users@ceph.io

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-13 Thread Torkil Svensgaard



On 13/06/2024 08:54, Janne Johansson wrote:

We made a mistake when we moved the servers physically so while the
replica 3 is intact the crush tree is not accurate.

If we just remedy the situation with "ceph osd crush move ceph-flashX
datacenter=Y" we will just end up with a lot of misplaced data and some
churn, right? Or will the affected pool go degraded/unavailable?


I know I am late here, but for the record, if you ask crush to change
in such a way that PGs are asked to move to "impossible" places, they
will just end up being remapped/misplaced and continue to serve data.
They will obviously not backfill anywhere, but they will also not
cause troubles apart from ceph -s telling you the whole pool(s) is
misplaced currently. Then you can revert the crush change and
everything goes back to normal again.

I have made such "mistakes" several times, and ceph kept going even
though I panicked and flailed with my arms a lot until I managed to
revert the bad crush map changes.


Good to know, thanks =)

Mvh.

Torkil
--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: tor...@drcmr.dk
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Eugen Block
Downgrading isn't supported, I don't think that would be a good idea.  
I also don't see anything obvious standing out in the pg output. Any  
chance you can add more OSDs to the metadata pool to see if it stops  
at some point? Did the cluster usage change in any way? For example  
cephfs snapshots which haven't been used before or something like that?


Zitat von Lars Köppel :


I updated from 17.2.6 to 17.2.7 and a few hours later to 18.2.2.
Would it be an option to go back to 17.2.6?


[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koep...@ariadne.ai
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai


On Wed, Jun 12, 2024 at 5:30 PM Eugen Block  wrote:


Which version did you upgrade from to 18.2.2?
I can’t pin it down to a specific issue, but somewhere in the back of
my mind is something related to a new omap format or something. But
I’m really not sure at all.

Zitat von Lars Köppel :

> I am happy to help you with as much information as possible. I probably
> just don't know where to look for it.
> Below are the requested information. The cluster is rebuilding the
> zapped OSD at the moment. This will probably take the next few days.
>
>
> sudo ceph pg ls-by-pool metadata
> PG OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES OMAP_BYTES*
>  OMAP_KEYS*  LOG   LOG_DUPS  STATE
>  SINCE  VERSION  REPORTED UP ACTING
> SCRUB_STAMP  DEEP_SCRUB_STAMP
> LAST_SCRUB_DURATION  SCRUB_SCHEDULING
> 10.0   5217325   4994695  00   4194304   5880891340
> 9393865  1885  3000
active+undersized+degraded+remapped+backfill_wait
> 2h  79875'180849582  79875:391519635  [0,1,2]p0  [1,2]p1
>  2024-06-11T09:08:09.829362+  2024-05-28T05:52:59.321589+
>627  periodic scrub scheduled @ 2024-06-17T08:21:31.808348+
> 10.1   5214785   5193424  00 0   5843682713
> 9410150  1912  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'180914288  79875:342746928  [2,1,0]p2  [2,1]p2
>  2024-06-01T15:56:28.927288+  2024-05-27T03:31:37.682966+
>966  queued for scrub
> 10.2   5218432   5187168  00 0   6402011266
> 9812513  1874  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'180970531  79875:341340204  [0,1,2]p0  [1,2]p1
>  2024-06-11T13:40:58.994256+  2024-06-11T13:40:58.994256+
>   1942  periodic scrub scheduled @ 2024-06-17T06:07:15.329675+
> 10.3   5217413   5217413  00   8388788   5766005023
> 9271787  1923  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'181012233  79875:388295881  [1,0,2]p1  [1,2]p1
>  2024-06-12T00:35:56.965547+  2024-05-23T19:54:56.121729+
>492  periodic scrub scheduled @ 2024-06-18T06:39:31.103864+
> 10.4   5220069   5220069  00  12583466   6027548724
> 9537290  1959  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'181576075  79875:405295868  [1,2,0]p1  [1,2]p1
>  2024-06-11T17:47:22.923514+  2024-05-31T02:06:55.339574+
>581  periodic scrub scheduled @ 2024-06-17T00:59:37.214420+
> 10.5   5216162   5211999  00   4194304   5941347251
> 9542764  1930  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'180455793  79875:338418517  [2,1,0]p2  [2,1]p2
>  2024-06-11T22:50:16.170708+  2024-05-30T23:49:54.316379+
>528  periodic scrub scheduled @ 2024-06-17T04:39:25.905185+
> 10.6   5216100   4980459  00   4521984   6428088514
> 9850762  1911  3000
active+undersized+degraded+remapped+backfill_wait
> 2h  79875'184045876  79875:396809795  [0,2,1]p0  [1,2]p1
>  2024-06-11T22:24:05.102716+  2024-06-11T22:24:05.102716+
>   1082  periodic scrub scheduled @ 2024-06-17T07:58:44.289885+
> 10.7   5218232   5218232  00   4194304   6377065363
> 9849360  1919  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'182672562  79875:342449062  [1,0,2]p1  [1,2]p1
>  2024-06-11T06:22:15.689422+  2024-06-11T06:22:15.689422+
>   8225  periodic scrub scheduled @ 2024-06-17T13:05:59.225052+
> 10.8   5219620   5182816  00 0   6167304290
> 9691796  1896  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'179628377  79875:378022884  [2,1,0]p2  [2,1]p2
>  2024-06-11T22:06:01.386763+  2024-06-11T22:06:01.386763+
>   1286  periodic scrub scheduled @ 2024-06-17T07:54:54.133093+
> 10.9   5219448   5164591  00   8388698   5796048346
> 9338312  1868  3000
active+undersized+degraded+remapped+backfill_wait
> 3h  79875'181739392  79875:387412389  [2,1,0]p2  [2,1]p2
>  2024-06-12T05:21:00.586747+  2024-05-26T11:10:59.780673+
> 

[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Lars Köppel
We have been using snapshots for a long time.
The only change in usage is that we are currently deleting many small files
from the system. Because this is slow (~150 requests/s) this is running for
the last few weeks. Could such a load result in a problem with the MDS?

I have to ask for permission to order more drives. This could take some
time.

[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koep...@ariadne.ai
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai


On Thu, Jun 13, 2024 at 12:55 PM Eugen Block  wrote:

> Downgrading isn't supported, I don't think that would be a good idea.
> I also don't see anything obvious standing out in the pg output. Any
> chance you can add more OSDs to the metadata pool to see if it stops
> at some point? Did the cluster usage change in any way? For example
> cephfs snapshots which haven't been used before or something like that?
>
> Zitat von Lars Köppel :
>
> > I updated from 17.2.6 to 17.2.7 and a few hours later to 18.2.2.
> > Would it be an option to go back to 17.2.6?
> >
> >
> > [image: ariadne.ai Logo] Lars Köppel
> > Developer
> > Email: lars.koep...@ariadne.ai
> > Phone: +49 6221 5993580 <+4962215993580>
> > ariadne.ai (Germany) GmbH
> > Häusserstraße 3, 69115 Heidelberg
> > Amtsgericht Mannheim, HRB 744040
> > Geschäftsführer: Dr. Fabian Svara
> > https://ariadne.ai
> >
> >
> > On Wed, Jun 12, 2024 at 5:30 PM Eugen Block  wrote:
> >
> >> Which version did you upgrade from to 18.2.2?
> >> I can’t pin it down to a specific issue, but somewhere in the back of
> >> my mind is something related to a new omap format or something. But
> >> I’m really not sure at all.
> >>
> >> Zitat von Lars Köppel :
> >>
> >> > I am happy to help you with as much information as possible. I
> probably
> >> > just don't know where to look for it.
> >> > Below are the requested information. The cluster is rebuilding the
> >> > zapped OSD at the moment. This will probably take the next few days.
> >> >
> >> >
> >> > sudo ceph pg ls-by-pool metadata
> >> > PG OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES OMAP_BYTES*
> >> >  OMAP_KEYS*  LOG   LOG_DUPS  STATE
> >> >  SINCE  VERSION  REPORTED UP ACTING
> >> > SCRUB_STAMP  DEEP_SCRUB_STAMP
> >> > LAST_SCRUB_DURATION  SCRUB_SCHEDULING
> >> > 10.0   5217325   4994695  00   4194304   5880891340
> >> > 9393865  1885  3000
> >> active+undersized+degraded+remapped+backfill_wait
> >> > 2h  79875'180849582  79875:391519635  [0,1,2]p0  [1,2]p1
> >> >  2024-06-11T09:08:09.829362+  2024-05-28T05:52:59.321589+
> >> >627  periodic scrub scheduled @ 2024-06-17T08:21:31.808348+
> >> > 10.1   5214785   5193424  00 0   5843682713
> >> > 9410150  1912  3000
> >> active+undersized+degraded+remapped+backfill_wait
> >> > 3h  79875'180914288  79875:342746928  [2,1,0]p2  [2,1]p2
> >> >  2024-06-01T15:56:28.927288+  2024-05-27T03:31:37.682966+
> >> >966  queued for scrub
> >> > 10.2   5218432   5187168  00 0   6402011266
> >> > 9812513  1874  3000
> >> active+undersized+degraded+remapped+backfill_wait
> >> > 3h  79875'180970531  79875:341340204  [0,1,2]p0  [1,2]p1
> >> >  2024-06-11T13:40:58.994256+  2024-06-11T13:40:58.994256+
> >> >   1942  periodic scrub scheduled @ 2024-06-17T06:07:15.329675+
> >> > 10.3   5217413   5217413  00   8388788   5766005023
> >> > 9271787  1923  3000
> >> active+undersized+degraded+remapped+backfill_wait
> >> > 3h  79875'181012233  79875:388295881  [1,0,2]p1  [1,2]p1
> >> >  2024-06-12T00:35:56.965547+  2024-05-23T19:54:56.121729+
> >> >492  periodic scrub scheduled @ 2024-06-18T06:39:31.103864+
> >> > 10.4   5220069   5220069  00  12583466   6027548724
> >> > 9537290  1959  3000
> >> active+undersized+degraded+remapped+backfill_wait
> >> > 3h  79875'181576075  79875:405295868  [1,2,0]p1  [1,2]p1
> >> >  2024-06-11T17:47:22.923514+  2024-05-31T02:06:55.339574+
> >> >581  periodic scrub scheduled @ 2024-06-17T00:59:37.214420+
> >> > 10.5   5216162   5211999  00   4194304   5941347251
> >> > 9542764  1930  3000
> >> active+undersized+degraded+remapped+backfill_wait
> >> > 3h  79875'180455793  79875:338418517  [2,1,0]p2  [2,1]p2
> >> >  2024-06-11T22:50:16.170708+  2024-05-30T23:49:54.316379+
> >> >528  periodic scrub scheduled @ 2024-06-17T04:39:25.905185+
> >> > 10.6   5216100   4980459  00   4521984   6428088514
> >> > 9850762  1911  3000
> >> active+undersized+degraded+remapped+backfill_wait
> >> > 2h  79875'184045876  79875:396809795  [0,2,1]p0  [1,2]p1
> >> >  2024-06-11T22:24:05.102716+  2024-06-11T22:24:05.102716+
> >> >   10

[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Eugen Block
I'm quite sure that this could result in the impact you're seeing. To  
confirm that suspicion you could stop deleting and wait a couple of  
days to see if the usage stabilizes. And if it does, maybe delete less  
files at once or so to see how far you can tweak it. That would be my  
approach.


Zitat von Lars Köppel :


We have been using snapshots for a long time.
The only change in usage is that we are currently deleting many small files
from the system. Because this is slow (~150 requests/s) this is running for
the last few weeks. Could such a load result in a problem with the MDS?

I have to ask for permission to order more drives. This could take some
time.

[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koep...@ariadne.ai
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai


On Thu, Jun 13, 2024 at 12:55 PM Eugen Block  wrote:


Downgrading isn't supported, I don't think that would be a good idea.
I also don't see anything obvious standing out in the pg output. Any
chance you can add more OSDs to the metadata pool to see if it stops
at some point? Did the cluster usage change in any way? For example
cephfs snapshots which haven't been used before or something like that?

Zitat von Lars Köppel :

> I updated from 17.2.6 to 17.2.7 and a few hours later to 18.2.2.
> Would it be an option to go back to 17.2.6?
>
>
> [image: ariadne.ai Logo] Lars Köppel
> Developer
> Email: lars.koep...@ariadne.ai
> Phone: +49 6221 5993580 <+4962215993580>
> ariadne.ai (Germany) GmbH
> Häusserstraße 3, 69115 Heidelberg
> Amtsgericht Mannheim, HRB 744040
> Geschäftsführer: Dr. Fabian Svara
> https://ariadne.ai
>
>
> On Wed, Jun 12, 2024 at 5:30 PM Eugen Block  wrote:
>
>> Which version did you upgrade from to 18.2.2?
>> I can’t pin it down to a specific issue, but somewhere in the back of
>> my mind is something related to a new omap format or something. But
>> I’m really not sure at all.
>>
>> Zitat von Lars Köppel :
>>
>> > I am happy to help you with as much information as possible. I
probably
>> > just don't know where to look for it.
>> > Below are the requested information. The cluster is rebuilding the
>> > zapped OSD at the moment. This will probably take the next few days.
>> >
>> >
>> > sudo ceph pg ls-by-pool metadata
>> > PG OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES OMAP_BYTES*
>> >  OMAP_KEYS*  LOG   LOG_DUPS  STATE
>> >  SINCE  VERSION  REPORTED UP ACTING
>> > SCRUB_STAMP  DEEP_SCRUB_STAMP
>> > LAST_SCRUB_DURATION  SCRUB_SCHEDULING
>> > 10.0   5217325   4994695  00   4194304   5880891340
>> > 9393865  1885  3000
>> active+undersized+degraded+remapped+backfill_wait
>> > 2h  79875'180849582  79875:391519635  [0,1,2]p0  [1,2]p1
>> >  2024-06-11T09:08:09.829362+  2024-05-28T05:52:59.321589+
>> >627  periodic scrub scheduled @ 2024-06-17T08:21:31.808348+
>> > 10.1   5214785   5193424  00 0   5843682713
>> > 9410150  1912  3000
>> active+undersized+degraded+remapped+backfill_wait
>> > 3h  79875'180914288  79875:342746928  [2,1,0]p2  [2,1]p2
>> >  2024-06-01T15:56:28.927288+  2024-05-27T03:31:37.682966+
>> >966  queued for scrub
>> > 10.2   5218432   5187168  00 0   6402011266
>> > 9812513  1874  3000
>> active+undersized+degraded+remapped+backfill_wait
>> > 3h  79875'180970531  79875:341340204  [0,1,2]p0  [1,2]p1
>> >  2024-06-11T13:40:58.994256+  2024-06-11T13:40:58.994256+
>> >   1942  periodic scrub scheduled @ 2024-06-17T06:07:15.329675+
>> > 10.3   5217413   5217413  00   8388788   5766005023
>> > 9271787  1923  3000
>> active+undersized+degraded+remapped+backfill_wait
>> > 3h  79875'181012233  79875:388295881  [1,0,2]p1  [1,2]p1
>> >  2024-06-12T00:35:56.965547+  2024-05-23T19:54:56.121729+
>> >492  periodic scrub scheduled @ 2024-06-18T06:39:31.103864+
>> > 10.4   5220069   5220069  00  12583466   6027548724
>> > 9537290  1959  3000
>> active+undersized+degraded+remapped+backfill_wait
>> > 3h  79875'181576075  79875:405295868  [1,2,0]p1  [1,2]p1
>> >  2024-06-11T17:47:22.923514+  2024-05-31T02:06:55.339574+
>> >581  periodic scrub scheduled @ 2024-06-17T00:59:37.214420+
>> > 10.5   5216162   5211999  00   4194304   5941347251
>> > 9542764  1930  3000
>> active+undersized+degraded+remapped+backfill_wait
>> > 3h  79875'180455793  79875:338418517  [2,1,0]p2  [2,1]p2
>> >  2024-06-11T22:50:16.170708+  2024-05-30T23:49:54.316379+
>> >528  periodic scrub scheduled @ 2024-06-17T04:39:25.905185+
>> > 10.6   5216100   4980459  00   4521984   6428088514
>> > 9850762  1911  3000
>> active+undersized+degraded+remapped+backfill_w

[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities

2024-06-13 Thread Frédéric Nass
Hello,

'ceph osd deep-scrub 5' deep-scrubs all PGs for which osd.5 is primary (and 
only those).

You can check that from ceph-osd.5.log by running:
for pg in $(grep 'deep-scrub starts' /var/log/ceph/*/ceph-osd.5.log | awk 
'{print $8}') ; do echo "pg: $pg, primary osd is osd.$(ceph pg $pg query -f 
json | jq '.info.stats.acting_primary')" ; done

while

'ceph osd deep-scrub all' instructs all OSDs to start deep-scrubbing all PGs 
they're primary for, so in the end, all cluster's PGs.

So if the data you overwrote on osd.5 with 'dd' was part of a PG for which 
osd.5 was not the primary OSD then it wasn't deep-scrubbed.

man ceph 8 could rather say:

   Subcommand deep-scrub initiates deep scrub on all PGs osd  is 
primary for.

   Usage:

  ceph osd deep-scrub 

Regards,
Frédéric.

- Le 10 Juin 24, à 16:51, Petr Bena petr@bena.rocks a écrit :

> Most likely it wasn't, the ceph help or documentation is not very clear about
> this:
> 
> osd deep-scrub 
> initiate
> deep scrub on osd , or use  to deep scrub all
> 
> It doesn't say anything like "initiate deep scrub of primary PGs on osd"
> 
> I assumed it just runs a scrub of everything on given OSD.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata pool size

2024-06-13 Thread Lars Köppel
ok. Thank you for your help.
We will try this and report back in a few days.


[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koep...@ariadne.ai
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai


On Thu, Jun 13, 2024 at 3:01 PM Eugen Block  wrote:

> I'm quite sure that this could result in the impact you're seeing. To
> confirm that suspicion you could stop deleting and wait a couple of
> days to see if the usage stabilizes. And if it does, maybe delete less
> files at once or so to see how far you can tweak it. That would be my
> approach.
>
> Zitat von Lars Köppel :
>
> > We have been using snapshots for a long time.
> > The only change in usage is that we are currently deleting many small
> files
> > from the system. Because this is slow (~150 requests/s) this is running
> for
> > the last few weeks. Could such a load result in a problem with the MDS?
> >
> > I have to ask for permission to order more drives. This could take some
> > time.
> >
> > [image: ariadne.ai Logo] Lars Köppel
> > Developer
> > Email: lars.koep...@ariadne.ai
> > Phone: +49 6221 5993580 <+4962215993580>
> > ariadne.ai (Germany) GmbH
> > Häusserstraße 3, 69115 Heidelberg
> > Amtsgericht Mannheim, HRB 744040
> > Geschäftsführer: Dr. Fabian Svara
> > https://ariadne.ai
> >
> >
> > On Thu, Jun 13, 2024 at 12:55 PM Eugen Block  wrote:
> >
> >> Downgrading isn't supported, I don't think that would be a good idea.
> >> I also don't see anything obvious standing out in the pg output. Any
> >> chance you can add more OSDs to the metadata pool to see if it stops
> >> at some point? Did the cluster usage change in any way? For example
> >> cephfs snapshots which haven't been used before or something like that?
> >>
> >> Zitat von Lars Köppel :
> >>
> >> > I updated from 17.2.6 to 17.2.7 and a few hours later to 18.2.2.
> >> > Would it be an option to go back to 17.2.6?
> >> >
> >> >
> >> > [image: ariadne.ai Logo] Lars Köppel
> >> > Developer
> >> > Email: lars.koep...@ariadne.ai
> >> > Phone: +49 6221 5993580 <+4962215993580>
> >> > ariadne.ai (Germany) GmbH
> >> > Häusserstraße 3, 69115 Heidelberg
> >> > Amtsgericht Mannheim, HRB 744040
> >> > Geschäftsführer: Dr. Fabian Svara
> >> > https://ariadne.ai
> >> >
> >> >
> >> > On Wed, Jun 12, 2024 at 5:30 PM Eugen Block  wrote:
> >> >
> >> >> Which version did you upgrade from to 18.2.2?
> >> >> I can’t pin it down to a specific issue, but somewhere in the back of
> >> >> my mind is something related to a new omap format or something. But
> >> >> I’m really not sure at all.
> >> >>
> >> >> Zitat von Lars Köppel :
> >> >>
> >> >> > I am happy to help you with as much information as possible. I
> >> probably
> >> >> > just don't know where to look for it.
> >> >> > Below are the requested information. The cluster is rebuilding the
> >> >> > zapped OSD at the moment. This will probably take the next few
> days.
> >> >> >
> >> >> >
> >> >> > sudo ceph pg ls-by-pool metadata
> >> >> > PG OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES OMAP_BYTES*
> >> >> >  OMAP_KEYS*  LOG   LOG_DUPS  STATE
> >> >> >  SINCE  VERSION  REPORTED UP ACTING
> >> >> > SCRUB_STAMP  DEEP_SCRUB_STAMP
> >> >> > LAST_SCRUB_DURATION  SCRUB_SCHEDULING
> >> >> > 10.0   5217325   4994695  00   4194304   5880891340
> >> >> > 9393865  1885  3000
> >> >> active+undersized+degraded+remapped+backfill_wait
> >> >> > 2h  79875'180849582  79875:391519635  [0,1,2]p0  [1,2]p1
> >> >> >  2024-06-11T09:08:09.829362+  2024-05-28T05:52:59.321589+
> >> >> >627  periodic scrub scheduled @
> 2024-06-17T08:21:31.808348+
> >> >> > 10.1   5214785   5193424  00 0   5843682713
> >> >> > 9410150  1912  3000
> >> >> active+undersized+degraded+remapped+backfill_wait
> >> >> > 3h  79875'180914288  79875:342746928  [2,1,0]p2  [2,1]p2
> >> >> >  2024-06-01T15:56:28.927288+  2024-05-27T03:31:37.682966+
> >> >> >966  queued for scrub
> >> >> > 10.2   5218432   5187168  00 0   6402011266
> >> >> > 9812513  1874  3000
> >> >> active+undersized+degraded+remapped+backfill_wait
> >> >> > 3h  79875'180970531  79875:341340204  [0,1,2]p0  [1,2]p1
> >> >> >  2024-06-11T13:40:58.994256+  2024-06-11T13:40:58.994256+
> >> >> >   1942  periodic scrub scheduled @
> 2024-06-17T06:07:15.329675+
> >> >> > 10.3   5217413   5217413  00   8388788   5766005023
> >> >> > 9271787  1923  3000
> >> >> active+undersized+degraded+remapped+backfill_wait
> >> >> > 3h  79875'181012233  79875:388295881  [1,0,2]p1  [1,2]p1
> >> >> >  2024-06-12T00:35:56.965547+  2024-05-23T19:54:56.121729+
> >> >> >492  periodic scrub scheduled @
> 2024-06-18T06:39:31.103864+
> >> >> > 10.4   5220069   5220069  0  

[ceph-users] deep scrubb and scrubb does get the job done

2024-06-13 Thread Manuel Oetiker
Hi

our cluster is on warning for more than two weeks we had to move some pools 
form ssd to hdd
and it looked good ... but somehow the pgs scrubb does not get done with his 
jobs
 
* PG_NOT_DEEP_SCRUBBED : 171 pgs not deep-scrubbed in time 
* PG_NOT_SCRUBBED : 132 pgs not scrubbed in time

Till the move the cluster was happy without any warnings... 

There is no have load on the cluster I don't see why the cluster get not done 
with that... 
is there a way to find out why ...

Thanks for any hint
Manuel




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph crash :-(

2024-06-13 Thread Ranjan Ghosh

Hi all,

I just upgraded the first node of our cluster to Ubuntu 24.04 (Noble) 
from 23.10 (Mantic). Unfortunately Ceph doesnt work anymore on the new node:


===


 ceph version 19.2.0~git20240301.4c76c50 
(4c76c50a73f63ba48ccdf0adccce03b00d1d80c7) squid (dev)

 1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x7c7da3c45320]
 2: pthread_kill()
 3: gsignal()
 4: abort()
 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ffe) [0x7c7da40a5ffe]
 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbae9c) [0x7c7da40bae9c]
 7: (std::unexpected()+0) [0x7c7da40a5a49]
 8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb128) [0x7c7da40bb128]
 9: (ceph::buffer::v15_2_0::list::iterator_impl::copy(unsigned 
int, char*)+0xc4) [0x7c7da4a6e414]
 10: 
(MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl&)+0xc74) 
[0x7c7da4c98fc4]
 11: (MDSDaemon::handle_mds_map(boost::intrusive_ptr 
const&)+0x2c6) [0x59124a7cd766]
 12: (MDSDaemon::handle_core_message(boost::intrusive_ptrconst> const&)+0x2c9) [0x59124a7d0d79]
 13: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr 
const&)+0xe1) [0x59124a7d1341]
 14: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr 
const&)+0x3b8) [0x7c7da4930dd8]

 15: (DispatchQueue::entry()+0x771) [0x7c7da492ecb1]
 16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7c7da49f3f81]
 17: /lib/x86_64-linux-gnu/libc.so.6(+0x9ca94) [0x7c7da3c9ca94]
 18: /lib/x86_64-linux-gnu/libc.so.6(+0x129c3c) [0x7c7da3d29c3c]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


===

What's more APT says I now got a Ceph Version 
(19.2.0~git20240301.4c76c50-0ubuntu6) which doesn't even have any 
official release notes:


https://docs.ceph.com/en/latest/releases/

How did I get here? What can I do to get back a working Ceph version? 
Until this is resolved, I obviously don't want to upgrade the other nodes.


Thank you / BR

Ranjan



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph crash :-(

2024-06-13 Thread Robert Sander

On 13.06.24 18:18, Ranjan Ghosh wrote:

What's more APT says I now got a Ceph Version 
(19.2.0~git20240301.4c76c50-0ubuntu6) which doesn't even have any 
official release notes:


Ubuntu 24.04 ships with that version from a git snapshot.

You have to ask Canonical why they did this.

I would not use Ceph packages shipped from a distribution but always the 
ones from download.ceph.com or even better the container images that 
come with the orchestrator.


Why version do your other Ceph nodes run on?

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [SPAM] Re: Ceph crash :-(

2024-06-13 Thread Sebastian
If this is one node from many it’s not a problem because you can reinstall 
system and ceph and rebalance cluster.
BTW. Read release notes before :)  I’m also not reading it in case of my 
personal desktop, but on servers where I keep data I’m doing it. 
but what canonical did in this case is… this is LTS version :/ 


BR,
Sebastian


> On 13 Jun 2024, at 19:47, David C.  wrote:
> 
> In addition to Robert's recommendations,
> 
> Remember to respect the update order (mgr->mon->(crash->)osd->mds->...)
> 
> Before everything was containerized, it was not recommended to have
> different services on the same machine.
> 
> 
> 
> Le jeu. 13 juin 2024 à 19:37, Robert Sander 
> a écrit :
> 
>> On 13.06.24 18:18, Ranjan Ghosh wrote:
>> 
>>> What's more APT says I now got a Ceph Version
>>> (19.2.0~git20240301.4c76c50-0ubuntu6) which doesn't even have any
>>> official release notes:
>> 
>> Ubuntu 24.04 ships with that version from a git snapshot.
>> 
>> You have to ask Canonical why they did this.
>> 
>> I would not use Ceph packages shipped from a distribution but always the
>> ones from download.ceph.com or even better the container images that
>> come with the orchestrator.
>> 
>> Why version do your other Ceph nodes run on?
>> 
>> Regards
>> --
>> Robert Sander
>> Heinlein Support GmbH
>> Linux: Akademie - Support - Hosting
>> http://www.heinlein-support.de
>> 
>> Tel: 030-405051-43
>> Fax: 030-405051-19
>> 
>> Zwangsangaben lt. §35a GmbHG:
>> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
>> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread sinan

I have doing some further testing.

My RGW pool is placed on spinning disks.
I created a 2nd RGW data pool, placed on flash disks.

Benchmarking on HDD pool:
Client 1 -> 1 RGW Node: 150 obj/s
Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client)
Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s
Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each client)

I did the same tests towards the RGW pool on flash disks: same results

So, it doesn't matter if my pool is hosted on HDD or SSD.
It doesn't matter if I am using 1 RGW or 3 RGW nodes.
It doesn't matter if I am using 1 client or 5 clients.

I am constantly limited at around 140-160 objects/s.

I see some TCP Retransmissions on the RGW Node, but maybe thats 
'normal'.


Any ideas/suggestions?

On 2024-06-11 22:08, Anthony D'Atri wrote:


I am not sure adding more RGW's will increase the performance.


That was a tangent.


To be clear, that means whatever.rgw.buckets.index ?

No, sorry my bad. .index is 32 and .data is 256.
Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG 
replicas on each OSD?  You want (IMHO) to end up with 100-200, 
keeping each pool's pg_num to a power of 2 ideally.


No, my RBD pool is larger. My average PG per OSD is round 60-70.


Ah.  Aim for 100-200 with spinners.



Assuming all your pools span all OSDs, I suggest at a minimum 256 for 
.index and 8192 for .data, assuming you have only RGW pools.  And 
would be included to try 512 / 8192.  Assuming your  other minor 
pools are at 32, I'd bump .log and .non-ec to 128 or 256 as well.

If you have RBD or other pools colocated, those numbers would change.
^ above assume disabling the autoscaler


I bumped my .data pool from 256 to 1024 and .index from 32 to 128.


Your index pool still only benefits from half of your OSDs with a value 
of 128.



Also doubled the .non-e and .log pools. Performance wise I don't see 
any improvement. If I would see 10-20% improvement, I definitely would 
increase it to 512 / 8192.
With 0.5MB object size I am still limited at about 150 up to 250 
objects/s.


The disks aren't saturated. The wr await is mostly around 1ms and does 
not get higher when benchmarking with S3.


Trust iostat about as far as you can throw it.




Other suggestions, or does anyone else has suggestions?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread Anthony D'Atri
How large are the objects you tested with?  

> On Jun 13, 2024, at 14:46, si...@turka.nl wrote:
> 
> I have doing some further testing.
> 
> My RGW pool is placed on spinning disks.
> I created a 2nd RGW data pool, placed on flash disks.
> 
> Benchmarking on HDD pool:
> Client 1 -> 1 RGW Node: 150 obj/s
> Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client)
> Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s
> Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each client)
> 
> I did the same tests towards the RGW pool on flash disks: same results
> 
> So, it doesn't matter if my pool is hosted on HDD or SSD.
> It doesn't matter if I am using 1 RGW or 3 RGW nodes.
> It doesn't matter if I am using 1 client or 5 clients.
> 
> I am constantly limited at around 140-160 objects/s.
> 
> I see some TCP Retransmissions on the RGW Node, but maybe thats 'normal'.
> 
> Any ideas/suggestions?
> 
> On 2024-06-11 22:08, Anthony D'Atri wrote:
>>> I am not sure adding more RGW's will increase the performance.
>> That was a tangent.
>>> To be clear, that means whatever.rgw.buckets.index ?
> No, sorry my bad. .index is 32 and .data is 256.
 Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG 
 replicas on each OSD?  You want (IMHO) to end up with 100-200, keeping 
 each pool's pg_num to a power of 2 ideally.
>>> No, my RBD pool is larger. My average PG per OSD is round 60-70.
>> Ah.  Aim for 100-200 with spinners.
 Assuming all your pools span all OSDs, I suggest at a minimum 256 for 
 .index and 8192 for .data, assuming you have only RGW pools.  And would be 
 included to try 512 / 8192.  Assuming your  other minor pools are at 32, 
 I'd bump .log and .non-ec to 128 or 256 as well.
 If you have RBD or other pools colocated, those numbers would change.
 ^ above assume disabling the autoscaler
>>> I bumped my .data pool from 256 to 1024 and .index from 32 to 128.
>> Your index pool still only benefits from half of your OSDs with a value of 
>> 128.
>>> Also doubled the .non-e and .log pools. Performance wise I don't see any 
>>> improvement. If I would see 10-20% improvement, I definitely would increase 
>>> it to 512 / 8192.
>>> With 0.5MB object size I am still limited at about 150 up to 250 objects/s.
>>> The disks aren't saturated. The wr await is mostly around 1ms and does not 
>>> get higher when benchmarking with S3.
>> Trust iostat about as far as you can throw it.
>>> Other suggestions, or does anyone else has suggestions?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph crash :-(

2024-06-13 Thread Robert Sander

Hi,

On 13.06.24 20:29, Ranjan Ghosh wrote:

Other Ceph nodes run on 18.2 which came with the previous Ubuntu version.

I wonder if I could easily switch to Ceph packages or whether that would 
cause even more problems.


Perhaps it's more advisable to wait until Ubuntu releases proper packages.


Read the Ceph documentation about upgrading a Ceph cluster.
You cannot just upgrade packages on one node and reboot it. There is a 
certain order to follow.


This is why it's bad to use the packages shipped by the distribution: 
When upgrading the distribution on one node you also upgrade the Ceph 
packages.


download.ceph.com has packages for Ubuntu 22.04 and nothing for 24.04.
Therefor I would assume Ubuntu 24.04 is not a supported platform for 
Ceph (unless you use the cephadm orchestrator and container).


BTW: Please keep the discussion on the mailing list.

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [SPAM] Re: Ceph crash :-(

2024-06-13 Thread David C.
Debian unstable

The situation is absolutely not dramatic, but if it is a large production,
you should benefit from product support.
Based on the geographical area of your email domain, perhaps ask Robert for
a local service ?


Le jeu. 13 juin 2024 à 20:35, Ranjan Ghosh  a écrit :

> I'm still in doubt whether any reinstall will fix this issue, because
> the packages seem to be buggy and there a no better packages right now
> for Ubuntu 24.04 it seems.
>
> Canonical is really crazy if you ask me. Even for a non-LTS version but
> especially for an LTS version. What were they thinking? Just get a
> preliminary obviously buggy GIT version and shove it out with a release
> to unsuspecting users.
> Just imagine they did sth. like this with Apache etc.
>
> Thanks for all tips y'all. Still need to figure out what the *best*
> option right now would be to fix this. Sigh.
>
>
>
> Am 13.06.24 um 20:00 schrieb Sebastian:
> > If this is one node from many it’s not a problem because you can
> reinstall system and ceph and rebalance cluster.
> > BTW. Read release notes before :)  I’m also not reading it in case of my
> personal desktop, but on servers where I keep data I’m doing it.
> > but what canonical did in this case is… this is LTS version :/
> >
> >
> > BR,
> > Sebastian
> >
> >
> >> On 13 Jun 2024, at 19:47, David C.  wrote:
> >>
> >> In addition to Robert's recommendations,
> >>
> >> Remember to respect the update order (mgr->mon->(crash->)osd->mds->...)
> >>
> >> Before everything was containerized, it was not recommended to have
> >> different services on the same machine.
> >>
> >>
> >>
> >> Le jeu. 13 juin 2024 à 19:37, Robert Sander <
> r.san...@heinlein-support.de>
> >> a écrit :
> >>
> >>> On 13.06.24 18:18, Ranjan Ghosh wrote:
> >>>
>  What's more APT says I now got a Ceph Version
>  (19.2.0~git20240301.4c76c50-0ubuntu6) which doesn't even have any
>  official release notes:
> >>> Ubuntu 24.04 ships with that version from a git snapshot.
> >>>
> >>> You have to ask Canonical why they did this.
> >>>
> >>> I would not use Ceph packages shipped from a distribution but always
> the
> >>> ones from download.ceph.com or even better the container images that
> >>> come with the orchestrator.
> >>>
> >>> Why version do your other Ceph nodes run on?
> >>>
> >>> Regards
> >>> --
> >>> Robert Sander
> >>> Heinlein Support GmbH
> >>> Linux: Akademie - Support - Hosting
> >>> http://www.heinlein-support.de
> >>>
> >>> Tel: 030-405051-43
> >>> Fax: 030-405051-19
> >>>
> >>> Zwangsangaben lt. §35a GmbHG:
> >>> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> >>> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [SPAM] Re: Ceph crash :-(

2024-06-13 Thread Ranjan Ghosh
I'm still in doubt whether any reinstall will fix this issue, because 
the packages seem to be buggy and there a no better packages right now 
for Ubuntu 24.04 it seems.


Canonical is really crazy if you ask me. Even for a non-LTS version but 
especially for an LTS version. What were they thinking? Just get a 
preliminary obviously buggy GIT version and shove it out with a release 
to unsuspecting users.

Just imagine they did sth. like this with Apache etc.

Thanks for all tips y'all. Still need to figure out what the *best* 
option right now would be to fix this. Sigh.




Am 13.06.24 um 20:00 schrieb Sebastian:

If this is one node from many it’s not a problem because you can reinstall 
system and ceph and rebalance cluster.
BTW. Read release notes before :)  I’m also not reading it in case of my 
personal desktop, but on servers where I keep data I’m doing it.
but what canonical did in this case is… this is LTS version :/


BR,
Sebastian



On 13 Jun 2024, at 19:47, David C.  wrote:

In addition to Robert's recommendations,

Remember to respect the update order (mgr->mon->(crash->)osd->mds->...)

Before everything was containerized, it was not recommended to have
different services on the same machine.



Le jeu. 13 juin 2024 à 19:37, Robert Sander 
a écrit :


On 13.06.24 18:18, Ranjan Ghosh wrote:


What's more APT says I now got a Ceph Version
(19.2.0~git20240301.4c76c50-0ubuntu6) which doesn't even have any
official release notes:

Ubuntu 24.04 ships with that version from a git snapshot.

You have to ask Canonical why they did this.

I would not use Ceph packages shipped from a distribution but always the
ones from download.ceph.com or even better the container images that
come with the orchestrator.

Why version do your other Ceph nodes run on?

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph crash :-(

2024-06-13 Thread David C.
In addition to Robert's recommendations,

Remember to respect the update order (mgr->mon->(crash->)osd->mds->...)

Before everything was containerized, it was not recommended to have
different services on the same machine.



Le jeu. 13 juin 2024 à 19:37, Robert Sander 
a écrit :

> On 13.06.24 18:18, Ranjan Ghosh wrote:
>
> > What's more APT says I now got a Ceph Version
> > (19.2.0~git20240301.4c76c50-0ubuntu6) which doesn't even have any
> > official release notes:
>
> Ubuntu 24.04 ships with that version from a git snapshot.
>
> You have to ask Canonical why they did this.
>
> I would not use Ceph packages shipped from a distribution but always the
> ones from download.ceph.com or even better the container images that
> come with the orchestrator.
>
> Why version do your other Ceph nodes run on?
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Patching Ceph cluster

2024-06-13 Thread Sake Ceph
Yeah we fully automated this with Ansible. In short we do the following. 

1. Check if cluster is healthy before continuing (via REST-API) only health_ok 
is good
2. Disable scrub and deep-scrub
3. Update all applications on all the hosts in the cluster
4. For every host, one by one, do the following:
4a. Check if applications got updated
4b. Check via reboot-hint if a reboot is necessary
4c. If applications got updated or reboot is necessary, do the following :
4c1. Put host in maintenance 
4c2. Reboot host if necessary 
4c3. Check and wait via 'ceph orch host ls' if status of the host is maintance 
and nothing else
4c4. Get host out of maintenance 
4d. Check if cluster is healthy before continuing (via Rest-API) only warning 
about scrub and deep-scrub is allowed, but no pg's should be degraded 
5. Enable scrub and deep-scrub when all hosts are done
6. Check if cluster is healthy (via Rest-API) only health_ok is good
7. Done

For upgrade the OS we have something similar, but exiting maintenance mode is 
broken (with 17.2.7) :(
I need to check the tracker for similar issues and if I can't find anything, I 
will create a ticket. 

Kind regards, 
Sake 

> Op 12-06-2024 19:02 CEST schreef Daniel Brown :
> 
>  
> I have two ansible roles, one for enter, one for exit. There’s likely better 
> ways to do this — and I’ll not be surprised if someone here lets me know. 
> They’re using orch commands via the cephadm shell. I’m using Ansible for 
> other configuration management in my environment, as well, including setting 
> up clients of the ceph cluster. 
> 
> 
> Below excerpts from main.yml in the “tasks” for the enter/exit roles. The 
> host I’m running ansible from is one of my CEPH servers - I’ve limited which 
> process run there though so it’s in the cluster but not equal to the others. 
> 
> 
> —
> Enter
> —
> 
> - name: Ceph Maintenance Mode Enter
>   shell:
> 
> cmd: ' cephadm shell ceph orch host maintenance enter {{ 
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} 
> --force --yes-i-really-mean-it ‘
>   become: True
> 
> 
> 
> —
> Exit
> — 
> 
> 
> - name: Ceph Maintenance Mode Exit
>   shell:
> cmd: 'cephadm shell ceph orch host maintenance exit {{ 
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘
>   become: True
>   connection: local
> 
> 
> - name: Wait for Ceph to be available
>   ansible.builtin.wait_for:
> delay: 60
> host: '{{ 
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’
> port: 9100
>   connection: local
> 
> 
> 
> 
> 
> 
> > On Jun 12, 2024, at 11:28 AM, Michael Worsham  
> > wrote:
> > 
> > Interesting. How do you set this "maintenance mode"? If you have a series 
> > of documented steps that you have to do and could provide as an example, 
> > that would be beneficial for my efforts.
> > 
> > We are in the process of standing up both a dev-test environment consisting 
> > of 3 Ceph servers (strictly for testing purposes) and a new production 
> > environment consisting of 20+ Ceph servers.
> > 
> > We are using Ubuntu 22.04.
> > 
> > -- Michael
> > From: Daniel Brown 
> > Sent: Wednesday, June 12, 2024 9:18 AM
> > To: Anthony D'Atri 
> > Cc: Michael Worsham ; ceph-users@ceph.io 
> > 
> > Subject: Re: [ceph-users] Patching Ceph cluster
> >  This is an external email. Please take care when clicking links or opening 
> > attachments. When in doubt, check with the Help Desk or Security.
> > 
> > 
> > There’s also a Maintenance mode that you can set for each server, as you’re 
> > doing updates, so that the cluster doesn’t try to move data from affected 
> > OSD’s, while the server being updated is offline or down. I’ve worked some 
> > on automating this with Ansible, but have found my process (and/or my 
> > cluster) still requires some manual intervention while it’s running to get 
> > things done cleanly.
> > 
> > 
> > 
> > > On Jun 12, 2024, at 8:49 AM, Anthony D'Atri  
> > > wrote:
> > >
> > > Do you mean patching the OS?
> > >
> > > If so, easy -- one node at a time, then after it comes back up, wait 
> > > until all PGs are active+clean and the mon quorum is complete before 
> > > proceeding.
> > >
> > >
> > >
> > >> On Jun 12, 2024, at 07:56, Michael Worsham  
> > >> wrote:
> > >>
> > >> What is the proper way to patch a Ceph cluster and reboot the servers in 
> > >> said cluster if a reboot is necessary for said updates? And is it 
> > >> possible to automate it via Ansible? This message and its attachments 
> > >> are from Data Dimensions and are intended only for the use of the 
> > >> individual or entity to which it is addressed, and may contain 
> > >> information that is privileged, confidential, and exempt from disclosure 
> > >> under applicable law. If the reader of this message is not the intended 
> > >> recipient, or the employee or agent responsible for delivering the 
> > >> message to the intended recipient, you are hereby notified that any 
> > >> dis

[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread sinan

Disabling Nagle didn't have any effect.
I created a new RGW pool (data, index), both on flash disks. No effect.
I set the size=2, no effect.

Btw, cluster is running on Octopus (15.2).

When using 3 MB/s objects, I am still getting 150 objects/s. Just a 
higher throughput (150x3MB = 450MB/s). But the objects/s doesn't 
increase. Its like, a Ceph configuration is limiting it or something.


On 2024-06-13 21:37, Anthony D'Atri wrote:

There you go.

Tiny objects are the hardest thing for any object storage service:
you can have space amplification and metadata operations become a very
high portion of the overall workload.

With 500KB objects, you may waste a significant fraction of underlying
space -- especially if you have large-IU QLC OSDs, or OSDs made with
an older Ceph release where the min_alloc_size was 64KB vs the current
4KB.  This is exacerbated by EC if you're using it, as many do for
buckets pools.

Bluestore Space Amplification Cheat Sheet [1]
docs.google.com [1]

Things to do:  Disable Nagle
https://docs.ceph.com/en/quincy/radosgw/frontends/

Putting your index pool on as many SSDs as you can would also help, I
don't recall if it's on HDD now.   Index doesn't use all that much
data, but benefits from a generous pg_num and multiple OSDs so that it
isn't bottlenecked.


On Jun 13, 2024, at 15:13, Sinan Polat  wrote:

500K object size

Op 13 jun 2024 om 21:11 heeft Anthony D'Atri 
het volgende geschreven:

How large are the objects you tested with?

On Jun 13, 2024, at 14:46, si...@turka.nl wrote:

I have doing some further testing.

My RGW pool is placed on spinning disks.
I created a 2nd RGW data pool, placed on flash disks.

Benchmarking on HDD pool:
Client 1 -> 1 RGW Node: 150 obj/s
Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client)
Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s
Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each
client)

I did the same tests towards the RGW pool on flash disks: same
results

So, it doesn't matter if my pool is hosted on HDD or SSD.
It doesn't matter if I am using 1 RGW or 3 RGW nodes.
It doesn't matter if I am using 1 client or 5 clients.

I am constantly limited at around 140-160 objects/s.

I see some TCP Retransmissions on the RGW Node, but maybe thats
'normal'.

Any ideas/suggestions?

On 2024-06-11 22:08, Anthony D'Atri wrote:
I am not sure adding more RGW's will increase the performance.
That was a tangent.
To be clear, that means whatever.rgw.buckets.index ?
No, sorry my bad. .index is 32 and .data is 256.
Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG
replicas on each OSD?  You want (IMHO) to end up with 100-200,
keeping each pool's pg_num to a power of 2 ideally.

 No, my RBD pool is larger. My average PG per OSD is round 60-70.
Ah.  Aim for 100-200 with spinners.


Assuming all your pools span all OSDs, I suggest at a minimum 256
for .index and 8192 for .data, assuming you have only RGW pools.
And would be included to try 512 / 8192.  Assuming your  other
minor pools are at 32, I'd bump .log and .non-ec to 128 or 256 as
well.
If you have RBD or other pools colocated, those numbers would
change.
^ above assume disabling the autoscaler

I bumped my .data pool from 256 to 1024 and .index from 32 to 128.

 Your index pool still only benefits from half of your OSDs with a
value of 128.


Also doubled the .non-e and .log pools. Performance wise I don't see
any improvement. If I would see 10-20% improvement, I definitely
would increase it to 512 / 8192.
With 0.5MB object size I am still limited at about 150 up to 250
objects/s.
The disks aren't saturated. The wr await is mostly around 1ms and
does not get higher when benchmarking with S3.

 Trust iostat about as far as you can throw it.


Other suggestions, or does anyone else has suggestions?

 ___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



Links:
--
[1] 
https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?gid=358760253#gid=358760253

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread Anthony D'Atri
There you go.

Tiny objects are the hardest thing for any object storage service:  you can 
have space amplification and metadata operations become a very high portion of 
the overall workload.

With 500KB objects, you may waste a significant fraction of underlying space -- 
especially if you have large-IU QLC OSDs, or OSDs made with an older Ceph 
release where the min_alloc_size was 64KB vs the current 4KB.  This is 
exacerbated by EC if you're using it, as many do for buckets pools.

https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?gid=358760253#gid=358760253
Bluestore Space Amplification Cheat Sheet
docs.google.com


Things to do:  Disable Nagle  https://docs.ceph.com/en/quincy/radosgw/frontends/

Putting your index pool on as many SSDs as you can would also help, I don't 
recall if it's on HDD now.   Index doesn't use all that much data, but benefits 
from a generous pg_num and multiple OSDs so that it isn't bottlenecked.


> On Jun 13, 2024, at 15:13, Sinan Polat  wrote:
> 
> 500K object size
> 
>> Op 13 jun 2024 om 21:11 heeft Anthony D'Atri  het 
>> volgende geschreven:
>> 
>> How large are the objects you tested with?  
>> 
>>> On Jun 13, 2024, at 14:46, si...@turka.nl wrote:
>>> 
>>> I have doing some further testing.
>>> 
>>> My RGW pool is placed on spinning disks.
>>> I created a 2nd RGW data pool, placed on flash disks.
>>> 
>>> Benchmarking on HDD pool:
>>> Client 1 -> 1 RGW Node: 150 obj/s
>>> Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client)
>>> Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s
>>> Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each client)
>>> 
>>> I did the same tests towards the RGW pool on flash disks: same results
>>> 
>>> So, it doesn't matter if my pool is hosted on HDD or SSD.
>>> It doesn't matter if I am using 1 RGW or 3 RGW nodes.
>>> It doesn't matter if I am using 1 client or 5 clients.
>>> 
>>> I am constantly limited at around 140-160 objects/s.
>>> 
>>> I see some TCP Retransmissions on the RGW Node, but maybe thats 'normal'.
>>> 
>>> Any ideas/suggestions?
>>> 
>>> On 2024-06-11 22:08, Anthony D'Atri wrote:
> I am not sure adding more RGW's will increase the performance.
 That was a tangent.
> To be clear, that means whatever.rgw.buckets.index ?
>>> No, sorry my bad. .index is 32 and .data is 256.
>> Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG 
>> replicas on each OSD?  You want (IMHO) to end up with 100-200, keeping 
>> each pool's pg_num to a power of 2 ideally.
> No, my RBD pool is larger. My average PG per OSD is round 60-70.
 Ah.  Aim for 100-200 with spinners.
>> Assuming all your pools span all OSDs, I suggest at a minimum 256 for 
>> .index and 8192 for .data, assuming you have only RGW pools.  And would 
>> be included to try 512 / 8192.  Assuming your  other minor pools are at 
>> 32, I'd bump .log and .non-ec to 128 or 256 as well.
>> If you have RBD or other pools colocated, those numbers would change.
>> ^ above assume disabling the autoscaler
> I bumped my .data pool from 256 to 1024 and .index from 32 to 128.
 Your index pool still only benefits from half of your OSDs with a value of 
 128.
> Also doubled the .non-e and .log pools. Performance wise I don't see any 
> improvement. If I would see 10-20% improvement, I definitely would 
> increase it to 512 / 8192.
> With 0.5MB object size I am still limited at about 150 up to 250 
> objects/s.
> The disks aren't saturated. The wr await is mostly around 1ms and does 
> not get higher when benchmarking with S3.
 Trust iostat about as far as you can throw it.
> Other suggestions, or does anyone else has suggestions?
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Can't comment on my own tracker item any more

2024-06-13 Thread Frank Schilder
Hi all,

I just received a notification about a bug I reported 4 years ago 
(https://tracker.ceph.com/issues/45253):

> Issue #45253 has been updated by Victoria Mackie.

I would like to leave a comment, but the comment function seems not available 
any more even through I'm logged in and I'm reported as the author.

I can still edit the item itself, but I'm not able to leave comments.

Can someone please look into that?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't comment on my own tracker item any more

2024-06-13 Thread Frank Schilder
OK, I can click on the little "quote" symbol and then a huge dialog opens that 
says "edit" but means "comment". Would it be possible to add the simple comment 
action again? Also, that the quote action removes nested text makes it a little 
bit less useful than it could be. I had to copy the code example back by hand.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Thursday, June 13, 2024 11:40 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Can't comment on my own tracker item any more

Hi all,

I just received a notification about a bug I reported 4 years ago 
(https://tracker.ceph.com/issues/45253):

> Issue #45253 has been updated by Victoria Mackie.

I would like to leave a comment, but the comment function seems not available 
any more even through I'm logged in and I'm reported as the author.

I can still edit the item itself, but I'm not able to leave comments.

Can someone please look into that?

Thanks!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance issues RGW (S3)

2024-06-13 Thread Sinan Polat
500K object size

> Op 13 jun 2024 om 21:11 heeft Anthony D'Atri  het 
> volgende geschreven:
> 
> How large are the objects you tested with?  
> 
>> On Jun 13, 2024, at 14:46, si...@turka.nl wrote:
>> 
>> I have doing some further testing.
>> 
>> My RGW pool is placed on spinning disks.
>> I created a 2nd RGW data pool, placed on flash disks.
>> 
>> Benchmarking on HDD pool:
>> Client 1 -> 1 RGW Node: 150 obj/s
>> Client 1-5 -> 1 RGW Node: 150 ob/s (30 obj/s each client)
>> Client 1 -> HAProxy -> 3 RGW Nodes: 150 obj/s
>> Client 1-5 -> HAProxy -> 3 RGW Nodes: 150 obj/s (30 obj/s each client)
>> 
>> I did the same tests towards the RGW pool on flash disks: same results
>> 
>> So, it doesn't matter if my pool is hosted on HDD or SSD.
>> It doesn't matter if I am using 1 RGW or 3 RGW nodes.
>> It doesn't matter if I am using 1 client or 5 clients.
>> 
>> I am constantly limited at around 140-160 objects/s.
>> 
>> I see some TCP Retransmissions on the RGW Node, but maybe thats 'normal'.
>> 
>> Any ideas/suggestions?
>> 
>> On 2024-06-11 22:08, Anthony D'Atri wrote:
 I am not sure adding more RGW's will increase the performance.
>>> That was a tangent.
 To be clear, that means whatever.rgw.buckets.index ?
>> No, sorry my bad. .index is 32 and .data is 256.
> Oh, yeah. Does `ceph osd df` show you at the far right like 4-5 PG 
> replicas on each OSD?  You want (IMHO) to end up with 100-200, keeping 
> each pool's pg_num to a power of 2 ideally.
 No, my RBD pool is larger. My average PG per OSD is round 60-70.
>>> Ah.  Aim for 100-200 with spinners.
> Assuming all your pools span all OSDs, I suggest at a minimum 256 for 
> .index and 8192 for .data, assuming you have only RGW pools.  And would 
> be included to try 512 / 8192.  Assuming your  other minor pools are at 
> 32, I'd bump .log and .non-ec to 128 or 256 as well.
> If you have RBD or other pools colocated, those numbers would change.
> ^ above assume disabling the autoscaler
 I bumped my .data pool from 256 to 1024 and .index from 32 to 128.
>>> Your index pool still only benefits from half of your OSDs with a value of 
>>> 128.
 Also doubled the .non-e and .log pools. Performance wise I don't see any 
 improvement. If I would see 10-20% improvement, I definitely would 
 increase it to 512 / 8192.
 With 0.5MB object size I am still limited at about 150 up to 250 objects/s.
 The disks aren't saturated. The wr await is mostly around 1ms and does not 
 get higher when benchmarking with S3.
>>> Trust iostat about as far as you can throw it.
 Other suggestions, or does anyone else has suggestions?
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: deep scrubb and scrubb does get the job done

2024-06-13 Thread Frank Schilder
Yes, there is: 
https://github.com/frans42/ceph-goodies/blob/main/doc/TuningScrub.md

This is work in progress and a few details are missing, but it should help you 
find the right parameters. Note that this is tested on octopus with WPQ.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Manuel Oetiker 
Sent: Thursday, June 13, 2024 4:37 PM
To: ceph-users@ceph.io
Subject: [ceph-users] deep scrubb and scrubb does get the job done

Hi

our cluster is on warning for more than two weeks we had to move some pools 
form ssd to hdd
and it looked good ... but somehow the pgs scrubb does not get done with his 
jobs

* PG_NOT_DEEP_SCRUBBED : 171 pgs not deep-scrubbed in time
* PG_NOT_SCRUBBED : 132 pgs not scrubbed in time

Till the move the cluster was happy without any warnings...

There is no have load on the cluster I don't see why the cluster get not done 
with that...
is there a way to find out why ...

Thanks for any hint
Manuel


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Are ceph commands backward compatible?

2024-06-13 Thread Satoru Takeuchi
Hi,

I'm developing some tools that execute ceph commands like rbd. During
development,
I have come to wonder about compatibility of ceph commands.

I'd like to use ceph commands which version is >= the version used by
ceph daemons.
It results in executing new ceph commands against ceph clusters using
old versions.
So I'd like to know my tools are expected to work fine. Coud someone tell me
whether ceph commands are backword  compatible?

The officlal document didn't have the information I needed.

Thanks,
Satoru
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Patching Ceph cluster

2024-06-13 Thread Michael Worsham
I'd love to see what your playbook(s) looks like for doing this.

-- Michael

From: Sake Ceph 
Sent: Thursday, June 13, 2024 4:05 PM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: Patching Ceph cluster

This is an external email. Please take care when clicking links or opening 
attachments. When in doubt, check with the Help Desk or Security.


Yeah we fully automated this with Ansible. In short we do the following.

1. Check if cluster is healthy before continuing (via REST-API) only health_ok 
is good
2. Disable scrub and deep-scrub
3. Update all applications on all the hosts in the cluster
4. For every host, one by one, do the following:
4a. Check if applications got updated
4b. Check via reboot-hint if a reboot is necessary
4c. If applications got updated or reboot is necessary, do the following :
4c1. Put host in maintenance
4c2. Reboot host if necessary
4c3. Check and wait via 'ceph orch host ls' if status of the host is maintance 
and nothing else
4c4. Get host out of maintenance
4d. Check if cluster is healthy before continuing (via Rest-API) only warning 
about scrub and deep-scrub is allowed, but no pg's should be degraded
5. Enable scrub and deep-scrub when all hosts are done
6. Check if cluster is healthy (via Rest-API) only health_ok is good
7. Done

For upgrade the OS we have something similar, but exiting maintenance mode is 
broken (with 17.2.7) :(
I need to check the tracker for similar issues and if I can't find anything, I 
will create a ticket.

Kind regards,
Sake

> Op 12-06-2024 19:02 CEST schreef Daniel Brown :
>
>
> I have two ansible roles, one for enter, one for exit. There’s likely better 
> ways to do this — and I’ll not be surprised if someone here lets me know. 
> They’re using orch commands via the cephadm shell. I’m using Ansible for 
> other configuration management in my environment, as well, including setting 
> up clients of the ceph cluster.
>
>
> Below excerpts from main.yml in the “tasks” for the enter/exit roles. The 
> host I’m running ansible from is one of my CEPH servers - I’ve limited which 
> process run there though so it’s in the cluster but not equal to the others.
>
>
> —
> Enter
> —
>
> - name: Ceph Maintenance Mode Enter
>   shell:
>
> cmd: ' cephadm shell ceph orch host maintenance enter {{ 
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} 
> --force --yes-i-really-mean-it ‘
>   become: True
>
>
>
> —
> Exit
> —
>
>
> - name: Ceph Maintenance Mode Exit
>   shell:
> cmd: 'cephadm shell ceph orch host maintenance exit {{ 
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘
>   become: True
>   connection: local
>
>
> - name: Wait for Ceph to be available
>   ansible.builtin.wait_for:
> delay: 60
> host: '{{ 
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’
> port: 9100
>   connection: local
>
>
>
>
>
>
> > On Jun 12, 2024, at 11:28 AM, Michael Worsham  
> > wrote:
> >
> > Interesting. How do you set this "maintenance mode"? If you have a series 
> > of documented steps that you have to do and could provide as an example, 
> > that would be beneficial for my efforts.
> >
> > We are in the process of standing up both a dev-test environment consisting 
> > of 3 Ceph servers (strictly for testing purposes) and a new production 
> > environment consisting of 20+ Ceph servers.
> >
> > We are using Ubuntu 22.04.
> >
> > -- Michael
> > From: Daniel Brown 
> > Sent: Wednesday, June 12, 2024 9:18 AM
> > To: Anthony D'Atri 
> > Cc: Michael Worsham ; ceph-users@ceph.io 
> > 
> > Subject: Re: [ceph-users] Patching Ceph cluster
> >  This is an external email. Please take care when clicking links or opening 
> > attachments. When in doubt, check with the Help Desk or Security.
> >
> >
> > There’s also a Maintenance mode that you can set for each server, as you’re 
> > doing updates, so that the cluster doesn’t try to move data from affected 
> > OSD’s, while the server being updated is offline or down. I’ve worked some 
> > on automating this with Ansible, but have found my process (and/or my 
> > cluster) still requires some manual intervention while it’s running to get 
> > things done cleanly.
> >
> >
> >
> > > On Jun 12, 2024, at 8:49 AM, Anthony D'Atri  
> > > wrote:
> > >
> > > Do you mean patching the OS?
> > >
> > > If so, easy -- one node at a time, then after it comes back up, wait 
> > > until all PGs are active+clean and the mon quorum is complete before 
> > > proceeding.
> > >
> > >
> > >
> > >> On Jun 12, 2024, at 07:56, Michael Worsham  
> > >> wrote:
> > >>
> > >> What is the proper way to patch a Ceph cluster and reboot the servers in 
> > >> said cluster if a reboot is necessary for said updates? And is it 
> > >> possible to automate it via Ansible? This message and its attachments 
> > >> are from Data Dimensions and are intended only for the use of the 
> > >> individual or entity to which it is addressed,

[ceph-users] Separated multisite sync and user traffic, doable?

2024-06-13 Thread Szabo, Istvan (Agoda)
Hi,

Could that cause any issue if the endpoints defined in the zonegroups are not 
in the endpoint list behind haproxy?
The question is mainly about the role of the endpoint servers in the zonegroup 
list. Their role is the sync only or something else also?

This would be the scenario, could it work?

  *

  *
I have 3 mon/mgr server and 15 OSD
  *
RGWs on the mon/mgr would be in the zonegroup definition like this

  "zones": [
{
  "id": "61c9sdf40-fdsd-4sdd-9rty9-ed56jda41817",
  "name": "dc",
  "endpoints": [
"http://mon1:8080";,
"http://mon2:8080";,
"http://mon3:8080";
  ],


  *   However for user traffic I'd use an haproxy endpoint with the 15 OSD node 
rgws (each osd node 1x).

Ty


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io