[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-28 Thread Sven Kieske
On Do, 2022-07-28 at 07:50 +1000, Brad Hubbard wrote:
> The primary cause of the issues with ca is that octopus was pinned to
> the stable_6.0 branch of ca for octopus should be using stable_5.0
> according to https://docs.ceph.com/projects/ceph-ansible/en/latest/#releases
> 
> I don't believe this should hold up the release.

Could you maybe shed some more light what's the exact issue here?

Is there a tracker for that?

I'm currently testing octopus deployment and octopus upgrade
via ceph-ansible and I'm hitting some bugs but I didn't find any documented 
upstream issues so far.

are you saying I need to use stable-6.0 branch to deploy octopus?

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske
Systementwickler / systems engineer
 
 
Mittwald CM Service GmbH & Co. KG
Königsberger Straße 4-6
32339 Espelkamp
 
Tel.: 05772 / 293-900
Fax: 05772 / 293-333
 
https://www.mittwald.de
 
Geschäftsführer: Robert Meyer, Florian Jürgens
 
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit 
gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG does not become active

2022-07-28 Thread Jesper Lykkegaard Karlsen
Hi Frank, 

I think you need at least 6 OSD hosts to make EC 4+2 with faillure domain host. 

I do not know how it was possible for you to create that configuration at 
first? 
Could it be that you have multiple name for the OSD hosts? 
That would at least explain the one OSD down, being show as two OSDs down. 

Also, I believe that min_size should never be smaller than “coding” shards, 
which is 4 in this case. 

You can either make a new test setup with your three test OSD hosts using EC 
2+1 or make e.g. 4+2, but with failure domain set to OSD. 

Best, 
Jesper
  
--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 27 Jul 2022, at 17.32, Frank Schilder  wrote:
> 
> Update: the inactive PG got recovered and active after a lnngg wait. The 
> middle question is now answered. However, these two questions are still of 
> great worry:
> 
> - How can 2 OSDs be missing if only 1 OSD is down?
> - If the PG should recover, why is it not prioritised considering its severe 
> degradation
>  compared with all other PGs?
> 
> I don't understand how a PG can loose 2 shards if 1 OSD goes down. That looks 
> really really bad to me (did ceph loose track of data??).
> 
> The second is of no less importance. The inactive PG was holding back client 
> IO, leading to further warnings about slow OPS/requests/... Why are such 
> critically degraded PGs not scheduled for recovery first? There is a service 
> outage but only a health warning?
> 
> Thanks and best regards.
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Frank Schilder 
> Sent: 27 July 2022 17:19:05
> To: ceph-users@ceph.io
> Subject: [ceph-users] PG does not become active
> 
> I'm testing octopus 15.2.16 and run into a problem right away. I'm filling up 
> a small test cluster with 3 hosts 3x3 OSDs and killed one OSD to see how 
> recovery works. I have one 4+2 EC pool with failure domain host and on 1 PGs 
> of this pool 2 (!!!) shards are missing. This most degraded PG is not 
> becoming active, its stuck inactive but peered.
> 
> Questions:
> 
> - How can 2 OSDs be missing if only 1 OSD is down?
> - Wasn't there an important code change to allow recovery for an EC PG with at
>  least k shards present even if min_size>k? Do I have to set something?
> - If the PG should recover, why is it not prioritised considering its severe 
> degradation
>  compared with all other PGs?
> 
> I have already increased these crush tunables and executed a pg repeer to no 
> avail:
> 
> tunable choose_total_tries 250 <-- default 100
> rule fs-data {
>id 1
>type erasure
>min_size 3
>max_size 6
>step set_chooseleaf_tries 50 <-- default 5
>step set_choose_tries 200 <-- default 100
>step take default
>step choose indep 0 type osd
>step emit
> }
> 
> Ceph health detail says to that:
> 
> [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
>pg 4.32 is stuck inactive for 37m, current state 
> recovery_wait+undersized+degraded+remapped+peered, last acting 
> [1,2147483647,2147483647,4,5,2]
> 
> I don't want to cheat and set min_size=k on this pool. It should work by 
> itself.
> 
> Thanks for any pointers!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG does not become active

2022-07-28 Thread Jesper Lykkegaard Karlsen
Ah I see, should have look at the “raw” data instead ;-)

Then I agree this very weird?

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 28 Jul 2022, at 12.45, Frank Schilder  wrote:
> 
> Hi Jesper,
> 
> thanks for looking at this. The failure domain is OSD and not host. I typed 
> it wrong in the text, the copy of the crush rule shows it right: step choose 
> indep 0 type osd.
> 
> I'm trying to reproduce the observation to file a tracker item, but it is 
> more difficult than expected. It might be a race condition, so far I didn't 
> see it again. I hope I can figure out when and why this is happening.
> 
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Jesper Lykkegaard Karlsen 
> Sent: 28 July 2022 12:02:51
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] PG does not become active
> 
> Hi Frank,
> 
> I think you need at least 6 OSD hosts to make EC 4+2 with faillure domain 
> host.
> 
> I do not know how it was possible for you to create that configuration at 
> first?
> Could it be that you have multiple name for the OSD hosts?
> That would at least explain the one OSD down, being show as two OSDs down.
> 
> Also, I believe that min_size should never be smaller than “coding” shards, 
> which is 4 in this case.
> 
> You can either make a new test setup with your three test OSD hosts using EC 
> 2+1 or make e.g. 4+2, but with failure domain set to OSD.
> 
> Best,
> Jesper
> 
> --
> Jesper Lykkegaard Karlsen
> Scientific Computing
> Centre for Structural Biology
> Department of Molecular Biology and Genetics
> Aarhus University
> Universitetsbyen 81
> 8000 Aarhus C
> 
> E-mail: je...@mbg.au.dk
> Tlf:+45 50906203
> 
>> On 27 Jul 2022, at 17.32, Frank Schilder  wrote:
>> 
>> Update: the inactive PG got recovered and active after a lnngg wait. The 
>> middle question is now answered. However, these two questions are still of 
>> great worry:
>> 
>> - How can 2 OSDs be missing if only 1 OSD is down?
>> - If the PG should recover, why is it not prioritised considering its severe 
>> degradation
>> compared with all other PGs?
>> 
>> I don't understand how a PG can loose 2 shards if 1 OSD goes down. That 
>> looks really really bad to me (did ceph loose track of data??).
>> 
>> The second is of no less importance. The inactive PG was holding back client 
>> IO, leading to further warnings about slow OPS/requests/... Why are such 
>> critically degraded PGs not scheduled for recovery first? There is a service 
>> outage but only a health warning?
>> 
>> Thanks and best regards.
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> 
>> 
>> From: Frank Schilder 
>> Sent: 27 July 2022 17:19:05
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] PG does not become active
>> 
>> I'm testing octopus 15.2.16 and run into a problem right away. I'm filling 
>> up a small test cluster with 3 hosts 3x3 OSDs and killed one OSD to see how 
>> recovery works. I have one 4+2 EC pool with failure domain host and on 1 PGs 
>> of this pool 2 (!!!) shards are missing. This most degraded PG is not 
>> becoming active, its stuck inactive but peered.
>> 
>> Questions:
>> 
>> - How can 2 OSDs be missing if only 1 OSD is down?
>> - Wasn't there an important code change to allow recovery for an EC PG with 
>> at
>> least k shards present even if min_size>k? Do I have to set something?
>> - If the PG should recover, why is it not prioritised considering its severe 
>> degradation
>> compared with all other PGs?
>> 
>> I have already increased these crush tunables and executed a pg repeer to no 
>> avail:
>> 
>> tunable choose_total_tries 250 <-- default 100
>> rule fs-data {
>>   id 1
>>   type erasure
>>   min_size 3
>>   max_size 6
>>   step set_chooseleaf_tries 50 <-- default 5
>>   step set_choose_tries 200 <-- default 100
>>   step take default
>>   step choose indep 0 type osd
>>   step emit
>> }
>> 
>> Ceph health detail says to that:
>> 
>> [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
>>   pg 4.32 is stuck inactive for 37m, current state 
>> recovery_wait+undersized+degraded+remapped+peered, last acting 
>> [1,2147483647,2147483647,4,5,2]
>> 
>> I don't want to cheat and set min_size=k on this pool. It should work by 
>> itself.
>> 
>> Thanks for any pointers!
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> _

[ceph-users] Cluster running without monitors

2022-07-28 Thread Johannes Liebl
Hi Ceph Users,


I am currently evaluating different cluster layouts and as a test I stopped two 
of my three monitors while client traffic was running on the nodes.?


Only when I restartet an OSD all PGs which were related to that OSD went down, 
but the rest were still active and serving requests.


A second try ran for 5:30 Hours without a hitch after which I aborted the Test 
since nothing was happening.


Now I want to know; Is this behavior by design?

It strikes me as odd that this more or less undefined state is still 
operational.


Thanks


Johannes Liebl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph pool size and OSD data distribution

2022-07-28 Thread Roland Giesler
I have a 7 node cluster which is complaining that:
root@s1:~# ceph -s

  cluster:
id: a6092407-216f-41ff-bccb-9bed78587ac3
health: HEALTH_WARN
1 nearfull osd(s)
4 pool(s) nearfull

  services:
mon: 3 daemons, quorum sm1,2,s5
mgr: s1(active), standbys: s5, sm1
mds: cephfs-1/1/1 up  {0=s1=up:active}, 2 up:standby
osd: 23 osds: 23 up, 23 in

  data:
pools:   4 pools, 1312 pgs
objects: 1.26M objects, 4.64TiB
usage:   11.4TiB used, 8.48TiB / 19.8TiB avail
pgs: 1312 active+clean

  io:
client:   542KiB/s wr, 0op/s rd, 32op/s wr


I see that the distribution of data over the OSD is very uneven.
Particularly on host S1 there are 6 SAS 300GB SAS drives that are identical
in spec, yet one is more than 89% in use, while another is on jus tover 40%
in use. What causes this?

root@s1:~# ceph osd df tree

ID  CLASS WEIGHT   REWEIGHT SIZEUSE DATAOMAPMETA
AVAIL   %USE  VAR  PGS TYPE NAME
 -1   19.82628- 19.8TiB 11.4TiB 11.3TiB 2.18GiB 32.9GiB
8.48TiB 57.25 1.00   - root default
 -26.36676- 6.37TiB 3.04TiB 3.03TiB  653MiB 11.3GiB
3.33TiB 47.76 0.83   - host hp1
  3   hdd  0.90959  1.0  931GiB  422GiB  420GiB 84.6MiB 2.21GiB
509GiB 45.35 0.79 143 osd.3
  4   hdd  0.68210  1.0  699GiB  265GiB  264GiB 66.7MiB  957MiB
433GiB 37.95 0.66  94 osd.4
  6   hdd  0.68210  1.0  699GiB  308GiB  307GiB 64.7MiB  988MiB
390GiB 44.15 0.77  99 osd.6
  7   hdd  0.68210  1.0  699GiB  346GiB  345GiB 74.4MiB  988MiB
353GiB 49.51 0.86 109 osd.7
 16   hdd  0.90959  1.0  931GiB  461GiB  460GiB  103MiB 1.13GiB
470GiB 49.51 0.86 145 osd.16
 19   hdd  0.90959  1.0  931GiB  516GiB  514GiB 96.2MiB 2.06GiB
415GiB 55.40 0.97 140 osd.19
 22   hdd  0.68210  1.0  699GiB  290GiB  288GiB 68.9MiB 1.91GiB
408GiB 41.55 0.73  98 osd.22
 24   hdd  0.90959  1.0  931GiB  505GiB  504GiB 94.8MiB 1.17GiB
426GiB 54.21 0.95 150 osd.24
 -31.63440- 1.63TiB 1.07TiB 1.06TiB  236MiB 5.77GiB
582GiB 65.22 1.14   - host s1
 10   hdd  0.27240  1.0  279GiB  152GiB  151GiB 19.9MiB 1004MiB
127GiB 54.35 0.95  44 osd.10
 11   hdd  0.27240  1.0  279GiB  114GiB  113GiB 43.3MiB  981MiB
165GiB 40.91 0.71  63 osd.11
 12   hdd  0.27240  1.0  279GiB  180GiB  179GiB 41.4MiB  983MiB
98.6GiB 64.66 1.13  58 osd.12
 13   hdd  0.27240  1.0  279GiB  190GiB  189GiB 33.8MiB  990MiB
89.4GiB 67.96 1.19  52 osd.13
 14   hdd  0.27240  1.0  279GiB  249GiB  248GiB 48.6MiB  975MiB
30.0GiB 89.26 1.56  67 osd.14
 15   hdd  0.27240  1.0  279GiB  207GiB  206GiB 49.2MiB  975MiB
72.0GiB 74.17 1.30  60 osd.15
 -42.72888- 2.73TiB 1.71TiB 1.70TiB  279MiB 4.47GiB
1.02TiB 62.64 1.09   - host s2
  9   hdd  1.81929  1.0 1.82TiB 1.15TiB 1.15TiB  196MiB 2.35GiB
685GiB 63.21 1.10 390 osd.9
 17   hdd  0.90959  1.0  931GiB  573GiB  571GiB 83.3MiB 2.12GiB
359GiB 61.50 1.07 181 osd.17
 -61.81929- 1.82TiB 1.24TiB 1.24TiB  203MiB 2.34GiB
594GiB 68.12 1.19   - host s4
 18   hdd  1.81929  1.0 1.82TiB 1.24TiB 1.24TiB  203MiB 2.34GiB
594GiB 68.12 1.19 407 osd.18
 -72.72888- 2.73TiB 1.73TiB 1.72TiB  341MiB 3.48GiB
1.00TiB 63.25 1.10   - host s5
  2   hdd  1.81929  1.0 1.82TiB 1.09TiB 1.09TiB  203MiB 2.06GiB
747GiB 59.89 1.05 368 osd.2
 20   hdd  0.90959  1.0  931GiB  652GiB  650GiB  138MiB 1.42GiB
280GiB 69.96 1.22 215 osd.20
-152.72888- 2.73TiB 1.41TiB 1.41TiB  307MiB 2.98GiB
1.32TiB 51.76 0.90   - host s6
  0   hdd  1.81929  1.0 1.82TiB  923GiB  921GiB  182MiB 1.81GiB
940GiB 49.56 0.87 358 osd.0
  1   hdd  0.90959  1.0  931GiB  523GiB  522GiB  125MiB 1.18GiB
408GiB 56.18 0.98 187 osd.1
 -51.81918- 1.82TiB 1.16TiB 1.15TiB  211MiB 2.56GiB
679GiB 63.56 1.11   - host sm1
  5   hdd  0.90959  1.0  931GiB  558GiB  557GiB  116MiB 1.23GiB
373GiB 59.94 1.05 182 osd.5
  8   hdd  0.90959  1.0  931GiB  626GiB  624GiB 95.5MiB 1.33GiB
306GiB 67.18 1.17 198 osd.8
  TOTAL 19.8TiB 11.4TiB 11.3TiB 2.18GiB 32.9GiB
8.48TiB 57.25
MIN/MAX VAR: 0.66/1.56  STDDEV: 12.03


How does this work?

thanks

Roland
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm automatic sizing of WAL/DB on SSD

2022-07-28 Thread Calhoun, Patrick
Hi,

I'd like to understand if the following behaviour is a bug.
I'm running ceph 16.2.9.

In a new OSD node with 24 hdd (16 TB each) and 2 ssd (1.44 TB each), I'd like 
to have "ceph orch" allocate WAL and DB on the ssd devices.

I use the following service spec:
spec:
  data_devices:
rotational: 1
size: '14T:'
  db_devices:
rotational: 0
size: '1T:'
  db_slots: 12

This results in each OSD having a 60GB volume for WAL/DB, which equates to 50% 
total usage in the VG on each ssd, and 50% free.
I honestly don't know what size to expect, but exactly 50% of capacity makes me 
suspect this is due to a bug:
https://tracker.ceph.com/issues/54541
(In fact, I had run into this bug when specifying block_db_size rather than 
db_slots)

Questions:
  Am I being bit by that bug?
  Is there a better approach, in general, to my situation?
  Are DB sizes still governed by the rocksdb tiering? (I thought that this was 
mostly resolved by https://github.com/ceph/ceph/pull/29687 )
  If I provision a DB/WAL logical volume size to 61GB, is that effectively a 
30GB database, and 30GB of extra room for compaction?

Thanks,
Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster running without monitors

2022-07-28 Thread Gregory Farnum
On Thu, Jul 28, 2022 at 5:32 AM Johannes Liebl  wrote:
>
> Hi Ceph Users,
>
>
> I am currently evaluating different cluster layouts and as a test I stopped 
> two of my three monitors while client traffic was running on the nodes.?
>
>
> Only when I restartet an OSD all PGs which were related to that OSD went 
> down, but the rest were still active and serving requests.
>
>
> A second try ran for 5:30 Hours without a hitch after which I aborted the 
> Test since nothing was happening.
>
>
> Now I want to know; Is this behavior by design?
>
> It strikes me as odd that this more or less undefined state is still 
> operational.

Yep, it's on purpose! I would not count on this behavior because a lot
of routine operations can disturb it[1], but Ceph does its best to
continue operating as it can by not relying on the other daemons
whenever possible.

Monitors are required for updates to the cluster maps, but as long as
the cluster is stable and no new maps need to be generated, things
will keep operating until something requires an update and that gets
blocked. As you saw, when an OSD got restarted, that changed the
cluster state and required updates which couldn't get processed, so
the affected PGs couldn't go active.
-Greg
[1]: RBD snapshots go through the monitors; MDSes send beacons to the
monitors and will shut down if those don't get acknowledged so I don't
think CephFS will keep running in this case; CephX does key rotations
which will eventually block access to the OSDs as keys time out; any
kind of PG peering or recovery needs the monitors to update values;
etc.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Stretch Cluster - df pool size (Max Avail)

2022-07-28 Thread Nicolas FONTAINE

Hello,

We have exactly the same problem. Did you find an answer or should we 
open a bug report?


Sincerely,

Nicolas.

Le 23/06/2022 à 11:42, Kilian Ries a écrit :

Hi Joachim,


yes i assigned the stretch rule to the pool (4x replica / 2x min). The rule 
says that two replicas should be in every datacenter.


$ ceph osd tree
ID   CLASS  WEIGHTTYPE NAME   STATUS  REWEIGHT  PRI-AFF
  -1 62.87799  root default
-17 31.43900  datacenter site1
-15 31.43900  rack b7
  -3 10.48000  host host01
   0ssd   1.74699  osd.0   up   1.0  1.0
   1ssd   1.74699  osd.1   up   1.0  1.0
   2ssd   1.74699  osd.2   up   1.0  1.0
   3ssd   1.74699  osd.3   up   1.0  1.0
   4ssd   1.74699  osd.4   up   1.0  1.0
   5ssd   1.74699  osd.5   up   1.0  1.0
  -5 10.48000  host host02
   6ssd   1.74699  osd.6   up   1.0  1.0
   7ssd   1.74699  osd.7   up   1.0  1.0
   8ssd   1.74699  osd.8   up   1.0  1.0
   9ssd   1.74699  osd.9   up   1.0  1.0
  10ssd   1.74699  osd.10  up   1.0  1.0
  11ssd   1.74699  osd.11  up   1.0  1.0
  -7 10.48000  host host03
  12ssd   1.74699  osd.12  up   1.0  1.0
  13ssd   1.74699  osd.13  up   1.0  1.0
  14ssd   1.74699  osd.14  up   1.0  1.0
  15ssd   1.74699  osd.15  up   1.0  1.0
  16ssd   1.74699  osd.16  up   1.0  1.0
  17ssd   1.74699  osd.17  up   1.0  1.0
-18 31.43900  datacenter site2
-16 31.43900  rack h2
  -9 10.48000  host host04
  18ssd   1.74699  osd.18  up   1.0  1.0
  19ssd   1.74699  osd.19  up   1.0  1.0
  20ssd   1.74699  osd.20  up   1.0  1.0
  21ssd   1.74699  osd.21  up   1.0  1.0
  22ssd   1.74699  osd.22  up   1.0  1.0
  23ssd   1.74699  osd.23  up   1.0  1.0
-11 10.48000  host host05
  24ssd   1.74699  osd.24  up   1.0  1.0
  25ssd   1.74699  osd.25  up   1.0  1.0
  26ssd   1.74699  osd.26  up   1.0  1.0
  27ssd   1.74699  osd.27  up   1.0  1.0
  28ssd   1.74699  osd.28  up   1.0  1.0
  29ssd   1.74699  osd.29  up   1.0  1.0
-13 10.48000  host host06
  30ssd   1.74699  osd.30  up   1.0  1.0
  31ssd   1.74699  osd.31  up   1.0  1.0
  32ssd   1.74699  osd.32  up   1.0  1.0
  33ssd   1.74699  osd.33  up   1.0  1.0
  34ssd   1.74699  osd.34  up   1.0  1.0
  35ssd   1.74699  osd.35  up   1.0  1.0


So regarding my calculation it should be


(6x Nodes * 6x SSD * 1,8TB) / 4 = 16 TB


Is this maybe a bug in the stretch mode that i only get displayed half the size 
available?


Regards,

Kilian



Von: Clyso GmbH - Ceph Foundation Member 
Gesendet: Mittwoch, 22. Juni 2022 18:20:59
An: Kilian Ries; ceph-users(a)ceph.io
Betreff: Re: [ceph-users] Ceph Stretch Cluster - df pool size (Max Avail)

Hi Kilian,

we do not currently use this mode of ceph clustering. but normally you
need to assign the crush rule to the pool as well, otherwise it will
take rule 0 as default.

the following calculation for rule 0 would also work approximately:

(3 Nodes *6 x SSD *1,8TB)/4 = 8,1 TB

hope it helps, Joachim


___
Clyso GmbH - Ceph Foundation Member

Am 22.06.22 um 18:09 schrieb Kilian Ries:

Hi,


i'm running a ceph stretch cluster with two datacenters. Each of the 
datacenters has 3x OSD nodes (in total 6x) and 2x monitors. A third monitor is 
deployed as arbiter node in a third datacenter.


Each OSD node has 6x SSDs with 1,8 TB storage - that gives me a total of about 
63 TB storage (6x nodes * 6x SSD * 1,8TB = 63TB)c.


In stretch mode my pool is configured with replication 4x - and as far as i unterstand 
this should give me a max pool storage size of ~15TB (63TB

[ceph-users] Re: cannot set quota on ceph fs root

2022-07-28 Thread Gregory Farnum
On Thu, Jul 28, 2022 at 1:01 AM Frank Schilder  wrote:
>
> Hi all,
>
> I'm trying to set a quota on the ceph fs file system root, but it fails with 
> "setfattr: /mnt/adm/cephfs: Invalid argument". I can set quotas on any 
> sub-directory. Is this intentional? The documentation 
> (https://docs.ceph.com/en/octopus/cephfs/quota/#quotas) says
>
> > CephFS allows quotas to be set on any directory in the system.
>
> Any includes the fs root. Is the documentation incorrect or is this a bug?

I'm not immediately seeing why we can't set quota on the root, but the
root inode is special in a lot of ways so this doesn't surprise me.
I'd probably regard it as a docs bug.

That said, there's also a good chance that the setfattr is getting
intercepted before Ceph ever sees it, since by setting it on the root
you're necessarily interacting with a mount point in Linux and those
can also be finicky...You could see if it works by using cephfs-shell.
-Greg


>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Stretch Cluster - df pool size (Max Avail)

2022-07-28 Thread Gregory Farnum
https://tracker.ceph.com/issues/56650

There's a PR in progress to resolve this issue now. (Thanks, Prashant!)
-Greg

On Thu, Jul 28, 2022 at 7:52 AM Nicolas FONTAINE  wrote:
>
> Hello,
>
> We have exactly the same problem. Did you find an answer or should we
> open a bug report?
>
> Sincerely,
>
> Nicolas.
>
> Le 23/06/2022 à 11:42, Kilian Ries a écrit :
> > Hi Joachim,
> >
> >
> > yes i assigned the stretch rule to the pool (4x replica / 2x min). The rule 
> > says that two replicas should be in every datacenter.
> >
> >
> > $ ceph osd tree
> > ID   CLASS  WEIGHTTYPE NAME   STATUS  REWEIGHT  PRI-AFF
> >   -1 62.87799  root default
> > -17 31.43900  datacenter site1
> > -15 31.43900  rack b7
> >   -3 10.48000  host host01
> >0ssd   1.74699  osd.0   up   1.0  1.0
> >1ssd   1.74699  osd.1   up   1.0  1.0
> >2ssd   1.74699  osd.2   up   1.0  1.0
> >3ssd   1.74699  osd.3   up   1.0  1.0
> >4ssd   1.74699  osd.4   up   1.0  1.0
> >5ssd   1.74699  osd.5   up   1.0  1.0
> >   -5 10.48000  host host02
> >6ssd   1.74699  osd.6   up   1.0  1.0
> >7ssd   1.74699  osd.7   up   1.0  1.0
> >8ssd   1.74699  osd.8   up   1.0  1.0
> >9ssd   1.74699  osd.9   up   1.0  1.0
> >   10ssd   1.74699  osd.10  up   1.0  1.0
> >   11ssd   1.74699  osd.11  up   1.0  1.0
> >   -7 10.48000  host host03
> >   12ssd   1.74699  osd.12  up   1.0  1.0
> >   13ssd   1.74699  osd.13  up   1.0  1.0
> >   14ssd   1.74699  osd.14  up   1.0  1.0
> >   15ssd   1.74699  osd.15  up   1.0  1.0
> >   16ssd   1.74699  osd.16  up   1.0  1.0
> >   17ssd   1.74699  osd.17  up   1.0  1.0
> > -18 31.43900  datacenter site2
> > -16 31.43900  rack h2
> >   -9 10.48000  host host04
> >   18ssd   1.74699  osd.18  up   1.0  1.0
> >   19ssd   1.74699  osd.19  up   1.0  1.0
> >   20ssd   1.74699  osd.20  up   1.0  1.0
> >   21ssd   1.74699  osd.21  up   1.0  1.0
> >   22ssd   1.74699  osd.22  up   1.0  1.0
> >   23ssd   1.74699  osd.23  up   1.0  1.0
> > -11 10.48000  host host05
> >   24ssd   1.74699  osd.24  up   1.0  1.0
> >   25ssd   1.74699  osd.25  up   1.0  1.0
> >   26ssd   1.74699  osd.26  up   1.0  1.0
> >   27ssd   1.74699  osd.27  up   1.0  1.0
> >   28ssd   1.74699  osd.28  up   1.0  1.0
> >   29ssd   1.74699  osd.29  up   1.0  1.0
> > -13 10.48000  host host06
> >   30ssd   1.74699  osd.30  up   1.0  1.0
> >   31ssd   1.74699  osd.31  up   1.0  1.0
> >   32ssd   1.74699  osd.32  up   1.0  1.0
> >   33ssd   1.74699  osd.33  up   1.0  1.0
> >   34ssd   1.74699  osd.34  up   1.0  1.0
> >   35ssd   1.74699  osd.35  up   1.0  1.0
> >
> >
> > So regarding my calculation it should be
> >
> >
> > (6x Nodes * 6x SSD * 1,8TB) / 4 = 16 TB
> >
> >
> > Is this maybe a bug in the stretch mode that i only get displayed half the 
> > size available?
> >
> >
> > Regards,
> >
> > Kilian
> >
> >
> > 
> > Von: Clyso GmbH - Ceph Foundation Member 
> > Gesendet: Mittwoch, 22. Juni 2022 18:20:59
> > An: Kilian Ries; ceph-users(a)ceph.io
> > Betreff: Re: [ceph-users] Ceph Stretch Cluster - df pool size (Max Avail)
> >
> > Hi Kilian,
> >
> > we do not currently use this mode of ceph clustering. but normally you
> > need to assign the crush rule to the pool as well, otherwise it will
> > take rule 0 as default.
> >
> > the following calculation for rule 0 would also work approximately:
> >
> > (3 Nodes *6 x SSD *1,8TB)/4 = 8,1 TB
> >
> > hope it helps, Joachim
> >
> >
> > ___
> > Clyso GmbH - Ceph Foun

[ceph-users] Re: Upgrade from Octopus to Pacific cannot get monitor to join

2022-07-28 Thread Gregory Farnum
On Wed, Jul 27, 2022 at 4:54 PM  wrote:
>
> Currently, all of the nodes are running in docker. The only way to upgrade is 
> to redeploy with docker (ceph orch daemon redeploy), which is essentially 
> making a new monitor. Am I missing something?

Apparently. I don't have any experience with Docker, and unfortunately
very little with containers in general, so I'm not sure what process
you need to follow, though. cephadm certainly managers to do it
properly — you want to maintain the existing disk store.

How do you do it for OSDs? Surely you don't create throw away an old
OSD, create a new one, and wait for migration to complete before doing
the next...
-Greg

>
> Is there some prep work I could/should be doing?
>
> I want to do a staggered upgrade as noted here 
> (https://docs.ceph.com/en/pacific/cephadm/upgrade/). That says for a 
> staggered upgrade the order is mgr -> mon, etc. But that was not working for 
> me because it said the --daemon-types was not supported.
>
> Basically I'm confused on what is the 'proper' way to upgrade then. There 
> isn't any way that I see to upgrade the 'code' they are running because it's 
> all in docker containers. But maybe I'm missing something obvious
>
> Thanks
>
>
>
>
> July 27, 2022 4:34 PM, "Gregory Farnum"  wrote:
>
> On Wed, Jul 27, 2022 at 10:24 AM  wrote:
>
> Currently running Octopus 15.2.16, trying to upgrade to Pacific using cephadm.
>
> 3 mon nodes running 15.2.16
> 2 mgr nodes running 16.2.9
> 15 OSD's running 15.2.16
>
> The mon/mgr nodes are running in lxc containers on Ubuntu running docker from 
> the docker repo (not the Ubuntu repo). Using cephadm to remove one of the 
> monitor nodes, and then re-add it back with a 16.2.9 version. The monitor 
> node runs but never joins the cluster. Also, this causes the other 2 mon 
> nodes to start flapping. Also tried adding 2 mon nodes (for a total of 5 
> mons) on bare metal running Ubuntu (with docker running from the docker repo) 
> and the mon's won't join and won't even show up in 'ceph status'
>
> The way you’re phrasing this it sounds like you’re removing existing monitors 
> and adding newly-created ones. That won’t work across major version 
> boundaries like this (at least, without a bit of prep work you aren’t doing) 
> because of how monitors bootstrap themselves and their cluster membership. 
> You need to upgrade the code running on the existing monitors instead, which 
> is the documented upgrade process AFAIK.
> -Greg
>
>
>
> Can't find anything in the logs regarding why it's failing. The docker 
> container starts and seems to try to join the cluster but just sits and 
> doesn't join. The other two start flapping and then eventually I have to stop 
> the new mon. I can add the monitor back by changing the container_image to 
> 15.2.16 and it will re-join the cluster as expected.
>
> The cluster was previously running nautilus installed using ceph-deploy
>
> Tried setting 'mon_mds_skip_sanity true' from reading another post but it 
> doesn't appear to help.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 17.2.2: all MGRs crashing in fresh cephadm install

2022-07-28 Thread Adam King
I've just taken another look at the orch ps output you posted and noticed
that the REFRESHED column is reporting "62m sgo". That makes it seem like
the issue is that cephadm isn't actually running its normal operations (it
should refresh daemons every 10 minutes by default). I guess maybe we
should see if it's logged anything that might tell us where it's stuck
"ceph log last 200 cephadm" . To try and get things unstuck, the typical
solution is to just run "ceph mgr fail" which will start the other mgr as
active and put the current active to standby effectively "rebooting"
cephadm. If it was a transient issue that was causing cephadm to get stuck
that would resolve it. I think (but I'm not certain) that the dashboard
might be getting some of its daemon info from cephadm so it being in error
there as well might not actually mean much.

On Thu, Jul 28, 2022 at 10:44 AM Carlos Mogas da Silva 
wrote:

> Yes, cephadm and ceph01 both have mgr's running (the ones with the fix).
> The "error" is that the
> ceph01 one is actually
> running but from "ceph orch"'s perspective, it looks like it's not. Even
> on the dashboard the daemon
> shows as errored but it's running (confirmed via podman and systemctl).
> My take is that something is not communicating some information with
> "cephadm" but I don't know
> what. ceph itself knows the mgr is running since it clearly says it's on
> standby.
>
>
> On Wed, 2022-07-27 at 21:09 -0400, Adam King wrote:
> > What actual hosts are meant to have a mgr here? The naming makes it look
> as if it thinks there's a
> > host "ceph01" and a host "cephadm" and both have 1 mgr. Is that actually
> correct or is that aspect
> > also messed up?
> >
> > Beyond that, you could try manually placing a copy of the cephadm script
> on each host and running
> > "cephadm ls" and see what it gives you. That's how the "ceph orch ps"
> info is gathered so if the
> > output of that looked strange it might tell us something useful.
> >
> > On Wed, Jul 27, 2022 at 8:58 PM Carlos Mogas da Silva 
> wrote:
> > > I just build a Ceph cluster and was, unfortunately hit by this :(
> > >
> > > I managed to restart the mgrs (2 of them) by manually editing the
> > > /var/run/ceph//mgr./unit.run.
> > >
> > > But now I have a problem that I really don't understand:
> > > - both managers are running, and appear on "ceph -s" as "mgr:
> cephadm.mxrhsp(active, since 62m),
> > > standbys: ceph01.fwtity"
> > > - looks like the orchestrator is a little "confused":
> > > # ceph orch ps --daemon-type mgr
> > > NAMEHOST PORTSSTATUS REFRESHED
> AGE  MEM USE  MEM LIM
> > > VERSION
> > > IMAGE ID  CONTAINER ID
> > > mgr.ceph01.fwtity   ceph01   *:8443,9283  error62m ago
>  2h--
> > > 
> > >  
> > > mgr.cephadm.mxrhsp  cephadm  *:9283   running (63m)62m ago
>  2h 437M-
> > > 17.2.2-1-
> > > gf516549e  5081f5a97849  0f0bc2c6791f
> > >
> > > because of this I can't run a "ceph orch upgrade" because it always
> complains about having only
> > > one.
> > > Is there something else that needs to be changed to get the cluster to
> a normal state?
> > >
> > > Thanks!
> > >
> > > On Wed, 2022-07-27 at 12:23 -0400, Adam King wrote:
> > > > yeah, that works if there is a working mgr to send the command to. I
> was
> > > > assuming here all the mgr daemons were down since it was a fresh
> cluster so
> > > > all the mgrs would have this bugged image.
> > > >
> > > > On Wed, Jul 27, 2022 at 12:07 PM Vikhyat Umrao 
> wrote:
> > > >
> > > > > Adam - or we could simply redeploy the daemon with the new image?
> at least
> > > > > this is something I did in our testing here[1].
> > > > >
> > > > > ceph orch daemon redeploy mgr. quay.ceph.io/ceph-
> > > > > ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531
> > > > >
> > > > > [1]
> https://github.com/ceph/ceph/pull/47270#issuecomment-1196062363
> > > > >
> > > > > On Wed, Jul 27, 2022 at 8:55 AM Adam King 
> wrote:
> > > > >
> > > > > > the unit.image file is just there for cpehadm to look at as part
> of
> > > > > > gathering metadata I think. What you'd want to edit is the
> unit.run file
> > > > > > (in the same directory as the unit.image). It should have a
> really long
> > > > > > line specifying a podman/docker run command and somewhere in
> there will be
> > > > > > "CONTAINER_IMAGE=". You'd need to change that to
> say
> > > > > > "CONTAINER_IMAGE=
> > > > > >
> quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531" then
> > > > > > restart the service.
> > > > > >
> > > > > > On Wed, Jul 27, 2022 at 11:46 AM Daniel Schreiber <
> > > > > > daniel.schrei...@hrz.tu-chemnitz.de> wrote:
> > > > > >
> > > > > > > Hi Neha,
> > > > > > >
> > > > > > > thanks for the quick response. Sorry for that stupid question:
> to use
> > > > > > > that image I pull the image on the machine and then change
> > > > > > > /var/lib/ceph/${clusterid}/mgr.${unit}/unit.image and start
> the service?
> > > > > > >

[ceph-users] Cache configuration for each storage class

2022-07-28 Thread Alejandro T:
Hello,

I have an octopus cluster with 3 OSD hosts. Each of them has 13 daemons 
belonging to different storage classes. 

I'd like to have multiple osd_memory_target settings for each class.
In the documentation there's some mention of setting different bluestore cache 
sizes for HDDs and SSDs, but I want to change the sizes for different HDD 
types based on their storage class.

I thought of writing a script that applies the configuration of all daemons 
one by one but I was wondering if there's another way to do it or some reason 
this option isn't available.


Thank you,
Alejandro


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot set quota on ceph fs root

2022-07-28 Thread Jesper Lykkegaard Karlsen
Hi Frank, 

I guess there is alway the possibility to set quota on pool level with 
"target_max_objects" and “target_max_bytes”
The cephfs quotas through attributes are only for sub-directories as far as I 
recall. 

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 28 Jul 2022, at 17.22, Frank Schilder  wrote:
> 
> Hi Gregory,
> 
> thanks for your reply. It should be possible to set a quota on the root, 
> other vattribs can be set as well despite it being a mount point. There must 
> be something on the ceph side (or another bug in the kclient) preventing it.
> 
> By the way, I can't seem to find cephfs-tools like cephfs-shell. I'm using 
> the image quay.io/ceph/ceph:v15.2.16 and its not installed in the image. A 
> "yum provides cephfs-shell" returns no candidate and I can't find 
> installation instructions. Could you help me out here?
> 
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Gregory Farnum 
> Sent: 28 July 2022 16:59:50
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] cannot set quota on ceph fs root
> 
> On Thu, Jul 28, 2022 at 1:01 AM Frank Schilder  wrote:
>> 
>> Hi all,
>> 
>> I'm trying to set a quota on the ceph fs file system root, but it fails with 
>> "setfattr: /mnt/adm/cephfs: Invalid argument". I can set quotas on any 
>> sub-directory. Is this intentional? The documentation 
>> (https://docs.ceph.com/en/octopus/cephfs/quota/#quotas) says
>> 
>>> CephFS allows quotas to be set on any directory in the system.
>> 
>> Any includes the fs root. Is the documentation incorrect or is this a bug?
> 
> I'm not immediately seeing why we can't set quota on the root, but the
> root inode is special in a lot of ways so this doesn't surprise me.
> I'd probably regard it as a docs bug.
> 
> That said, there's also a good chance that the setfattr is getting
> intercepted before Ceph ever sees it, since by setting it on the root
> you're necessarily interacting with a mount point in Linux and those
> can also be finicky...You could see if it works by using cephfs-shell.
> -Greg
> 
> 
>> 
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: LibCephFS Python Mount Failure

2022-07-28 Thread 胡 玮文
Hi Adam,

Have you tried ‘cephfs.LibCephFS(auth_id="monitoring")’?

Weiwen Hu

> 在 2022年7月27日,20:41,Adam Carrgilson (NBI)  写道:
> 
> I’m still persevering with this, if anyone can assist, I would truly 
> appreciate it.
> 
> As I said previously, I’ve been able to identify that the error is 
> specifically a Permission Denied error. I’m dropping the custom keyring in 
> via the CEPH_ARGS environment variable, I’ve also tried using the environment 
> to include the id/name/user to tie it to my monitoring account. I’ve also 
> tried coding those id / name / user into the ceph.conf file with no effect 
> either.
> 
> I’ve been trying to dump the state of the LibCephFS object to identify what 
> exactly is going on; although I can't see a way to dump everything (that 
> would be far too easy), I can call the conf_get command with a key to return 
> the state of individual settings.
> 
> I can do this with ‘keyring’ and it returns the location I specified in the 
> environment, similarly, I can request mon_host or fsid, and those return 
> their values from the configuration file, however, I cannot return id, name, 
> user, or client.id, client.name, or client.user, although I don't know for 
> sure if those are the correct setting names I should be requesting?
> 
> So, it seems that whatever mechanism I use to try to populate the user, I 
> cannot yet get the library to honour it.
> 
> Does anyone have any further clues that might help me resolve this?
> 
> Many Thanks,
> Adam.
> 
> 
> -Original Message-
> From: Adam Carrgilson (NBI)  
> Sent: 27 July 2022 09:51
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: LibCephFS Python Mount Failure
> 
> It feels like that, but I have specified --id client.monitoring inside of the 
> environment variable together with the --keyring definition.
> 
> Is there anyway to query the library so that I can see all the variables that 
> it thinks are active and debug from there?
> 
> Many Thanks,
> Adam.
> 
> From: Gregory Farnum 
> Sent: 26 July 2022 16:41
> To: Adam Carrgilson (NBI) 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: LibCephFS Python Mount Failure
> 
> It looks like you’re setting environment variables that force your new 
> keyring,  it you aren’t telling the library to use your new CephX user. So it 
> opens your new keyring and looks for the default (client.admin) user and 
> doesn’t get anything.
> -Greg
> 
> On Tue, Jul 26, 2022 at 7:54 AM Adam Carrgilson (NBI) 
> mailto:adam.carrgil...@nbi.ac.uk>> wrote:
> I've disabled the part of the script that catches the Python exception and 
> allowed it to print everything out and it looks like the OSError with the 
> code 13, is a permissions error:
> 
> Traceback (most recent call last):
>  File "./get-ceph-quota-statistics.py", line 274, in 
>main(args)
>  File "./get-ceph-quota-statistics.py", line 30, in main
>cephfs = login() # holds CephFS bindings
>  File "./get-ceph-quota-statistics.py", line 94, in login
>cephfs.mount(filesystem_name=b'cephfs')
>  File "cephfs.pyx", line 684, in cephfs.LibCephFS.mount
>  File "cephfs.pyx", line 676, in cephfs.LibCephFS.init
> cephfs.OSError: error calling ceph_init: Permission denied [Errno 13]
> 
> Now I've tested a FUSE mount with the same keyfile and that functions as 
> expected, so I'm having to assume that somehow the Python script either 
> doesn't have all of the properties I've supplied (which I doubt, because I 
> can point it at files with admin credentials and it works fine), something 
> within the Python CephFS library might be hardcoded to particular values 
> which I'm having problems with, or maybe something else?
> 
> Is there a way to interrogate the Python object before I do the cephfs.mount, 
> just to confirm the options are as I expect?
> 
> Alternatively, python-cephfs wraps around the CephFS library, right?
> Does the CephFS FUSE component utilise the same CephFS library?
> If not, is there a way to call something else on the command line directly to 
> rule out problems there?
> 
> Many Thanks,
> Adam.
> 
> -Original Message-
> From: Adam Carrgilson (NBI) 
> mailto:adam.carrgil...@nbi.ac.uk>>
> Sent: 25 July 2022 16:24
> To: ceph-users@ceph.io
> Cc: Bogdan Adrian Velica mailto:vbog...@gmail.com>>
> Subject: [ceph-users] Re: LibCephFS Python Mount Failure
> 
> Thanks Bogdan,
> 
> I’m running this script at the moment as my development system’s root user 
> account, I don’t have a particular ceph user on this standalone system, and I 
> don’t think I’ll be able to control the user account of the monitoring hosts 
> either (I think they might run under a user account dedicated to the 
> monitoring) but I’m interested to what you think I should test here?
> 
> I can definitely run the code as the root user, it can read my custom 
> configuration and key files, when I specify those providing the admin user 
> credentials, it works as expected, but when I specify the monitoring 
> crede

[ceph-users] RGW Multisite Sync Policy - Bucket Specific - Core Dump

2022-07-28 Thread Mark Selby
We use Ceph RBD/FS extensively and are starting down our RGW journey. We have 3 
sites and want to replicate buckets from a single “primary” to multiple 
“backup” site. Each site has a Ceph cluster and they are all configured as part 
of a Multisite setup.

I am using the instructions at 
https://docs.ceph.com/en/quincy/radosgw/multisite-sync-policy/#example-3-mirror-a-specific-bucket
 to try and configure a single bucket to replicate from one zone to two other 
zones in a directional manner (not symmetric)

When I follow the example I get a core dump on the final radosgw-admin sync 
group pipe create command.

It would be great if someone with experience with Multisite Sync Policy could 
take a look my commands and see if there is anything glaringly wrong with what 
I am trying to do.

BTW: The Multisite Sync Policy docs are, IMHO, the most opaque/confusing 
section on the doc site overall.

Setup:
  Version: 16.2.10
  Clusters: 3
  Zonegroup: us
  Zones: us-dev-1, us-dev-2, us-dev-3
  Tenant: elvis
  Bucket: artifact

radosgw-admin sync group create \
--group-id=us \
--status=allowed

radosgw-admin sync group flow create \
--group-id=us \
--flow-id=dev1-to-dev2 \
--flow-type=directional \
--source-zone=us-dev-1 \
--dest-zone=us-dev-2

radosgw-admin sync group flow create \
--group-id=us \
--flow-id=dev1-to-dev3 \
--flow-type=directional \
--source-zone=us-dev-1 \
--dest-zone=us-rose-3-dev

radosgw-admin sync group pipe create \
--group-id=us \
--pipe-id=us-all \
--source-zones='*' \
--source-bucket='*' \
--dest-zones='*' \
--dest-bucket='*'

radosgw-admin sync group create \
--bucket=elvis/artifact \
--group-id=elvis-artifact \
--status=enabled

radosgw-admin sync group pipe create \
--bucket=elvis/artifact \
--group-id=elvis-artifact \
--pipe-id=pipe1 \
--source-zones='us-dev-1'\
--dest-zones='us-dev-2,us-rose-3-dev'

/usr/include/c++/8/optional:714: constexpr _Tp& std::_Optional_base<_Tp, 
,  >::_M_get() [with _Tp = rgw_bucket; bool  = 
false; bool  = false]: Assertion 'this->_M_is_engaged()' failed.
*** Caught signal (Aborted) **
 in thread 7f0092e41380 thread_name:radosgw-admin
 ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f0086ffece0]
 2: gsignal()
 3: abort()
 4: radosgw-admin(+0x35fff8) [0x563d0c5f5ff8]
 5: 
(rgw_sync_bucket_entities::set_bucket(std::optional, std::allocator > >, 
std::optional, 
std::allocator > >, std::optional, std::allocator > >)+0x67) [0x563d0c879c07]
 6: main()
 7: __libc_start_main()
 8: _start()
2022-07-28T09:24:39.445-0700 7f0092e41380 -1 *** Caught signal (Aborted) **
 in thread 7f0092e41380 thread_name:radosgw-admin

 ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f0086ffece0]
 2: gsignal()
 3: abort()
 4: radosgw-admin(+0x35fff8) [0x563d0c5f5ff8]
 5: 
(rgw_sync_bucket_entities::set_bucket(std::optional, std::allocator > >, 
std::optional, 
std::allocator > >, std::optional, std::allocator > >)+0x67) [0x563d0c879c07]
 6: main()
 7: __libc_start_main()
 8: _start()
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- begin dump of recent events ---
  -494> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command assert hook 0x563d0e75fbd0
  -493> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command abort hook 0x563d0e75fbd0
  -492> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command leak_some_memory hook 0x563d0e75fbd0
  -491> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command perfcounters_dump hook 0x563d0e75fbd0
  -490> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command 1 hook 0x563d0e75fbd0
  -489> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command perf dump hook 0x563d0e75fbd0
  -488> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command perfcounters_schema hook 0x563d0e75fbd0
  -487> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command perf histogram dump hook 0x563d0e75fbd0
  -486> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command 2 hook 0x563d0e75fbd0
  -485> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command perf schema hook 0x563d0e75fbd0
  -484> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command perf histogram schema hook 0x563d0e75fbd0
  -483> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command perf reset hook 0x563d0e75fbd0
  -482> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 asok(0x563d0e6ece90) 
register_command config show hook 0x563d0e75fbd0
  -481> 2022-07-28T09:24:39.316-0700 7f0092e41380  5 

[ceph-users] RGW Multisite Sync Policy - Flow and Pipe Linkage

2022-07-28 Thread Mark Selby
We use Ceph RBD/FS extensively and are starting down our RGW journey. We

have 3 sites and want to replicate buckets from a single "primary" to

multiple "backup" sites. Each site has a Ceph cluster and they are all

configured as part of a Multisite setup.

 

I am using the examples at

https://docs.ceph.com/en/quincy/radosgw/multisite-sync-policy and have

gotten a directional sync to work using the below commands.

 

The question I have is about the scalability of what I am doing. How

groups/flows/pipe all fit together is not very clear in the docs.

 

My group 'us' has two flows, 'dev1-to-dev2' and 'dev1-to-dev3'.

 

The pipe that I create specifies the group 'us' but does not specify the

flows to us. I have to assume in this case it uses all of the flows at

the zonegroup level. This is fine as long as these are the only flows

that I will ever need.

 

What I am try having difficulty understanding is the linkage between

pipes and flows. If any one can explain this more than the docs do, it

would be greatly appreciated.

 

BTW: The Multisite Sync Policy docs are, IMHO, the most opaque/confusing 
section on the doc site overall.

 

Setup:

  Version: 16.2.10

  Clusters: 3

  Zonegroup: us

  Zones: us-dev-1, us-dev-2, us-dev-3

  Tenant: elvis

  Bucket: artifact

 

radosgw-admin sync group create \

    --group-id=us \

    --status=allowed

 

radosgw-admin sync group flow create \

    --group-id=us \

    --flow-id=dev1-to-dev2 \

    --flow-type=directional \

    --source-zone=us-dev-1 \

    --dest-zone=us-dev-2

 

radosgw-admin sync group flow create \

    --group-id=us \

    --flow-id=dev1-to-dev3 \

    --flow-type=directional \

    --source-zone=us-dev-1 \

    --dest-zone=us-dev-3

 

radosgw-admin sync group pipe create \

    --group-id=us \

    --pipe-id=systems-artifact \

    --source-zones='us-dev-1' \

    --source-bucket='elvis/artifact' \

    --dest-zones='us-dev-2,us-dev-3' \

    --dest-bucket='elvis/artifact'

 

radosgw-admin sync group modify \

    --group-id=us \

    --status=enabled

 

radosgw-admin period update --commit

 

-- 

Mark Selby

Sr Linux Administrator, The Voleon Group

mse...@voleon.com 

 

 This email is subject to important conditions and disclosures that are listed 
on this web page: https://voleon.com/disclaimer/.

 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW Multisite Sync Policy - Bucket Specific - Core Dump

2022-07-28 Thread Soumya Koduri

On 7/28/22 22:19, Mark Selby wrote:

/usr/include/c++/8/optional:714: constexpr _Tp& std::_Optional_base<_Tp, ,  
>::_M_get() [with _Tp = rgw_bucket; bool  = false; bool  = false]: Assertion 
'this->_M_is_engaged()' failed.
*** Caught signal (Aborted) **
  in thread 7f0092e41380 thread_name:radosgw-admin
  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific 
(stable)
  1: /lib64/libpthread.so.0(+0x12ce0) [0x7f0086ffece0]
  2: gsignal()
  3: abort()
  4: radosgw-admin(+0x35fff8) [0x563d0c5f5ff8]
  5: (rgw_sync_bucket_entities::set_bucket(std::optional, 
std::allocator > >, std::optional, std::allocator 
> >, std::optional, std::allocator > >)+0x67) 
[0x563d0c879c07]
  6: main()
  7: __libc_start_main()
  8: _start()
2022-07-28T09:24:39.445-0700 7f0092e41380 -1 *** Caught signal (Aborted) **
  in thread 7f0092e41380 thread_name:radosgw-admin



ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)



This issue is fixed [1] and is yet to be backported to quincy & pacific [2].


Regards,

Soumya


[1] https://tracker.ceph.com/issues/52044

[2] https://tracker.ceph.com/issues/55918

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW Multisite Sync Policy - Flow and Pipe Linkage

2022-07-28 Thread Soumya Koduri

On 7/28/22 22:41, Mark Selby wrote:

We use Ceph RBD/FS extensively and are starting down our RGW journey. We

have 3 sites and want to replicate buckets from a single "primary" to

multiple "backup" sites. Each site has a Ceph cluster and they are all

configured as part of a Multisite setup.

  


I am using the examples at

https://docs.ceph.com/en/quincy/radosgw/multisite-sync-policy and have

gotten a directional sync to work using the below commands.

  


The question I have is about the scalability of what I am doing. How

groups/flows/pipe all fit together is not very clear in the docs.

  


My group 'us' has two flows, 'dev1-to-dev2' and 'dev1-to-dev3'.

  


The pipe that I create specifies the group 'us' but does not specify the

flows to us. I have to assume in this case it uses all of the flows at

the zonegroup level. This is fine as long as these are the only flows

that I will ever need.



yes, in your setup, it uses the flows created for the group "us" 
(zone-group level).





  


What I am try having difficulty understanding is the linkage between

pipes and flows. If any one can explain this more than the docs do, it

would be greatly appreciated.

  



A data-flow defines the flow of data between the different zones (either 
symmetrical or directional) where as a pipe defines actual buckets that 
can use these data flows [1]


FWIU, pipes use the data-flows associated with their group policy or 
inherits the ones configured at zone-group level.


i.e.,

* if the group policy is created at zonegroup level (like in your setup 
'us'), pipes of that group use the flows associated with that  
('us'). If no flows are configured, the sync is not allowed.


* In case if the group (say, 'us-bucket') is created at bucket-level,  
the data-flows (if any) created for that group ('us-bucket') should be 
subset of what zonegroup level policy ('us') allows. And that group 
policy ('us-bucket') pipes use the flows in the following order


(a) the data-flows created for that bucket-level group policy ('us-bucket')

(b) if no flows are created, it then inherits the data flow allowed at 
zonegroup level ('us').



Hope this helps!


-Soumya

[1] https://docs.ceph.com/en/quincy/radosgw/multisite-sync-policy/



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: colocation of MDS (count-per-host) not working in Quincy?

2022-07-28 Thread John Mulligan
On Thursday, July 28, 2022 3:45:30 PM EDT Vladimir Brik wrote:
> count_per_host worked!
> 
> I create a ticket.

Good to hear. Thanks!

> 
>  > Regardless of whether this is a good idea or not the
> 
> option is a
> 
>  > generic one and should be handled gracefully. :-)
> 
> Do you mean running multiple MDSes on a single host may not
> be a good idea?
> 

Right, I'm not an expert in this area, but the issue is that running multiple 
MDS for the same file system won't give any redundancy if that node fails. If 
you run two MDS services, and they're on different hosts the standby MDS on the 
other node can take over. 

If you use this option there's also the chance that the services may want to 
bind to the same port. I don't know if that the case for the MDS.

> 
> Vlad
> 
> On 7/28/22 14:11, John Mulligan wrote:
> > On Thursday, July 28, 2022 2:14:57 PM EDT Vladimir Brik wrote:
> >> Hello
> >> 
> >> I tried to run multiple MDSes per host using this yaml (per
> >> https://docs.ceph.com/en/quincy/cephadm/services/index.html#co-location-o
> >> f-d aemons):
> >> 
> >> service_type: mds
> >> service_id: default
> >> service_name: mds.default
> >> 
> >> placement:
> >> count-per-host: 2
> >> label: mds
> >> 
> >> But got: Error EINVAL: PlacementSpec: __init__() got an
> >> unexpected keyword argument 'count-per-host'
> > 
> > Please try 'count_per_host' if you haven't already.
> > 
> >> Is this just not supported yet or is there another way to
> >> get multiple MDSes on the same host?
> > 
> > I was curious about this and saw parts of the codebase and the docs refer
> > to this field as "count-per-host" but I couldn't find any example yaml
> > that uses this only 'count_per_host'.
> > 
> > Even if this suggestion works, please also file a tracker issue on ceph
> > orch for this. We should either fix the docs or have a better error
> > message or both. Regardless of whether this is a good idea or not the
> > option is a generic one and should be handled gracefully. :-)
> > 
> >> Vlad
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-28 Thread Jesper Lykkegaard Karlsen
Thanks you for your suggestions Josh, it is really appreciated. 

Pgremapper looks interesting and definitely something I will look into.
 
I know the balancer will reach a well balanced PG landscape eventually, but I 
am not sure that it will prioritise backfill after “most available location” 
first. 
Then I might end up in the same situation, where some of the old (but not 
retired) OSD starts getting full. 

Then there is the “undo-upmaps” script left or maybe even the script that I 
propose in combination with “cancel-backfill”, as it just moves what Ceph was 
planing to move anyway, just in a prioritised manner. 

Have you tried the pgremapper youself Josh? 
Is it safe to use? 
And does the Ceph developers vouch for this methode?   

Status now is ~1,600,000,000 objects are now move, which is about half of all 
of the planned backfills. 
I have been reweighing OSD down, as they get to close to maximum usage, which 
works to some extend. 

Monitors on the other hand are now complaining about using a lot of disk space, 
due to the long time backfilling. 
There is still plenty of disk space on the mons, but I feel that the backfill 
is getting slower and slower, although still the same amount of PGs are 
backfilling. 

Can large disk usage on mons slow down backfill and other operations? 
Is it dangerous? 

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 28 Jul 2022, at 22.26, Josh Baergen  wrote:
> 
> I don't have many comments on your proposed approach, but just wanted
> to note that how I would have approached this, assuming that you have
> the same number of old hosts, would be to:
> 1. Swap-bucket the hosts.
> 2. Downweight the OSDs on the old hosts to 0.001. (Marking them out
> (i.e. weight 0) prevents maps from being applied.)
> 3. Add the old hosts back to the CRUSH map in their old racks or whatever.
> 4. Use https://github.com/digitalocean/pgremapper#cancel-backfill.
> 5. Then run https://github.com/digitalocean/pgremapper#undo-upmaps in
> a loop to drain the old OSDs.
> 
> This gives you the maximum concurrency and efficiency of movement, but
> doesn't necessarily solve your balance issue if it's the new OSDs that
> are getting full (that wasn't clear to me). It's still possible to
> apply steps 2, 4, and 5 if the new hosts are in place. If you're not
> in a rush could actually use the balancer instead of undo-upmaps in
> step 5 to perform the rest of the data migration and then you wouldn't
> have full OSDs.
> 
> Josh
> 
> On Fri, Jul 22, 2022 at 1:57 AM Jesper Lykkegaard Karlsen
>  wrote:
>> 
>> It seems like a low hanging fruit to fix?
>> There must be a reason why the developers have not made a prioritized order 
>> of backfilling PGs.
>> Or maybe the prioritization is something else than available space?
>> 
>> The answer remains unanswered, as well as if my suggested approach/script 
>> would work or not?
>> 
>> Summer vacation?
>> 
>> Best,
>> Jesper
>> 
>> --
>> Jesper Lykkegaard Karlsen
>> Scientific Computing
>> Centre for Structural Biology
>> Department of Molecular Biology and Genetics
>> Aarhus University
>> Universitetsbyen 81
>> 8000 Aarhus C
>> 
>> E-mail: je...@mbg.au.dk
>> Tlf:+45 50906203
>> 
>> 
>> Fra: Janne Johansson 
>> Sendt: 20. juli 2022 19:39
>> Til: Jesper Lykkegaard Karlsen 
>> Cc: ceph-users@ceph.io 
>> Emne: Re: [ceph-users] replacing OSD nodes
>> 
>> Den ons 20 juli 2022 kl 11:22 skrev Jesper Lykkegaard Karlsen 
>> :
>>> Thanks for you answer Janne.
>>> Yes, I am also running "ceph osd reweight" on the "nearfull" osds, once 
>>> they get too close for comfort.
>>> 
>>> But I just though a continuous prioritization of rebalancing PGs, could 
>>> make this process more smooth, with less/no need for handheld operations.
>> 
>> You are absolutely right there, just wanted to chip in with my
>> experiences of "it nags at me but it will still work out" so other
>> people finding these mails later on can feel a bit relieved at knowing
>> that a few toofull warnings aren't a major disaster and that it
>> sometimes happens, because ceph looks for all possible moves, even
>> those who will run late in the rebalancing.
>> 
>> --
>> May the most significant bit of your life be positive.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: replacing OSD nodes

2022-07-28 Thread Jesper Lykkegaard Karlsen
Cool thanks a lot! 
I will definitely put it in my toolbox. 

Best, 
Jesper

--
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:+45 50906203

> On 29 Jul 2022, at 00.35, Josh Baergen  wrote:
> 
>> I know the balancer will reach a well balanced PG landscape eventually, but 
>> I am not sure that it will prioritise backfill after “most available 
>> location” first.
> 
> Correct, I don't believe it prioritizes in this way.
> 
>> Have you tried the pgremapper youself Josh?
> 
> My team wrote and maintains pgremapper and we've used it extensively,
> but I'd always recommend trying it in test environments first. Its
> effect on the system isn't much different than what you're proposing
> (it simply manipulates the upmap exception table).
> 
> Josh

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mds optimization

2022-07-28 Thread David Yang
Dear all
I have a CephFS filesystem storage cluster with version pacific mounted on
a linux server using the kernel client.
Then share the stored mount directory to the windows client by deploying
the samba service.

Sometimes it is found that some workloads from Windows will have a lot of
metadata operations.
At this time, the CPU usage of the mds process will be higher than usual;
at the same time, the process on the samba server on the kernel client will
become the D process and the load will increase.

Using process monitor on windows to view the IO of the application, you can
find that the application has been operating in querydirctory during this
time.

Personally think that the application load is in a large amount of
operation metadata, which leads to an increase in the usage rate of the mds
service and a slow request.

Can someone provide some optimized solutions, thanks a lot.

Best regards,
David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io