[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"

2024-08-12 Thread Janek Bevendorff

Thanks all.


ceph log last 10 warn cluster


That outputs nothing for me. Any docs about this?

I don't have much to comment about logging, I feel you though. I just 
wanted to point out that the details about the large omap object 
should be in the (primary) OSD log, not in the MON log:


The message says cluster log. But even if it were the OSD logs, am I 
supposed to grep every single OSD log for it?



If you’re getting much volume to mon logs maybe you aren’t setting the level in 
a way that’s taking effect.   Should mostly be quorum status and compaction 
results.
I have set log levels, but it's largely ignored for the cluster log. 
There was an issue about this a few years back, I just can't find it 
right now.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"

2024-08-12 Thread Eugen Block

Hi,


ceph log last 10 warn cluster


That outputs nothing for me. Any docs about this?


not any good docs, I'm afraid. At some point I stumbled across 'ceph  
log last cephadm' and played around a bit to see what else you can get  
from that. The help command shows some useful information:


log last [] []  
[]   print last few lines of the  
cluster log


But I agree, there should be a section in the docs for that. I'm  
adding Zac in CC, maybe there is already some work going on in that  
regard.


The message says cluster log. But even if it were the OSD logs, am I  
supposed to grep every single OSD log for it?


That's where the 'ceph log last' commands should help you out, but I  
don't know why you don't see it, maybe increase the number of lines to  
display or something?


BTW, which ceph version are we talking about here?


Zitat von Janek Bevendorff :


Thanks all.


ceph log last 10 warn cluster


That outputs nothing for me. Any docs about this?

I don't have much to comment about logging, I feel you though. I  
just wanted to point out that the details about the large omap  
object should be in the (primary) OSD log, not in the MON log:


The message says cluster log. But even if it were the OSD logs, am I  
supposed to grep every single OSD log for it?


If you’re getting much volume to mon logs maybe you aren’t setting  
the level in a way that’s taking effect.   Should mostly be quorum  
status and compaction results.
I have set log levels, but it's largely ignored for the cluster log.  
There was an issue about this a few years back, I just can't find it  
right now.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"

2024-08-12 Thread Janek Bevendorff


That's where the 'ceph log last' commands should help you out, but I 
don't know why you don't see it, maybe increase the number of lines to 
display or something?


BTW, which ceph version are we talking about here?


reef.

I tried ceph log last 100 debug cluster and that gives me the usual DBG 
spam that I otherwise see in the MON logs. But there are no messages 
above that level.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD Journaling seemingly getting stuck for some VMs after upgrade to Octopus

2024-08-12 Thread Oliver Freyermuth

Dear Cephalopodians,

we've successfully operated a "good old" Mimic cluster with primary RBD images, 
replicated via journaling to a "backup cluster" with Octopus, for the past years (i.e. 
one-way replication).
We've now finally gotten around upgrading the cluster with the primary images 
to Octopus (and plan to upgrade further in the near future).

After the upgrade, all MON+MGR-OSD+rbd_mirror daemons are running 15.2.17.

We run three rbd-mirror daemons which all share the following client with auth in the 
"backup" cluster, to which they write:

  client.rbd_mirror_backup
caps: [mon] profile rbd-mirror
caps: [osd] profile rbd

and the following shared client with auth in the "primary cluster" from which 
they are reading:

  client.rbd_mirror
caps: [mon] profile rbd
caps: [osd] profile rbd

i.e. the same auth as described in the docs[0].

Checking on the primary cluster, we get:

# rbd mirror pool status
  health: UNKNOWN
  daemon health: UNKNOWN
  image health: OK
  images: 288 total
  288 replaying

For some reason, some values are "unknown" here. But mirroring seems to work, 
as checking on the backup cluster reveals, see for example:

  # rbd mirror image status zabbix-test.example.com-disk2
zabbix-test.example.com-disk2:
global_id:   1bdcb981-c1c5-4172-9583-be6a6cd996ec
state:   up+replaying
description: replaying, 
{"bytes_per_second":8540.27,"entries_behind_primary":0,"entries_per_second":1.8,"non_primary_position":{"entry_tid":869176,"object_number":504,"tag_tid":1},"primary_position":{"entry_tid":11143,"object_number":7,"tag_tid":1}}
service: rbd_mirror_backup on rbd-mirror002.example.com
last_update: 2024-08-12 09:53:17

However, we do in some seemingly random cases see that journals are never 
advanced on the primary cluster — staying with the example above, on the 
primary cluster I find the following:

  # rbd journal status --image zabbix-test.physik.uni-bonn.de-disk2
  minimum_set: 1
  active_set: 126
registered clients:
  [id=, commit_position=[positions=[[object_number=7, tag_tid=1, 
entry_tid=11143], [object_number=6, tag_tid=1, entry_tid=11142], 
[object_number=5, tag_tid=1, entry_tid=11141], [object_number=4, tag_tid=1, 
entry_tid=11140]]], state=connected]
  [id=52b80bb0-a090-4f7d-9950-c8691ed8fee9, 
commit_position=[positions=[[object_number=505, tag_tid=1, entry_tid=869181], 
[object_number=504, tag_tid=1, entry_tid=869180], [object_number=507, 
tag_tid=1, entry_tid=869179], [object_number=506, tag_tid=1, 
entry_tid=869178]]], state=connected]

As you can see, the minimum_set was not advanced. As can be seen in "mirror image 
status", it shows the strange output that non_primary_position seems much more advanced than 
primary_position. This seems to happen "at random" for only a few volumes...
There are no other active clients apart from the actual VM (libvirt+qemu).

As a quick fix, to purge journals piling up over and over, we've only found the 
"solution" to temporarily disable and then re-enable journaling for affected VM 
disks, which can be identified by:
 for A in $(rbd ls); do echo -n "$A: "; rbd --format=json journal status 
--image $A | jq '.active_set - .minimum_set'; done


Any idea what is going wrong here?
This did not happen with the primary cluster running Mimic and the backup 
cluster running Octopus before, and also did not happen when both were running 
Mimic.

We plan to switch to snapshot-based mirroring in the future anyways, but it 
would be good to understand this strange issue in any case.

Cheers,
Oliver

[0] https://docs.ceph.com/en/octopus/rbd/rbd-mirroring/

--
Oliver Freyermuth
Universität Bonn
Physikalisches Institut, Raum 1.047
Nußallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--


smime.p7s
Description: Kryptografische S/MIME-Signatur
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Please guide us inidentifyingthecauseofthedata miss in EC pool

2024-08-12 Thread Best Regards
Hi, Frédéric


Thanks for your advice and suggestions.
The failure to identify the root cause of the data loss has a certain impact on 
subsequent improvement measures.min_size does indeed need to be changed to K+1. 
We will also reevaluate the disaster recovery situation to better handle 
extreme scenarios. 



Thank you again.


Best regards.



Best Regards
wu_chu...@qq.com

Best Regards






   
Original Email
   
 

From:"Frédéric Nass"< frederic.n...@univ-lorraine.fr >;

Sent Time:2024/8/9 16:33

To:"Best Regards"< wu_chu...@qq.com >;

Cc recipient:"ceph-users"< ceph-users@ceph.io >;

Subject:Re: Re:Re: Re:[ceph-users] Re: Please guide us 
inidentifyingthecauseofthedata miss in EC pool


Hi Chulin,


When it comes to data consistency, it's generally admitted that Ceph is an 
undefeated master.
 


Considering the very few (~100) rados objects that were completely lost (data 
and metadata) and the fact that you're using colocated HDD OSDs with volatile 
disk buffers caching rocksdb metadata and Bluestore data and metadata, I doubt 
that volatile disk buffers weren't involved in the data loss, whatever the logs 
say or don't say about which of the 6 over 9 OSDs were in the acting set at the 
moment of the power outage.


Unless you're ok with facing data loss again, I'd advise you fix the initial 
design flaws if you can. Like stop using non-persistent cache / buffers along 
the IO path, raise mon_size to k+1 and reconsider data placement in regards to 
risks of network partitioning, power outage, fire. Also, considering the ceph 
status, make sure you don't run out of disk space.


Best regards,
Frédéric.





De : Best Regards https://tracker.ceph.com/issues/66942, it includes the 
original logs needed for troubleshooting. However, four days have passed 
without any response. In desperation, we are sending this email, hoping that 
someone from the Ceph team can guide us as soon as possible. 


 We are currently in a difficult situation and hope you can provide guidance. 
Thank you. 



 Best regards. 





 wu_chu...@qq.com 
 wu_chu...@qq.com
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Journaling seemingly getting stuck for some VMs after upgrade to Octopus

2024-08-12 Thread Ilya Dryomov
On Mon, Aug 12, 2024 at 10:20 AM Oliver Freyermuth
 wrote:
>
> Dear Cephalopodians,
>
> we've successfully operated a "good old" Mimic cluster with primary RBD 
> images, replicated via journaling to a "backup cluster" with Octopus, for the 
> past years (i.e. one-way replication).
> We've now finally gotten around upgrading the cluster with the primary images 
> to Octopus (and plan to upgrade further in the near future).
>
> After the upgrade, all MON+MGR-OSD+rbd_mirror daemons are running 15.2.17.
>
> We run three rbd-mirror daemons which all share the following client with 
> auth in the "backup" cluster, to which they write:
>
>client.rbd_mirror_backup
>  caps: [mon] profile rbd-mirror
>  caps: [osd] profile rbd
>
> and the following shared client with auth in the "primary cluster" from which 
> they are reading:
>
>client.rbd_mirror
>  caps: [mon] profile rbd
>  caps: [osd] profile rbd
>
> i.e. the same auth as described in the docs[0].
>
> Checking on the primary cluster, we get:
>
> # rbd mirror pool status
>health: UNKNOWN
>daemon health: UNKNOWN
>image health: OK
>images: 288 total
>288 replaying
>
> For some reason, some values are "unknown" here. But mirroring seems to work, 
> as checking on the backup cluster reveals, see for example:
>
># rbd mirror image status zabbix-test.example.com-disk2
>  zabbix-test.example.com-disk2:
>  global_id:   1bdcb981-c1c5-4172-9583-be6a6cd996ec
>  state:   up+replaying
>  description: replaying, 
> {"bytes_per_second":8540.27,"entries_behind_primary":0,"entries_per_second":1.8,"non_primary_position":{"entry_tid":869176,"object_number":504,"tag_tid":1},"primary_position":{"entry_tid":11143,"object_number":7,"tag_tid":1}}
>  service: rbd_mirror_backup on rbd-mirror002.example.com
>  last_update: 2024-08-12 09:53:17
>
> However, we do in some seemingly random cases see that journals are never 
> advanced on the primary cluster — staying with the example above, on the 
> primary cluster I find the following:
>
># rbd journal status --image zabbix-test.physik.uni-bonn.de-disk2
>minimum_set: 1
>active_set: 126
>  registered clients:
>[id=, commit_position=[positions=[[object_number=7, tag_tid=1, 
> entry_tid=11143], [object_number=6, tag_tid=1, entry_tid=11142], 
> [object_number=5, tag_tid=1, entry_tid=11141], [object_number=4, tag_tid=1, 
> entry_tid=11140]]], state=connected]
>[id=52b80bb0-a090-4f7d-9950-c8691ed8fee9, 
> commit_position=[positions=[[object_number=505, tag_tid=1, entry_tid=869181], 
> [object_number=504, tag_tid=1, entry_tid=869180], [object_number=507, 
> tag_tid=1, entry_tid=869179], [object_number=506, tag_tid=1, 
> entry_tid=869178]]], state=connected]
>
> As you can see, the minimum_set was not advanced. As can be seen in "mirror 
> image status", it shows the strange output that non_primary_position seems 
> much more advanced than primary_position. This seems to happen "at random" 
> for only a few volumes...
> There are no other active clients apart from the actual VM (libvirt+qemu).

Hi Oliver,

Were the VM clients (i.e. librbd on the hypervisor nodes) upgraded as well?

>
> As a quick fix, to purge journals piling up over and over, we've only found 
> the "solution" to temporarily disable and then re-enable journaling for 
> affected VM disks, which can be identified by:
>   for A in $(rbd ls); do echo -n "$A: "; rbd --format=json journal status 
> --image $A | jq '.active_set - .minimum_set'; done
>
>
> Any idea what is going wrong here?
> This did not happen with the primary cluster running Mimic and the backup 
> cluster running Octopus before, and also did not happen when both were 
> running Mimic.

You might be hitting https://tracker.ceph.com/issues/57396.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Journaling seemingly getting stuck for some VMs after upgrade to Octopus

2024-08-12 Thread Oliver Freyermuth

Am 12.08.24 um 11:09 schrieb Ilya Dryomov:

On Mon, Aug 12, 2024 at 10:20 AM Oliver Freyermuth
 wrote:


Dear Cephalopodians,

we've successfully operated a "good old" Mimic cluster with primary RBD images, 
replicated via journaling to a "backup cluster" with Octopus, for the past years (i.e. 
one-way replication).
We've now finally gotten around upgrading the cluster with the primary images 
to Octopus (and plan to upgrade further in the near future).

After the upgrade, all MON+MGR-OSD+rbd_mirror daemons are running 15.2.17.

We run three rbd-mirror daemons which all share the following client with auth in the 
"backup" cluster, to which they write:

client.rbd_mirror_backup
  caps: [mon] profile rbd-mirror
  caps: [osd] profile rbd

and the following shared client with auth in the "primary cluster" from which 
they are reading:

client.rbd_mirror
  caps: [mon] profile rbd
  caps: [osd] profile rbd

i.e. the same auth as described in the docs[0].

Checking on the primary cluster, we get:

# rbd mirror pool status
health: UNKNOWN
daemon health: UNKNOWN
image health: OK
images: 288 total
288 replaying

For some reason, some values are "unknown" here. But mirroring seems to work, 
as checking on the backup cluster reveals, see for example:

# rbd mirror image status zabbix-test.example.com-disk2
  zabbix-test.example.com-disk2:
  global_id:   1bdcb981-c1c5-4172-9583-be6a6cd996ec
  state:   up+replaying
  description: replaying, 
{"bytes_per_second":8540.27,"entries_behind_primary":0,"entries_per_second":1.8,"non_primary_position":{"entry_tid":869176,"object_number":504,"tag_tid":1},"primary_position":{"entry_tid":11143,"object_number":7,"tag_tid":1}}
  service: rbd_mirror_backup on rbd-mirror002.example.com
  last_update: 2024-08-12 09:53:17

However, we do in some seemingly random cases see that journals are never 
advanced on the primary cluster — staying with the example above, on the 
primary cluster I find the following:

# rbd journal status --image zabbix-test.physik.uni-bonn.de-disk2
minimum_set: 1
active_set: 126
  registered clients:
[id=, commit_position=[positions=[[object_number=7, tag_tid=1, 
entry_tid=11143], [object_number=6, tag_tid=1, entry_tid=11142], 
[object_number=5, tag_tid=1, entry_tid=11141], [object_number=4, tag_tid=1, 
entry_tid=11140]]], state=connected]
[id=52b80bb0-a090-4f7d-9950-c8691ed8fee9, 
commit_position=[positions=[[object_number=505, tag_tid=1, entry_tid=869181], 
[object_number=504, tag_tid=1, entry_tid=869180], [object_number=507, 
tag_tid=1, entry_tid=869179], [object_number=506, tag_tid=1, 
entry_tid=869178]]], state=connected]

As you can see, the minimum_set was not advanced. As can be seen in "mirror image 
status", it shows the strange output that non_primary_position seems much more advanced than 
primary_position. This seems to happen "at random" for only a few volumes...
There are no other active clients apart from the actual VM (libvirt+qemu).


Hi Oliver,

Were the VM clients (i.e. librbd on the hypervisor nodes) upgraded as well?


Hi Ilya,

"some of them" — as a matter of fact, we wanted to stress-test VM restarting 
and live migration first, and in some cases saw VMs stuck for a long time, which is now 
understandable...
 


As a quick fix, to purge journals piling up over and over, we've only found the 
"solution" to temporarily disable and then re-enable journaling for affected VM 
disks, which can be identified by:
   for A in $(rbd ls); do echo -n "$A: "; rbd --format=json journal status 
--image $A | jq '.active_set - .minimum_set'; done


Any idea what is going wrong here?
This did not happen with the primary cluster running Mimic and the backup 
cluster running Octopus before, and also did not happen when both were running 
Mimic.


You might be hitting https://tracker.ceph.com/issues/57396.


Indeed, it looks exactly like that, as we do fsfreeze+fstrim every night 
(before snapshotting) inside all VMs (via qemu-guest-agent).
Correlating affected VMs with upgraded hypervisors reveals that only those VMs 
running on hypervisors with Octopus clients seem affected,
and the issue easily explains why we saw problems with VM shutdown / restart or 
live migration (extremely slowness / VMs almost getting stuck). I can also 
confirm these problems seem to vanish when disabling journaling.

So many thanks, this does indeed explain a lot :-). It also means the bug is 
still present in Octopus, but fixed in Pacific and later.

We'll likely switch to snapshot-based mirroring in the next weeks (now that we 
know that this will avoid the problem), then finish the upgrade of all 
hypervisors to Octopus, and only then attack Pacific and later.

Cheers and many thanks,
Oliver


--
Oliver Freyermuth
Universität Bonn
Physikalisches Institut, Raum 1.047
Nußallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 78

[ceph-users] Re: RBD Journaling seemingly getting stuck for some VMs after upgrade to Octopus

2024-08-12 Thread Ilya Dryomov
On Mon, Aug 12, 2024 at 11:28 AM Oliver Freyermuth
 wrote:
>
> Am 12.08.24 um 11:09 schrieb Ilya Dryomov:
> > On Mon, Aug 12, 2024 at 10:20 AM Oliver Freyermuth
> >  wrote:
> >>
> >> Dear Cephalopodians,
> >>
> >> we've successfully operated a "good old" Mimic cluster with primary RBD 
> >> images, replicated via journaling to a "backup cluster" with Octopus, for 
> >> the past years (i.e. one-way replication).
> >> We've now finally gotten around upgrading the cluster with the primary 
> >> images to Octopus (and plan to upgrade further in the near future).
> >>
> >> After the upgrade, all MON+MGR-OSD+rbd_mirror daemons are running 15.2.17.
> >>
> >> We run three rbd-mirror daemons which all share the following client with 
> >> auth in the "backup" cluster, to which they write:
> >>
> >> client.rbd_mirror_backup
> >>   caps: [mon] profile rbd-mirror
> >>   caps: [osd] profile rbd
> >>
> >> and the following shared client with auth in the "primary cluster" from 
> >> which they are reading:
> >>
> >> client.rbd_mirror
> >>   caps: [mon] profile rbd
> >>   caps: [osd] profile rbd
> >>
> >> i.e. the same auth as described in the docs[0].
> >>
> >> Checking on the primary cluster, we get:
> >>
> >> # rbd mirror pool status
> >> health: UNKNOWN
> >> daemon health: UNKNOWN
> >> image health: OK
> >> images: 288 total
> >> 288 replaying
> >>
> >> For some reason, some values are "unknown" here. But mirroring seems to 
> >> work, as checking on the backup cluster reveals, see for example:
> >>
> >> # rbd mirror image status zabbix-test.example.com-disk2
> >>   zabbix-test.example.com-disk2:
> >>   global_id:   1bdcb981-c1c5-4172-9583-be6a6cd996ec
> >>   state:   up+replaying
> >>   description: replaying, 
> >> {"bytes_per_second":8540.27,"entries_behind_primary":0,"entries_per_second":1.8,"non_primary_position":{"entry_tid":869176,"object_number":504,"tag_tid":1},"primary_position":{"entry_tid":11143,"object_number":7,"tag_tid":1}}
> >>   service: rbd_mirror_backup on rbd-mirror002.example.com
> >>   last_update: 2024-08-12 09:53:17
> >>
> >> However, we do in some seemingly random cases see that journals are never 
> >> advanced on the primary cluster — staying with the example above, on the 
> >> primary cluster I find the following:
> >>
> >> # rbd journal status --image zabbix-test.physik.uni-bonn.de-disk2
> >> minimum_set: 1
> >> active_set: 126
> >>   registered clients:
> >> [id=, commit_position=[positions=[[object_number=7, tag_tid=1, 
> >> entry_tid=11143], [object_number=6, tag_tid=1, entry_tid=11142], 
> >> [object_number=5, tag_tid=1, entry_tid=11141], [object_number=4, 
> >> tag_tid=1, entry_tid=11140]]], state=connected]
> >> [id=52b80bb0-a090-4f7d-9950-c8691ed8fee9, 
> >> commit_position=[positions=[[object_number=505, tag_tid=1, 
> >> entry_tid=869181], [object_number=504, tag_tid=1, entry_tid=869180], 
> >> [object_number=507, tag_tid=1, entry_tid=869179], [object_number=506, 
> >> tag_tid=1, entry_tid=869178]]], state=connected]
> >>
> >> As you can see, the minimum_set was not advanced. As can be seen in 
> >> "mirror image status", it shows the strange output that 
> >> non_primary_position seems much more advanced than primary_position. This 
> >> seems to happen "at random" for only a few volumes...
> >> There are no other active clients apart from the actual VM (libvirt+qemu).
> >
> > Hi Oliver,
> >
> > Were the VM clients (i.e. librbd on the hypervisor nodes) upgraded as well?
>
> Hi Ilya,
>
> "some of them" — as a matter of fact, we wanted to stress-test VM restarting 
> and live migration first, and in some cases saw VMs stuck for a long time, 
> which is now understandable...
>
> >>
> >> As a quick fix, to purge journals piling up over and over, we've only 
> >> found the "solution" to temporarily disable and then re-enable journaling 
> >> for affected VM disks, which can be identified by:
> >>for A in $(rbd ls); do echo -n "$A: "; rbd --format=json journal status 
> >> --image $A | jq '.active_set - .minimum_set'; done
> >>
> >>
> >> Any idea what is going wrong here?
> >> This did not happen with the primary cluster running Mimic and the backup 
> >> cluster running Octopus before, and also did not happen when both were 
> >> running Mimic.
> >
> > You might be hitting https://tracker.ceph.com/issues/57396.
>
> Indeed, it looks exactly like that, as we do fsfreeze+fstrim every night 
> (before snapshotting) inside all VMs (via qemu-guest-agent).
> Correlating affected VMs with upgraded hypervisors reveals that only those 
> VMs running on hypervisors with Octopus clients seem affected,
> and the issue easily explains why we saw problems with VM shutdown / restart 
> or live migration (extremely slowness / VMs almost getting stuck). I can also 
> confirm these problems seem to vanish when disabling journaling.
>
> So many thanks, this does 

[ceph-users] Re: RBD Journaling seemingly getting stuck for some VMs after upgrade to Octopus

2024-08-12 Thread Oliver Freyermuth

Am 12.08.24 um 12:16 schrieb Ilya Dryomov:

On Mon, Aug 12, 2024 at 11:28 AM Oliver Freyermuth
 wrote:


Am 12.08.24 um 11:09 schrieb Ilya Dryomov:

On Mon, Aug 12, 2024 at 10:20 AM Oliver Freyermuth
 wrote:


Dear Cephalopodians,

we've successfully operated a "good old" Mimic cluster with primary RBD images, 
replicated via journaling to a "backup cluster" with Octopus, for the past years (i.e. 
one-way replication).
We've now finally gotten around upgrading the cluster with the primary images 
to Octopus (and plan to upgrade further in the near future).

After the upgrade, all MON+MGR-OSD+rbd_mirror daemons are running 15.2.17.

We run three rbd-mirror daemons which all share the following client with auth in the 
"backup" cluster, to which they write:

 client.rbd_mirror_backup
   caps: [mon] profile rbd-mirror
   caps: [osd] profile rbd

and the following shared client with auth in the "primary cluster" from which 
they are reading:

 client.rbd_mirror
   caps: [mon] profile rbd
   caps: [osd] profile rbd

i.e. the same auth as described in the docs[0].

Checking on the primary cluster, we get:

# rbd mirror pool status
 health: UNKNOWN
 daemon health: UNKNOWN
 image health: OK
 images: 288 total
 288 replaying

For some reason, some values are "unknown" here. But mirroring seems to work, 
as checking on the backup cluster reveals, see for example:

 # rbd mirror image status zabbix-test.example.com-disk2
   zabbix-test.example.com-disk2:
   global_id:   1bdcb981-c1c5-4172-9583-be6a6cd996ec
   state:   up+replaying
   description: replaying, 
{"bytes_per_second":8540.27,"entries_behind_primary":0,"entries_per_second":1.8,"non_primary_position":{"entry_tid":869176,"object_number":504,"tag_tid":1},"primary_position":{"entry_tid":11143,"object_number":7,"tag_tid":1}}
   service: rbd_mirror_backup on rbd-mirror002.example.com
   last_update: 2024-08-12 09:53:17

However, we do in some seemingly random cases see that journals are never 
advanced on the primary cluster — staying with the example above, on the 
primary cluster I find the following:

 # rbd journal status --image zabbix-test.physik.uni-bonn.de-disk2
 minimum_set: 1
 active_set: 126
   registered clients:
 [id=, commit_position=[positions=[[object_number=7, tag_tid=1, 
entry_tid=11143], [object_number=6, tag_tid=1, entry_tid=11142], 
[object_number=5, tag_tid=1, entry_tid=11141], [object_number=4, tag_tid=1, 
entry_tid=11140]]], state=connected]
 [id=52b80bb0-a090-4f7d-9950-c8691ed8fee9, 
commit_position=[positions=[[object_number=505, tag_tid=1, entry_tid=869181], 
[object_number=504, tag_tid=1, entry_tid=869180], [object_number=507, 
tag_tid=1, entry_tid=869179], [object_number=506, tag_tid=1, 
entry_tid=869178]]], state=connected]

As you can see, the minimum_set was not advanced. As can be seen in "mirror image 
status", it shows the strange output that non_primary_position seems much more advanced than 
primary_position. This seems to happen "at random" for only a few volumes...
There are no other active clients apart from the actual VM (libvirt+qemu).


Hi Oliver,

Were the VM clients (i.e. librbd on the hypervisor nodes) upgraded as well?


Hi Ilya,

"some of them" — as a matter of fact, we wanted to stress-test VM restarting 
and live migration first, and in some cases saw VMs stuck for a long time, which is now 
understandable...



As a quick fix, to purge journals piling up over and over, we've only found the 
"solution" to temporarily disable and then re-enable journaling for affected VM 
disks, which can be identified by:
for A in $(rbd ls); do echo -n "$A: "; rbd --format=json journal status 
--image $A | jq '.active_set - .minimum_set'; done


Any idea what is going wrong here?
This did not happen with the primary cluster running Mimic and the backup 
cluster running Octopus before, and also did not happen when both were running 
Mimic.


You might be hitting https://tracker.ceph.com/issues/57396.


Indeed, it looks exactly like that, as we do fsfreeze+fstrim every night 
(before snapshotting) inside all VMs (via qemu-guest-agent).
Correlating affected VMs with upgraded hypervisors reveals that only those VMs 
running on hypervisors with Octopus clients seem affected,
and the issue easily explains why we saw problems with VM shutdown / restart or 
live migration (extremely slowness / VMs almost getting stuck). I can also 
confirm these problems seem to vanish when disabling journaling.

So many thanks, this does indeed explain a lot :-). It also means the bug is 
still present in Octopus, but fixed in Pacific and later.

We'll likely switch to snapshot-based mirroring in the next weeks (now that we 
know that this will avoid the problem), then finish the upgrade of all 
hypervisors to Octopus, and only then attack Pacific and later.


Are any of your VM images clones (in "rbd snap creat

[ceph-users] Re: Ceph Logging Configuration and "Large omap objects found"

2024-08-12 Thread Eugen Block
I just played a bit more with the 'ceph log last' command, it doesn't  
have a large retention time, the messages get cleared out quickly, I  
suppose because they haven't changed. I'll take a closer look if and  
how that can be handled properly.



Zitat von Janek Bevendorff :



That's where the 'ceph log last' commands should help you out, but  
I don't know why you don't see it, maybe increase the number of  
lines to display or something?


BTW, which ceph version are we talking about here?


reef.

I tried ceph log last 100 debug cluster and that gives me the usual  
DBG spam that I otherwise see in the MON logs. But there are no  
messages above that level.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Search for a professional service to audit a CephFS infrastructure

2024-08-12 Thread Fabien Sirjean

Hello,

In a professional context, I'm looking for someone with strong CephFS 
expertise to help us audit our infrastructure.
We prefer an on-site audit, but are open to working remotely, and can 
provide any documentation or information required.


Please note that we are not currently in a blocking situation (the 
storage is functional and meets requirements), but we would like to 
anticipate an increase in workload and identify bottlenecks and areas 
for improvement.


Our infrastructure, located in Grenoble (France), consists of 12 servers 
in three server rooms, each server containing 60 HDD of 18.2 TiB, for a 
total of ~12 PiB raw and 4 PiB net (replicated size 3).
We currently store 1.72 PiB (net) of data, in a single CephFS filesystem 
with ~300 clients (linux kernel). A specific pool is dedicated to MDS, 
backed with NVMe OSDs.
Network is 2 x 25Gbps per server and 2 x 100Gbps between the three 
server rooms, with jumbo frames for the replication network.


This CephFS storage houses the scientific data produced by some fifty 
fundamental research instruments, as well as the analysis of these data.


Our questions mainly concern the configuration of the MDS (currently 6 
active MDS + 6 standby-replay), such as :

 - Best number of MDS for our workload
 - Pinning MDS to specific parts of the filesystem ?
 - MDS cache size
 - CephFS clients performances
 - ...

If you'd like to keep in touch (especially commercially), don't hesitate 
to contact me at my work address: sirj...@ill.fr


Cheers,

--
Fabien Sirjean

Head of IT Infrastructures (DPT/SI/INFRA)
Institut Laue Langevin (ILL)
+33 (0)4 76 20 76 46 / +33 (0)6 62 47 52 80


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Identify laggy PGs

2024-08-12 Thread Boris
hmm.. will try that.

Thanks

Am Sa., 10. Aug. 2024 um 13:33 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:

> We either facing that.
> Have a look in the logs reported failed osd. I count the occurrence and
> offline compact those, it can help for a while. Normally for us compacting
> blocking the operation on it.
>
> Istvan
> --
> *From:* Boris 
> *Sent:* Saturday, August 10, 2024 5:30:54 PM
> *To:* ceph-users@ceph.io 
> *Subject:* [ceph-users] Identify laggy PGs
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> Hi,
>
> currently we encouter laggy PGs and I would like to find out what is
> causing it.
> I suspect it might be one or more failing OSDs. We had flapping OSDs and I
> synced one out, which helped with the flapping, but it doesn't help with
> the laggy ones.
>
> Any tooling to identify or count PG performance and map that to OSDs?
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm and the "--data-dir" Argument

2024-08-12 Thread Alex Hussein-Kershaw (HE/HIM)
Hi Folks,

I'm trying to use the --data-dir argument of cephadm when bootstrapping a 
Storage Cluster. It looks like exactly what I need, where my use case is that I 
want to data files onto a persistent disk, such that I can below away my VMs 
while retaining the files.

Everything looks good and the bootstrap command completes. For reference I am 
running this command:

"sudo cephadm --image "ceph/squid:v19.1.0" --docker --data-dir 
/cephconfig/var/lib/ceph bootstrap --mon-ip 10.235.22.23 --ssh-user qs-admin 
--ssh-private-key /home/qs-admin/.ssh/id_rsa --ssh-public-key 
/home/qs-admin/.ssh/id_rsa.pub --output-dir /cephconfig/etc/ceph 
--skip-dashboard --skip-monitoring-stack  --skip-pull --config my.conf"

However, when I then try to continue with the deployment of my Storage Cluster, 
I find that I can't authenticate with the monitors. I run the suggested command 
to drop into a cephadm shell which then can't speak to the Storage Cluster. For 
example:

$ ceph -s
2024-08-12T10:47:07.862+ 7f998e59c640 -1 monclient(hunting): 
handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
[errno 13] RADOS permission denied (error connecting to the cluster)

In the MON logs at the same time I can see:
"cephx server client.admin: unexpected key: req.key=2c62e1471f111d12 
expected_key=d18ce06d18f116b4"

In the systemd unit files created I see:

...
ExecStart=/bin/bash 
/var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.run
ExecStop=-/bin/bash -c 'bash 
/var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.stop'
ExecStopPost=-/bin/bash 
/var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.poststop
...

Which does not contain my data directory. Looking at the source template it 
appears that it should:
ceph/src/cephadm/cephadmlib/templates/ceph.service.j2 at 
616fbc1b181ce15e49281553b35ca215d2aa1053 · ceph/ceph 
(github.com)

Manually modifying the unit file, reloading systemd and restarting the mon 
makes the authentication issue go away, although cephadm seems to be 
periodically rewriting my file and undoing the changes. Is there a templating 
bug in here? I note that there are no other variables being templated from the 
ctx in this jinja2 template so it seems likely it is broken.

Many thanks,
Alex



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm and the "--data-dir" Argument

2024-08-12 Thread Adam King
Looking through the code it doesn't seem like this will work currently. I
found that the --data-dir arg to the cephadm binary was from the initial
implementation of the cephadm binary (so early that it was actually called
"ceph-daemon" at the time rather than "cephadm") but it doesn't look like
that worked included anything to connect it to the cephadm mgr module. So,
after bootstrapping the cluster, whenever the cephadm mgr module calls out
to the binary to deploy any daemon, it sets the data dir back to the
default, hence why you're seeing the unit files being overwritten. This
seems to be the case fo all of the `---dir` parameters (unit, log,
sysctl, logrotate, and data). We are doing the planning session for the
next release tomorrow. I might add this as a topic to look into. But for
now, unfortunately, it simply won't work without a large amount of manual
effort.

On Mon, Aug 12, 2024 at 10:06 AM Alex Hussein-Kershaw (HE/HIM) <
alex...@microsoft.com> wrote:

> Hi Folks,
>
> I'm trying to use the --data-dir argument of cephadm when bootstrapping a
> Storage Cluster. It looks like exactly what I need, where my use case is
> that I want to data files onto a persistent disk, such that I can below
> away my VMs while retaining the files.
>
> Everything looks good and the bootstrap command completes. For reference I
> am running this command:
>
> "sudo cephadm --image "ceph/squid:v19.1.0" --docker --data-dir
> /cephconfig/var/lib/ceph bootstrap --mon-ip 10.235.22.23 --ssh-user
> qs-admin --ssh-private-key /home/qs-admin/.ssh/id_rsa --ssh-public-key
> /home/qs-admin/.ssh/id_rsa.pub --output-dir /cephconfig/etc/ceph
> --skip-dashboard --skip-monitoring-stack  --skip-pull --config my.conf"
>
> However, when I then try to continue with the deployment of my Storage
> Cluster, I find that I can't authenticate with the monitors. I run the
> suggested command to drop into a cephadm shell which then can't speak to
> the Storage Cluster. For example:
>
> $ ceph -s
> 2024-08-12T10:47:07.862+ 7f998e59c640 -1 monclient(hunting):
> handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
> [errno 13] RADOS permission denied (error connecting to the cluster)
>
> In the MON logs at the same time I can see:
> "cephx server client.admin: unexpected key: req.key=2c62e1471f111d12
> expected_key=d18ce06d18f116b4"
>
> In the systemd unit files created I see:
>
> ...
> ExecStart=/bin/bash
> /var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.run
> ExecStop=-/bin/bash -c 'bash
> /var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.stop'
> ExecStopPost=-/bin/bash
> /var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.poststop
> ...
>
> Which does not contain my data directory. Looking at the source template
> it appears that it should:
> ceph/src/cephadm/cephadmlib/templates/ceph.service.j2 at
> 616fbc1b181ce15e49281553b35ca215d2aa1053 · ceph/ceph (github.com)<
> https://github.com/ceph/ceph/blob/616fbc1b181ce15e49281553b35ca215d2aa1053/src/cephadm/cephadmlib/templates/ceph.service.j2#L22
> >
>
> Manually modifying the unit file, reloading systemd and restarting the mon
> makes the authentication issue go away, although cephadm seems to be
> periodically rewriting my file and undoing the changes. Is there a
> templating bug in here? I note that there are no other variables being
> templated from the ctx in this jinja2 template so it seems likely it is
> broken.
>
> Many thanks,
> Alex
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Cephadm and the "--data-dir" Argument

2024-08-12 Thread Alex Hussein-Kershaw (HE/HIM)
Thanks Adam - noted, I expect we can make something else work to meet our needs 
here.

I don't know just how many monsters may be under the bed here - but if it's a 
fix that's appropriate for someone who doesn't know the Ceph codebase  (me) I'd 
be happy to have a look at implementing a fix.

Best Wishes,
Alex


From: Adam King 
Sent: Monday, August 12, 2024 4:05 PM
To: Alex Hussein-Kershaw (HE/HIM) 
Cc: ceph-users ; Joseph Silva 
Subject: [EXTERNAL] Re: [ceph-users] Cephadm and the "--data-dir" Argument

Looking through the code it doesn't seem like this will work currently. I found 
that the --data-dir arg to the cephadm binary was from the initial 
implementation of the cephadm binary (so early that it was actually called 
"ceph-daemon" at the time rather than "cephadm") but it doesn't look like that 
worked included anything to connect it to the cephadm mgr module. So, after 
bootstrapping the cluster, whenever the cephadm mgr module calls out to the 
binary to deploy any daemon, it sets the data dir back to the default, hence 
why you're seeing the unit files being overwritten. This seems to be the case 
fo all of the `---dir` parameters (unit, log, sysctl, logrotate, and 
data). We are doing the planning session for the next release tomorrow. I might 
add this as a topic to look into. But for now, unfortunately, it simply won't 
work without a large amount of manual effort.

On Mon, Aug 12, 2024 at 10:06 AM Alex Hussein-Kershaw (HE/HIM) 
mailto:alex...@microsoft.com>> wrote:
Hi Folks,

I'm trying to use the --data-dir argument of cephadm when bootstrapping a 
Storage Cluster. It looks like exactly what I need, where my use case is that I 
want to data files onto a persistent disk, such that I can below away my VMs 
while retaining the files.

Everything looks good and the bootstrap command completes. For reference I am 
running this command:

"sudo cephadm --image "ceph/squid:v19.1.0" --docker --data-dir 
/cephconfig/var/lib/ceph bootstrap --mon-ip 10.235.22.23 --ssh-user qs-admin 
--ssh-private-key /home/qs-admin/.ssh/id_rsa --ssh-public-key 
/home/qs-admin/.ssh/id_rsa.pub --output-dir /cephconfig/etc/ceph 
--skip-dashboard --skip-monitoring-stack  --skip-pull --config my.conf"

However, when I then try to continue with the deployment of my Storage Cluster, 
I find that I can't authenticate with the monitors. I run the suggested command 
to drop into a cephadm shell which then can't speak to the Storage Cluster. For 
example:

$ ceph -s
2024-08-12T10:47:07.862+ 7f998e59c640 -1 monclient(hunting): 
handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
[errno 13] RADOS permission denied (error connecting to the cluster)

In the MON logs at the same time I can see:
"cephx server client.admin: unexpected key: req.key=2c62e1471f111d12 
expected_key=d18ce06d18f116b4"

In the systemd unit files created I see:

...
ExecStart=/bin/bash 
/var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.run
ExecStop=-/bin/bash -c 'bash 
/var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.stop'
ExecStopPost=-/bin/bash 
/var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.poststop
...

Which does not contain my data directory. Looking at the source template it 
appears that it should:
ceph/src/cephadm/cephadmlib/templates/ceph.service.j2 at 
616fbc1b181ce15e49281553b35ca215d2aa1053 · ceph/ceph 
(github.com)

Manually modifying the unit file, reloading systemd and restarting the mon 
makes the authentication issue go away, although cephadm seems to be 
periodically rewriting my file and undoing the changes. Is there a templating 
bug in here? I note that there are no other variables being templated from the 
ctx in this jinja2 template so it seems likely it is broken.

Many thanks,
Alex



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Cephadm and the "--data-dir" Argument

2024-08-12 Thread Adam King
I think if it was locked in from bootstrap time it might not be that
complicated. We'd just have to store the directory paths in some
persistent location the module can access and make ceph cephadm mgr module
use them when calling out to the binary for any further actions. This does
have the slight issue that technically all the places cephadm uses for
storing persistent settings can be modified by the user (config options and
config-key store entries) although cephadm already has a number of other
config-key store entries it doesn't expect users to modify so that might be
fine. It gets more complicated if you allow users to change it post
bootstrap as we'd have to implement some kind of migration of the existing
dir on the hosts to the new location and make sure we don't make any calls
using the new location until the migration is completed. That would take
much more investigation.

On Mon, Aug 12, 2024 at 11:13 AM Alex Hussein-Kershaw (HE/HIM) <
alex...@microsoft.com> wrote:

> Thanks Adam - noted, I expect we can make something else work to meet our
> needs here.
>
> I don't know just how many monsters may be under the bed here - but if
> it's a fix that's appropriate for someone who doesn't know the Ceph
> codebase  (me) I'd be happy to have a look at implementing a fix.
>
> Best Wishes,
> Alex
>
> --
> *From:* Adam King 
> *Sent:* Monday, August 12, 2024 4:05 PM
> *To:* Alex Hussein-Kershaw (HE/HIM) 
> *Cc:* ceph-users ; Joseph Silva <
> t-josi...@microsoft.com>
> *Subject:* [EXTERNAL] Re: [ceph-users] Cephadm and the "--data-dir"
> Argument
>
> Looking through the code it doesn't seem like this will work currently. I
> found that the --data-dir arg to the cephadm binary was from the initial
> implementation of the cephadm binary (so early that it was actually called
> "ceph-daemon" at the time rather than "cephadm") but it doesn't look like
> that worked included anything to connect it to the cephadm mgr module. So,
> after bootstrapping the cluster, whenever the cephadm mgr module calls out
> to the binary to deploy any daemon, it sets the data dir back to the
> default, hence why you're seeing the unit files being overwritten. This
> seems to be the case fo all of the `---dir` parameters (unit, log,
> sysctl, logrotate, and data). We are doing the planning session for the
> next release tomorrow. I might add this as a topic to look into. But for
> now, unfortunately, it simply won't work without a large amount of manual
> effort.
>
> On Mon, Aug 12, 2024 at 10:06 AM Alex Hussein-Kershaw (HE/HIM) <
> alex...@microsoft.com> wrote:
>
> Hi Folks,
>
> I'm trying to use the --data-dir argument of cephadm when bootstrapping a
> Storage Cluster. It looks like exactly what I need, where my use case is
> that I want to data files onto a persistent disk, such that I can below
> away my VMs while retaining the files.
>
> Everything looks good and the bootstrap command completes. For reference I
> am running this command:
>
> "sudo cephadm --image "ceph/squid:v19.1.0" --docker --data-dir
> /cephconfig/var/lib/ceph bootstrap --mon-ip 10.235.22.23 --ssh-user
> qs-admin --ssh-private-key /home/qs-admin/.ssh/id_rsa --ssh-public-key
> /home/qs-admin/.ssh/id_rsa.pub --output-dir /cephconfig/etc/ceph
> --skip-dashboard --skip-monitoring-stack  --skip-pull --config my.conf"
>
> However, when I then try to continue with the deployment of my Storage
> Cluster, I find that I can't authenticate with the monitors. I run the
> suggested command to drop into a cephadm shell which then can't speak to
> the Storage Cluster. For example:
>
> $ ceph -s
> 2024-08-12T10:47:07.862+ 7f998e59c640 -1 monclient(hunting):
> handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
> [errno 13] RADOS permission denied (error connecting to the cluster)
>
> In the MON logs at the same time I can see:
> "cephx server client.admin: unexpected key: req.key=2c62e1471f111d12
> expected_key=d18ce06d18f116b4"
>
> In the systemd unit files created I see:
>
> ...
> ExecStart=/bin/bash
> /var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.run
> ExecStop=-/bin/bash -c 'bash
> /var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.stop'
> ExecStopPost=-/bin/bash
> /var/lib/ceph/64415fba-58b0-11ef-9d27-005056014e4f/%i/unit.poststop
> ...
>
> Which does not contain my data directory. Looking at the source template
> it appears that it should:
> ceph/src/cephadm/cephadmlib/templates/ceph.service.j2 at
> 616fbc1b181ce15e49281553b35ca215d2aa1053 · ceph/ceph (github.com)<
> https://github.com/ceph/ceph/blob/616fbc1b181ce15e49281553b35ca215d2aa1053/src/cephadm/cephadmlib/templates/ceph.service.j2#L22
> >
>
> Manually modifying the unit file, reloading systemd and restarting the mon
> makes the authentication issue go away, although cephadm seems to be
> periodically rewriting my file and undoing the changes. Is there a
> templating bug in here? I note that there are no other variables being
> templat

[ceph-users] Important Community Updates [Ceph Developer Summit, Cephalocon]

2024-08-12 Thread Noah Lehman
Hi Ceph community,

I have two important updates to share with you about the Ceph Developer
Summit and Cephalocon 20204.

*Ceph Developer Summit*
The Ceph Developer Summit has been extended until August 20th. Find
everything you need to know about the event, including the program and
other important details, here
.

*Cephalocon 2024*
The CFP deadline to submit a speaking proposal has been extended until
August 18th. If you're interested in participating as a speaker at the
event, you can send in a proposal here
.

Thanks everyone!

Noah
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Stable and fastest ceph version for RBD cluster.

2024-08-12 Thread Özkan Göksu
Hello folks!

I built a cluster in 2020 and it has been working great with
Nautilus 14.2.16 for the past 4 years.
I have 1000++ RBD drives for VM's running on Samsung MZ7LH3T8HMLT drives.

Now I want to upgrade the ceph version with a fresh installation and I want
to take your opinion on which version will be the best choice for me. I
want to upgrade it once and I won't touch it again minimum of 2 years.

Does anyone have RBD performance comparison from nautilus, octopus, pacific
and quincy?
I just want to learn the important changes and benefits of this upgrade.

Best regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stable and fastest ceph version for RBD cluster.

2024-08-12 Thread Mark Nelson

Hi Özkan,

I've written a couple of articles that might be helpful:

https://ceph.io/en/news/blog/2023/reef-osds-per-nvme/
https://ceph.io/en/news/blog/2023/reef-freeze-rbd-performance/
https://ceph.io/en/news/blog/2023/reef-freeze-rgw-performance/
https://ceph.io/en/news/blog/2024/ceph-a-journey-to-1tibps/

There have been a lot of improvements since Nautilus, but some of the 
biggest revolve around bluestore cache handling, memory management, 
better RocksDB tuning, RocksDB column families, allocation metadata, and 
threading improvements (both OSD and client side).  There has also been 
a significant (though unavoidable) regression due to a fix in RocksDB 
for a potential data corruption issue.


At Clyso we're still deploying Quincy (with some of the new tunings from 
Reef and Squid).  Reef hasn't gotten a ton of updates so I suspect we'll 
probably jump straight from Quincy to Squid once it's gotten some a 
point release or two.  Hope that helps!


Mark

On 8/12/24 12:18, Özkan Göksu wrote:

Hello folks!

I built a cluster in 2020 and it has been working great with
Nautilus 14.2.16 for the past 4 years.
I have 1000++ RBD drives for VM's running on Samsung MZ7LH3T8HMLT drives.

Now I want to upgrade the ceph version with a fresh installation and I want
to take your opinion on which version will be the best choice for me. I
want to upgrade it once and I won't touch it again minimum of 2 years.

Does anyone have RBD performance comparison from nautilus, octopus, pacific
and quincy?
I just want to learn the important changes and benefits of this upgrade.

Best regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-12 Thread Laura Flores
Hey @Adam King , can you take a look at this tracker?
https://tracker.ceph.com/issues/66883#note-26 I summarized the full issue
in the last note. I believe it is an orch problem blocking the
upgrade tests, and I would like to hear your thoughts.

On Fri, Aug 9, 2024 at 9:14 AM Adam King  wrote:

> orch approved
>
> On Mon, Aug 5, 2024 at 4:33 PM Yuri Weinstein  wrote:
>
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/67340#note-1
> >
> > Release Notes - N/A
> > LRC upgrade - N/A
> > Gibba upgrade -TBD
> >
> > Seeking approvals/reviews for:
> >
> > rados - Radek, Laura (https://github.com/ceph/ceph/pull/59020 is being
> > tested and will be cherry-picked when ready)
> >
> > rgw - Eric, Adam E
> > fs - Venky
> > orch - Adam King
> > rbd, krbd - Ilya
> >
> > quincy-x, reef-x - Laura, Neha
> >
> > powercycle - Brad
> > crimson-rados - Matan, Samuel
> >
> > ceph-volume - Guillaume
> >
> > Pls let me know if any tests were missed from this list.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io