Hi Eugen,

just to add another strangeness observation from long ago: 
https://www.spinics.net/lists/ceph-users/msg74655.html. I didn't see any 
reweights in your trees, so its something else. However, there seem to be 
multiple issues with EC pools and peering.

I also want to clarify:

> If this is the case, it is possible that this is partly intentional and 
> partly buggy.

"Partly intentional" here means the code behaviour changes when you add OSDs to 
the root outside the rooms and this change is not considered a bug. It is 
clearly *not* expected as it means you cannot do maintenance on a pool living 
on a tree A without affecting pools on the same device class living on an 
unmodified subtree of A.

>From a ceph user's point of view everything you observe looks buggy. I would 
>really like to see a good explanation why the mappings in the subtree *should* 
>change when adding OSDs above that subtree as in your case when the 
>expectation for good reasons is that they don't. This would help devising 
>clean procedures for adding hosts when you (and I) want to add OSDs first 
>without any peering and then move OSDs into place to have it happen separate 
>from adding and not a total mess with everything in parallel.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <fr...@dtu.dk>
Sent: Thursday, May 23, 2024 6:32 PM
To: Eugen Block
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: unknown PGs after adding hosts in different subtree

Hi Eugen,

I'm at home now. Could you please check all the remapped PGs that they have no 
shards on the new OSDs, i.e. its just shuffling around mappings within the same 
set of OSDs under rooms?

If this is the case, it is possible that this is partly intentional and partly 
buggy. The remapping is then probably intentional and the method I use with a 
disjoint tree for new hosts prevents such remappings initially (the crush code 
sees the new OSDs in the root, doesn't use them but their presence does change 
choice orders resulting in remapped PGs). However, the unknown PGs should 
clearly not occur.

I'm afraid that the peering code has quite a few bugs, I reported something at 
least similarly weird a long time ago: https://tracker.ceph.com/issues/56995 
and https://tracker.ceph.com/issues/46847. Might even be related. It looks like 
peering can loose track of PG members in certain situations (specifically after 
adding OSDs until rebalancing completed). In my cases, I get degraded objects 
even though everything is obviously still around. Flipping between the 
crush-maps before/after the change re-discovers everything again.

Issue 46847 is long-standing and still unresolved. In case you need to file a 
tracker, please consider to refer to the two above as well as "might be 
related" if you deem that they might be related.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to