[389-users] Re: Determining max CSN of running server

Thierry Bordaz Fri, 01 Mar 2024 02:12:47 -0800


On 2/29/24 21:31, William Faulk wrote:

Thanks, Pierre and Thierry.


After quite some time of poring over these debug logs, I've found some 
anomalies and they seem like they're matching up with the idea that the 
affected replica isn't updating its own RUV correctly.

The logs show a change being made, and it lists the CSN of the change. The 
first anomalies are here, but they probably aren't terribly significant. The 
CSN includes a timestamp, and the timestamp on this CSN is 11 hours into the 
future from when the change was made and logged. Also, the next part of the CSN 
is supposed to be a serial number for when there are changes made during the 
same second of the timestamp. In the case I was looking at, that serial was 
0xb231. I'm certain that this replica didn't record another 45000 changes in 
that second.


Hi William,

Are you running DS on a VM, container, HW ?

The fact that the CSN timestamp is some time in the future is notfrequent but can happen. Generated CSN should always been increasing, sothe generation of CSN ajust its timestamp with the received CSN.What looks weird is the number of serial number. Do you have a fullerror log sample where we can see sequence number moving to such highnumber (0xb231) ? C


Then it shows the server committing the change to the changelog. It shows it 
"processing data" for over 16000 other CSNs, and it takes about 25 seconds to 
complete.

It then starts a replication session with the peer and prints out the peer's 
(consumer's) RUV and then its own (supplier's) RUV. The RUV it prints out for 
itself shows the maxCSN for itself with a timestamp from almost 4 months ago. 
It is greater than the maxCSN for itself in the consumer's RUV, though, by a 
little. (The replicagenerations are equal, though.)

IIUC the consumer is currently catching up. Is the RUV, on the consumer,evolving ?


It then claims to send 7 changes, all of which are skipped because "empty". It then 
claims that there are "No more updates to send" and releases the consumer and eventually 
closes the connection.

Do you have fractional replication ? (some attributes are skipped fromreplication)


I like the idea that there's a list of pending operations that's blocking RUV updates. Is 
there any way for me to examine this list? That said, I do think it updated its own 
maxCSN in its own RUV by a few hours. The peer I'm looking at does seem to reflect the 
increased maxCSN for the bad replica in the RUV I can see in the "mapping 
tree". I've tried to reproduce this small update, but haven't been able to yet.

difficult to say. pending list has likely a different meaning in myunderstanding.


I also have another replica that seems to be experiencing the same problem, and 
I've restarted it with no improvement in symptoms. It might be different, 
though. It doesn't look like it discarded its changelog.

I definitely don't relish reinitializing from this bad replica, though. I'd 
have to perform a rolling reinitialization throughout our whole environment, 
and it takes ages and a lot of effort.

--
_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

[389-users] Re: Determining max CSN of running server

Reply via email to