Re: Issue with Both Diskful Nodes Being Outdated in DRBD9

2025-01-15 Thread Joel Colledge
Hi Rui, This should be fixed by the following commit from Phil: https://github.com/LINBIT/drbd/commit/d8214d47d13a4c9b5c8f8cae7989de7996983688 Please test it to ensure that it fixes your precise scenario. Best regards, Joel On Tue, 7 Jan 2025 at 17:12, Joel Colledge wrote: > Hi Rui, >

Re: Issue with Both Diskful Nodes Being Outdated in DRBD9

2025-01-07 Thread Joel Colledge
Hi Rui, > I installed DRBD 9.2.12 and retested, but the issue persists. > > I think the logic of this problem is quite clear. First, an Inconsistent > replication serving as a sync target can be promoted to the primary when it > is connected to an uptodate replication. Next, if the connection wi

Re: Issue with Both Diskful Nodes Being Outdated in DRBD9

2024-12-10 Thread Joel Colledge
Hello Rui, Thank you for the clear report. > I am using DRBD 9.2.8 Please test again with DRBD 9.2.12. There have been some improvements in this area since DRBD 9.2.8 such as: 44cb5fa46478 drbd: Avoid wrong outdates in far_away_change() It is possible that the problem you have discovered is fix

Re: Kernel Panic with 9.2.8

2024-04-12 Thread Joel Colledge
Hallo Aleksandr, Thanks for the report. We are looking into this issue and have an idea of what may be causing it. The issue appears to be related to Kubernetes. In particular, to some unusual actions taken by the CSI driver. There is a workaround available in the CSI driver that avoids these unu

Re: Usynced blocks if replication is interrupted during initial sync

2024-03-20 Thread Joel Colledge
> We are still seeing the issue as described but perhaps I am not putting the > invalidate > at the right spot > > Note - I've added it at step 6 below, but I'm wondering if it should be after > the additional node is configured and adjusted (in which case I would need to > unmount as apparently y

Re: Usynced blocks if replication is interrupted during initial sync

2024-03-19 Thread Joel Colledge
Hi Tim, Thanks for the report and your previous one with the subject line "Blocks fail to sync if connection lost during initial sync". This does look like a bug. I can reproduce the issue. It appears to be a regression introduced in drbd-9.1.16 / drbd-9.2.5. Specifically, with the following comm

Re: [DRBD-user] Resources outdated but Current UUIDs match and Bitmap UUIDs are clean

2023-06-20 Thread Joel Colledge
Hi Andrei, Which DRBD version is running here? Which version was running previously, where this issue was not observed? Best regards, Joel ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.l

Re: [DRBD-user] drbd9.2 resync stuck with drbd_set_in_sync: sector=<...>s size=<...> nonsense!

2022-10-25 Thread Joel Colledge
Dear Nils, > The third resource however did sync about 65% of the outdated data and > then stalled (no more sync traffic, no progress in drbdmon) > > The kernel message that seems to be relevant here is this: > > drbd vm-101-disk-1/0 drbd1001: drbd_set_in_sync: sector=73703424s > size=134479872 no

Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot

2022-08-02 Thread Joel Colledge
Hi Michael, Are you using the most recent version of drbd-utils? There have been a few fixes over the years which might be related. Perhaps the hardware problems affected the metadata long ago and now the corrupted metadata is present in all the snapshots. If that is not the case, this looks to

Re: [DRBD-user] Adding a second volume to a resource and making it UpToDate

2021-09-01 Thread Joel Colledge
> Thank you for your reply, Joel! I'm a little confused. If I understand > you correctly, what I should have done is: > > pcs property set maintenance-mode=true > pcs cluster standby storage1 > > How would the combination of putting Pacemaker in maintenance mode and > then trying to standby a clu

Re: [DRBD-user] Adding a second volume to a resource and making it UpToDate

2021-08-31 Thread Joel Colledge
Hi Bryan, > In the very last post in this thread there is this: > > "DRBD requires some intervention to enable the volume. The > simplest method to get the new volume working would be to demote > the resource to the "Secondary" role and then promote it to the > "Primary" role again, using drbdadm

Re: [DRBD-user] drbc 9.1.1 whole cluster blocked

2021-05-27 Thread Joel Colledge
> No ko-count set, so apparently something different... ko-count is enabled by default (with value "7"). Have you explicitly disabled it? Your description does sound very similar to the issue that has been fixed as Rene mentioned. Regards, Joel ___ Star

Re: [DRBD-user] 300% latency difference between protocol A and B with NVMe

2020-11-24 Thread Joel Colledge
Hi Wido, These results are not too surprising. Consider the steps involved in a protocol C write. Note that tcp_lat is one way latency, so we get: Send data to peer: 13.3 us (perhaps more, if qperf was testing with a size less than 4K) Write on peer: 1s / 32200 == 31.1 us Confirmation of write fr

Re: [DRBD-user] Upgrade DRBD 9.0.19-1 to 9.0.20-1 on Debian 9

2019-10-31 Thread Joel Colledge
Hi Anthony, On Thu, Oct 31, 2019 at 7:18 AM Anthony Frnog wrote: > When I upgrade DRBD from 9.0.19-1 to 9.0.20-1 on Debian 9, the DRBD cluster > seems "break" Yes, we are aware of this issue and have prepared a solution to it internally. It only affects kernels in the 4.8 and 4.9 series. Unfort

[DRBD-user] drbd-10.0.0a1

2019-08-05 Thread Joel Colledge
Hi, The first drbd-10.0 alpha release is out. We are working on some major new features in DRBD, so we have created a drbd-9.0 stable branch and upcoming releases from master will belong to the drbd-10.0 series. The main changes for drbd-10.0 so far are: * Reduced lock contention. The wide-reachi

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-12 Thread Joel Colledge
s: dax > > Regards, > Rob > > > On 6/12/19 9:50 AM, Joel Colledge wrote: > > Hi Rob, > > This is strange, since the filesystem DAX access uses essentially the same > checks as DRBD does. You can get more detail about the failure by doing the > mount test again aft

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-12 Thread Joel Colledge
g > off DAX. > [518270.835599] EXT4-fs (dm-16): mounted filesystem with ordered data > mode. Opts: dax > > Regards, > Rob > > > On 6/11/19 3:57 PM, Joel Colledge wrote: > > Hi Rob, > > This is strange. It seems that your LV is reporting that it supports > PMEM/

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-11 Thread Joel Colledge
Hi Rob, This is strange. It seems that your LV is reporting that it supports PMEM/DAX. I suggest that you check that this issue also occurs without DRBD. For example, create a filesystem and try to mount it with DAX enabled: mkfs.ext4 -b 4K /dev/ mount -o dax /dev/ The check the syslog to see wh

Re: [DRBD-user] drbd-9.0.18-1 : BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0

2019-06-11 Thread Joel Colledge
m? What backing device are you using for DRBD? In case you have external metadata - what backing device are you using for the DRBD metadata? Is this a PMEM device? Just before the crash, you should see "meta-data IO uses: ..." in your kernel log. Please provide this log line. Joel Colle

Re: [DRBD-user] LINSTOR snapshots problem

2018-09-21 Thread Joel Colledge
ore' just creates volume definitions matching those at the time of the snapshot. Best regards, -- Joel Colledge LINBIT | Keeping the Digital World Running DRBD HA - Disaster Recovery - Software-defined Storage DRBD® and LINBIT® are registered