Hi, I still have no solution for this. I tried different versions of drbd: 9.2.2 9.2.1 9.1.12 I tried the third node as diskfull node with no success, two primaries with split brain is the result.
Nevertheless the documentation clearly describes how it SHOULD work: https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-configuring-quorum 4.21.3. Using a Diskless Node as a Tiebreaker The connection between the primary and secondary has failed, and the application is continuing to run on the primary, when the primary suddenly loses its connection to the diskless node. In this case, no node can be promoted to primary and the cluster cannot continue to operate. But I never achieved that, I was able to have two primaries after the above situation. I'm still wondering if I missed a important configuration option. I watched the status with "drbdsetup events2", before and after the secondary became primary, it states "may_promote:no" (comments from me starting with #): exists resource name:store1 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10102 exists connection name:store1 peer-node-id:2 conn-name:perf3 connection:Connected role:Secondary exists connection name:store1 peer-node-id:0 conn-name:perf1 connection:Connected role:Primary exists device name:store1 volume:0 minor:3 backing_dev:/dev/nvme/store1 disk:UpToDate client:no quorum:yes exists peer-device name:store1 peer-node-id:2 conn-name:perf3 volume:0 replication:Established peer-disk:Diskless peer-client:yes resync-suspended:no exists path name:store1 peer-node-id:2 conn-name:perf3 local:ipv4:100.80.3.2:7792 peer:ipv4:100.80.2.3:7792 established:yes exists peer-device name:store1 peer-node-id:0 conn-name:perf1 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no exists path name:store1 peer-node-id:0 conn-name:perf1 local:ipv4:100.80.3.2:7792 peer:ipv4:100.80.3.1:7792 established:yes exists - # starting point, store1 is synced, no primary # mount /dev/drbd3 on perf1: # writing data to mountpoint # perf1 looses connection to perf3: change resource name:store1 may_promote:no promotion_score:0 change connection name:store1 peer-node-id:0 conn-name:perf1 connection:NetworkFailure role:Unknown change device name:store1 volume:0 minor:3 backing_dev:/dev/nvme/store1 disk:Consistent client:no quorum:yes change peer-device name:store1 peer-node-id:0 conn-name:perf1 volume:0 replication:Off peer-disk:DUnknown peer-client:no - skipped 3 change path name:store1 peer-node-id:0 conn-name:perf1 local:ipv4:100.80.3.2:7792 peer:ipv4:100.80.3.1:7792 established:no call helper name:store1 peer-node-id:0 conn-name:perf1 helper:disconnected response helper name:store1 peer-node-id:0 conn-name:perf1 helper:disconnected status:0 change device name:store1 volume:0 minor:3 backing_dev:/dev/nvme/store1 disk:Outdated client:no quorum:yes change connection name:store1 peer-node-id:0 conn-name:perf1 connection:Unconnected change connection name:store1 peer-node-id:0 conn-name:perf1 connection:Connecting # mount /dev/drbd3 on perf2 fails # perf1 is stil writing and looses connection to perf3: # mount /dev/drbd3 on perf2 succeeds: - skipped 4 change resource name:store1 role:Primary may_promote:no promotion_score:10101 change device name:store1 volume:0 minor:3 backing_dev:/dev/nvme/store1 disk:UpToDate client:no quorum:yes Next step I could try is a monitoring script, which watches the drbd state on the quorum node and if it detects a secondary with outdated data, the quorum then disconnects from this node. This would prevent the secondary from becoming primary. But I'm sure this wasn't the idea to use some scripting around.... Any ideas? Many thanks in advance, -- Mfg Markus Hochholdinger Am Freitag, 17. März 2023, 14:21:30 CET schrieb Markus Hochholdinger: > Hi, > > I'm using drbd 9.2.2 and using the quorum feature together with the diskless > node feature. > I've three nodes, A (perf1) and B (perf2) with a disk, C (perf3) without a > disk. > > Most of the time, the setup is behaving like expected. But in the following > scenario, I get two primaries: > 1. A is primary, B secondary, C is the diskless quorum. > 2. A looses connection to B, i/o stops for around 10 seconds until A > recognizes it can continue to be primary, i/o continues. B can't be promoted > to primary. C sees both notes. B get its disk state to Outdated. All fine > to this point. > 3. A, now primary and i/o continues, looses connection to C. Nevertheless, A > keeps primary (with the knowledge the outdated disk of B can't get > primary). 4. B tries to get primary (mount drbd device) and gets primary as > well! B sees C. Why doesn't C prevent this? > => Now I've two primaries! > > In my opinion, the Outdated secondary B shouldn't become primary, the quorum > node C has the knowledge of the fact that A was primary and has changed > data against the outdated secondary B. > > Attached store1.res, the config of my resource store1. > > Whre is my error? Is this expected with a diskless quorum? Why can an > outdated secondary become primary? Any ideas? > > > Many thanks in advance, _______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user