Hello , everyone. I am seeing a behaviour that I can not understand very well when I have drbd managed by pacemaker and I have a prefers location constraint.
These are my resources: Full List of Resources: * Clone Set: DRBDData-clone [DRBDData] (promotable): * Promoted: [ pcs01 ] * Unpromoted: [ pcs02 pcs03 ] * Resource Group: nfs: * portblock_on_nfs (ocf:heartbeat:portblock): Started pcs01 * vip_nfs (ocf:heartbeat:IPaddr2): Started pcs01 * drbd_fs (ocf:heartbeat:Filesystem): Started pcs01 * nfsd (ocf:heartbeat:nfsserver): Started pcs01 * exportnfs (ocf:heartbeat:exportfs): Started pcs01 * portblock_off_nfs (ocf:heartbeat:portblock): Started pcs01 And I have a location preference for pcs01 and DRBD-clone resource 'DRBDData-clone' prefers node 'pcs01' with score INFINITY # drbdadm status exports role:Primary disk:UpToDate pcs02.lan role:Secondary peer-disk:UpToDate pcs03.lan role:Secondary peer-disk:UpToDate Now, while pcs01 is providing the resources, I mount the NFS export in some client and start copying a 15GB random file. After copying 5 GB I pull the plug of the pcs01 node. After a few seconds pcs02 is promoted and the copy resumes. Output for drbdadm status: exports role:Primary disk:UpToDate pcs01.lan connection:Connecting pcs03.lan role:Secondary congested:yes ap-in-flight:1032 rs-in-flight:0 peer-disk:UpToDate output for pcs status Node List: * Online: [ pcs02 pcs03 ] * OFFLINE: [ pcs01 ] Full List of Resources: * Clone Set: DRBDData-clone [DRBDData] (promotable): * Promoted: [ pcs02 ] * Unpromoted: [ pcs03 ] * Stopped: [ pcs01 ] * Resource Group: nfs: * portblock_on_nfs (ocf:heartbeat:portblock): Started pcs02 * vip_nfs (ocf:heartbeat:IPaddr2): Started pcs02 * drbd_fs (ocf:heartbeat:Filesystem): Started pcs02 * nfsd (ocf:heartbeat:nfsserver): Started pcs02 * exportnfs (ocf:heartbeat:exportfs): Started pcs02 * portblock_off_nfs (ocf:heartbeat:portblock): Started pcs02 Now after the 15GB file is near 14GB copied (so, at least 9GB needed to resync if pcs01 is back online) I start the pcs01 again. Since it has preference on pacemaker, as soon as pacemaker detects it, the service will move there. The question is, how can a "inconsistent/degraded" replica become primary without the resync is completed ? # drbdadm status exports role:Primary disk:Inconsistent pcs02.lan role:Secondary replication:SyncTarget peer-disk:UpToDate done:79.16 pcs03.lan role:Secondary replication:PausedSyncT peer-disk:UpToDate done:78.24 resync-suspended:dependency The service moved back to pcs01: Node List: * Online: [ pcs01 pcs02 pcs03 ] Full List of Resources: * Clone Set: DRBDData-clone [DRBDData] (promotable): * Promoted: [ pcs01 ] * Unpromoted: [ pcs02 pcs03 ] * Resource Group: nfs: * portblock_on_nfs (ocf:heartbeat:portblock): Started pcs01 * vip_nfs (ocf:heartbeat:IPaddr2): Started pcs01 * drbd_fs (ocf:heartbeat:Filesystem): Started pcs01 * nfsd (ocf:heartbeat:nfsserver): Started pcs01 * exportnfs (ocf:heartbeat:exportfs): Started pcs01 * portblock_off_nfs (ocf:heartbeat:portblock): Started pcs01 # drbdadm --version DRBDADM_BUILDTAG=GIT-hash:\ fd0904f7bf256ecd380e1c19ec73c712f3855d40\ build\ by\ mockbuild@42fe748df8a24339966f712147eb3bfd\,\ 2023-11-01\ 01:47:26 DRBDADM_API_VERSION=2 DRBD_KERNEL_VERSION_CODE=0x090111 DRBD_KERNEL_VERSION=9.1.17 DRBDADM_VERSION_CODE=0x091a00 DRBDADM_VERSION=9.26.0 # cat /etc/redhat-release AlmaLinux release 9.3 (Shamrock Pampas Cat) Is that a bug ? Shouldn´t that corrupt the filesystem ? Atenciosamente/Kind regards, Salatiel _______________________________________________ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user