Hi folks,

I have introduced sync checks of my drbd volumes run by cron once
per month. Result: Most of my volumes are out of sync.

AFAIU the fix is to disconnect the secondary volume and connect it
again. drbd will sync the lost chunks of data automagically. The
output of cat /proc/drbd shows that it does.

Problem is, a "drbdadm verify data2" run immediately after the sync
shows that the volume is still (or again?) out of sync.

# cat /proc/drbd
version: 8.4.11 (api:1/proto:86-101)
srcversion: A2B7326333D8D6BCA07314C

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:1796033864 nr:0 dw:1795344536 dr:687121205 al:573908 bm:0 lo:0 pe:0 ua:0 
ap:0 ep:1 wo:f oos:0
 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:72537008 dw:1479579756 dr:649638416 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 
ep:1 wo:f oos:20

Please note the "oos:20". Below you can find the log file of the
first and second verify session, run on the secondary.

This is not what I had expected. I had assumed that the primary
keeps track about the sectors it has to sync to the secondary, if
the secondary is offline, no matter what. When the secondary is
connected again, *all* out-of-sync sectors are copied.

How comes? Is my kernel too old (5.10.179-1, as provided by
Debian 11)? How can I improve reliability?


Regards

Harri
--------------------------------------------------------------------
[Sun Aug 13 07:37:12 2023] block drbd2: Out of sync: start=1040747880, size=8 
(sectors)
[Sun Aug 13 07:37:18 2023] block drbd2: Out of sync: start=1041586840, size=8 
(sectors)
[Sun Aug 13 07:37:21 2023] block drbd2: Out of sync: start=1041920856, size=8 
(sectors)
[Sun Aug 13 07:50:54 2023] block drbd2: Out of sync: start=1146912168, size=8 
(sectors)
[Sun Aug 13 07:53:29 2023] block drbd2: Out of sync: start=1166579112, size=8 
(sectors)
[Sun Aug 13 07:53:30 2023] block drbd2: Out of sync: start=1166581448, size=8 
(sectors)
:
:
[Mon Aug 14 14:52:34 2023] block drbd2: Out of sync: start=15145086480, size=8 
(sectors)
[Mon Aug 14 14:52:54 2023] block drbd2: Out of sync: start=15147457080, size=8 
(sectors)
[Mon Aug 14 15:12:00 2023] block drbd2: Out of sync: start=15288564184, size=8 
(sectors)
[Mon Aug 14 15:34:04 2023] block drbd2: Out of sync: start=15452534120, size=8 
(sectors)
[Mon Aug 14 15:47:24 2023] block drbd2: Out of sync: start=15554437512, size=8 
(sectors)
[Mon Aug 14 16:04:50 2023] block drbd2: Online verify done (total 122673 sec; 
paused 0 sec; 62616 K/sec)
[Mon Aug 14 16:04:50 2023] block drbd2: Online verify found 190 4k block out of 
sync!
[Mon Aug 14 16:04:50 2023] block drbd2: conn( VerifyS -> Connected )
[Mon Aug 14 16:04:50 2023] block drbd2: helper command: /sbin/drbdadm 
out-of-sync minor-2
[Mon Aug 14 16:04:50 2023] block drbd2: helper command: /sbin/drbdadm 
out-of-sync minor-2 exit code 0 (0x0)
[Mon Aug 14 22:54:48 2023] drbd data2: Requested state change failed by peer: 
Refusing to be Primary while peer is not outdated (-7)
[Mon Aug 14 22:54:48 2023] drbd data2: peer( Primary -> Unknown ) conn( Connected -> 
Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown )
[Mon Aug 14 22:54:48 2023] drbd data2: ack_receiver terminated
[Mon Aug 14 22:54:48 2023] drbd data2: Terminating drbd_a_data2
[Mon Aug 14 22:54:48 2023] drbd data2: Connection closed
[Mon Aug 14 22:54:48 2023] drbd data2: conn( Disconnecting -> StandAlone )
[Mon Aug 14 22:54:48 2023] drbd data2: receiver terminated
[Mon Aug 14 22:54:48 2023] drbd data2: Terminating drbd_r_data2
[Mon Aug 14 22:54:53 2023] drbd data2: conn( StandAlone -> Unconnected )
[Mon Aug 14 22:54:53 2023] drbd data2: Starting receiver thread (from 
drbd_w_data2 [1472])
[Mon Aug 14 22:54:53 2023] drbd data2: receiver (re)started
[Mon Aug 14 22:54:53 2023] drbd data2: conn( Unconnected -> WFConnection )
[Mon Aug 14 22:54:53 2023] drbd data2: Handshake successful: Agreed network 
protocol version 101
[Mon Aug 14 22:54:53 2023] drbd data2: Feature flags enabled on protocol level: 
0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[Mon Aug 14 22:54:53 2023] drbd data2: Peer authenticated using 32 bytes HMAC
[Mon Aug 14 22:54:53 2023] drbd data2: conn( WFConnection -> WFReportParams )
[Mon Aug 14 22:54:53 2023] drbd data2: Starting ack_recv thread (from 
drbd_r_data2 [2092105])
[Mon Aug 14 22:54:53 2023] block drbd2: drbd_sync_handshake:
[Mon Aug 14 22:54:53 2023] block drbd2: self 
691455693E5AB756:0000000000000000:5ECB208B898522BC:5ECA208B898522BD bits:190 
flags:0
[Mon Aug 14 22:54:53 2023] block drbd2: peer 
1D95BBC8D3EE5A27:691455693E5AB757:5ECB208B898522BD:5ECA208B898522BD bits:360 
flags:0
[Mon Aug 14 22:54:53 2023] block drbd2: uuid_compare()=-1 by rule 50
[Mon Aug 14 22:54:53 2023] block drbd2: peer( Unknown -> Primary ) conn( 
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
[Mon Aug 14 22:54:53 2023] block drbd2: receive bitmap stats [Bytes(packets)]: 
plain 0(0), RLE 1023(1), total 1023; compression: 100.0%
[Mon Aug 14 22:54:53 2023] block drbd2: send bitmap stats [Bytes(packets)]: 
plain 0(0), RLE 1023(1), total 1023; compression: 100.0%
[Mon Aug 14 22:54:53 2023] block drbd2: conn( WFBitMapT -> WFSyncUUID )
[Mon Aug 14 22:54:53 2023] block drbd2: updated sync uuid 
691555693E5AB756:0000000000000000:5ECB208B898522BC:5ECA208B898522BD
[Mon Aug 14 22:54:53 2023] block drbd2: helper command: /sbin/drbdadm 
before-resync-target minor-2
[Mon Aug 14 22:54:53 2023] block drbd2: helper command: /sbin/drbdadm 
before-resync-target minor-2 exit code 0 (0x0)
[Mon Aug 14 22:54:53 2023] block drbd2: conn( WFSyncUUID -> SyncTarget ) disk( 
Outdated -> Inconsistent )
[Mon Aug 14 22:54:53 2023] block drbd2: Began resync as SyncTarget (will sync 
1440 KB [360 bits set]).
[Mon Aug 14 22:54:54 2023] block drbd2: Resync done (total 1 sec; paused 0 sec; 
1440 K/sec)
[Mon Aug 14 22:54:54 2023] block drbd2: updated UUIDs 
1D95BBC8D3EE5A26:0000000000000000:691555693E5AB756:691455693E5AB757
[Mon Aug 14 22:54:54 2023] block drbd2: conn( SyncTarget -> Connected ) disk( 
Inconsistent -> UpToDate )
[Mon Aug 14 22:54:54 2023] block drbd2: helper command: /sbin/drbdadm 
after-resync-target minor-2
[Mon Aug 14 22:54:54 2023] block drbd2: helper command: /sbin/drbdadm 
after-resync-target minor-2 exit code 0 (0x0)
[Mon Aug 14 22:55:33 2023] block drbd2: conn( Connected -> VerifyS )
[Mon Aug 14 22:55:33 2023] block drbd2: Starting Online Verify from sector 0
[Tue Aug 15 09:11:58 2023] block drbd2: Out of sync: start=4723202472, size=8 
(sectors)
[Tue Aug 15 11:02:05 2023] block drbd2: Out of sync: start=5507952664, size=8 
(sectors)
[Tue Aug 15 11:02:23 2023] block drbd2: Out of sync: start=5510085128, size=8 
(sectors)
[Tue Aug 15 13:19:38 2023] block drbd2: Out of sync: start=6543445712, size=8 
(sectors)
[Tue Aug 15 15:43:44 2023] block drbd2: Out of sync: start=7428513912, size=8 
(sectors)
[Tue Aug 15 15:43:44 2023] block drbd2: Out of sync: start=7428513976, size=8 
(sectors)
[Tue Aug 15 19:33:12 2023] block drbd2: Out of sync: start=9106431560, size=8 
(sectors)
[Wed Aug 16 09:33:27 2023] block drbd2: Out of sync: start=15573735816, size=8 
(sectors)
[Wed Aug 16 09:47:57 2023] block drbd2: Online verify done (total 125543 sec; 
paused 0 sec; 62456 K/sec)
[Wed Aug 16 09:47:57 2023] block drbd2: Online verify found 5 4k block out of 
sync!
[Wed Aug 16 09:47:57 2023] block drbd2: conn( VerifyS -> Connected )
[Wed Aug 16 09:47:57 2023] block drbd2: helper command: /sbin/drbdadm 
out-of-sync minor-2
[Wed Aug 16 09:47:57 2023] block drbd2: helper command: /sbin/drbdadm 
out-of-sync minor-2 exit code 0 (0x0)
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to