I've tried to update the kernel and DRBD 9 to the last available version, but
nothing helped with the PingAck issue. So I had to downgrade to DRBD 8.4 which
is started to replicate fine except the following messages in the log:
[139267.930516] block drbd0: BAD! enr=34406392 rs_left=-4 rs_failed=0 count=4
cstate=SyncTarget[139267.930529] block drbd0: start offset (-2092958208) too
large in drbd_bm_e_weight[139267.933231] block drbd0: BAD! enr=34406392
rs_left=-4 rs_failed=0 count=4 cstate=SyncTarget[139267.933241] block drbd0:
start offset (-2092958208) too large in drbd_bm_e_weight[139267.934064] block
drbd0: BAD! enr=34406392 rs_left=-4 rs_failed=0 count=4
cstate=SyncTarget[139267.934075] block drbd0: start offset (-2092958208) too
large in drbd_bm_e_weight[139267.934942] block drbd0: BAD! enr=34406392
rs_left=-4 rs_failed=0 count=4 cstate=SyncTarget[139267.934950] block drbd0:
start offset (-2092958208) too large in drbd_bm_e_weight[139267.936012] block
drbd0: BAD! enr=34406392 rs_left=-4 rs_failed=0 count=4
cstate=SyncTarget[139267.936019] block drbd0: start offset (-2092958208) too
large in drbd_bm_e_weight[139267.936818] block drbd0: BAD! enr=34406392
rs_left=-4 rs_failed=0 count=4 cstate=SyncTarget[139267.936825] block drbd0:
start offset (-2092958208) too large in drbd_bm_e_weight
Any ideas how to resolve this problem?
-----Original Message-----From: Oleksiy Evin <[email protected]>To:
[email protected]: PingAck did not arrive in time.Date: Sun, 16
Jun 2019 22:37:08 +0800
Hi All,
Can anyone help me on the "PingAck did not arrive in time." repeated error
while initial bitmap synchronization? First time it happened after I updated
our cluster with the latest centos and drbd updates. I'm using the basic drbd
configuration on 527TB LVM volume, replicated on 2 nodes with cross-over
100Gbps connection. The same connection is used for Pacemaker without any
problems. I don't see any network adapter errors in logs, no reconnects or
packets drop when the error happens for drbd. I've also tried another adapter
with 10Gbps direct cable connection and got the same error.
# rpm -q centos-release
centos-release-7-6.1810.2.el7.centos.x86_64
# yum list installed | grep drbd
drbd90-utils.x86_64 9.6.0-1.el7.elrepo @elrepo
kmod-drbd90.x86_64 9.0.16-1.el7_6.elrepo @elrepo
# ifconfig
ens2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet
172.16.1.1 netmask 255.255.255.255 broadcast 172.16.1.1 ether
b8:83:03:67:3f:d4 txqueuelen 1000 (Ethernet) RX packets 63547 bytes
11147564 (10.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0
TX packets 265307 bytes 33045583 (31.5 MiB) TX errors 0 dropped 0
overruns 0 carrier 0 collisions 0
eno8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet
172.20.1.1 netmask 255.255.0.0 broadcast 172.20.255.255 ether
20:67:7c:1c:42:c6 txqueuelen 1000 (Ethernet) RX packets 484 bytes
49086 (47.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX
packets 504 bytes 56974 (55.6 KiB) TX errors 0 dropped 0 overruns 0
carrier 0 collisions 0 device interrupt 116 memory 0xe3000000-e37fffff
# drbdadm dump all
# /etc/drbd.confglobal { usage-count no;}
common { options { auto-promote yes; } net {
protocol C; }}
# resource r0 on sgpplhan01: not ignored, not stacked# defined at
/etc/drbd.d/r0.res:1resource r0 { volume 0 { device
/dev/drbd0 minor 0; disk /dev/storage/data; meta-disk
internal; } on sgpplhan01 { node-id 0; address
ipv4 172.16.1.1:7788; } on sgpplhan02 { node-id 1;
address ipv4 172.16.2.1:7788; } net { after-sb-0pri
discard-zero-changes; after-sb-1pri consensus; after-sb-2pri
disconnect; }}
# dmesg | grep drbd
[37259.335235] drbd r0/0 drbd0 sgpplhan02: drbd_sync_handshake:[37259.335245]
drbd r0/0 drbd0 sgpplhan02: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:141608532581 flags:24[37259.335254] drbd r0/0 drbd0 sgpplhan02: peer
B7DA5A657F09CD92:45B54292B9CBC0CF:0000000000000000:0000000000000000
bits:141608532581 flags:20[37259.335260] drbd r0/0 drbd0 sgpplhan02:
uuid_compare()=-3 by rule 20[37259.335265] drbd r0/0 drbd0 sgpplhan02: Writing
the whole bitmap, full sync required after drbd_sync_handshake.[37265.754528]
drbd r0/0 drbd0 sgpplhan02: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT
)[37265.754546] drbd r0/0 drbd0: Resumed AL updates[37279.780140] drbd r0
sgpplhan02: PingAck did not arrive in time.[37279.781303] drbd r0 sgpplhan02:
conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )[37279.781313]
drbd r0/0 drbd0 sgpplhan02: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off
)[37279.781371] drbd r0 sgpplhan02: ack_receiver terminated[37279.781376] drbd
r0 sgpplhan02: Terminating ack_recv thread[37279.833051] drbd r0 sgpplhan02:
Connection closed[37279.833069] drbd r0 sgpplhan02: conn( NetworkFailure ->
Unconnected )[37279.833086] drbd r0 sgpplhan02: Restarting receiver
thread[37279.833098] drbd r0 sgpplhan02: conn( Unconnected -> Connecting
)[37308.171618] drbd r0 sgpplhan02: Handshake to peer 1 successful: Agreed
network protocol version 114[37308.171628] drbd r0 sgpplhan02: Feature flags
enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
WRITE_ZEROES.[37308.171666] drbd r0 sgpplhan02: Starting ack_recv thread (from
drbd_r_r0 [28699])[37308.217846] drbd r0: Preparing cluster-wide state change
686534516 (0->1 499/146)[37308.218242] drbd r0: State change 686534516:
primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC[37308.218253] drbd r0: Committing
cluster-wide state change 686534516 (0ms)[37308.218296] drbd r0 sgpplhan02:
conn( Connecting -> Connected ) peer( Unknown -> Primary )[37308.222753] drbd
r0/0 drbd0 sgpplhan02: drbd_sync_handshake:[37308.222763] drbd r0/0 drbd0
sgpplhan02: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:141608532581 flags:124[37308.222771] drbd r0/0 drbd0 sgpplhan02: peer
B7DA5A657F09CD92:45B54292B9CBC0CF:0000000000000000:0000000000000000
bits:141608532581 flags:120[37308.222777] drbd r0/0 drbd0 sgpplhan02:
uuid_compare()=-3 by rule 20[37308.222782] drbd r0/0 drbd0 sgpplhan02: Writing
the whole bitmap, full sync required after drbd_sync_handshake.[37314.890717]
drbd r0/0 drbd0 sgpplhan02: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT
)[37328.669598] drbd r0 sgpplhan02: PingAck did not arrive in
time.[37328.670759] drbd r0 sgpplhan02: conn( Connected -> NetworkFailure )
peer( Primary -> Unknown )[37328.670770] drbd r0/0 drbd0 sgpplhan02: pdsk(
UpToDate -> DUnknown ) repl( WFBitMapT -> Off )[37328.670823] drbd r0
sgpplhan02: ack_receiver terminated[37328.670828] drbd r0 sgpplhan02:
Terminating ack_recv thread[37328.718096] drbd r0 sgpplhan02: Connection
closed[37328.718112] drbd r0 sgpplhan02: conn( NetworkFailure -> Unconnected
)[37328.718127] drbd r0 sgpplhan02: Restarting receiver thread[37328.718138]
drbd r0 sgpplhan02: conn( Unconnected -> Connecting )[37351.755553] drbd r0
sgpplhan02: conn( Connecting -> Disconnecting )[37351.794081] drbd r0
sgpplhan02: Connection closed
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user