Hi All,
Can anyone help me on the "PingAck did not arrive in time." repeated error 
while initial bitmap synchronization? First time it happened after I updated 
our cluster with the latest centos and drbd updates. I'm using the basic drbd 
configuration on 527TB LVM volume, replicated on 2 nodes with cross-over 
100Gbps connection. The same connection is used for Pacemaker without any 
problems. I don't see any network adapter errors in logs, no reconnects or 
packets drop when the error happens for drbd. I've also tried another adapter 
with 10Gbps direct cable connection and got the same error.
# rpm -q centos-release
centos-release-7-6.1810.2.el7.centos.x86_64
# yum list installed | grep drbd
drbd90-utils.x86_64                           9.6.0-1.el7.elrepo       @elrepo  
kmod-drbd90.x86_64                            9.0.16-1.el7_6.elrepo    @elrepo  
# ifconfig
ens2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500        inet 
172.16.1.1  netmask 255.255.255.255  broadcast 172.16.1.1        ether 
b8:83:03:67:3f:d4  txqueuelen 1000  (Ethernet)        RX packets 63547  bytes 
11147564 (10.6 MiB)        RX errors 0  dropped 0  overruns 0  frame 0        
TX packets 265307  bytes 33045583 (31.5 MiB)        TX errors 0  dropped 0 
overruns 0  carrier 0  collisions 0

eno8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500        inet 
172.20.1.1  netmask 255.255.0.0  broadcast 172.20.255.255        ether 
20:67:7c:1c:42:c6  txqueuelen 1000  (Ethernet)        RX packets 484  bytes 
49086 (47.9 KiB)        RX errors 0  dropped 0  overruns 0  frame 0        TX 
packets 504  bytes 56974 (55.6 KiB)        TX errors 0  dropped 0 overruns 0  
carrier 0  collisions 0        device interrupt 116  memory 0xe3000000-e37fffff 
 
# drbdadm dump all
# /etc/drbd.confglobal {    usage-count no;}
common {    options {        auto-promote     yes;    }    net {        
protocol           C;    }}
# resource r0 on sgpplhan01: not ignored, not stacked# defined at 
/etc/drbd.d/r0.res:1resource r0 {    volume 0 {        device           
/dev/drbd0 minor 0;        disk             /dev/storage/data;        meta-disk 
       internal;    }    on sgpplhan01 {        node-id 0;        address       
   ipv4 172.16.1.1:7788;    }    on sgpplhan02 {        node-id 1;        
address          ipv4 172.16.2.1:7788;    }    net {        after-sb-0pri    
discard-zero-changes;        after-sb-1pri    consensus;        after-sb-2pri   
 disconnect;    }}
# dmesg | grep drbd
[37259.335235] drbd r0/0 drbd0 sgpplhan02: drbd_sync_handshake:[37259.335245] 
drbd r0/0 drbd0 sgpplhan02: self 
0000000000000004:0000000000000000:0000000000000000:0000000000000000 
bits:141608532581 flags:24[37259.335254] drbd r0/0 drbd0 sgpplhan02: peer 
B7DA5A657F09CD92:45B54292B9CBC0CF:0000000000000000:0000000000000000 
bits:141608532581 flags:20[37259.335260] drbd r0/0 drbd0 sgpplhan02: 
uuid_compare()=-3 by rule 20[37259.335265] drbd r0/0 drbd0 sgpplhan02: Writing 
the whole bitmap, full sync required after drbd_sync_handshake.[37265.754528] 
drbd r0/0 drbd0 sgpplhan02: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT 
)[37265.754546] drbd r0/0 drbd0: Resumed AL updates[37279.780140] drbd r0 
sgpplhan02: PingAck did not arrive in time.[37279.781303] drbd r0 sgpplhan02: 
conn( Connected -> NetworkFailure ) peer( Primary -> Unknown )[37279.781313] 
drbd r0/0 drbd0 sgpplhan02: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off 
)[37279.781371] drbd r0 sgpplhan02: ack_receiver terminated[37279.781376] drbd 
r0 sgpplhan02: Terminating ack_recv thread[37279.833051] drbd r0 sgpplhan02: 
Connection closed[37279.833069] drbd r0 sgpplhan02: conn( NetworkFailure -> 
Unconnected )[37279.833086] drbd r0 sgpplhan02: Restarting receiver 
thread[37279.833098] drbd r0 sgpplhan02: conn( Unconnected -> Connecting 
)[37308.171618] drbd r0 sgpplhan02: Handshake to peer 1 successful: Agreed 
network protocol version 114[37308.171628] drbd r0 sgpplhan02: Feature flags 
enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME 
WRITE_ZEROES.[37308.171666] drbd r0 sgpplhan02: Starting ack_recv thread (from 
drbd_r_r0 [28699])[37308.217846] drbd r0: Preparing cluster-wide state change 
686534516 (0->1 499/146)[37308.218242] drbd r0: State change 686534516: 
primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC[37308.218253] drbd r0: Committing 
cluster-wide state change 686534516 (0ms)[37308.218296] drbd r0 sgpplhan02: 
conn( Connecting -> Connected ) peer( Unknown -> Primary )[37308.222753] drbd 
r0/0 drbd0 sgpplhan02: drbd_sync_handshake:[37308.222763] drbd r0/0 drbd0 
sgpplhan02: self 
0000000000000004:0000000000000000:0000000000000000:0000000000000000 
bits:141608532581 flags:124[37308.222771] drbd r0/0 drbd0 sgpplhan02: peer 
B7DA5A657F09CD92:45B54292B9CBC0CF:0000000000000000:0000000000000000 
bits:141608532581 flags:120[37308.222777] drbd r0/0 drbd0 sgpplhan02: 
uuid_compare()=-3 by rule 20[37308.222782] drbd r0/0 drbd0 sgpplhan02: Writing 
the whole bitmap, full sync required after drbd_sync_handshake.[37314.890717] 
drbd r0/0 drbd0 sgpplhan02: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT 
)[37328.669598] drbd r0 sgpplhan02: PingAck did not arrive in 
time.[37328.670759] drbd r0 sgpplhan02: conn( Connected -> NetworkFailure ) 
peer( Primary -> Unknown )[37328.670770] drbd r0/0 drbd0 sgpplhan02: pdsk( 
UpToDate -> DUnknown ) repl( WFBitMapT -> Off )[37328.670823] drbd r0 
sgpplhan02: ack_receiver terminated[37328.670828] drbd r0 sgpplhan02: 
Terminating ack_recv thread[37328.718096] drbd r0 sgpplhan02: Connection 
closed[37328.718112] drbd r0 sgpplhan02: conn( NetworkFailure -> Unconnected 
)[37328.718127] drbd r0 sgpplhan02: Restarting receiver thread[37328.718138] 
drbd r0 sgpplhan02: conn( Unconnected -> Connecting )[37351.755553] drbd r0 
sgpplhan02: conn( Connecting -> Disconnecting )[37351.794081] drbd r0 
sgpplhan02: Connection closed




_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to