Is there any chance this could be an MTU mismatch between the two nodes?
If you use ping with varying packet sizes from one node to the other, do
they stop working above a specific size? Does ifconfig report the same
MTU size for the interface on both nodes?
Examples:
ifconfig | grep MTU
ping -s 500 <other_ip>
ping -s 1400 <other_ip>
ping -s 1472 <other_ip>
ping -s 2000 <other_ip>
Thanks,
- Nelson Hicks
On 05/23/2018 02:07 PM, Dirk Bonenkamp - ProActive wrote:
Hi,
Thank you for your reply.
I am / was under the impression that DRBD9 is the new and improved
DRBD, so I figured to use this version. But this is not the case?
Could somebody enlighten me a bit?
I already have disabled all bonding and other fancy network stuff, so
I'm using 1 nic currently. This doesn't solve anything unfortunately.
Kind regards,
Dirk
On 23-05-18 14:20, Yannis Milios wrote:
Two things:
- I would use drbd8 instead of drbd9 for a 2 node setup.
- I would first test with 1 nic instead of 2.
On Wed, May 23, 2018 at 11:01 AM, Dirk Bonenkamp - ProActive
<[email protected] <mailto:[email protected]>> wrote:
Hi List,
I'm struggling with a new DRBD9 setup. It's a simple Master/Slave
setup.
I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the
Launchpad PPA.
I'm running some DRBD8 systems in production for quite some
years, so I
have some experience. This setup is very similar, the only major
difference is that this is DRBD9 and I use LUKS encrypted
partitions as
backend.
I keep running into this 'PingAck did not arrive in time.' error,
which
points to network issues if I am correct (see complete log snippet
below). This error occurs when I try to reattach the secondary node
after a reboot. Initial sync works fine.
The servers are interconnected with 2 10Gb NICs. I had bonding &
jumbo
frames configured, but deactivated all this, to no avail. I've also
stripped the DRBD configuration to the bare minimum (see below).
I've tested the connection with iperf and some other tools and it
seems
just fine.
Could somebody point me in the right direction?
Thank you in advance, regards,
Dirk Bonenkamp
syslog messages:
May 23 11:31:56 data2 kernel: [ 704.111755] drbd: loading
out-of-tree
module taints kernel.
May 23 11:31:56 data2 kernel: [ 704.112290] drbd: module
verification
failed: signature and/or required key missing - tainting kernel
May 23 11:31:56 data2 kernel: [ 704.127677] drbd: initialized.
Version:
9.0.14-1 (api:2/proto:86-113)
May 23 11:31:56 data2 kernel: [ 704.127680] drbd: GIT-hash:
62f906cf44ef02a30ce0c148fec223b40c51c533 build by root@data2,
2018-05-23
09:19:54
May 23 11:31:56 data2 kernel: [ 704.127683] drbd: registered as
block
device major 147
May 23 11:31:56 data2 kernel: [ 704.153565] drbd r0: Starting worker
thread (from drbdsetup [4495])
May 23 11:31:56 data2 kernel: [ 704.183031] drbd r0/0 drbd0: disk(
Diskless -> Attaching )
May 23 11:31:56 data2 kernel: [ 704.183066] drbd r0/0 drbd0: Maximum
number of peer devices = 1
May 23 11:31:56 data2 kernel: [ 704.183293] drbd r0: Method to
ensure
write ordering: flush
May 23 11:31:56 data2 kernel: [ 704.183308] drbd r0/0 drbd0:
drbd_bm_resize called with capacity == 273437203064
May 23 11:31:58 data2 kernel: [ 706.508228] drbd r0/0 drbd0: resync
bitmap: bits=34179650383 words=534057038 pages=1043081
May 23 11:31:58 data2 kernel: [ 706.508234] drbd r0/0 drbd0:
size = 127
TB (136718601532 KB)
May 23 11:31:58 data2 kernel: [ 706.508236] drbd r0/0 drbd0:
size = 127
TB (136718601532 KB)
May 23 11:32:10 data2 kernel: [ 717.890420] drbd r0/0 drbd0:
recounting
of set bits took additional 1256ms
May 23 11:32:10 data2 kernel: [ 717.890435] drbd r0/0 drbd0: disk(
Attaching -> Outdated )
May 23 11:32:10 data2 kernel: [ 717.890439] drbd r0/0 drbd0:
attached
to current UUID: 244DD61D2781DF44
May 23 11:32:10 data2 kernel: [ 717.918473] drbd r0 data1: Starting
sender thread (from drbdsetup [4544])
May 23 11:32:10 data2 kernel: [ 717.922534] drbd r0 data1: conn(
StandAlone -> Unconnected )
May 23 11:32:10 data2 kernel: [ 717.922820] drbd r0 data1: Starting
receiver thread (from drbd_w_r0 [4498])
May 23 11:32:10 data2 kernel: [ 717.922973] drbd r0 data1: conn(
Unconnected -> Connecting )
May 23 11:32:10 data2 kernel: [ 718.421219] drbd r0 data1:
Handshake to
peer 1 successful: Agreed network protocol version 113
May 23 11:32:10 data2 kernel: [ 718.421229] drbd r0 data1: Feature
flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
WRITE_ZEROES.
May 23 11:32:10 data2 kernel: [ 718.421259] drbd r0 data1: Starting
ack_recv thread (from drbd_r_r0 [4550])
May 23 11:32:10 data2 kernel: [ 718.424095] drbd r0: Preparing
cluster-wide state change 1205605755 (0->1 499/146)
May 23 11:32:10 data2 kernel: [ 718.437172] drbd r0: State change
1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
May 23 11:32:10 data2 kernel: [ 718.437185] drbd r0: Aborting
cluster-wide state change 1205605755 (12ms) rv = -22
May 23 11:32:12 data2 kernel: [ 719.896223] drbd r0: Preparing
cluster-wide state change 445952355 (0->1 499/146)
May 23 11:32:12 data2 kernel: [ 719.896498] drbd r0: State change
445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
May 23 11:32:12 data2 kernel: [ 719.896508] drbd r0: Committing
cluster-wide state change 445952355 (0ms)
May 23 11:32:12 data2 kernel: [ 719.896541] drbd r0 data1: conn(
Connecting -> Connected ) peer( Unknown -> Primary )
May 23 11:32:12 data2 kernel: [ 719.912186] drbd r0/0 drbd0 data1:
drbd_sync_handshake:
May 23 11:32:12 data2 kernel: [ 719.912198] drbd r0/0 drbd0
data1: self
244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000
bits:52035 flags:20
May 23 11:32:12 data2 kernel: [ 719.912207] drbd r0/0 drbd0
data1: peer
E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356
bits:53162 flags:20
May 23 11:32:12 data2 kernel: [ 719.912214] drbd r0/0 drbd0 data1:
uuid_compare()=-2 by rule 50
May 23 11:32:12 data2 kernel: [ 719.912248] drbd r0/0 drbd0 data1:
pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
May 23 11:32:32 data2 kernel: [ 740.397026] drbd r0 data1:
PingAck did
not arrive in time.
May 23 11:32:32 data2 kernel: [ 740.397121] drbd r0 data1: conn(
Connected -> NetworkFailure ) peer( Primary -> Unknown )
May 23 11:32:32 data2 kernel: [ 740.397131] drbd r0/0 drbd0 data1:
pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )
May 23 11:32:32 data2 kernel: [ 740.397176] drbd r0 data1:
ack_receiver
terminated
May 23 11:32:32 data2 kernel: [ 740.397182] drbd r0 data1:
Terminating
ack_recv thread
May 23 11:32:32 data2 kernel: [ 740.458608] drbd r0 data1:
Connection
closed
May 23 11:32:32 data2 kernel: [ 740.458650] drbd r0 data1: conn(
NetworkFailure -> Unconnected )
May 23 11:32:32 data2 kernel: [ 740.458688] drbd r0 data1:
Restarting
receiver thread
May 23 11:32:32 data2 kernel: [ 740.458723] drbd r0 data1: conn(
Unconnected -> Connecting )
resources:
resource r0 {
on data1 {
device /dev/drbd0;
disk /dev/mapper/mapper_secure;
address 172.16.11.21:7789 <http://172.16.11.21:7789>;
meta-disk internal;
}
on data2 {
device /dev/drbd0;
disk /dev/mapper/mapper_secure;
address 172.16.11.22:7789 <http://172.16.11.22:7789>;
meta-disk internal;
}
}
drbd configuration:
global {
usage-count yes;
}
common {
#handlers {
# fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh
<http://crm-fence-peer.9.sh>";
# after-resync-target
"/usr/lib/drbd/crm-unfence-peer.9.sh <http://crm-unfence-peer.9.sh>";
#}
#disk {
# on-io-error detach;
# disk-barrier no;
# disk-flushes no;
# al-extents 3833;
# c-plan-ahead 7;
# c-fill-target 2M;
# c-min-rate 80M;
# c-max-rate 720M;
#}
net {
protocol C;
#fencing resource-only;
#cram-hmac-alg sha1;
#verify-alg sha1;
#shared-secret 1e69dc721fd2e65368ae3ba1e5929979;
#after-sb-0pri disconnect;
#after-sb-1pri disconnect;
#after-sb-2pri disconnect;
#max-buffers 8000;
#max-epoch-size 8000;
#sndbuf-size 0;
#rcvbuf-size 2048k;
}
}
_______________________________________________
drbd-user mailing list
[email protected] <mailto:[email protected]>
http://lists.linbit.com/mailman/listinfo/drbd-user
<http://lists.linbit.com/mailman/listinfo/drbd-user>
--
ProActive Software
Dirk Bonenkamp
CTO <https://www.proactive-software.com>
Phone: +31 (0)23 54 222 99
Mobile: +31 (0)6 250 787 93 Richard Holkade 9
2033 PZ Haarlem
LinkedIn <http://linkd.in/1V6egnk> Facebook
<http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9>
www.proactive.nl <https://www.proactive.nl>
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user