Hello, Thank you for your suggestion. The MTU is 1500 on both nodes. I had it at 9000, but reverted everything to 'normal' to debug this problem. Pinging as in your example works fine.
Cheers, Dirk On 23-05-18 21:22, Nelson Hicks wrote: > Is there any chance this could be an MTU mismatch between the two > nodes? If you use ping with varying packet sizes from one node to the > other, do they stop working above a specific size? Does ifconfig > report the same MTU size for the interface on both nodes? > > Examples: > > ifconfig | grep MTU > > ping -s 500 <other_ip> > > ping -s 1400 <other_ip> > > ping -s 1472 <other_ip> > > ping -s 2000 <other_ip> > > Thanks, > > - Nelson Hicks > > > > > On 05/23/2018 02:07 PM, Dirk Bonenkamp - ProActive wrote: >> Hi, >> >> Thank you for your reply. >> >> I am / was under the impression that DRBD9 is the new and improved >> DRBD, so I figured to use this version. But this is not the case? >> Could somebody enlighten me a bit? >> >> I already have disabled all bonding and other fancy network stuff, so >> I'm using 1 nic currently. This doesn't solve anything unfortunately. >> >> Kind regards, >> >> Dirk >> >> On 23-05-18 14:20, Yannis Milios wrote: >>> Two things: >>> >>> - I would use drbd8 instead of drbd9 for a 2 node setup. >>> - I would first test with 1 nic instead of 2. >>> >>> On Wed, May 23, 2018 at 11:01 AM, Dirk Bonenkamp - ProActive >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> Hi List, >>> >>> I'm struggling with a new DRBD9 setup. It's a simple Master/Slave >>> setup. >>> I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the >>> Launchpad PPA. >>> >>> I'm running some DRBD8 systems in production for quite some >>> years, so I >>> have some experience. This setup is very similar, the only major >>> difference is that this is DRBD9 and I use LUKS encrypted >>> partitions as >>> backend. >>> >>> I keep running into this 'PingAck did not arrive in time.' error, >>> which >>> points to network issues if I am correct (see complete log snippet >>> below). This error occurs when I try to reattach the secondary node >>> after a reboot. Initial sync works fine. >>> >>> The servers are interconnected with 2 10Gb NICs. I had bonding & >>> jumbo >>> frames configured, but deactivated all this, to no avail. I've also >>> stripped the DRBD configuration to the bare minimum (see below). >>> >>> I've tested the connection with iperf and some other tools and it >>> seems >>> just fine. >>> >>> Could somebody point me in the right direction? >>> >>> Thank you in advance, regards, >>> >>> Dirk Bonenkamp >>> >>> syslog messages: >>> >>> May 23 11:31:56 data2 kernel: [ 704.111755] drbd: loading >>> out-of-tree >>> module taints kernel. >>> May 23 11:31:56 data2 kernel: [ 704.112290] drbd: module >>> verification >>> failed: signature and/or required key missing - tainting kernel >>> May 23 11:31:56 data2 kernel: [ 704.127677] drbd: initialized. >>> Version: >>> 9.0.14-1 (api:2/proto:86-113) >>> May 23 11:31:56 data2 kernel: [ 704.127680] drbd: GIT-hash: >>> 62f906cf44ef02a30ce0c148fec223b40c51c533 build by root@data2, >>> 2018-05-23 >>> 09:19:54 >>> May 23 11:31:56 data2 kernel: [ 704.127683] drbd: registered as >>> block >>> device major 147 >>> May 23 11:31:56 data2 kernel: [ 704.153565] drbd r0: Starting >>> worker >>> thread (from drbdsetup [4495]) >>> May 23 11:31:56 data2 kernel: [ 704.183031] drbd r0/0 drbd0: disk( >>> Diskless -> Attaching ) >>> May 23 11:31:56 data2 kernel: [ 704.183066] drbd r0/0 drbd0: >>> Maximum >>> number of peer devices = 1 >>> May 23 11:31:56 data2 kernel: [ 704.183293] drbd r0: Method to >>> ensure >>> write ordering: flush >>> May 23 11:31:56 data2 kernel: [ 704.183308] drbd r0/0 drbd0: >>> drbd_bm_resize called with capacity == 273437203064 >>> May 23 11:31:58 data2 kernel: [ 706.508228] drbd r0/0 drbd0: >>> resync >>> bitmap: bits=34179650383 words=534057038 pages=1043081 >>> May 23 11:31:58 data2 kernel: [ 706.508234] drbd r0/0 drbd0: >>> size = 127 >>> TB (136718601532 KB) >>> May 23 11:31:58 data2 kernel: [ 706.508236] drbd r0/0 drbd0: >>> size = 127 >>> TB (136718601532 KB) >>> May 23 11:32:10 data2 kernel: [ 717.890420] drbd r0/0 drbd0: >>> recounting >>> of set bits took additional 1256ms >>> May 23 11:32:10 data2 kernel: [ 717.890435] drbd r0/0 drbd0: disk( >>> Attaching -> Outdated ) >>> May 23 11:32:10 data2 kernel: [ 717.890439] drbd r0/0 drbd0: >>> attached >>> to current UUID: 244DD61D2781DF44 >>> May 23 11:32:10 data2 kernel: [ 717.918473] drbd r0 data1: >>> Starting >>> sender thread (from drbdsetup [4544]) >>> May 23 11:32:10 data2 kernel: [ 717.922534] drbd r0 data1: conn( >>> StandAlone -> Unconnected ) >>> May 23 11:32:10 data2 kernel: [ 717.922820] drbd r0 data1: >>> Starting >>> receiver thread (from drbd_w_r0 [4498]) >>> May 23 11:32:10 data2 kernel: [ 717.922973] drbd r0 data1: conn( >>> Unconnected -> Connecting ) >>> May 23 11:32:10 data2 kernel: [ 718.421219] drbd r0 data1: >>> Handshake to >>> peer 1 successful: Agreed network protocol version 113 >>> May 23 11:32:10 data2 kernel: [ 718.421229] drbd r0 data1: Feature >>> flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME >>> WRITE_ZEROES. >>> May 23 11:32:10 data2 kernel: [ 718.421259] drbd r0 data1: >>> Starting >>> ack_recv thread (from drbd_r_r0 [4550]) >>> May 23 11:32:10 data2 kernel: [ 718.424095] drbd r0: Preparing >>> cluster-wide state change 1205605755 (0->1 499/146) >>> May 23 11:32:10 data2 kernel: [ 718.437172] drbd r0: State change >>> 1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC >>> May 23 11:32:10 data2 kernel: [ 718.437185] drbd r0: Aborting >>> cluster-wide state change 1205605755 (12ms) rv = -22 >>> May 23 11:32:12 data2 kernel: [ 719.896223] drbd r0: Preparing >>> cluster-wide state change 445952355 (0->1 499/146) >>> May 23 11:32:12 data2 kernel: [ 719.896498] drbd r0: State change >>> 445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC >>> May 23 11:32:12 data2 kernel: [ 719.896508] drbd r0: Committing >>> cluster-wide state change 445952355 (0ms) >>> May 23 11:32:12 data2 kernel: [ 719.896541] drbd r0 data1: conn( >>> Connecting -> Connected ) peer( Unknown -> Primary ) >>> May 23 11:32:12 data2 kernel: [ 719.912186] drbd r0/0 drbd0 data1: >>> drbd_sync_handshake: >>> May 23 11:32:12 data2 kernel: [ 719.912198] drbd r0/0 drbd0 >>> data1: self >>> 244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000 >>> bits:52035 flags:20 >>> May 23 11:32:12 data2 kernel: [ 719.912207] drbd r0/0 drbd0 >>> data1: peer >>> E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356 >>> bits:53162 flags:20 >>> May 23 11:32:12 data2 kernel: [ 719.912214] drbd r0/0 drbd0 data1: >>> uuid_compare()=-2 by rule 50 >>> May 23 11:32:12 data2 kernel: [ 719.912248] drbd r0/0 drbd0 data1: >>> pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) >>> May 23 11:32:32 data2 kernel: [ 740.397026] drbd r0 data1: >>> PingAck did >>> not arrive in time. >>> May 23 11:32:32 data2 kernel: [ 740.397121] drbd r0 data1: conn( >>> Connected -> NetworkFailure ) peer( Primary -> Unknown ) >>> May 23 11:32:32 data2 kernel: [ 740.397131] drbd r0/0 drbd0 data1: >>> pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off ) >>> May 23 11:32:32 data2 kernel: [ 740.397176] drbd r0 data1: >>> ack_receiver >>> terminated >>> May 23 11:32:32 data2 kernel: [ 740.397182] drbd r0 data1: >>> Terminating >>> ack_recv thread >>> May 23 11:32:32 data2 kernel: [ 740.458608] drbd r0 data1: >>> Connection >>> closed >>> May 23 11:32:32 data2 kernel: [ 740.458650] drbd r0 data1: conn( >>> NetworkFailure -> Unconnected ) >>> May 23 11:32:32 data2 kernel: [ 740.458688] drbd r0 data1: >>> Restarting >>> receiver thread >>> May 23 11:32:32 data2 kernel: [ 740.458723] drbd r0 data1: conn( >>> Unconnected -> Connecting ) >>> >>> resources: >>> >>> resource r0 { >>> on data1 { >>> device /dev/drbd0; >>> disk /dev/mapper/mapper_secure; >>> address 172.16.11.21:7789 >>> <http://172.16.11.21:7789>; >>> meta-disk internal; >>> } >>> on data2 { >>> device /dev/drbd0; >>> disk /dev/mapper/mapper_secure; >>> address 172.16.11.22:7789 >>> <http://172.16.11.22:7789>; >>> meta-disk internal; >>> } >>> } >>> >>> drbd configuration: >>> >>> global { >>> usage-count yes; >>> } >>> >>> common { >>> #handlers { >>> # fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh >>> <http://crm-fence-peer.9.sh>"; >>> # after-resync-target >>> "/usr/lib/drbd/crm-unfence-peer.9.sh >>> <http://crm-unfence-peer.9.sh>"; >>> #} >>> #disk { >>> # on-io-error detach; >>> # disk-barrier no; >>> # disk-flushes no; >>> # al-extents 3833; >>> # c-plan-ahead 7; >>> # c-fill-target 2M; >>> # c-min-rate 80M; >>> # c-max-rate 720M; >>> #} >>> net { >>> protocol C; >>> #fencing resource-only; >>> #cram-hmac-alg sha1; >>> #verify-alg sha1; >>> #shared-secret 1e69dc721fd2e65368ae3ba1e5929979; >>> #after-sb-0pri disconnect; >>> #after-sb-1pri disconnect; >>> #after-sb-2pri disconnect; >>> #max-buffers 8000; >>> #max-epoch-size 8000; >>> #sndbuf-size 0; >>> #rcvbuf-size 2048k; >>> } >>> } >>> >>> >>> >>> _______________________________________________ >>> drbd-user mailing list >>> [email protected] <mailto:[email protected]> >>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> <http://lists.linbit.com/mailman/listinfo/drbd-user> >>> >>> >> >> -- >> ProActive Software >> Dirk Bonenkamp >> CTO <https://www.proactive-software.com> >> Phone: +31 (0)23 54 222 99 >> Mobile: +31 (0)6 250 787 93 Richard Holkade 9 >> 2033 PZ Haarlem >> LinkedIn <http://linkd.in/1V6egnk> Facebook >> <http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9> >> www.proactive.nl <https://www.proactive.nl> >> >> >> >> _______________________________________________ >> drbd-user mailing list >> [email protected] >> http://lists.linbit.com/mailman/listinfo/drbd-user > > _______________________________________________ > drbd-user mailing list > [email protected] > http://lists.linbit.com/mailman/listinfo/drbd-user -- ProActive Software Dirk Bonenkamp CTO <https://www.proactive-software.com> Phone: +31 (0)23 54 222 99 Mobile: +31 (0)6 250 787 93 Richard Holkade 9 2033 PZ Haarlem LinkedIn <http://linkd.in/1V6egnk> Facebook <http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9> www.proactive.nl <https://www.proactive.nl>
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
