Two things: - I would use drbd8 instead of drbd9 for a 2 node setup. - I would first test with 1 nic instead of 2.
On Wed, May 23, 2018 at 11:01 AM, Dirk Bonenkamp - ProActive < [email protected]> wrote: > Hi List, > > I'm struggling with a new DRBD9 setup. It's a simple Master/Slave setup. > I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the Launchpad > PPA. > > I'm running some DRBD8 systems in production for quite some years, so I > have some experience. This setup is very similar, the only major > difference is that this is DRBD9 and I use LUKS encrypted partitions as > backend. > > I keep running into this 'PingAck did not arrive in time.' error, which > points to network issues if I am correct (see complete log snippet > below). This error occurs when I try to reattach the secondary node > after a reboot. Initial sync works fine. > > The servers are interconnected with 2 10Gb NICs. I had bonding & jumbo > frames configured, but deactivated all this, to no avail. I've also > stripped the DRBD configuration to the bare minimum (see below). > > I've tested the connection with iperf and some other tools and it seems > just fine. > > Could somebody point me in the right direction? > > Thank you in advance, regards, > > Dirk Bonenkamp > > syslog messages: > > May 23 11:31:56 data2 kernel: [ 704.111755] drbd: loading out-of-tree > module taints kernel. > May 23 11:31:56 data2 kernel: [ 704.112290] drbd: module verification > failed: signature and/or required key missing - tainting kernel > May 23 11:31:56 data2 kernel: [ 704.127677] drbd: initialized. Version: > 9.0.14-1 (api:2/proto:86-113) > May 23 11:31:56 data2 kernel: [ 704.127680] drbd: GIT-hash: > 62f906cf44ef02a30ce0c148fec223b40c51c533 build by root@data2, 2018-05-23 > 09:19:54 > May 23 11:31:56 data2 kernel: [ 704.127683] drbd: registered as block > device major 147 > May 23 11:31:56 data2 kernel: [ 704.153565] drbd r0: Starting worker > thread (from drbdsetup [4495]) > May 23 11:31:56 data2 kernel: [ 704.183031] drbd r0/0 drbd0: disk( > Diskless -> Attaching ) > May 23 11:31:56 data2 kernel: [ 704.183066] drbd r0/0 drbd0: Maximum > number of peer devices = 1 > May 23 11:31:56 data2 kernel: [ 704.183293] drbd r0: Method to ensure > write ordering: flush > May 23 11:31:56 data2 kernel: [ 704.183308] drbd r0/0 drbd0: > drbd_bm_resize called with capacity == 273437203064 > May 23 11:31:58 data2 kernel: [ 706.508228] drbd r0/0 drbd0: resync > bitmap: bits=34179650383 words=534057038 pages=1043081 > May 23 11:31:58 data2 kernel: [ 706.508234] drbd r0/0 drbd0: size = 127 > TB (136718601532 KB) > May 23 11:31:58 data2 kernel: [ 706.508236] drbd r0/0 drbd0: size = 127 > TB (136718601532 KB) > May 23 11:32:10 data2 kernel: [ 717.890420] drbd r0/0 drbd0: recounting > of set bits took additional 1256ms > May 23 11:32:10 data2 kernel: [ 717.890435] drbd r0/0 drbd0: disk( > Attaching -> Outdated ) > May 23 11:32:10 data2 kernel: [ 717.890439] drbd r0/0 drbd0: attached > to current UUID: 244DD61D2781DF44 > May 23 11:32:10 data2 kernel: [ 717.918473] drbd r0 data1: Starting > sender thread (from drbdsetup [4544]) > May 23 11:32:10 data2 kernel: [ 717.922534] drbd r0 data1: conn( > StandAlone -> Unconnected ) > May 23 11:32:10 data2 kernel: [ 717.922820] drbd r0 data1: Starting > receiver thread (from drbd_w_r0 [4498]) > May 23 11:32:10 data2 kernel: [ 717.922973] drbd r0 data1: conn( > Unconnected -> Connecting ) > May 23 11:32:10 data2 kernel: [ 718.421219] drbd r0 data1: Handshake to > peer 1 successful: Agreed network protocol version 113 > May 23 11:32:10 data2 kernel: [ 718.421229] drbd r0 data1: Feature > flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME > WRITE_ZEROES. > May 23 11:32:10 data2 kernel: [ 718.421259] drbd r0 data1: Starting > ack_recv thread (from drbd_r_r0 [4550]) > May 23 11:32:10 data2 kernel: [ 718.424095] drbd r0: Preparing > cluster-wide state change 1205605755 (0->1 499/146) > May 23 11:32:10 data2 kernel: [ 718.437172] drbd r0: State change > 1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC > May 23 11:32:10 data2 kernel: [ 718.437185] drbd r0: Aborting > cluster-wide state change 1205605755 (12ms) rv = -22 > May 23 11:32:12 data2 kernel: [ 719.896223] drbd r0: Preparing > cluster-wide state change 445952355 (0->1 499/146) > May 23 11:32:12 data2 kernel: [ 719.896498] drbd r0: State change > 445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC > May 23 11:32:12 data2 kernel: [ 719.896508] drbd r0: Committing > cluster-wide state change 445952355 (0ms) > May 23 11:32:12 data2 kernel: [ 719.896541] drbd r0 data1: conn( > Connecting -> Connected ) peer( Unknown -> Primary ) > May 23 11:32:12 data2 kernel: [ 719.912186] drbd r0/0 drbd0 data1: > drbd_sync_handshake: > May 23 11:32:12 data2 kernel: [ 719.912198] drbd r0/0 drbd0 data1: self > 244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000 > bits:52035 flags:20 > May 23 11:32:12 data2 kernel: [ 719.912207] drbd r0/0 drbd0 data1: peer > E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356 > bits:53162 flags:20 > May 23 11:32:12 data2 kernel: [ 719.912214] drbd r0/0 drbd0 data1: > uuid_compare()=-2 by rule 50 > May 23 11:32:12 data2 kernel: [ 719.912248] drbd r0/0 drbd0 data1: > pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) > May 23 11:32:32 data2 kernel: [ 740.397026] drbd r0 data1: PingAck did > not arrive in time. > May 23 11:32:32 data2 kernel: [ 740.397121] drbd r0 data1: conn( > Connected -> NetworkFailure ) peer( Primary -> Unknown ) > May 23 11:32:32 data2 kernel: [ 740.397131] drbd r0/0 drbd0 data1: > pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off ) > May 23 11:32:32 data2 kernel: [ 740.397176] drbd r0 data1: ack_receiver > terminated > May 23 11:32:32 data2 kernel: [ 740.397182] drbd r0 data1: Terminating > ack_recv thread > May 23 11:32:32 data2 kernel: [ 740.458608] drbd r0 data1: Connection > closed > May 23 11:32:32 data2 kernel: [ 740.458650] drbd r0 data1: conn( > NetworkFailure -> Unconnected ) > May 23 11:32:32 data2 kernel: [ 740.458688] drbd r0 data1: Restarting > receiver thread > May 23 11:32:32 data2 kernel: [ 740.458723] drbd r0 data1: conn( > Unconnected -> Connecting ) > > resources: > > resource r0 { > on data1 { > device /dev/drbd0; > disk /dev/mapper/mapper_secure; > address 172.16.11.21:7789; > meta-disk internal; > } > on data2 { > device /dev/drbd0; > disk /dev/mapper/mapper_secure; > address 172.16.11.22:7789; > meta-disk internal; > } > } > > drbd configuration: > > global { > usage-count yes; > } > > common { > #handlers { > # fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh"; > # after-resync-target "/usr/lib/drbd/crm-unfence-peer.9.sh > "; > #} > #disk { > # on-io-error detach; > # disk-barrier no; > # disk-flushes no; > # al-extents 3833; > # c-plan-ahead 7; > # c-fill-target 2M; > # c-min-rate 80M; > # c-max-rate 720M; > #} > net { > protocol C; > #fencing resource-only; > #cram-hmac-alg sha1; > #verify-alg sha1; > #shared-secret 1e69dc721fd2e65368ae3ba1e5929979; > #after-sb-0pri disconnect; > #after-sb-1pri disconnect; > #after-sb-2pri disconnect; > #max-buffers 8000; > #max-epoch-size 8000; > #sndbuf-size 0; > #rcvbuf-size 2048k; > } > } > > > > _______________________________________________ > drbd-user mailing list > [email protected] > http://lists.linbit.com/mailman/listinfo/drbd-user >
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
