Is there any chance this could be an MTU mismatch between the two nodes? If you use ping with varying packet sizes from one node to the other, do they stop working above a specific size? Does ifconfig report the same MTU size for the interface on both nodes?

Examples:

ifconfig | grep MTU

ping -s 500 <other_ip>

ping -s 1400 <other_ip>

ping -s 1472 <other_ip>

ping -s 2000 <other_ip>

Thanks,

- Nelson Hicks




On 05/23/2018 02:07 PM, Dirk Bonenkamp - ProActive wrote:
Hi,

Thank you for your reply.

I am / was under the impression that DRBD9 is the new and improved DRBD, so I figured to use this version. But this is not the case? Could somebody enlighten me a bit?

I already have disabled all bonding and other fancy network stuff, so I'm using  1 nic currently. This doesn't solve anything unfortunately.

Kind regards,

Dirk

On 23-05-18 14:20, Yannis Milios wrote:
Two things:

- I would use drbd8 instead of drbd9 for a 2 node setup.
- I would first test with 1 nic instead of 2.

On Wed, May 23, 2018 at 11:01 AM, Dirk Bonenkamp - ProActive <[email protected] <mailto:[email protected]>> wrote:

    Hi List,

    I'm struggling with a new DRBD9 setup. It's a simple Master/Slave
    setup.
    I'm running Ubuntu 16.04 LTS with the DRBD9 packages from the
    Launchpad PPA.

    I'm running some DRBD8 systems in production for quite some
    years, so I
    have some experience. This setup is very similar, the only major
    difference is that this is DRBD9 and I use LUKS encrypted
    partitions as
    backend.

    I keep running into this 'PingAck did not arrive in time.' error,
    which
    points to network issues if I am correct (see complete log snippet
    below). This error occurs when I try to reattach the secondary node
    after a reboot. Initial sync works fine.

    The servers are interconnected with 2 10Gb NICs. I had bonding &
    jumbo
    frames configured, but deactivated all this, to no avail. I've also
    stripped the DRBD configuration to the bare minimum (see below).

    I've tested the connection with iperf and some other tools and it
    seems
    just fine.

    Could somebody point me in the right direction?

    Thank you in advance, regards,

    Dirk Bonenkamp

    syslog messages:

    May 23 11:31:56 data2 kernel: [  704.111755] drbd: loading
    out-of-tree
    module taints kernel.
    May 23 11:31:56 data2 kernel: [  704.112290] drbd: module
    verification
    failed: signature and/or required key missing - tainting kernel
    May 23 11:31:56 data2 kernel: [  704.127677] drbd: initialized.
    Version:
    9.0.14-1 (api:2/proto:86-113)
    May 23 11:31:56 data2 kernel: [  704.127680] drbd: GIT-hash:
    62f906cf44ef02a30ce0c148fec223b40c51c533 build by root@data2,
    2018-05-23
    09:19:54
    May 23 11:31:56 data2 kernel: [  704.127683] drbd: registered as
    block
    device major 147
    May 23 11:31:56 data2 kernel: [  704.153565] drbd r0: Starting worker
    thread (from drbdsetup [4495])
    May 23 11:31:56 data2 kernel: [  704.183031] drbd r0/0 drbd0: disk(
    Diskless -> Attaching )
    May 23 11:31:56 data2 kernel: [  704.183066] drbd r0/0 drbd0: Maximum
    number of peer devices = 1
    May 23 11:31:56 data2 kernel: [  704.183293] drbd r0: Method to
    ensure
    write ordering: flush
    May 23 11:31:56 data2 kernel: [  704.183308] drbd r0/0 drbd0:
    drbd_bm_resize called with capacity == 273437203064
    May 23 11:31:58 data2 kernel: [  706.508228] drbd r0/0 drbd0: resync
    bitmap: bits=34179650383 words=534057038 pages=1043081
    May 23 11:31:58 data2 kernel: [  706.508234] drbd r0/0 drbd0:
    size = 127
    TB (136718601532 KB)
    May 23 11:31:58 data2 kernel: [  706.508236] drbd r0/0 drbd0:
    size = 127
    TB (136718601532 KB)
    May 23 11:32:10 data2 kernel: [  717.890420] drbd r0/0 drbd0:
    recounting
    of set bits took additional 1256ms
    May 23 11:32:10 data2 kernel: [  717.890435] drbd r0/0 drbd0: disk(
    Attaching -> Outdated )
    May 23 11:32:10 data2 kernel: [  717.890439] drbd r0/0 drbd0:
    attached
    to current UUID: 244DD61D2781DF44
    May 23 11:32:10 data2 kernel: [  717.918473] drbd r0 data1: Starting
    sender thread (from drbdsetup [4544])
    May 23 11:32:10 data2 kernel: [  717.922534] drbd r0 data1: conn(
    StandAlone -> Unconnected )
    May 23 11:32:10 data2 kernel: [  717.922820] drbd r0 data1: Starting
    receiver thread (from drbd_w_r0 [4498])
    May 23 11:32:10 data2 kernel: [  717.922973] drbd r0 data1: conn(
    Unconnected -> Connecting )
    May 23 11:32:10 data2 kernel: [  718.421219] drbd r0 data1:
    Handshake to
    peer 1 successful: Agreed network protocol version 113
    May 23 11:32:10 data2 kernel: [  718.421229] drbd r0 data1: Feature
    flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME
    WRITE_ZEROES.
    May 23 11:32:10 data2 kernel: [  718.421259] drbd r0 data1: Starting
    ack_recv thread (from drbd_r_r0 [4550])
    May 23 11:32:10 data2 kernel: [  718.424095] drbd r0: Preparing
    cluster-wide state change 1205605755 (0->1 499/146)
    May 23 11:32:10 data2 kernel: [  718.437172] drbd r0: State change
    1205605755: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
    May 23 11:32:10 data2 kernel: [  718.437185] drbd r0: Aborting
    cluster-wide state change 1205605755 (12ms) rv = -22
    May 23 11:32:12 data2 kernel: [  719.896223] drbd r0: Preparing
    cluster-wide state change 445952355 (0->1 499/146)
    May 23 11:32:12 data2 kernel: [  719.896498] drbd r0: State change
    445952355: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
    May 23 11:32:12 data2 kernel: [  719.896508] drbd r0: Committing
    cluster-wide state change 445952355 (0ms)
    May 23 11:32:12 data2 kernel: [  719.896541] drbd r0 data1: conn(
    Connecting -> Connected ) peer( Unknown -> Primary )
    May 23 11:32:12 data2 kernel: [  719.912186] drbd r0/0 drbd0 data1:
    drbd_sync_handshake:
    May 23 11:32:12 data2 kernel: [  719.912198] drbd r0/0 drbd0
    data1: self
    244DD61D2781DF44:0000000000000000:0000000000000000:0000000000000000
    bits:52035 flags:20
    May 23 11:32:12 data2 kernel: [  719.912207] drbd r0/0 drbd0
    data1: peer
    E38BE51FE782EAE0:244DD61D2781DF44:934CAB8662DF0410:E555BDC58E528356
    bits:53162 flags:20
    May 23 11:32:12 data2 kernel: [  719.912214] drbd r0/0 drbd0 data1:
    uuid_compare()=-2 by rule 50
    May 23 11:32:12 data2 kernel: [  719.912248] drbd r0/0 drbd0 data1:
    pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
    May 23 11:32:32 data2 kernel: [  740.397026] drbd r0 data1:
    PingAck did
    not arrive in time.
    May 23 11:32:32 data2 kernel: [  740.397121] drbd r0 data1: conn(
    Connected -> NetworkFailure ) peer( Primary -> Unknown )
    May 23 11:32:32 data2 kernel: [  740.397131] drbd r0/0 drbd0 data1:
    pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off )
    May 23 11:32:32 data2 kernel: [  740.397176] drbd r0 data1:
    ack_receiver
    terminated
    May 23 11:32:32 data2 kernel: [  740.397182] drbd r0 data1:
    Terminating
    ack_recv thread
    May 23 11:32:32 data2 kernel: [  740.458608] drbd r0 data1:
    Connection
    closed
    May 23 11:32:32 data2 kernel: [  740.458650] drbd r0 data1: conn(
    NetworkFailure -> Unconnected )
    May 23 11:32:32 data2 kernel: [  740.458688] drbd r0 data1:
    Restarting
    receiver thread
    May 23 11:32:32 data2 kernel: [  740.458723] drbd r0 data1: conn(
    Unconnected -> Connecting )

    resources:

    resource r0 {
            on data1 {
                    device    /dev/drbd0;
                    disk      /dev/mapper/mapper_secure;
                    address 172.16.11.21:7789 <http://172.16.11.21:7789>;
                    meta-disk internal;
            }
            on data2 {
                    device    /dev/drbd0;
                    disk      /dev/mapper/mapper_secure;
                    address 172.16.11.22:7789 <http://172.16.11.22:7789>;
                    meta-disk internal;
            }
    }

    drbd configuration:

    global {
            usage-count yes;
    }

    common {
            #handlers {
            #        fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh
    <http://crm-fence-peer.9.sh>";
            #        after-resync-target
    "/usr/lib/drbd/crm-unfence-peer.9.sh <http://crm-unfence-peer.9.sh>";
            #}
            #disk {
            #        on-io-error detach;
            #       disk-barrier no;
            #       disk-flushes no;
            #       al-extents 3833;
            #        c-plan-ahead 7;
            #        c-fill-target 2M;
            #        c-min-rate 80M;
            #        c-max-rate 720M;
            #}
            net {
                    protocol C;
                    #fencing resource-only;
                    #cram-hmac-alg sha1;
                    #verify-alg sha1;
                    #shared-secret 1e69dc721fd2e65368ae3ba1e5929979;
                    #after-sb-0pri disconnect;
                    #after-sb-1pri disconnect;
                    #after-sb-2pri disconnect;
                    #max-buffers    8000;
                    #max-epoch-size 8000;
                    #sndbuf-size 0;
                    #rcvbuf-size 2048k;
            }
    }



    _______________________________________________
    drbd-user mailing list
    [email protected] <mailto:[email protected]>
    http://lists.linbit.com/mailman/listinfo/drbd-user
    <http://lists.linbit.com/mailman/listinfo/drbd-user>



--
ProActive Software
Dirk Bonenkamp
CTO             <https://www.proactive-software.com>
Phone: +31 (0)23 54 222 99
Mobile: +31 (0)6 250 787 93     Richard Holkade 9
2033 PZ Haarlem
LinkedIn <http://linkd.in/1V6egnk> Facebook <http://bit.ly/FBProActive> YouTube <http://bit.ly/1Mc23L9> www.proactive.nl <https://www.proactive.nl>



_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to