Re: [DRBD-user] Block out of sync right after full resync

Lionel Sausin Fri, 10 Oct 2014 00:52:45 -0700

"buffer modified by upper layers during write" means whatever sits ontop of drbd changes data "in flight".

Please search the list archives, this is a FAQ.

Swap and some file systems do that - usually it's some kind ofoptimization. I suspect VMWare VMs hosted in ext4 do that too.There's probably nothing wrong, but DRBD can't know. You should do yourdata integrity checking on some higher level (fsck for example)

Lionel.


Le 10/10/2014 01:45, aurelien panizza a écrit :

Hi all,

I've got a problem on my environnement.
I set up my primary server (pacemaker + drbd) which ran alone for awhile, and then I added the second server (currently only DRBD).
Both server can see each other and /proc/drbd reports "uptodate/uptodate".
If I run a verify on that resource (right after the full resync), itreports some blocks out of sync ( generally from 100 to 1500 on my80GO LVM partition).
So I disconnect/connect the slave and oos report 0 block.
I run again a verify and some block are still out of sync. What I'venotived is that it seems to be almost always the same blocks which areout of sync.
I tried to do a full resync multiple times but had the same issue.
I also tried to replace the physical secondary server by a virtualmachine (in order to check if the issue came from the secondaryserver) but had the same issue.
I then activated "data-integrity-alg crc32c" and got a couple of"Digest mismatch, buffer modified by upper layers during write:167134312s +4096" in the primary log.
I tried on a different network card but got the same errors.

My full configuration file:

  protocol C;
  meta-disk internal;
  device /dev/drbd0;
  disk /dev/sysvg/drbd;

  handlers {
         split-brain "/usr/lib/drbd/notify-split-brain.sh xxx@xxx";
         out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh xxx@xxx";
         fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
         after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
  }

  net {
         cram-hmac-alg "sha1";
         shared-secret "drbd";
         sndbuf-size 512k;
         max-buffers 8000;
         max-epoch-size 8000;
         verify-alg md5;
         after-sb-0pri disconnect;
         after-sb-1pri disconnect;
         after-sb-2pri disconnect;
         data-integrity-alg crc32c;
  }

  disk {
        al-extents 3389;
        fencing resource-only;
  }

  syncer {
        rate 90M;
  }
  on host1 {
        address 10.110.1.71:7799 <http://10.110.1.71:7799>;
  }
  on host2 {
        address 10.110.1.72:7799 <http://10.110.1.72:7799>;
  }
}

My OS : Redhat6 2.6.32-431.20.3.el6.x86_64
DRBD version : drbd84-8.4.4-1

ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off
Secondary server is currently not in the HA (pacemaker) but I don'tthink this the problem.I have got another HA on 2 physical host with the exact sameconfiguration and drbd/os version (but not same server model) andeverything's OK.
As the primary server is in production, I can't stop the application(Database) to check if the alerts are false positive.
Would you have any advice ?
Could it be the primary server which have corrupted block or wrongmetadata ?
Regards,



_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Block out of sync right after full resync

Reply via email to