"buffer modified by upper layers during write" means whatever sits on
top of drbd changes data "in flight".
Please search the list archives, this is a FAQ.
Swap and some file systems do that - usually it's some kind of
optimization. I suspect VMWare VMs hosted in ext4 do that too.
There's probably nothing wrong, but DRBD can't know. You should do your
data integrity checking on some higher level (fsck for example)
Lionel.
Le 10/10/2014 01:45, aurelien panizza a écrit :
Hi all,
I've got a problem on my environnement.
I set up my primary server (pacemaker + drbd) which ran alone for a
while, and then I added the second server (currently only DRBD).
Both server can see each other and /proc/drbd reports "uptodate/uptodate".
If I run a verify on that resource (right after the full resync), it
reports some blocks out of sync ( generally from 100 to 1500 on my
80GO LVM partition).
So I disconnect/connect the slave and oos report 0 block.
I run again a verify and some block are still out of sync. What I've
notived is that it seems to be almost always the same blocks which are
out of sync.
I tried to do a full resync multiple times but had the same issue.
I also tried to replace the physical secondary server by a virtual
machine (in order to check if the issue came from the secondary
server) but had the same issue.
I then activated "data-integrity-alg crc32c" and got a couple of
"Digest mismatch, buffer modified by upper layers during write:
167134312s +4096" in the primary log.
I tried on a different network card but got the same errors.
My full configuration file:
protocol C;
meta-disk internal;
device /dev/drbd0;
disk /dev/sysvg/drbd;
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh xxx@xxx";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh xxx@xxx";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
net {
cram-hmac-alg "sha1";
shared-secret "drbd";
sndbuf-size 512k;
max-buffers 8000;
max-epoch-size 8000;
verify-alg md5;
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
data-integrity-alg crc32c;
}
disk {
al-extents 3389;
fencing resource-only;
}
syncer {
rate 90M;
}
on host1 {
address 10.110.1.71:7799 <http://10.110.1.71:7799>;
}
on host2 {
address 10.110.1.72:7799 <http://10.110.1.72:7799>;
}
}
My OS : Redhat6 2.6.32-431.20.3.el6.x86_64
DRBD version : drbd84-8.4.4-1
ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: off
ntuple-filters: off
receive-hashing: off
Secondary server is currently not in the HA (pacemaker) but I don't
think this the problem.
I have got another HA on 2 physical host with the exact same
configuration and drbd/os version (but not same server model) and
everything's OK.
As the primary server is in production, I can't stop the application
(Database) to check if the alerts are false positive.
Would you have any advice ?
Could it be the primary server which have corrupted block or wrong
metadata ?
Regards,
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user