Alexandre, This is all correct but not related to inconsistency issue.
Stanislav On Thu, May 28, 2015 at 10:44 AM, Alexandre DERUMIER <aderum...@odiso.com> wrote: > >>That is right and you just can't use O_DIRECT without alignment. You > would just get an error on "write" system call. If you check > drbd_oos_test.c you find posix_memalign there. > http://people.redhat.com/msnitzer/docs/io-limits.txt > "Direct I/O best practices > ------------------------- > Users must always take care to use properly aligned and sized IO. This > is especially important for Direct I/O access. Direct I/O should be > aligned on a 'logical_block_size' boundary and in multiples of the > 'logical_block_size'. With native 4K devices (logical_block_size is 4K) > it is now critical that applications perform Direct I/O that is a > multiple of the device's 'logical_block_size'. This means that > applications that do not perform 4K aligned I/O, but 512-byte aligned > I/O, will break with native 4K devices. > " > > about qemu, (for qcow2, I think raw it's ok) > > http://lists.gnu.org/archive/html/qemu-discuss/2015-01/msg00051.html > > " > qcow2 cannot store the "physical block size" as an explicit > property. But what you can do is the following: > > 1. Make sure the host physical disk partition system that stores > the qcow2 file is aligned to a multiple of 4K (or the RAID block > size if on a RAID system). > > 2. Make sure the host file system that stores the qcow2 file has > a block size of 4K or a multiple of 4K. > > 3. Make sure the internal qcow2 cluster_size is 4K or a multiple > of 4K (I think this is the default). Otherwise this is set using > the "-o" "cluster_size=4096" option to qemu-img create/convert. > > 4. Make sure the guest partition on the virtual disk (backed by > the qcow2 file) is aligned on a multiple of the qcow2 > cluster_size. > > 5. Make sure the guest file system of the guest partition on the > virtual disk has a block size which is a multiple of the qcow2 > cluster_size. > > In other words, the usual "4K issue" procedures, but on both the > physical and virtual machine. > " > ----- Mail original ----- > De: "Stanislav German-Evtushenko" <ginerm...@gmail.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "dietmar" <diet...@proxmox.com>, "pve-devel" < > pve-devel@pve.proxmox.com> > Envoyé: Jeudi 28 Mai 2015 09:38:15 > Objet: Re: [pve-devel] Default cache mode for VM hard drives > > > not sure it's related, but with O_DIRECT I think that the write need to > be aligned with multiple of 4k block. (or 512bytes) > That is right and you just can't use O_DIRECT without alignment. You would > just get an error on "write" system call. If you check drbd_oos_test.c you > find posix_memalign there. > > On Thu, May 28, 2015 at 10:33 AM, Alexandre DERUMIER < aderum...@odiso.com > > wrote: > > > Hi, > > not sure it's related, but with O_DIRECT I think that the write need to be > aligned with multiple of 4k block. (or 512bytes) > > (and I remember some bug with qemu and and 512b-logical/4k-physical disks > > http://pve.proxmox.com/pipermail/pve-devel/2012-November/004530.html > > I'm not an expert so I can't confirm. > > ----- Mail original ----- > De: "Stanislav German-Evtushenko" < ginerm...@gmail.com > > À: "dietmar" < diet...@proxmox.com > > Cc: "aderumier" < aderum...@odiso.com >, "pve-devel" < > pve-devel@pve.proxmox.com > > Envoyé: Jeudi 28 Mai 2015 09:22:12 > Objet: Re: [pve-devel] Default cache mode for VM hard drives > > Hi Dietmar, > > I did it couple of times already and everytime I had the same answer > "upper layer problem". Well, as we've done this long way up to this point I > would like to continue. > > I have just done the same test with mdadm and not DRBD. And what I found > that this problem was reproducible on the software raid too, just as it was > claimed by Lars Ellenberg. It means that problem is not only related to > DRBD but to O_DIRECT mode generally when we don't use host cache and a > block device reads data directly from userspace. > > The testcase is bellow. > > 1. Prepare > > dd if=/dev/zero of=/tmp/mdadm1 bs=1M count=100 > dd if=/dev/zero of=/tmp/mdadm2 bs=1M count=100 > losetup /dev/loop1 /tmp/mdadm1 > losetup /dev/loop2 /tmp/mdadm2 > mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/loop{1,2} > > 2. Write data with O_DIRECT > > ./a.out /dev/md0 > > 3. Check consistency with vbindiff > > vbindiff /tmp/mdadm{1,2} #press enter multiple times to skip metadata > > And here we find that data on "physical devices" is different and md raid > did not catch this. > > > On Thu, May 28, 2015 at 7:40 AM, Dietmar Maurer < diet...@proxmox.com > > wrote: > > > > What this means? > > I still think you should discuss that on the DRBD list. > > > > > > Best regards, > Stanislav German-Evtushenko > > > > -- www.helplinux.ru - Найди себе Гуру
_______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel