Hi All,
    I'd like to accomplish 2 things with this message:
1) Unblock (one way or another) https://review.openstack.org/#/c/123957
2) Create some form of consensus on when it's okay to add temporary code to
   nova to work around bugs in external utilities.

So some background on this specific issue.  The issue was first reported in
July 2014 at [1] and then clarified at [2].  The synopsis of the bug is that
calling qemu-img convert -O raw /may/ generate a corrupt output file if the
source image isn't fully flushed to disk.  The coreutils folk discovered
something similar in 2011 *sigh*

The clear and correct solution is to ensure that qemu-img uses
FIEMAP_FLAG_SYNC.  This in turn produces a measurable slowdown in that code
path, so additionally it's best if qemu-img uses an alternate method to
determine data status in a disk image.  This has been done and will be included
in qemu 2.2.0 when it's released.  These fixes prompted a more substantial
rework of that code in qemu.  Which is awesome but not *required* to fix the
bug in qemu.

While we wait for $distros to get the fixed qemu nova is still vulnerable to
the bug.  To that end I proposed a work around in nova that forces images
retrieved from glance to disk with an fsync() prior to calling qemu-img on
them.  I admit that this is ugly and has a performance impact.

In order to reduce the impact of the fsync() I considered:
1) Testing the qemu version and only fsync()ing on affected versions.
   - Vendors will backport the fix to there version of qemu.  The fixed version
     will still claim to be 2.1.0 (for example) and therefore trigger the
     fsync() when not required.  Given how unreliable this will be I dismissed
     it as an option

2) API Change
   - In the case of this specific bug we only need to fsync() in certain
     scenarios.  It would be easy to add a flag to IMAGE_API.download() to
     determine if this fsync() is required.  This has the nice property of only
     having a performance impact in the suspect case (personally I'll take
     slow-and-correct over fast-and-buggy any day).  My hesitation is that
     after we've modified the API it's very hard to remove that change when we
     decide the work around is redundant.

3) Config file option
   - For many of the same reasons as the API change this seemed like a bad
     idea.

Does anyone have any other ideas?

One thing that I haven't done is measure the impact of the fsync() on any
reasonable workload.  This is mainly because I don't really know how.  Sure I
could do some statistics in devstack but I don't really think they'd be
meaningful.  Also the size of the image in glance is fairly important.  An
fsync() of an 100Gb image is many times more painful than an 1Gb image.

While in Paris I was asked to look at other code paths in nova where we use
qemu-img convert.  I'm doing this analysis.  To date I have some suspicions
that snapshot (and migration) are affected, but no data that confirms or
debases that.  I continue to look at the appropriate code in nova, libvirt and
qemu.

I understand that there is more work to be done in this area, and I'm happy to
do it.  Having said that from where I sit that work is not directly related to
the bug that started this.

As the idea is to remove this code as soon as all the distros we care about
have a fixed qemu I started an albeit brief discussion here[3] on which distros
are in that list.  Armed with that list I have opened (or am in the process of
opening) bugs for each version of each distribution to make them aware of the
issue and the fix.  I have a status page at [4].

okay I think I'm done raving.

So moving forward:

1) So what should I do with the open review?
2) What can we learn from this in terms of how we work around key utilities
   that are not in our direct power to change.
   - Is taking ugly code for "some time" okay?  I understand that this is a
     complex issue as we're relying on $developer to be around (or leave enough
     information for those that follow) to determine when it's okay to remove
     the ugliness.

If you made it this far bravo!

Yours Tony.

[1] https://bugs.launchpad.net/nova/+bug/1350766
[2] https://bugs.launchpad.net/qemu/+bug/1368815
[3] http://lists.openstack.org/pipermail/openstack-dev/2014-November/050526.html
[4] https://wiki.openstack.org/wiki/Bug1368815

Attachment: pgpd1QE1DlW2W.pgp
Description: PGP signature

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to