On Wed, May 24, 2017 at 04:50:03PM -0400, Jeff Cody wrote: > On Wed, May 24, 2017 at 11:02:02AM +0200, Niels de Vos wrote: > > On Tue, May 23, 2017 at 01:27:50PM -0400, Jeff Cody wrote: > > > On current released versions of glusterfs, glfs_lseek() will sometimes > > > return invalid values for SEEK_DATA or SEEK_HOLE. For SEEK_DATA and > > > SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in > > > the case of error: > > > > > > LSEEK(2): > > > > > > off_t lseek(int fd, off_t offset, int whence); > > > > > > [...] > > > > > > SEEK_HOLE > > > Adjust the file offset to the next hole in the file greater > > > than or equal to offset. If offset points into the middle > > > of > > > a hole, then the file offset is set to offset. If there is > > > no > > > hole past offset, then the file offset is adjusted to the > > > end > > > of the file (i.e., there is an implicit hole at the end of > > > any file). > > > > > > [...] > > > > > > RETURN VALUE > > > Upon successful completion, lseek() returns the > > > resulting > > > offset location as measured in bytes from the beginning of > > > the > > > file. On error, the value (off_t) -1 is returned and errno > > > is > > > set to indicate the error > > > > > > However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a > > > value less than the passed offset, yet greater than zero. > > > > > > For instance, here are example values observed from this call: > > > > > > offs = glfs_lseek(s->fd, start, SEEK_HOLE); > > > if (offs < 0) { > > > return -errno; /* D1 and (H3 or H4) */ > > > } > > > > > > start == 7608336384 > > > offs == 7607877632 > > > > > > This causes QEMU to abort on the assert test. When this value is > > > returned, errno is also 0. > > > > > > This is a reported and known bug to glusterfs: > > > https://bugzilla.redhat.com/show_bug.cgi?id=1425293 > > > > > > Although this is being fixed in gluster, we still should work around it > > > in QEMU, given that multiple released versions of gluster behave this > > > way. > > > > Versions of GlusterFS 3.8.0 - 3.8.8 are affected, 3.8.9 has the fix > > already and was released in February this year. We encourage users to > > update to recent versions, and provide a stable (bugfix only) update > > each month. > > I am able to reproduce this bug on a server running glusterfs 3.11.0rc0. Is > that expected?
No, that really is not expected. The backport that was reported to fix it for a 3.8.4 version is definitely part of 3.11 already. Could you pass me some details about your Gluster environment and volume, either by email or in a bug? I'll try to reproduce and debug it from the Gluster side. There is a holiday tomorrow (I'm in The Netherlands), and I'm travelling the whole weekend. I might not be able to look into this before next week. Sorry about that! Thanks, Niels > > > > > The Red Hat Gluster Storage product that is often used in combination > > with QEMU (+oVirt) does unfortunately not have an update where this is > > fixed. > > > > Using an unfixed Gluster storage environment can cause QEMU to segfault. > > Although fixes exist for Gluster, not all users will have them > > installed. Preventing the segfault in QEMU due to a broken storage > > environment makes sense to me. > > > > > This patch treats the return case of (offs < start) the same as if an > > > error value other than ENXIO is returned; we will assume we learned > > > nothing, and there are no holes in the file. > > > > > > Signed-off-by: Jeff Cody <jc...@redhat.com> > > > --- > > > block/gluster.c | 18 ++++++++++++++++-- > > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > > > diff --git a/block/gluster.c b/block/gluster.c > > > index 7c76cd0..c147909e 100644 > > > --- a/block/gluster.c > > > +++ b/block/gluster.c > > > @@ -1275,7 +1275,14 @@ static int find_allocation(BlockDriverState *bs, > > > off_t start, > > > if (offs < 0) { > > > return -errno; /* D3 or D4 */ > > > } > > > - assert(offs >= start); > > > + > > > + if (offs < start) { > > > + /* This is not a valid return by lseek(). We are safe to just > > > return > > > + * -EIO in this case, and we'll treat it like D4. Unfortunately > > > some > > > + * versions of libgfapi will return offs < start, so an assert > > > here > > > + * will unneccesarily abort QEMU. */ > > > > This is not really correct, the problem is not in libgfapi, but in the > > protocol translator on the server-side. The version of libgfapi does not > > matter here, it is the version on the server. But that might be too much > > detail for the comment. > > > > > + return -EIO; > > > + } > > > > > > if (offs > start) { > > > /* D2: in hole, next data at offs */ > > > @@ -1307,7 +1314,14 @@ static int find_allocation(BlockDriverState *bs, > > > off_t start, > > > if (offs < 0) { > > > return -errno; /* D1 and (H3 or H4) */ > > > } > > > - assert(offs >= start); > > > + > > > + if (offs < start) { > > > + /* This is not a valid return by lseek(). We are safe to just > > > return > > > + * -EIO in this case, and we'll treat it like H4. Unfortunately > > > some > > > + * versions of libgfapi will return offs < start, so an assert > > > here > > > + * will unneccesarily abort QEMU. */ > > > + return -EIO; > > > + } > > > > > > if (offs > start) { > > > /* > > > -- > > > 2.9.3 > > > > > > > You might want to explain the problem a little different in the commit > > message. It is fine too if you think it would become too detailed, my > > explanation is in the archives now in any case. > > > > Reviewed-by: Niels de Vos <nde...@redhat.com>