Bug#705124: base: Filesystem corruption issue

Anthony Sheetz Mon, 15 Apr 2013 06:00:25 -0700

Replies in line below.

On Mon, Apr 15, 2013 at 4:54 AM, Ian Campbell <i...@hellion.org.uk> wrote:


> On Wed, 2013-04-10 at 08:17 -0400, Anthony Sheetz wrote:
> > Steps to reproduce:
> > Install Debian Testing from Netinstall CD, amd64.
> > Choose LVM and Full Disk Encryption, with a separate /home
> > Resize /home to be 80GB
> > Install openswan, connect to remote network
> > Install xen
> > Set up a virtual machine with Debian Stable using logical volumes as the
> backing store.
> >       fs: ext3
> >       network: NAT
> > transfer a large (multigigabyte) file from a remote server over the
> internet to the virtual machine
> >
> > Expected behavior: File transfers fine, md5sum agrees with remote system
> > Observed behavior: md5sum never matches, done enough times, the ext3 fs
> becomes corrupted
>
> Can I just confirm a few things please:
>
> The VM disk backend is an LVM volume which is included in the full disk
> encryption? I suppose it is using dm-crypt?
>

Correct on both accounts.

>
> The ext3 fs which becomes corrupted is the guest VM filesystem, not the
> dom0 filesystem nor a filesystem which is is what the the large
> multigigabyte file which is transferred over the network consists of?
>

Correct again.


> On the face of it it sounds to me like the network corruption (md5sum
> issue) and the eventual ext3 corruption must be separate issues. Or I
> suppose it is possible that the file is received correctly but is
> corrupted when written to the disk, but it's probably better to consider
> them separately until we know one way or the other.
>
> WRT the file transfer corruption: Is the file being transferred over the
> openswan link?

Yes. Dom0 is set up with the openswan connection, DomU is set up to use NAT
through Dom0 - file was transferred that way.

Did you ever happen to try a transfer over a
> non-tunnelled connection?


Yes, tried file transfers from another machine on the local network - never
had a problem with those.

Were you able to successfully transfer the
> file to the dom0 filesystem or to any other system (e.g. one not running
> Xen) on this end of the openswan link?


Yes - tried that several times, and was able to do the transfer with no
corruption, and md5sum matched.

I'm not sure what error
> detection/correction scp/rsync or if they have any additional
> verification options which could be tried or perhaps it is possible to
> run md5sum on the stream before it hits the disk (can one rsync/scp to
> stdout? I doubt it).


Tried doing 'scp file.sql | md5sum' on DomU which resulted in a matching
md5sum. We decided this eliminated the openswan link as the culprit.


> If you can transfer to dom0 OK then it might be
> interesting to try turning off the various offloads (GSO, SG etc) on the
> vif link.
>

Any instructions on doing that?

WRT the filesystem corruption: How did the ext3 corruption manifest
> itself?


Initially with errors on the console (and in kernel.log and other places)
about writes beyond the end of the logical volume. After a time, the
filesystem would be set to read-only, and refuse to mount in read/write
mode.

I wonder if the layering of crypto+lvm+xen-blkback is causing
> the barriers which ext3 requires to function correctly to not occur in
> the right places. Does something need to be manually configured to
> enable barriers at some layer? (or perhaps I am thinking of DISCARD
> support). If you were able to attempt to reproduce without the crypto
> bit in dom0 for the VM disk that would be really useful. It might also
> be interesting to try using the ext3 barrier mount option in the guest
> to switch barriers either off or on (I can't remember what the default
> was for Squeeze).
>

Google led me to try mounting the file system with barriers=0, and no luck.


> I appreciate that you may have redeployed/downgraded the systems so some
> of the above experiments might be quite hard to try out but if you could
> setup a spare system or something it would be very much appreciated.
>

We planned for this, and once we have some ideas to try (with some detailed
instructions for trying them) we'll be purchasing a spare hard drive to try
them out. We'd like this problem solved, and we're willing to spend a
little to do it.

>
> Ian.
>
>

Bug#705124: base: Filesystem corruption issue

Reply via email to