Replies in line below. On Mon, Apr 15, 2013 at 4:54 AM, Ian Campbell <i...@hellion.org.uk> wrote:
> On Wed, 2013-04-10 at 08:17 -0400, Anthony Sheetz wrote: > > Steps to reproduce: > > Install Debian Testing from Netinstall CD, amd64. > > Choose LVM and Full Disk Encryption, with a separate /home > > Resize /home to be 80GB > > Install openswan, connect to remote network > > Install xen > > Set up a virtual machine with Debian Stable using logical volumes as the > backing store. > > fs: ext3 > > network: NAT > > transfer a large (multigigabyte) file from a remote server over the > internet to the virtual machine > > > > Expected behavior: File transfers fine, md5sum agrees with remote system > > Observed behavior: md5sum never matches, done enough times, the ext3 fs > becomes corrupted > > Can I just confirm a few things please: > > The VM disk backend is an LVM volume which is included in the full disk > encryption? I suppose it is using dm-crypt? > Correct on both accounts. > > The ext3 fs which becomes corrupted is the guest VM filesystem, not the > dom0 filesystem nor a filesystem which is is what the the large > multigigabyte file which is transferred over the network consists of? > Correct again. > On the face of it it sounds to me like the network corruption (md5sum > issue) and the eventual ext3 corruption must be separate issues. Or I > suppose it is possible that the file is received correctly but is > corrupted when written to the disk, but it's probably better to consider > them separately until we know one way or the other. > > WRT the file transfer corruption: Is the file being transferred over the > openswan link? Yes. Dom0 is set up with the openswan connection, DomU is set up to use NAT through Dom0 - file was transferred that way. Did you ever happen to try a transfer over a > non-tunnelled connection? Yes, tried file transfers from another machine on the local network - never had a problem with those. Were you able to successfully transfer the > file to the dom0 filesystem or to any other system (e.g. one not running > Xen) on this end of the openswan link? Yes - tried that several times, and was able to do the transfer with no corruption, and md5sum matched. I'm not sure what error > detection/correction scp/rsync or if they have any additional > verification options which could be tried or perhaps it is possible to > run md5sum on the stream before it hits the disk (can one rsync/scp to > stdout? I doubt it). Tried doing 'scp file.sql | md5sum' on DomU which resulted in a matching md5sum. We decided this eliminated the openswan link as the culprit. > If you can transfer to dom0 OK then it might be > interesting to try turning off the various offloads (GSO, SG etc) on the > vif link. > Any instructions on doing that? WRT the filesystem corruption: How did the ext3 corruption manifest > itself? Initially with errors on the console (and in kernel.log and other places) about writes beyond the end of the logical volume. After a time, the filesystem would be set to read-only, and refuse to mount in read/write mode. I wonder if the layering of crypto+lvm+xen-blkback is causing > the barriers which ext3 requires to function correctly to not occur in > the right places. Does something need to be manually configured to > enable barriers at some layer? (or perhaps I am thinking of DISCARD > support). If you were able to attempt to reproduce without the crypto > bit in dom0 for the VM disk that would be really useful. It might also > be interesting to try using the ext3 barrier mount option in the guest > to switch barriers either off or on (I can't remember what the default > was for Squeeze). > Google led me to try mounting the file system with barriers=0, and no luck. > I appreciate that you may have redeployed/downgraded the systems so some > of the above experiments might be quite hard to try out but if you could > setup a spare system or something it would be very much appreciated. > We planned for this, and once we have some ideas to try (with some detailed instructions for trying them) we'll be purchasing a spare hard drive to try them out. We'd like this problem solved, and we're willing to spend a little to do it. > > Ian. > >