You have been subscribed to a public bug:

We observed severe data loss/filesystem corruption when executing fstrim
on a filesystem hosted on an Eternus DX600 S3 system.

There is multipathing via a fibre channel fabrics but the issue could be
reproduced when disabling multipathing and using one of the block
devices directly.

It could not be reproduced when creating a multipathing device via
dmsetup with four paths pointing to four loop devices mapping the same
file.

The observed behavior is that XFS cannot read vital filesystem metadata
as the underlying storage device returns blocks of 0x00. The blocks are
discarded via UNMAP commands and since thin provisioning is used, the
SAN deallocates them and returns 0x00 on subsequent reads. Invoking find
yields error messages like "find: ./dir_16: Structure needs cleaning".
In other tests, where more data had been written, files were accessible
but checksums did no longer match.

In consequence, the XFS filesystem is in an unusable state and has to be
created freshly, equaling complete data loss. Trying to repair the
filesystem had proven not to be worth it as backups were available and
trust had already been compromised.

The problem was discovered after installing a new storage server with
ubuntu 16.04, intending to replace the current machine running 14.04.
Every weekend, the test volumes were corrupted. Investigation pointed
towards Sunday, 06:47, which is the time `cron.weekly` is run. The job
file `/etc/cron.weekly/fstrim` seemed most likely, so `fstrim -a` was
run manually after `mkfs.xfs` and the filesystem became damaged. The
damage only became apparent after a `umount` `mount` cycle, when all
buffers were flushed and data was re-read from the device.

We now could use config management to install a cronjob that (every
minute!) checks for /sbin/fstrim and renames it, if present. This would
be extremely unsatisfactory as it is a brittle workaround. So for now,
we are locked on ubuntu 14.04. Since util-linux is one of the most
central packages, there is no way to not have fstrim or the cronjob on a
ubuntu system.

I have attached a script used to reproduce the bug reliably on our
system and its log output, as well as excerpts from syslog and md5sum.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
fstrim destroying XFS on SAN
https://bugs.launchpad.net/bugs/1686687
You received this bug notification because you are a member of Kernel Packages, 
which is subscribed to linux in Ubuntu.

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to