On Wed, Jul 22, 2015 at 12:24 PM, Changbin Liu <changbin....@gmail.com> wrote:
> > But now I wonder: is it "by design" that EC does not handle an accidental > deletion of just the data file? > Well, the design goal was not "do not handle the accidental deletion of just the data file" - it was "make replication fast enough that it works" - and that required not listing all the dirs all the time. > Deleting both data file and hashes.pkl file is more like a > deliberately-created failure case instead of a normal one. > To me deleting some file that swift wrote to disk without updating (or removing) the index it normally updates during write/delete/replicate to accelerate replication seems like a deliberately created failure case? You could try to flip a bit or truncate a data file and let the auditor pick it up. Or rm a suffix and wait for the every-so-often suffixdir listdir to catch it, or remove an entire partition, or wipe a new filesystem onto the disk. Or shutdown a node and do a PUT, then shutdown the handoff node, and run the reconstructor. Any of the "normal" failure conditions like that (and plenty more!) are all detected by and handled efficiently. To me Swift EC repairing seems different from the triple-replication mode, > where you delete any data file copy, it will be restored. > > Well, replication and reconstruction are different in lots of ways - but not this part. If you rm a .data file without updating the index you'll need some activity (post/copy/put/quarantine) in the suffix before the replication engine can notice. Luckily (?) people don't often go under the covers into the middle of the storage system and rm data like that? -Clay
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev