Re: [openstack-dev] [Openstack] [Swift] Erasure coding reconstructor doesn't work

Luse, Paul E Wed, 22 Jul 2015 12:42:41 -0700

Correct, it by design.  Swift doesn’t expect people to delete things “under the 
covers”.  When the auditor finds a corrupted file, it’s the one that quantities 
it and knows that it also needs to invalidate the hashes.pkl file.  This 
mechanism is there to minimize extra ‘stuff’ going on both at the node and on 
the cluster when it comes to making sure there is durability in the system.


Wrt why the replication code seems to work if you delete just a .data (again, 
you shouldn’t do this as files don’t just disappear, the intention is that the 
auditor is in charge here) is because of some code in the replicator that I 
didn’t ‘mimic’ in the reconstructor and it doesn’t look like clay did either 
when he worked on it.  Not really sure it was there – forces a listing every 10 
passes for some reason.  Clay? (see do_listdir in update() in the replciator)

Thx
Paul

From: Changbin Liu [mailto:changbin....@gmail.com]
Sent: Wednesday, July 22, 2015 12:24 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Openstack] [Swift] Erasure coding reconstructor 
doesn't work

Thanks, Paul and Clay.

By "deleted one data fragment" I meant I "rm" only the data file. I did not 
delete the hashes.pkl file in the outer directory.

I tried it again. This time deleting both the data file and the hashes.pkl 
file. The reconstructor is able to restore the data file correctly.

But now I wonder: is it "by design" that EC does not handle an accidental 
deletion of just the data file? Deleting both data file and hashes.pkl file is 
more like a deliberately-created failure case instead of a normal one.  To me 
Swift EC repairing seems different from the triple-replication mode, where you 
delete any data file copy, it will be restored.



Thanks

Changbin

On Tue, Jul 21, 2015 at 5:28 PM, Luse, Paul E 
<paul.e.l...@intel.com<mailto:paul.e.l...@intel.com>> wrote:
I was about to ask that very same thing and, at the same time, if you can 
indicate if you’ve seen errors in any logs and if so please provide those as 
well.  I’m hoping you just didn’t delete the hashes.pkl file though ☺

-Paul

From: Clay Gerrard 
[mailto:clay.gerr...@gmail.com<mailto:clay.gerr...@gmail.com>]
Sent: Tuesday, July 21, 2015 2:22 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Openstack] [Swift] Erasure coding reconstructor 
doesn't work

How did you "deleted one data fragment"?

Like replication the EC consistency engine uses some sub directory hashing to 
accelerate replication requests in a consistent system - so if you just rm a 
file down in an hashdir somewhere you also need to delete the hashes.pkl up in 
the part dir (or call the invalidate_hash method like PUT, DELETE, POST, and 
quarantine do)

Every so often someone discusses the idea of having the auditor invalidate a 
hash after "long enough" or take some action on empty hashdirs (mind the 
races!) - but its really only an issue when someone delete's something by hand 
so we normally manage to get distracted with other things.

-Clay

On Tue, Jul 21, 2015 at 1:38 PM, Changbin Liu 
<changbin....@gmail.com<mailto:changbin....@gmail.com>> wrote:
Folks,

To test the latest feature of Swift erasure coding, I followed this document 
(http://docs.openstack.org/developer/swift/overview_erasure_code.html) to 
deploy a simple cluster. I used Swift 2.3.0.

I am glad that operations like object PUT/GET/DELETE worked fine. I can see 
that objects were correctly encoded/uploaded and downloaded at proxy and object 
servers.

However, I noticed that swift-object-reconstructor seemed don't work as 
expected. Here is my setup: my cluster has three object servers, and I use this 
policy:

[storage-policy:1]
policy_type = erasure_coding
name = jerasure-rs-vand-2-1
ec_type = jerasure_rs_vand
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_object_segment_size = 1048576

After I uploaded one object, I verified that: there was one data fragment on 
each of two object servers, and one parity fragment on the third object server. 
However, when I deleted one data fragment, no matter how long I waited, it 
never got repaired, i.e., the deleted data fragment was never regenerated by 
the swift-object-reconstructor process.

My question: is swift-object-reconstructor supposed to be "NOT WORKING" given 
the current implementation status? Or, is there any configuration I missed in 
setting up swift-object-reconstructor?

Thanks

Changbin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack] [Swift] Erasure coding reconstructor doesn't work

Reply via email to