[Openstack] [openstack] [swift] Erasure Coding - No reconstruction to other nodes/disks on disk failure

Hannes Fuchs Mon, 22 Jan 2018 14:12:06 -0800

Hello all,

for my master's thesis I'm analyzing different storage policies in
openstack swift. I'm manly interested in the reconstruction speed of the
different EC implementations.


I've noticed in my tests, that there is no reconstruction of
fragments/parity to other nodes/disks if a disk fails.

My test setup consists of 8 nodes with each 4 disks. OS is Ubuntu 16.04
LTS and the swift version is 2.15.1/pike and here are my 2 example policies:

---
[storage-policy:2]
name = liberasurecode-rs-vand-4-2
policy_type = erasure_coding
ec_type = liberasurecode_rs_vand
ec_num_data_fragments = 4
ec_num_parity_fragments = 2
ec_object_segment_size = 1048576

[storage-policy:3]
name = liberasurecode-rs-vand-3-1
policy_type = erasure_coding
ec_type = liberasurecode_rs_vand
ec_num_data_fragments = 3
ec_num_parity_fragments = 1
ec_object_segment_size = 1048576
---

ATM I've tested only the ec_type liberasurecode_rs_vand. With other
implementations the startup of swift fails, but I think this is another
topic.

To simulate a disk failure I'm using fault injection [1].

Testrun example:
1. fill with objects (32.768 1M Objects, Sum: 32GB)
2. make a disk "fail"
3. disk failure is detected, /but no reconstruction/
4. replace "failed" disk, mount "new" empty disk
5. missing fragments/parity is reconstructed on new empty disk

Expected:
1. fill with objects (32.768 1M Objects, Sum: 32GB)
2. make a disk "fail"
3. disk failure is detected, reconstruction to remaining disks/nodes
4. replace "failed" disk, mount "new" empty disk
5. rearrange data in ring to pre fail state


Shouldn't be the missing fragments/parity reconstructed on the remaining
disks/nodes? (See point 3, in Testrun example)


[1]
https://www.kernel.org/doc/Documentation/fault-injection/fault-injection.txt


Cheers,
Hannes Fuchs

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

[Openstack] [openstack] [swift] Erasure Coding - No reconstruction to other nodes/disks on disk failure

Reply via email to