The answer always starts with "it depends...". Depends on your hardware, where 
it's physically located, the durability you need, the access patterns, etc

There have been whole phd dissertations on the right way to calculate 
durability. Two parity segments isn't exactly equivalent to three replicas 
because in the EC case you've also got to figure out the chance of failure to 
get all of the necessary remaining segments to satisfy a read request[1].

In your case, using 3 or 4 parity bits will probably get you better durability 
and availability than a 3x replica system and still use less overall drive 
space[2]. My company's product has three "canned" EC policy settings to make it 
simpler for customers to choose. We've got 4+3, 8+4, and 15+4 settings, and we 
steer people to one of them based on how many servers are in their cluster.

Note that there's nothing special about the m=4 examples in Swift's docs, at 
least in the sense of recommending 4 parity as better than 3 or 5 (or any other 
number).

In your case, you'll want to take into account how many drives you can lose and 
how many servers you can lose. Suppose you have a 10+4 scheme and two servers 
and 12 drives in each server. You'll be able to lose 4 drives, yes, but if 
either server goes down, you'll not be able to access your data because each 
server will have 7 fragments (on seven disks). However, if you had 6 servers 
with 4 drives each, for the same total of 24 drives, you could lose four 
drives, like the other situation, but you could also lose up to two servers and 
still be able to read your data[3].

Another consideration is how much overhead you want to have. Increasing the 
data segments lowers the overhead used, but increasing the parity segments 
improves your durability and availability (up to the limits of your physical 
hardware failure domains).

Finally, and probably most simply, you'll want to take into account the 
increased CPU and network cost for a particular EC scheme. A 3x replica write 
needs 3 network connections, and a read needs 1. For an EC policy, a write 
needs k+m connections, and a read needs k. If you're using something really 
large like an 18+3 scheme, you're looking at a 7x overhead in network 
requirements when compared to a 3x replica policy. The increased socket 
management and packet shuffling can add significant burden to your proxy 
servers[4]. Good news on the CPU though. The EC algorithms are old and well 
tuned, especially when using libraries like erasure or isa-l, and CPUs are 
really fast. Erasure code policies do not add significant overhead from the 
encode/decode steps.

So, in summary, it's complicated, there's isn't a "right" answer, and it 
depends a lot on everything else about your cluster. But you've got this! 
You'll do great, and keep asking questions.

I hope all this helps.

--John



[1] At a high level, it's fairly intuitive that a 2+2 scheme is very different 
than a 10+2 scheme, even though they both have 2 parity segments and can 
survive the loss of any two segments.
[2] "probably", because it depends a lot on your specific situation.
[3] The fragments are distributed across the servers, so 14 fragments across 6 
servers means that some servers have 2 fragments and some have 3. If you're 
"lucky" the two files servers would each have 2 fragments, and you'd still be 
able to read your data.
[4] Similarly, the EC reconstructor process needs to do much more work, when 
compared to replication, when it discovers a missing fragment.






On 4 Apr 2018, at 2:12, Mark Kirkwood wrote:

> ...hearing crickets - come on guys, I know you have some thoughts about this 
> :-) !
>
>
> On 29/03/18 13:08, Mark Kirkwood wrote:
>> Hi,
>>
>> We are looking at implementing EC Policies with similar durability to 3x
>> replication. Now naively this corresponds to m=2 (using notation from
>> previous thread). However we could take the opportunity to 'do better'
>> and use m=3 or 4. I note that m=4 seems to be used in some of the Swift
>> documentation. I'd love to get some guidance about how to decide on the
>> 'right amount' of parity!
>>
>> Cheers
>>
>> Mark
>>
>>
>>
>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack@lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to