On 9/30/20 8:58 AM, Xavi Hernandez wrote:
This is normal. A dispersed volume writes encoded fragments of each block in each brick. In this case it's a 2+1 configuration, so each block is divided into 2 fragments. A third fragment is generated
for redundancy and stored on the third brick.
OK. But for Distributed-Replicate 2 x 3 setup and 64K shards, 4M file should be
split into (4096 / 64) * 3 = 192 shards, not 189. So why 189?
And if all bricks are considered equal and has enough amount of free space,
shards distribution {24, 24, 24, 39, 39, 39} looks suboptimal.
Why not {31, 32, 31, 32, 31, 32}? Isn't it a bug?
This is not right. A disperse 2+1 configuration only supports a single failure. Wiping 2 fragments from the same file makes the file unrecoverable. Disperse works using the Reed-Solomon erasure code,
which requires at least 2 healthy fragments to recover the data (in a 2+1 configuration).
It seems that I missed the point that all bricks are considered equal,
regardless of the physical host they're attached to.
So, for the Distributed-Disperse 2 x (2 + 1) setup with 3 hosts, 2 bricks per
each, and two files, A and B, it's possible to have
the following layout:
Host0: Host1: Host2:
|- Brick0: A0 B0 |- Brick0: A1 |- Brick0: A2
|- Brick1: B1 |- Brick1: B2 |- Brick1:
This setup can tolerate single brick failure but not single host failure
because if Host0 is down, two fragments of B will be lost
and so B becomes unrecoverable (but A is not).
If this is so, is it possible/hard to enforce 'one fragment per *host*'
behavior? If we can guarantee the following:
Host0: Host1: Host2:
|- Brick0: A0 |- Brick0: A1 |- Brick0: A2
|- Brick1: B1 |- Brick1: B2 |- Brick1: B0
this setup can tolerate both single brick and single host failures.
Dmitry
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users