Hi Sam, I got some thoughts from your mail. EC is more useful for a centered storage solution but not for a distributed one. It will bring a heavy load on internal network traffic.
Actually from network performance's point of view, the 3 copies are also sort of the result of compromise. There should be a way to combine them together and tune the parameters under different scenarios. They, however, would bring different reliability and performance. Regards, Howard On Thu, Oct 18, 2012 at 7:30 AM, Eugene Kirpichov <ekirpic...@gmail.com>wrote: > Hi Sam, > > My five cents. > > Using Fountain codes, which are also a class of EC, one can make all > the blocks equivalent in role (no separation into data and parity > blocks). > http://en.wikipedia.org/wiki/Fountain_code > > They resolve a few of the issues that you raised, however they may > raise others - e.g. it's more difficult to determine how many blocks > you need to fetch to reconstruct the data. > > On Wed, Oct 17, 2012 at 4:24 PM, Samuel Merritt <s...@swiftstack.com> > wrote: > > On 10/15/12 5:36 PM, Duan, Jiangang wrote: > >> > >> Some of our customers are interested in Erasure code than tri-replicate > to > >> save disk space. > >> We propose a BP "Light weight Erasure code framework for swift", which > can > >> be found here https://blueprints.launchpad.net/swift/+spec/swift-ec > >> The general idea is to have some daemon on storage node to do offline > scan > >> - select code object with big enough size to do EC. > >> > >> Will glad to hear any feedback on this. > > > > > > Here, in no particular order, are some thoughts I have. > > > > - Object blocks (both data blocks and parity blocks) will need to be > marked > > somehow so that 3 replicas of each block aren't kept. This is a pretty > > fundamental change to Swift; up until now, all objects are treated the > same. > > It's essentially introducing the notion of tiered storage into Swift. > > > > - Who's responsible for ensuring the presence of all the blocks? That is, > > assume you have an object that's been split into ten data blocks (D1, D2, > > ..., D10) and 2 parity blocks (P1, P2). The drive with D7 on it dies. > Which > > replicator(s) is(are) responsible for rebuilding D7 and storing it on a > > handoff node? > > > > If you have the replicators on each block's machine checking for > failures, > > then you'll wind up with more people checking each replica. Here, it > would > > be 11 replicators ensuring that each block is present. Compare that to > the > > full-replication case, where there are 2 replicators checking on it. > That's > > going to result in more traffic on the internal network. > > > > - There will need to be throttles on the transformation daemons (replica > -> > > EC and vice versa), as that's very IO intensive. If a big bunch of data > is > > uploaded at one time and then not accessed (think large backups), then > that > > could be a ticking time bomb for my cluster performance. After those > objects > > become "cold", the transformation daemons will thrash my disks and > network > > turning them into EC-type objects. > > > > - Does this open up a Swift cluster to a DoS attack? If my objects are > > stored w/EC, then can someone go through and request a few bytes from > each > > object in my cluster a few times and cause all my objects to get "hot"? > > Under the proposed scheme, this would turn my objects from EC-storage to > > replica-storage, filling up my disks and killing my cluster. To mitigate > > that, I'd have to keep enough disk around to hold 3 replicas of > everything, > > and at that point, I may as well just keep the 3 replicas. > > > > - Another thought for a resource-consumption attack: can someone slowly > walk > > my objects and make a large fraction (say, 5%) of them hot each day? That > > seems like it would make the transformation daemons run at maximum > capacity > > all the time trying to keep up. > > > > - Retrieval of EC-stored objects becomes more failure-prone. With > > replica-stored objects, 1 out of 3 object servers has to be available > for a > > GET request to work. With EC-stored objects and a 10:2 coding, 10 out of > 12 > > object servers have to be available. That makes network partitions much > > worse for data availability. > > > > - EC-storage is at odds with geographic replication. Of course, Swift > > supports neither one today. However, with geographic replication, one > wants > > to have a local replica of each each object in each geographic region, > which > > results in more copies for lower latency. With EC-storage, less data is > > stored. When they're combined, the result is a whole lot of traffic > across > > slow, expensive WAN links. > > > > - Recombining EC-stored object chunks is going to chew up a ton more CPU > on > > either the object or proxy servers, depending on which one does it. If > the > > proxy, then it'll add more to an already CPU-heavy workload. If the > object > > server, then it'll make using big storage boxes less practical (like one > of > > the 48-drives-in-4U servers one can buy). > > > > - Can one change the EC-coding level? That is, if I'm using 10:2 coding > (so > > each object turns into 10 data blocks and 2 parity blocks), can I change > > that later? Will that have massive performance impacts on my cluster as > more > > data blocks are computed? > > > > It may be that this is like changing the replica count, and the answer is > > "yes, but your cluster will thrash for a long time after you do it". > > > > - Where's the original checksum stored? Clearly, each block will have its > > own checksum for the auditors to use. However, if a client issues a > request > > like "HEAD /a/c/o", that'll contain the checksum of the original file. > Does > > that live somewhere, or will the proxy have to read all the bytes and > > determine the checksum? > > > > - I wonder what effect this will have on internal-network traffic. With a > > replica-stored object, the proxy opens one connection to an object > server, > > sends a request, gets a response, and streams the bytes out to the > client. > > > > With an EC-stored object, the proxy has to open connections to, say, 10 > > different object servers. Further, if one of the data blocks is > unavailable > > (say data block 5), then the proxy has to go ahead and re-request all the > > data blocks plus a parity block so that it can fill in the gaps. That > may be > > a significant increase in traffic on Swift's internal network. Further, > by > > using such a large number of connections, it considerably increases the > > probability of a connection failure, which would mean more client > requests > > would fail with truncated downloads. > > > > > > Those are all the thoughts I have right now that are coherent enough to > put > > into text. Clearly, adding erasure coding (or any other form of tiered > > storage) to Swift is not something undertaken lightly. > > > > Hope this helps. > > > > > > _______________________________________________ > > Mailing list: https://launchpad.net/~openstack > > Post to : openstack@lists.launchpad.net > > Unsubscribe : https://launchpad.net/~openstack > > More help : https://help.launchpad.net/ListHelp > > > > -- > Eugene Kirpichov > http://www.linkedin.com/in/eugenekirpichov > We're hiring! http://tinyurl.com/mirantis-openstack-engineer > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp >
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp