I had similar questions earlier. You might find [1] useful. As John mentioned, under highly concurrent workloads the container itself can become a bottleneck and sharding across containers can help speed things up. Also, as containers grow in size, the sqlite database keeping information about the objects in that container grows in size and each write might start become slower. Therefore it's a good idea to also "shard vertically" and restrict the number of objects in a container. The recommended count was 1M if the swift cluster is running on rotational disks. It's higher for SSDs but I don't know if there are experiments that suggest a good number.
Unfortunately, the link above doesn't have the images of my experiments. So I'm attaching them again. Note that the experiment that tried to find out how slow can each object write get as number of blobs increases could've been done at a bigger scale. I only pumped ~3M blobs. The Swift cluster in Rackspace was using SSDs. Hope that helps. -Shri [1] https://www.mail-archive.com/openstack@lists.openstack.org/msg01760.html On Wed, Dec 4, 2013 at 3:15 PM, John Dickinson <m...@not.mn> wrote: > correct. a single container is replicated like other data in the system > (typically 3x). This means that a single container is on only 3 spindles, and > an nymber of concurrent writes to objects in that container will attempt to > update the container listing (with graceful failure handling). This means > that under significant concurrency, the concurrent object write speed is > limited by the time it takes to update one of those container replicas. > > There are two easy "fixes" for this: (1) shard your data across containers > and (2) use faster, dedicated drives for the containers (eg SSDs). > > The hard fix for this is to implement container sharding within swift, but > this is a hard problem to solve (although nobody would be opposed to a > successful solution). > > --John > > > > > > On Dec 4, 2013, at 3:01 PM, Stephen Wood <smwo...@gmail.com> wrote: > >> Can someone explain to me (or point me to some good literature) about why >> sharding across containers is such a big deal in terms of performance? Is it >> that a single container is typically localized across a small number of >> shards? >> >> -- >> Stephen Wood >> www.heystephenwood.com >> _______________________________________________ >> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> Post to : openstack@lists.openstack.org >> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >
<<attachment: average_blob_write_time.png>>
<<attachment: sharding_across_containers_new.png>>
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack