I had similar questions earlier. You might find [1] useful.

As John mentioned, under highly concurrent workloads the container
itself can become a bottleneck and sharding across containers can help
speed things up. Also, as containers grow in size, the sqlite database
keeping information about the objects in that container grows in size
and each write might start become slower. Therefore it's a good idea
to also "shard vertically" and restrict the number of objects in a
container. The recommended count was 1M if the swift cluster is
running on rotational disks. It's higher for SSDs but I don't know if
there are experiments that suggest a good number.

Unfortunately, the link above doesn't have the images of my
experiments. So I'm attaching them again. Note that the experiment
that tried to find out how slow can each object write get as number of
blobs increases could've been done at a bigger scale. I only pumped
~3M blobs. The Swift cluster in Rackspace was using SSDs.

Hope that helps.
-Shri

[1] https://www.mail-archive.com/openstack@lists.openstack.org/msg01760.html

On Wed, Dec 4, 2013 at 3:15 PM, John Dickinson <m...@not.mn> wrote:
> correct. a single container is replicated like other data in the system 
> (typically 3x). This means that a single container is on only 3 spindles, and 
> an nymber of concurrent writes to objects in that container will attempt to 
> update the container listing (with graceful failure handling). This means 
> that under significant concurrency, the concurrent object write speed is 
> limited by the time it takes to update one of those container replicas.
>
> There are two easy "fixes" for this: (1) shard your data across containers 
> and (2) use faster, dedicated drives for the containers (eg SSDs).
>
> The hard fix for this is to implement container sharding within swift, but 
> this is a hard problem to solve (although nobody would be opposed to a 
> successful solution).
>
> --John
>
>
>
>
>
> On Dec 4, 2013, at 3:01 PM, Stephen Wood <smwo...@gmail.com> wrote:
>
>> Can someone explain to me (or point me to some good literature) about why 
>> sharding across containers is such a big deal in terms of performance? Is it 
>> that a single container is typically localized across a small number of 
>> shards?
>>
>> --
>> Stephen Wood
>> www.heystephenwood.com
>> _______________________________________________
>> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack@lists.openstack.org
>> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack@lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

<<attachment: average_blob_write_time.png>>

<<attachment: sharding_across_containers_new.png>>

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to