Thanks Chuck. In order to really measure this, I ran some tests on Rackspace; i.e. I got a VM on Rackspace and that VM was talking to a Rackspace Cloudfiles-US swift cluster. The VM and object store were both in the Chicago region. The downside of using a public object store is that I have little idea about the configuration of Swift being used. But installing and configuring one's own enterprise class Swift cluster is no child's play either (to put it mildly :D).
In the first experiment, 128 threads were continuously trying to write 1 byte blobs into N containers where N was in (1, 32, 64, 128, 256, 512). The experiment ran for 15 minutes. The experiment was run thrice for each N and the results below are the average of three runs. [image: Inline image 1] The number of writes completed in 15 minutes if ~87K for a single container, whereas when these writes are sharded across 32 containers, this # is ~135K. The second experiment was to find out whether Swift becomes slower as the number of objects in a container increases. To do this, I measured the time it was taking to write blobs in a single container. Here again, I ran the experiment three times and the graph below is the average of the three runs. [image: Inline image 2] If a container has less than 1.6M blobs, the average time to write a blob is ~12.58ms whereas if the container has > 1.6M blobs, the average time to write a blob is ~13.29ms. The trend definitely seems to be that as number of objects increase, the time to write also increases. I guess the absolute number may differ depending on factors like memory, CPU, disk (SSD's vs rotational) of the servers running swift. But the relative numbers give a better picture of the benefits of: i) Sharding across containers to increase throughput ii) Restricting the number of objects per container Let me know if I have missed out on anything or if there are more experiments to run that would make Swift #awesome!! -Shri On Tue, Sep 3, 2013 at 7:47 AM, Chuck Thier <cth...@gmail.com> wrote: > Hi Shri, > > The short answer is that sharding your data across containers in swift is > generally a good idea. > > The limitations with containers has a lot more to do with overall > concurrency rather than total objects in a container. The number of > objects in a container can have an affect on that, but will be less of an > issue if you are not putting objects in at a high concurrency. > > -- > Chuck > > > On Sun, Sep 1, 2013 at 9:39 PM, Shrinand Javadekar < > shrin...@maginatics.com> wrote: > >> Hi, >> >> There have been several articles which talk about keeping the number of >> objects in a container to about 1M. Beyond that sqlite starts becoming the >> bottleneck. I am going to make sure we abide by this number. >> >> However, has anyone measured whether putting objects among multiple >> containers right from the start gives any performance benefits. For e.g. I >> could create 32 containers right at the start and split the objects among >> these as I write more and more objects. In the average case, I would have >> several partially filled containers instead of a few fully filled ones >> (fully filled means having 1M objects). Would this be better for the >> overall performance? Any downsides of this approach? Has anyone tried this >> before and published numbers on this? >> >> Thanks in advance. >> -Shri >> >> >> >> _______________________________________________ >> Mailing list: >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> Post to : openstack@lists.openstack.org >> Unsubscribe : >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> >> >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack