Thanks for the inputs Chuck. Please see my responses inline.
On Thu, Oct 10, 2013 at 7:56 AM, Chuck Thier <cth...@gmail.com> wrote: > Hi Shri, > > I think your observations are fairly spot on. Here are a couple of > thoughts/comments. > > 1. I wonder if you are maxing out how much your client can push at 128 > threads. If you were to increase the number threads (or number of clients) > for the higher container counts, you could get more transactions through. > [SJ] In this experiment I simply wanted to see how sharding across containers helps gives the same input data rate. I will run some numbers with higher number of threads to see what's the max number of operations per second I can get. > > 2. Cloudfiles rate limits PUTs at 100 per second to a single container. > This helps ensure fairly consistent performance to a single container. We > also put our container data on SSD drives to help drive better performance. > So your max theoretical performance is 100xNUM_CONTAINERS PUTs/sec. > [SJ] Great to know about the SSDs and the rate limits. Is it also possible to know what version of Swift has been deployed at Rackspace Cloudfiles? > 3. It would be worthwhile to test even larger containers to test how much > container size affects performance. I don't think your sample size is > large enough. > [SJ] Yeah, especially with SSDs, this is definitely not a large enough sample. I guess, I should start at 1M and go upto 10M or so. Will keep you'll posted. -Shri On Wed, Oct 9, 2013 at 11:11 PM, Shrinand Javadekar <shrin...@maginatics.com > > wrote: > >> Thanks Chuck. >> >> In order to really measure this, I ran some tests on Rackspace; i.e. I >> got a VM on Rackspace and that VM was talking to a Rackspace Cloudfiles-US >> swift cluster. The VM and object store were both in the Chicago region. The >> downside of using a public object store is that I have little idea about >> the configuration of Swift being used. But installing and configuring one's >> own enterprise class Swift cluster is no child's play either (to put it >> mildly :D). >> >> In the first experiment, 128 threads were continuously trying to write 1 >> byte blobs into N containers where N was in (1, 32, 64, 128, 256, 512). The >> experiment ran for 15 minutes. The experiment was run thrice for each N and >> the results below are the average of three runs. >> >> [image: Inline image 1] >> The number of writes completed in 15 minutes if ~87K for a single >> container, whereas when these writes are sharded across 32 containers, this >> # is ~135K. >> >> The second experiment was to find out whether Swift becomes slower as the >> number of objects in a container increases. To do this, I measured the time >> it was taking to write blobs in a single container. Here again, I ran the >> experiment three times and the graph below is the average of the three runs. >> >> [image: Inline image 2] >> >> If a container has less than 1.6M blobs, the average time to write a blob >> is ~12.58ms whereas if the container has > 1.6M blobs, the average time to >> write a blob is ~13.29ms. The trend definitely seems to be that as number >> of objects increase, the time to write also increases. >> >> I guess the absolute number may differ depending on factors like memory, >> CPU, disk (SSD's vs rotational) of the servers running swift. But the >> relative numbers give a better picture of the benefits of: >> >> i) Sharding across containers to increase throughput >> ii) Restricting the number of objects per container >> >> Let me know if I have missed out on anything or if there are more >> experiments to run that would make Swift #awesome!! >> >> -Shri >> >> >> >> On Tue, Sep 3, 2013 at 7:47 AM, Chuck Thier <cth...@gmail.com> wrote: >> >>> Hi Shri, >>> >>> The short answer is that sharding your data across containers in swift >>> is generally a good idea. >>> >>> The limitations with containers has a lot more to do with overall >>> concurrency rather than total objects in a container. The number of >>> objects in a container can have an affect on that, but will be less of an >>> issue if you are not putting objects in at a high concurrency. >>> >>> -- >>> Chuck >>> >>> >>> On Sun, Sep 1, 2013 at 9:39 PM, Shrinand Javadekar < >>> shrin...@maginatics.com> wrote: >>> >>>> Hi, >>>> >>>> There have been several articles which talk about keeping the number of >>>> objects in a container to about 1M. Beyond that sqlite starts becoming the >>>> bottleneck. I am going to make sure we abide by this number. >>>> >>>> However, has anyone measured whether putting objects among multiple >>>> containers right from the start gives any performance benefits. For e.g. I >>>> could create 32 containers right at the start and split the objects among >>>> these as I write more and more objects. In the average case, I would have >>>> several partially filled containers instead of a few fully filled ones >>>> (fully filled means having 1M objects). Would this be better for the >>>> overall performance? Any downsides of this approach? Has anyone tried this >>>> before and published numbers on this? >>>> >>>> Thanks in advance. >>>> -Shri >>>> >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> Post to : openstack@lists.openstack.org >>>> Unsubscribe : >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> >>>> >>> >> >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack