On Jun 27, 2012, at 7:48 AM, Yousuf Fauzan wrote: > This is great. > > I was loading data using Python. My code would spawn 10 threads and put data > in a queue. All threads would read data from this queue. > However, all threads were hitting the same server/load balancer. > > I tried a different setup too. Where I spawned processes with each process > having its own queue. In this case too, all processes were hitting the same > server. > > I just now made a change to my code. So now I have 10 threads randomly > selecting a node and storing data in it. > Again, I am getting around 50 writes/sec
When the threads randomly pick a node, do they create a new connection to it, or do they pull the connection from a pool? As you saw with the throughput difference between curl and python, persistent connections make big difference. > > Could there be something wrong with the way I have written my loader script? > > On Wed, Jun 27, 2012 at 5:10 PM, Russell Brown <russell.br...@mac.com> wrote: > > On 27 Jun 2012, at 12:36, Yousuf Fauzan wrote: > >> So I changed concurrency to 10 and put all the IPs of the nodes in basho >> bench config. >> Throughput is now around 1500. >> > > I guess you can now try 5 or 15 concurrent workers and see which is optimal > for that set up to get a good feel for the sizing of any connection pools for > your application. > > You can also see how adding nodes and adding workers effects your results to > help you size the cluster you need for your expected usage. > > Cheers > > Russell > >> >> On Wed, Jun 27, 2012 at 4:40 PM, Russell Brown <russell.br...@mac.com> wrote: >> >> On 27 Jun 2012, at 12:09, Yousuf Fauzan wrote: >> >>> I used examples/riakc_pb.config >>> >>> {mode, max}. >>> >>> {duration, 10}. >>> >>> {concurrent, 1}. >> >> Try upping this. On my local 3 node cluster with 8gb ram and an old, cheap >> quad core per box I'd set concurrency to 10 workers. >> >>> >>> {driver, basho_bench_driver_riakc_pb}. >>> >>> {key_generator, {int_to_bin, {uniform_int, 10000}}}. >>> >>> {value_generator, {fixed_bin, 10000}}. >>> >>> {riakc_pb_ips, [{<IP of one of the nodes>}]}. >> >> I add all the IPs here, one entry per node. >> >>> >>> {riakc_pb_replies, 1}. >>> >>> {operations, [{get, 1}, {update, 1}]}. >>> >>> >>> On Wed, Jun 27, 2012 at 4:37 PM, Russell Brown <russell.br...@mac.com> >>> wrote: >>> >>> On 27 Jun 2012, at 12:05, Yousuf Fauzan wrote: >>> >>>> I did use basho bench on my clusters. It should throughput of around 150 >>> >>> Could you share the config you used, please? >>> >>>> >>>> On Wed, Jun 27, 2012 at 4:24 PM, Russell Brown <russell.br...@mac.com> >>>> wrote: >>>> >>>> On 27 Jun 2012, at 11:50, Yousuf Fauzan wrote: >>>> >>>>> Its not about the difference in throughput in the two approaches I took. >>>>> Rather, the issue is that even 200 writes/sec is a bit on the lower side. >>>>> I could be doing something wrong with the configuration because people >>>>> are reporting throughputs of 2-3k ops/sec >>>>> >>>>> If anyone here could guide me in setting up a cluster which would give >>>>> such kind of throughput. >>>> >>>> To get the kind of throughput I use multiple threads / workers. Have you >>>> looked at basho_bench[1], it is a simple, reliable tool to benchmark Riak >>>> clusters? >>>> >>>> Cheers >>>> >>>> Russell >>>> >>>> [1] Basho Bench - https://github.com/basho/basho_bench and >>>> http://wiki.basho.com/Benchmarking.html >>>> >>>>> >>>>> Thanks, >>>>> Yousuf >>>>> >>>>> On Wed, Jun 27, 2012 at 4:02 PM, Eric Anderson <ander...@copperegg.com> >>>>> wrote: >>>>> On Jun 27, 2012, at 5:13 AM, Yousuf Fauzan <yousuffau...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I setup a 3 machine riak SM cluster. Each machine used 4GB Ram and riak >>>>>> OpenSource SmartMachine Image. >>>>>> >>>>>> Afterwards I tried loading data by following two methods >>>>>> 1. Bash script >>>>>> #!/bin/bash >>>>>> echo $(date) >>>>>> for (( c=1; c<=1000; c++ )) >>>>>> do >>>>>> curl -s -d 'this is a test' -H "Content-Type: text/plain" >>>>>> http://127.0.0.1:8098/buckets/test/keys >>>>>> done >>>>>> echo $(date) >>>>>> >>>>>> 2. Python Riak Client >>>>>> c=riak.RiakClient("10.112.2.185") >>>>>> b=c.bucket("test") >>>>>> for i in xrange(10000):o=b.new(str(i), str(i)).store() >>>>>> >>>>>> For case 1, throughput was 25 writes/sec >>>>>> For case 2, throughput was 200 writes/sec >>>>>> >>>>>> Maybe I am making a fundamental mistake somewhere. I tried the above two >>>>>> scripts on EC2 clusters too and still got the same performance. >>>>>> >>>>>> Please, someone help >>>>> >>>>> >>>>> The major difference between these two is the first is executing a >>>>> binary, which has to basically create everything (connection, payload, >>>>> etc) every time through the loop. The second does not - it creates the >>>>> client once, then iterates over it keeping the same client and presumably >>>>> the same connection as well. That makes a huge difference. >>>>> >>>>> I would not use curl to do performance testing. What you probably want >>>>> is something like your python script that will work on many >>>>> threads/processes at once (or fire them up many times). >>>>> >>>>> >>>>> Eric Anderson >>>>> Co-Founder >>>>> CopperEgg >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> riak-users mailing list >>>>> riak-users@lists.basho.com >>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>>> >>> >>> >> >> > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com