The first thing I noticed is your script uses python threading library, which is hampered by the Global Interpreter Lock http://docs.python.org/2/library/threading.html
You don't really have multiple threads running in parallel, try using the multiprocessor library. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/02/2013, at 7:15 AM, Pradeep Kumar Mantha <pradeep...@gmail.com> wrote: > Hi, > > Could some one please let me know any hints, why the pycassa client(attached) > is much slower than the YCSB? > is it something to attribute to performance difference between python and > Java? or the pycassa api has some performance limitations? > > I don't see any client statements affecting the pycassa performance. Please > have a look at the simple python script attached and let me know > your suggestions. > > thanks > pradeep > > On Thu, Jan 31, 2013 at 4:53 PM, Pradeep Kumar Mantha <pradeep...@gmail.com> > wrote: > > > On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar Mantha <pradeep...@gmail.com> > wrote: > Thanks.. Please find the script as attachment. > > Just re-iterating. > Its just a simple python script which submit 4 threads. > This script has been scheduled on 8 cores using taskset unix command , thus > running 32 threads/node. > and then scaling to 16 nodes > > thanks > pradeep > > > On Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs <ty...@datastax.com> wrote: > Can you provide the python script that you're using? > > (I'm moving this thread to the pycassa mailing list > (pycassa-disc...@googlegroups.com), which is a better place for this > discussion.) > > > On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar Mantha <pradeep...@gmail.com> > wrote: > Hi, > > I am trying to benchmark cassandra on a 12 Data Node cluster using 16 clients > ( each client uses 32 threads) using custom pycassa client and YCSB. > > I found the maximum number of operations/seconds achieved using pycassa > client is nearly 70k+ reads/second. > Whereas with YCSB it is ~ 120k reads/second. > > Any thoughts, why I see this huge difference in performance? > > > Here is the description of setup. > > Pycassa client (a simple python script). > 1. Each pycassa client starts 4 threads - where each thread queries 76896 > queries. > 2. a shell script is used to submit 4threads/each core using taskset unix > command on a 8 core single node. ( 8 * 4 * 76896 queries) > 3. Another shell script is used to scale the single node shell script to 16 > nodes ( total queries now - 16 * 8 * 4 * 76896 queries ) > > I tried to keep YCSB configuration as much as similar to my custom pycassa > benchmarking setup. > > YCSB - > > Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for > execution and need to query ( 32 * 76896 keys ), i.e 100% reads > > The dataset is different in each case, but has > > 1. same number of total records. > 2. same number of fields. > 3. field length is almost same. > > Could you please let me know, why I see this huge performance difference and > is there any way I can improve the operations/second using pycassa client. > > thanks > pradeep > > > > > -- > Tyler Hobbs > DataStax > > > > <pycassa_client.py>