On Tue, 2013-02-05 at 21:38 +1300, aaron morton wrote: > The first thing I noticed is your script uses python threading library, which > is hampered by the Global Interpreter Lock > http://docs.python.org/2/library/threading.html > > You don't really have multiple threads running in parallel, try using the > multiprocessor library.
Python _should_ release the GIL around IO-bound work, so this is a situation where the GIL shouldn't be an issue (It's actually a very good use for python's threads as there's no serialization overhead for message passing between processes as there would be in most multi-process examples) A constant factor 2 slowdown really doesn't seem that significant for two different implementations, and I would not worry about this unless you're talking about thousands of machines.. If you are talking about enough machines that this is real $$$, then I do think the python code can be optimised a lot. I'm talking about language/VM specific optimisations - so I'm assuming cpython (the standard /usr/bin/python as in the shebang). I don't know how much of a difference this will make, but I'd be interested in hearing your results: I would start by trying rewriting this: def start_cassandra_client(Threadname): f=open(Threadname,"w") for key in lines: key=key.strip() st=time.time() f.write(str(cf.get(key))+"\n") et=time.time() f.write("Time taken for a single query is " + str(round(1000*(et-st),2))+" milli secs\n") f.close() As something like this: def start_cassandra_client(Threadname): # Avoid variable names outside this scope time_fn = time.time colfam = cf f=open(Threadname,"w") for key in lines: key=key.strip() st=time_fn() f.write(str(colfam.get(key))+"\n") et=time_fn() f.write("Time taken for a single query is " + str(round(1000*(et-st),2))+" milli secs\n") f.close() If you don't consider it cheating compared to the java version, I would also move the "key.strip()" call to the module initiation instead of doing it once per thread, as there's a lot of function dispatch overhead in python. I'd also closely compare the IO going on in both versions (the .write calls). For example this may be significantly faster: et=time_fn() f.write(str(colfam.get(key))+"\nTime taken for a single query is " + str(round(1000*(et-st),2))+" milli secs\n") .. I haven't read your java code and I don't know Java IO semantics well enough to compare the behaviour of both. Tim > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 5/02/2013, at 7:15 AM, Pradeep Kumar Mantha <pradeep...@gmail.com> wrote: > > > Hi, > > > > Could some one please let me know any hints, why the pycassa > > client(attached) is much slower than the YCSB? > > is it something to attribute to performance difference between python and > > Java? or the pycassa api has some performance limitations? > > > > I don't see any client statements affecting the pycassa performance. Please > > have a look at the simple python script attached and let me know > > your suggestions. > > > > thanks > > pradeep > > > > On Thu, Jan 31, 2013 at 4:53 PM, Pradeep Kumar Mantha > > <pradeep...@gmail.com> wrote: > > > > > > On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar Mantha > > <pradeep...@gmail.com> wrote: > > Thanks.. Please find the script as attachment. > > > > Just re-iterating. > > Its just a simple python script which submit 4 threads. > > This script has been scheduled on 8 cores using taskset unix command , thus > > running 32 threads/node. > > and then scaling to 16 nodes > > > > thanks > > pradeep > > > > > > On Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs <ty...@datastax.com> wrote: > > Can you provide the python script that you're using? > > > > (I'm moving this thread to the pycassa mailing list > > (pycassa-disc...@googlegroups.com), which is a better place for this > > discussion.) > > > > > > On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar Mantha > > <pradeep...@gmail.com> wrote: > > Hi, > > > > I am trying to benchmark cassandra on a 12 Data Node cluster using 16 > > clients ( each client uses 32 threads) using custom pycassa client and YCSB. > > > > I found the maximum number of operations/seconds achieved using pycassa > > client is nearly 70k+ reads/second. > > Whereas with YCSB it is ~ 120k reads/second. > > > > Any thoughts, why I see this huge difference in performance? > > > > > > Here is the description of setup. > > > > Pycassa client (a simple python script). > > 1. Each pycassa client starts 4 threads - where each thread queries 76896 > > queries. > > 2. a shell script is used to submit 4threads/each core using taskset unix > > command on a 8 core single node. ( 8 * 4 * 76896 queries) > > 3. Another shell script is used to scale the single node shell script to 16 > > nodes ( total queries now - 16 * 8 * 4 * 76896 queries ) > > > > I tried to keep YCSB configuration as much as similar to my custom pycassa > > benchmarking setup. > > > > YCSB - > > > > Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for > > execution and need to query ( 32 * 76896 keys ), i.e 100% reads > > > > The dataset is different in each case, but has > > > > 1. same number of total records. > > 2. same number of fields. > > 3. field length is almost same. > > > > Could you please let me know, why I see this huge performance difference > > and is there any way I can improve the operations/second using pycassa > > client. > > > > thanks > > pradeep > > > > > > > > > > -- > > Tyler Hobbs > > DataStax > > > > > > > > <pycassa_client.py> >