The simple thing to do would be use the multiprocessing package and eliminate all shared state.
On a multicore box python threads can run on different cores and battle over obtaining the GIL. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/02/2013, at 11:34 PM, Tim Wintle <timwin...@gmail.com> wrote: > On Tue, 2013-02-05 at 21:38 +1300, aaron morton wrote: >> The first thing I noticed is your script uses python threading library, >> which is hampered by the Global Interpreter Lock >> http://docs.python.org/2/library/threading.html >> >> You don't really have multiple threads running in parallel, try using the >> multiprocessor library. > > Python _should_ release the GIL around IO-bound work, so this is a > situation where the GIL shouldn't be an issue (It's actually a very good > use for python's threads as there's no serialization overhead for > message passing between processes as there would be in most > multi-process examples) > > > A constant factor 2 slowdown really doesn't seem that significant for > two different implementations, and I would not worry about this unless > you're talking about thousands of machines.. > > If you are talking about enough machines that this is real $$$, then I > do think the python code can be optimised a lot. > > I'm talking about language/VM specific optimisations - so I'm assuming > cpython (the standard /usr/bin/python as in the shebang). > > I don't know how much of a difference this will make, but I'd be > interested in hearing your results: > > > I would start by trying rewriting this: > > def start_cassandra_client(Threadname): > f=open(Threadname,"w") > for key in lines: > key=key.strip() > st=time.time() > f.write(str(cf.get(key))+"\n") > et=time.time() > f.write("Time taken for a single query is " + > str(round(1000*(et-st),2))+" milli secs\n") > f.close() > > As something like this: > > def start_cassandra_client(Threadname): > # Avoid variable names outside this scope > time_fn = time.time > colfam = cf > f=open(Threadname,"w") > for key in lines: > key=key.strip() > st=time_fn() > f.write(str(colfam.get(key))+"\n") > et=time_fn() > f.write("Time taken for a single query is " + > str(round(1000*(et-st),2))+" milli secs\n") > f.close() > > > If you don't consider it cheating compared to the java version, I would > also move the "key.strip()" call to the module initiation instead of > doing it once per thread, as there's a lot of function dispatch overhead > in python. > > > I'd also closely compare the IO going on in both versions (the .write > calls). For example this may be significantly faster: > > et=time_fn() > f.write(str(colfam.get(key))+"\nTime taken for a single query is " > + str(round(1000*(et-st),2))+" milli secs\n") > > > .. I haven't read your java code and I don't know Java IO semantics well > enough to compare the behaviour of both. > > Tim > > > > >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 5/02/2013, at 7:15 AM, Pradeep Kumar Mantha <pradeep...@gmail.com> wrote: >> >>> Hi, >>> >>> Could some one please let me know any hints, why the pycassa >>> client(attached) is much slower than the YCSB? >>> is it something to attribute to performance difference between python and >>> Java? or the pycassa api has some performance limitations? >>> >>> I don't see any client statements affecting the pycassa performance. Please >>> have a look at the simple python script attached and let me know >>> your suggestions. >>> >>> thanks >>> pradeep >>> >>> On Thu, Jan 31, 2013 at 4:53 PM, Pradeep Kumar Mantha >>> <pradeep...@gmail.com> wrote: >>> >>> >>> On Thu, Jan 31, 2013 at 4:49 PM, Pradeep Kumar Mantha >>> <pradeep...@gmail.com> wrote: >>> Thanks.. Please find the script as attachment. >>> >>> Just re-iterating. >>> Its just a simple python script which submit 4 threads. >>> This script has been scheduled on 8 cores using taskset unix command , thus >>> running 32 threads/node. >>> and then scaling to 16 nodes >>> >>> thanks >>> pradeep >>> >>> >>> On Thu, Jan 31, 2013 at 4:38 PM, Tyler Hobbs <ty...@datastax.com> wrote: >>> Can you provide the python script that you're using? >>> >>> (I'm moving this thread to the pycassa mailing list >>> (pycassa-disc...@googlegroups.com), which is a better place for this >>> discussion.) >>> >>> >>> On Thu, Jan 31, 2013 at 6:25 PM, Pradeep Kumar Mantha >>> <pradeep...@gmail.com> wrote: >>> Hi, >>> >>> I am trying to benchmark cassandra on a 12 Data Node cluster using 16 >>> clients ( each client uses 32 threads) using custom pycassa client and YCSB. >>> >>> I found the maximum number of operations/seconds achieved using pycassa >>> client is nearly 70k+ reads/second. >>> Whereas with YCSB it is ~ 120k reads/second. >>> >>> Any thoughts, why I see this huge difference in performance? >>> >>> >>> Here is the description of setup. >>> >>> Pycassa client (a simple python script). >>> 1. Each pycassa client starts 4 threads - where each thread queries 76896 >>> queries. >>> 2. a shell script is used to submit 4threads/each core using taskset unix >>> command on a 8 core single node. ( 8 * 4 * 76896 queries) >>> 3. Another shell script is used to scale the single node shell script to 16 >>> nodes ( total queries now - 16 * 8 * 4 * 76896 queries ) >>> >>> I tried to keep YCSB configuration as much as similar to my custom pycassa >>> benchmarking setup. >>> >>> YCSB - >>> >>> Launched 16 YCSB clients on 16 nodes where each client uses 32 threads for >>> execution and need to query ( 32 * 76896 keys ), i.e 100% reads >>> >>> The dataset is different in each case, but has >>> >>> 1. same number of total records. >>> 2. same number of fields. >>> 3. field length is almost same. >>> >>> Could you please let me know, why I see this huge performance difference >>> and is there any way I can improve the operations/second using pycassa >>> client. >>> >>> thanks >>> pradeep >>> >>> >>> >>> >>> -- >>> Tyler Hobbs >>> DataStax >>> >>> >>> >>> <pycassa_client.py> >> > >