Thanks Tyler. I just moved the pool and cf which store the connection pool and CF information to have global scope.
Increased the server_list values from 1 to 4. ( i think i can increase them max to 12 since I have 12 data nodes ) when I created 8 threads using python threading package , I see the below error. Exception in thread Thread-3: Traceback (most recent call last): File "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py", line 530, in __bootstrap_inner self.run() File "my_cc.py", line 20, in run start_cassandra_client(self.name) File "my_cc.py", line 33, in start_cassandra_client cf.get(key) File "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py", line 652, in get read_consistency_level or self.read_consistency_level) File "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", line 553, in execute conn = self.get() File "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", line 536, in get raise NoConnectionAvailable(message) NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable to obtain connection after 30 seconds Please have a look at the script attached.. and let me know if I need to change something.. Please bear with me, if I do something terribly wrong.. I am running the script on a 8 processor node. thanks pradeep On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote: > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's best > to share them across multiple threads. Of course, when you do that, make > sure to make the ConnectionPool large enough to support all of the threads > making queries concurrently. I'm also not sure if you're just omitting > this, but pycassa's ConnectionPool will only open connections to servers you > explicitly include in server_list; there's no autodiscovery of other nodes > going on. > > Depending on your network latency, you'll top out on python performance with > a fairly low number of threads due to the GIL. It's best to use multiple > processes if you really want to benchmark something. > > > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha <pradeep...@gmail.com> > wrote: >> >> Hi, >> >> Thanks. I would like to benchmark cassandra with our application so >> that we understand the details of how the actual benchmarking is done. >> Not sure, how easy it would be to integrate YCSB with our application. >> >> So, i am trying different client interfaces to cassandra. >> >> I found >> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32 >> threads ( each querying X number of queries ). >> >> cassandra-cli took 133 seconds >> pycassa took 521 seconds. >> >> Here is the python pycassa code used to query and passed to each >> thread.... >> >> def start_cassandra_client(Threadname): >> pool = pycassa.ConnectionPool('Blast', >> server_list=['xxx.xx.xx.xx']) >> cf = pycassa.ColumnFamily(pool, 'Blast_NR') >> inp_file=open("pycassa_100%_query") >> for key in inp_file: >> key=key.strip() >> cf.get(key) >> >> Does Java clients like Hector/Astynax help here.. I am more >> comfortable with Python than Java and our existing application is also >> in Python. >> >> thanks >> pradeep >> >> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <edlinuxg...@gmail.com> >> wrote: >> > Wow you managed to do a load test through the cassandra-cli. There >> > should be >> > a merit badge for that. >> > >> > You should use the built in stress tool or YCSB. >> > >> > The CLI has to do much more string conversion then a normal client would >> > and >> > it is not built for performance. You will definitely get better numbers >> > through other means. >> > >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha >> > <pradeep...@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> I am trying to maximize execution of the number of read queries/second. >> >> >> >> Here is my cluster configuration. >> >> >> >> Replication - Default >> >> 12 Data Nodes. >> >> 16 Client Nodes - used for querying. >> >> >> >> Each client node executes 32 threads - each thread executes 76896 read >> >> queries using cassandra-cli tool. >> >> i.e all the read queries are stored in a file and that file is >> >> given to cassandra-cli tool ( using -f option ) which is executed by a >> >> thread. >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896. >> >> >> >> The read queries on each client node submitted at the same time. The >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds - >> >> which is nearly 53k transactions/second. >> >> >> >> I would like to know if there is any other way/tool through which I >> >> can improve the number of transactions/second. >> >> Is the performance affected by cassandra-cli tool? >> >> >> >> thanks >> >> pradeep >> > >> > > > > > > -- > Tyler Hobbs > DataStax
pycassa_client.py
Description: Binary data