You just need to increase the ConnectionPool size to handle the number of threads you have using it concurrently. Set the pool_size kwarg to at least the number of threads you're using.
On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha <pradeep...@gmail.com>wrote: > Thanks Tyler. > > I just moved the pool and cf which store the connection pool and CF > information to have global scope. > > Increased the server_list values from 1 to 4. ( i think i can increase > them max to 12 since I have 12 data nodes ) > > when I created 8 threads using python threading package , I see the > below error. > > Exception in thread Thread-3: > Traceback (most recent call last): > File > "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py", > line 530, in __bootstrap_inner > self.run() > File "my_cc.py", line 20, in run > start_cassandra_client(self.name) > File "my_cc.py", line 33, in start_cassandra_client > cf.get(key) > File > "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py", > line 652, in get > read_consistency_level or self.read_consistency_level) > File > "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", > line 553, in execute > conn = self.get() > File > "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", > line 536, in get > raise NoConnectionAvailable(message) > NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable > to obtain connection after 30 seconds > > > Please have a look at the script attached.. and let me know if I need > to change something.. Please bear with me, if I do something terribly > wrong.. > > I am running the script on a 8 processor node. > > thanks > pradeep > > On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote: > > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's > best > > to share them across multiple threads. Of course, when you do that, make > > sure to make the ConnectionPool large enough to support all of the > threads > > making queries concurrently. I'm also not sure if you're just omitting > > this, but pycassa's ConnectionPool will only open connections to servers > you > > explicitly include in server_list; there's no autodiscovery of other > nodes > > going on. > > > > Depending on your network latency, you'll top out on python performance > with > > a fairly low number of threads due to the GIL. It's best to use multiple > > processes if you really want to benchmark something. > > > > > > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha < > pradeep...@gmail.com> > > wrote: > >> > >> Hi, > >> > >> Thanks. I would like to benchmark cassandra with our application so > >> that we understand the details of how the actual benchmarking is done. > >> Not sure, how easy it would be to integrate YCSB with our application. > >> > >> So, i am trying different client interfaces to cassandra. > >> > >> I found > >> > >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32 > >> threads ( each querying X number of queries ). > >> > >> cassandra-cli took 133 seconds > >> pycassa took 521 seconds. > >> > >> Here is the python pycassa code used to query and passed to each > >> thread.... > >> > >> def start_cassandra_client(Threadname): > >> pool = pycassa.ConnectionPool('Blast', > >> server_list=['xxx.xx.xx.xx']) > >> cf = pycassa.ColumnFamily(pool, 'Blast_NR') > >> inp_file=open("pycassa_100%_query") > >> for key in inp_file: > >> key=key.strip() > >> cf.get(key) > >> > >> Does Java clients like Hector/Astynax help here.. I am more > >> comfortable with Python than Java and our existing application is also > >> in Python. > >> > >> thanks > >> pradeep > >> > >> > >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <edlinuxg...@gmail.com > > > >> wrote: > >> > Wow you managed to do a load test through the cassandra-cli. There > >> > should be > >> > a merit badge for that. > >> > > >> > You should use the built in stress tool or YCSB. > >> > > >> > The CLI has to do much more string conversion then a normal client > would > >> > and > >> > it is not built for performance. You will definitely get better > numbers > >> > through other means. > >> > > >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha > >> > <pradeep...@gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> I am trying to maximize execution of the number of read > queries/second. > >> >> > >> >> Here is my cluster configuration. > >> >> > >> >> Replication - Default > >> >> 12 Data Nodes. > >> >> 16 Client Nodes - used for querying. > >> >> > >> >> Each client node executes 32 threads - each thread executes 76896 > read > >> >> queries using cassandra-cli tool. > >> >> i.e all the read queries are stored in a file and that file is > >> >> given to cassandra-cli tool ( using -f option ) which is executed by > a > >> >> thread. > >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896. > >> >> > >> >> The read queries on each client node submitted at the same time. The > >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds - > >> >> which is nearly 53k transactions/second. > >> >> > >> >> I would like to know if there is any other way/tool through which I > >> >> can improve the number of transactions/second. > >> >> Is the performance affected by cassandra-cli tool? > >> >> > >> >> thanks > >> >> pradeep > >> > > >> > > > > > > > > > > > -- > > Tyler Hobbs > > DataStax > -- Tyler Hobbs DataStax <http://datastax.com/>