Hi, Thanks Tyler.
Below is the *global* connection pool I am trying to use, where the server_list contains all the ips of 12 DataNodes I am using and pool_size is the number of threads and I just set to timeout to 60 to avoid connection retry errors. pool = pycassa.ConnectionPool('Blast', server_list=server_list,pool_size=32,timeout=60) It seems the performance is still stuck at 521 seconds.. which is 177 seconds for cassandra-cli. Am I still missing something? thanks Pradeep On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote: > You just need to increase the ConnectionPool size to handle the number of > threads you have using it concurrently. Set the pool_size kwarg to at least > the number of threads you're using. > > > On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha <pradeep...@gmail.com> > wrote: >> >> Thanks Tyler. >> >> I just moved the pool and cf which store the connection pool and CF >> information to have global scope. >> >> Increased the server_list values from 1 to 4. ( i think i can increase >> them max to 12 since I have 12 data nodes ) >> >> when I created 8 threads using python threading package , I see the >> below error. >> >> Exception in thread Thread-3: >> Traceback (most recent call last): >> File >> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py", >> line 530, in __bootstrap_inner >> self.run() >> File "my_cc.py", line 20, in run >> start_cassandra_client(self.name) >> File "my_cc.py", line 33, in start_cassandra_client >> cf.get(key) >> File >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py", >> line 652, in get >> read_consistency_level or self.read_consistency_level) >> File >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", >> line 553, in execute >> conn = self.get() >> File >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", >> line 536, in get >> raise NoConnectionAvailable(message) >> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable >> to obtain connection after 30 seconds >> >> >> Please have a look at the script attached.. and let me know if I need >> to change something.. Please bear with me, if I do something terribly >> wrong.. >> >> I am running the script on a 8 processor node. >> >> thanks >> pradeep >> >> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote: >> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's >> > best >> > to share them across multiple threads. Of course, when you do that, >> > make >> > sure to make the ConnectionPool large enough to support all of the >> > threads >> > making queries concurrently. I'm also not sure if you're just omitting >> > this, but pycassa's ConnectionPool will only open connections to servers >> > you >> > explicitly include in server_list; there's no autodiscovery of other >> > nodes >> > going on. >> > >> > Depending on your network latency, you'll top out on python performance >> > with >> > a fairly low number of threads due to the GIL. It's best to use >> > multiple >> > processes if you really want to benchmark something. >> > >> > >> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha >> > <pradeep...@gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> Thanks. I would like to benchmark cassandra with our application so >> >> that we understand the details of how the actual benchmarking is done. >> >> Not sure, how easy it would be to integrate YCSB with our application. >> >> >> >> So, i am trying different client interfaces to cassandra. >> >> >> >> I found >> >> >> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32 >> >> threads ( each querying X number of queries ). >> >> >> >> cassandra-cli took 133 seconds >> >> pycassa took 521 seconds. >> >> >> >> Here is the python pycassa code used to query and passed to each >> >> thread.... >> >> >> >> def start_cassandra_client(Threadname): >> >> pool = pycassa.ConnectionPool('Blast', >> >> server_list=['xxx.xx.xx.xx']) >> >> cf = pycassa.ColumnFamily(pool, 'Blast_NR') >> >> inp_file=open("pycassa_100%_query") >> >> for key in inp_file: >> >> key=key.strip() >> >> cf.get(key) >> >> >> >> Does Java clients like Hector/Astynax help here.. I am more >> >> comfortable with Python than Java and our existing application is also >> >> in Python. >> >> >> >> thanks >> >> pradeep >> >> >> >> >> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo >> >> <edlinuxg...@gmail.com> >> >> wrote: >> >> > Wow you managed to do a load test through the cassandra-cli. There >> >> > should be >> >> > a merit badge for that. >> >> > >> >> > You should use the built in stress tool or YCSB. >> >> > >> >> > The CLI has to do much more string conversion then a normal client >> >> > would >> >> > and >> >> > it is not built for performance. You will definitely get better >> >> > numbers >> >> > through other means. >> >> > >> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha >> >> > <pradeep...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> I am trying to maximize execution of the number of read >> >> >> queries/second. >> >> >> >> >> >> Here is my cluster configuration. >> >> >> >> >> >> Replication - Default >> >> >> 12 Data Nodes. >> >> >> 16 Client Nodes - used for querying. >> >> >> >> >> >> Each client node executes 32 threads - each thread executes 76896 >> >> >> read >> >> >> queries using cassandra-cli tool. >> >> >> i.e all the read queries are stored in a file and that file >> >> >> is >> >> >> given to cassandra-cli tool ( using -f option ) which is executed by >> >> >> a >> >> >> thread. >> >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896. >> >> >> >> >> >> The read queries on each client node submitted at the same time. The >> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds - >> >> >> which is nearly 53k transactions/second. >> >> >> >> >> >> I would like to know if there is any other way/tool through which I >> >> >> can improve the number of transactions/second. >> >> >> Is the performance affected by cassandra-cli tool? >> >> >> >> >> >> thanks >> >> >> pradeep >> >> > >> >> > >> > >> > >> > >> > >> > -- >> > Tyler Hobbs >> > DataStax > > > > > -- > Tyler Hobbs > DataStax