The fact that it's still exactly 521 seconds is very suspicious. I can't debug your script over the mailing list, but do some sanity checks to make sure there's not a bottleneck somewhere you don't expect.
On Fri, Jan 18, 2013 at 12:44 PM, Pradeep Kumar Mantha <pradeep...@gmail.com > wrote: > Hi, > > Thanks Tyler. > > Below is the *global* connection pool I am trying to use, where the > server_list contains all the ips of 12 DataNodes I am using and > pool_size is the number of threads and I just set to timeout to 60 to > avoid connection retry errors. > > pool = pycassa.ConnectionPool('Blast', > server_list=server_list,pool_size=32,timeout=60) > > > It seems the performance is still stuck at 521 seconds.. which is 177 > seconds for cassandra-cli. > > Am I still missing something? > > thanks > Pradeep > > > > On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote: > > You just need to increase the ConnectionPool size to handle the number of > > threads you have using it concurrently. Set the pool_size kwarg to at > least > > the number of threads you're using. > > > > > > On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha < > pradeep...@gmail.com> > > wrote: > >> > >> Thanks Tyler. > >> > >> I just moved the pool and cf which store the connection pool and CF > >> information to have global scope. > >> > >> Increased the server_list values from 1 to 4. ( i think i can increase > >> them max to 12 since I have 12 data nodes ) > >> > >> when I created 8 threads using python threading package , I see the > >> below error. > >> > >> Exception in thread Thread-3: > >> Traceback (most recent call last): > >> File > >> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py", > >> line 530, in __bootstrap_inner > >> self.run() > >> File "my_cc.py", line 20, in run > >> start_cassandra_client(self.name) > >> File "my_cc.py", line 33, in start_cassandra_client > >> cf.get(key) > >> File > >> > "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py", > >> line 652, in get > >> read_consistency_level or self.read_consistency_level) > >> File > >> > "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", > >> line 553, in execute > >> conn = self.get() > >> File > >> > "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py", > >> line 536, in get > >> raise NoConnectionAvailable(message) > >> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable > >> to obtain connection after 30 seconds > >> > >> > >> Please have a look at the script attached.. and let me know if I need > >> to change something.. Please bear with me, if I do something terribly > >> wrong.. > >> > >> I am running the script on a 8 processor node. > >> > >> thanks > >> pradeep > >> > >> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> > wrote: > >> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and > it's > >> > best > >> > to share them across multiple threads. Of course, when you do that, > >> > make > >> > sure to make the ConnectionPool large enough to support all of the > >> > threads > >> > making queries concurrently. I'm also not sure if you're just > omitting > >> > this, but pycassa's ConnectionPool will only open connections to > servers > >> > you > >> > explicitly include in server_list; there's no autodiscovery of other > >> > nodes > >> > going on. > >> > > >> > Depending on your network latency, you'll top out on python > performance > >> > with > >> > a fairly low number of threads due to the GIL. It's best to use > >> > multiple > >> > processes if you really want to benchmark something. > >> > > >> > > >> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha > >> > <pradeep...@gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> Thanks. I would like to benchmark cassandra with our application so > >> >> that we understand the details of how the actual benchmarking is > done. > >> >> Not sure, how easy it would be to integrate YCSB with our > application. > >> >> > >> >> So, i am trying different client interfaces to cassandra. > >> >> > >> >> I found > >> >> > >> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32 > >> >> threads ( each querying X number of queries ). > >> >> > >> >> cassandra-cli took 133 seconds > >> >> pycassa took 521 seconds. > >> >> > >> >> Here is the python pycassa code used to query and passed to each > >> >> thread.... > >> >> > >> >> def start_cassandra_client(Threadname): > >> >> pool = pycassa.ConnectionPool('Blast', > >> >> server_list=['xxx.xx.xx.xx']) > >> >> cf = pycassa.ColumnFamily(pool, 'Blast_NR') > >> >> inp_file=open("pycassa_100%_query") > >> >> for key in inp_file: > >> >> key=key.strip() > >> >> cf.get(key) > >> >> > >> >> Does Java clients like Hector/Astynax help here.. I am more > >> >> comfortable with Python than Java and our existing application is > also > >> >> in Python. > >> >> > >> >> thanks > >> >> pradeep > >> >> > >> >> > >> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo > >> >> <edlinuxg...@gmail.com> > >> >> wrote: > >> >> > Wow you managed to do a load test through the cassandra-cli. There > >> >> > should be > >> >> > a merit badge for that. > >> >> > > >> >> > You should use the built in stress tool or YCSB. > >> >> > > >> >> > The CLI has to do much more string conversion then a normal client > >> >> > would > >> >> > and > >> >> > it is not built for performance. You will definitely get better > >> >> > numbers > >> >> > through other means. > >> >> > > >> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha > >> >> > <pradeep...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> I am trying to maximize execution of the number of read > >> >> >> queries/second. > >> >> >> > >> >> >> Here is my cluster configuration. > >> >> >> > >> >> >> Replication - Default > >> >> >> 12 Data Nodes. > >> >> >> 16 Client Nodes - used for querying. > >> >> >> > >> >> >> Each client node executes 32 threads - each thread executes 76896 > >> >> >> read > >> >> >> queries using cassandra-cli tool. > >> >> >> i.e all the read queries are stored in a file and that file > >> >> >> is > >> >> >> given to cassandra-cli tool ( using -f option ) which is executed > by > >> >> >> a > >> >> >> thread. > >> >> >> so, total number of queries for 16 client Nodes is 16 * 32 * > 76896. > >> >> >> > >> >> >> The read queries on each client node submitted at the same time. > The > >> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds > - > >> >> >> which is nearly 53k transactions/second. > >> >> >> > >> >> >> I would like to know if there is any other way/tool through which > I > >> >> >> can improve the number of transactions/second. > >> >> >> Is the performance affected by cassandra-cli tool? > >> >> >> > >> >> >> thanks > >> >> >> pradeep > >> >> > > >> >> > > >> > > >> > > >> > > >> > > >> > -- > >> > Tyler Hobbs > >> > DataStax > > > > > > > > > > -- > > Tyler Hobbs > > DataStax > -- Tyler Hobbs DataStax <http://datastax.com/>