Re: Cassandra Performance Benchmarking.

Tyler Hobbs Fri, 18 Jan 2013 07:12:50 -0800

You just need to increase the ConnectionPool size to handle the number of
threads you have using it concurrently.  Set the pool_size kwarg to at
least the number of threads you're using.



On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha
<pradeep...@gmail.com>wrote:

> Thanks Tyler.
>
> I just moved the pool and cf which store the connection pool and CF
> information to have global scope.
>
> Increased the server_list values from 1 to 4. ( i think i can increase
> them max to 12 since I have 12 data nodes )
>
> when I created 8 threads  using python threading package , I see the
> below error.
>
> Exception in thread Thread-3:
> Traceback (most recent call last):
>   File
> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
> line 530, in __bootstrap_inner
>     self.run()
>   File "my_cc.py", line 20, in run
>     start_cassandra_client(self.name)
>   File "my_cc.py", line 33, in start_cassandra_client
>     cf.get(key)
>   File
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
> line 652, in get
>     read_consistency_level or self.read_consistency_level)
>   File
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> line 553, in execute
>     conn = self.get()
>   File
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> line 536, in get
>     raise NoConnectionAvailable(message)
> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
> to obtain connection after 30 seconds
>
>
> Please have a look at the script attached.. and let me know if I need
> to change something.. Please bear with me, if I do something terribly
> wrong..
>
> I am running the script on a 8 processor node.
>
> thanks
> pradeep
>
> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's
> best
> > to share them across multiple threads.  Of course, when you do that, make
> > sure to make the ConnectionPool large enough to support all of the
> threads
> > making queries concurrently.  I'm also not sure if you're just omitting
> > this, but pycassa's ConnectionPool will only open connections to servers
> you
> > explicitly include in server_list; there's no autodiscovery of other
> nodes
> > going on.
> >
> > Depending on your network latency, you'll top out on python performance
> with
> > a fairly low number of threads due to the GIL.  It's best to use multiple
> > processes if you really want to benchmark something.
> >
> >
> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha <
> pradeep...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> Thanks. I would like to benchmark cassandra with our application so
> >> that we understand the details of how the actual benchmarking is done.
> >> Not sure, how easy it would be to integrate YCSB with our application.
> >>
> >> So, i am trying different client interfaces to cassandra.
> >>
> >> I found
> >>
> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
> >> threads ( each querying X number of queries ).
> >>
> >> cassandra-cli     took 133 seconds
> >> pycassa took 521 seconds.
> >>
> >> Here is the python pycassa code used to query and passed to each
> >> thread....
> >>
> >> def start_cassandra_client(Threadname):
> >>         pool = pycassa.ConnectionPool('Blast',
> >> server_list=['xxx.xx.xx.xx'])
> >>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
> >>         inp_file=open("pycassa_100%_query")
> >>         for key in inp_file:
> >>                 key=key.strip()
> >>                 cf.get(key)
> >>
> >> Does Java clients like Hector/Astynax help here.. I am more
> >> comfortable with Python than Java and our existing application is also
> >> in Python.
> >>
> >> thanks
> >> pradeep
> >>
> >>
> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <edlinuxg...@gmail.com
> >
> >> wrote:
> >> > Wow you managed to do a load test through the cassandra-cli. There
> >> > should be
> >> > a merit badge for that.
> >> >
> >> > You should use the built in stress tool or YCSB.
> >> >
> >> > The CLI has to do much more string conversion then a normal client
> would
> >> > and
> >> > it is not built for performance. You will definitely get better
> numbers
> >> > through other means.
> >> >
> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
> >> > <pradeep...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I am trying to maximize execution of the number of read
> queries/second.
> >> >>
> >> >> Here is my cluster configuration.
> >> >>
> >> >> Replication - Default
> >> >> 12 Data Nodes.
> >> >> 16 Client Nodes - used for querying.
> >> >>
> >> >> Each client node executes 32 threads - each thread executes 76896
> read
> >> >> queries using  cassandra-cli tool.
> >> >>        i.e all the read queries are stored in a file and that file is
> >> >> given to cassandra-cli tool ( using -f option ) which is executed by
> a
> >> >> thread.
> >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
> >> >>
> >> >> The read queries on each client node submitted at the same time. The
> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
> >> >> which is nearly 53k transactions/second.
> >> >>
> >> >> I would like to know if there is any other way/tool through which I
> >> >> can improve the number of transactions/second.
> >> >> Is the performance affected by cassandra-cli tool?
> >> >>
> >> >> thanks
> >> >> pradeep
> >> >
> >> >
> >
> >
> >
> >
> > --
> > Tyler Hobbs
> > DataStax
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra Performance Benchmarking.

Reply via email to