Re: Cassandra Performance Benchmarking.

Pradeep Kumar Mantha Fri, 18 Jan 2013 10:45:40 -0800

Hi,

Thanks Tyler.


Below is the *global* connection pool I am trying to use, where the
server_list contains all the ips of 12 DataNodes I am using and
pool_size is the number of threads  and I just set to timeout to 60 to
avoid connection retry errors.

pool = pycassa.ConnectionPool('Blast',
server_list=server_list,pool_size=32,timeout=60)


It seems the performance is still stuck at 521 seconds.. which is 177
seconds for cassandra-cli.

Am I still missing something?

thanks
Pradeep



On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote:
> You just need to increase the ConnectionPool size to handle the number of
> threads you have using it concurrently.  Set the pool_size kwarg to at least
> the number of threads you're using.
>
>
> On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha <pradeep...@gmail.com>
> wrote:
>>
>> Thanks Tyler.
>>
>> I just moved the pool and cf which store the connection pool and CF
>> information to have global scope.
>>
>> Increased the server_list values from 1 to 4. ( i think i can increase
>> them max to 12 since I have 12 data nodes )
>>
>> when I created 8 threads  using python threading package , I see the
>> below error.
>>
>> Exception in thread Thread-3:
>> Traceback (most recent call last):
>>   File
>> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
>> line 530, in __bootstrap_inner
>>     self.run()
>>   File "my_cc.py", line 20, in run
>>     start_cassandra_client(self.name)
>>   File "my_cc.py", line 33, in start_cassandra_client
>>     cf.get(key)
>>   File
>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
>> line 652, in get
>>     read_consistency_level or self.read_consistency_level)
>>   File
>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>> line 553, in execute
>>     conn = self.get()
>>   File
>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>> line 536, in get
>>     raise NoConnectionAvailable(message)
>> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
>> to obtain connection after 30 seconds
>>
>>
>> Please have a look at the script attached.. and let me know if I need
>> to change something.. Please bear with me, if I do something terribly
>> wrong..
>>
>> I am running the script on a 8 processor node.
>>
>> thanks
>> pradeep
>>
>> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's
>> > best
>> > to share them across multiple threads.  Of course, when you do that,
>> > make
>> > sure to make the ConnectionPool large enough to support all of the
>> > threads
>> > making queries concurrently.  I'm also not sure if you're just omitting
>> > this, but pycassa's ConnectionPool will only open connections to servers
>> > you
>> > explicitly include in server_list; there's no autodiscovery of other
>> > nodes
>> > going on.
>> >
>> > Depending on your network latency, you'll top out on python performance
>> > with
>> > a fairly low number of threads due to the GIL.  It's best to use
>> > multiple
>> > processes if you really want to benchmark something.
>> >
>> >
>> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha
>> > <pradeep...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks. I would like to benchmark cassandra with our application so
>> >> that we understand the details of how the actual benchmarking is done.
>> >> Not sure, how easy it would be to integrate YCSB with our application.
>> >>
>> >> So, i am trying different client interfaces to cassandra.
>> >>
>> >> I found
>> >>
>> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
>> >> threads ( each querying X number of queries ).
>> >>
>> >> cassandra-cli     took 133 seconds
>> >> pycassa took 521 seconds.
>> >>
>> >> Here is the python pycassa code used to query and passed to each
>> >> thread....
>> >>
>> >> def start_cassandra_client(Threadname):
>> >>         pool = pycassa.ConnectionPool('Blast',
>> >> server_list=['xxx.xx.xx.xx'])
>> >>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
>> >>         inp_file=open("pycassa_100%_query")
>> >>         for key in inp_file:
>> >>                 key=key.strip()
>> >>                 cf.get(key)
>> >>
>> >> Does Java clients like Hector/Astynax help here.. I am more
>> >> comfortable with Python than Java and our existing application is also
>> >> in Python.
>> >>
>> >> thanks
>> >> pradeep
>> >>
>> >>
>> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo
>> >> <edlinuxg...@gmail.com>
>> >> wrote:
>> >> > Wow you managed to do a load test through the cassandra-cli. There
>> >> > should be
>> >> > a merit badge for that.
>> >> >
>> >> > You should use the built in stress tool or YCSB.
>> >> >
>> >> > The CLI has to do much more string conversion then a normal client
>> >> > would
>> >> > and
>> >> > it is not built for performance. You will definitely get better
>> >> > numbers
>> >> > through other means.
>> >> >
>> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
>> >> > <pradeep...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I am trying to maximize execution of the number of read
>> >> >> queries/second.
>> >> >>
>> >> >> Here is my cluster configuration.
>> >> >>
>> >> >> Replication - Default
>> >> >> 12 Data Nodes.
>> >> >> 16 Client Nodes - used for querying.
>> >> >>
>> >> >> Each client node executes 32 threads - each thread executes 76896
>> >> >> read
>> >> >> queries using  cassandra-cli tool.
>> >> >>        i.e all the read queries are stored in a file and that file
>> >> >> is
>> >> >> given to cassandra-cli tool ( using -f option ) which is executed by
>> >> >> a
>> >> >> thread.
>> >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
>> >> >>
>> >> >> The read queries on each client node submitted at the same time. The
>> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
>> >> >> which is nearly 53k transactions/second.
>> >> >>
>> >> >> I would like to know if there is any other way/tool through which I
>> >> >> can improve the number of transactions/second.
>> >> >> Is the performance affected by cassandra-cli tool?
>> >> >>
>> >> >> thanks
>> >> >> pradeep
>> >> >
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Tyler Hobbs
>> > DataStax
>
>
>
>
> --
> Tyler Hobbs
> DataStax

Re: Cassandra Performance Benchmarking.

Reply via email to