It does not have a c extension as far as I know -----Original Message----- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, October 27, 2010 5:01 PM To: user Subject: Re: cassandra + avro | python client vs java client
Does Avro have a Python C extension yet? If not, 10x is right in line with how much faster I would expect Java to be than pure Python. On Wed, Oct 27, 2010 at 11:59 AM, Koert Kuipers <koert.kuip...@diamondnotch.com> wrote: > Hey all, > > I have Cassandra 0.7 (nightly build from halfway September) running on one > test machine with the avro interface. The node holds about 16mm values > across 10k keys. > > As a simple test I ran 2 test queries from a client, one query where I ask > for all columns for 100 keys and one query where I ask all columns for one > key (which I know to have a lot of columns). I am not using any buffering > for columns. I ran the tests multiple times to make sure file caching on > server wouldn't mess up the comparison. > > > > Using a java client the results are: > > *** test1 *** > > running test get_range_slices > > 2.672 seconds. > > 100 keys > > 81849 total columns > > *** test2 *** > > running test multiget_slice > > 1.0 seconds. > > 1 keys > > 36626 total columns > > > > That's pretty impressive to me. I also later confirmed that with multiple > nodes the query across multiple keys is much faster. Also using a clientpool > would probably speed it up more too. > > > > Then I ran a python client. The results are: > > *** test1 *** > > client:rpc get_range_slices > > client:rpc call took 30.6 seconds > > 100 keys > > 81849 total columns > > *** test2 *** > > client:rpc multiget_slice > > client:rpc call took 13.9 seconds > > 1 keys > > 36626 total columns > > > > So the python client took 11.4 times as long with the first query and 13.9 > times as long with the second query. That is a big difference! I suspect the > avro deserialization is causing the slowdown (since the rpc call consists of > contacting the server, retrieving results and deserializing results). Has > anyone seen a similar performance difference? This would mean that for a > production system python avro is not acceptable to me at the moment.... > > > > Both client use only the avro library. > > > > Best, Koert -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com