Then you should use Thrift from Python if you are concerned about speed. (I think the speed penalty there is only about 2x w/ the extension.)
On Wed, Oct 27, 2010 at 4:15 PM, Koert Kuipers <koert.kuip...@diamondnotch.com> wrote: > It does not have a c extension as far as I know > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Wednesday, October 27, 2010 5:01 PM > To: user > Subject: Re: cassandra + avro | python client vs java client > > Does Avro have a Python C extension yet? > > If not, 10x is right in line with how much faster I would expect Java > to be than pure Python. > > On Wed, Oct 27, 2010 at 11:59 AM, Koert Kuipers > <koert.kuip...@diamondnotch.com> wrote: >> Hey all, >> >> I have Cassandra 0.7 (nightly build from halfway September) running on one >> test machine with the avro interface. The node holds about 16mm values >> across 10k keys. >> >> As a simple test I ran 2 test queries from a client, one query where I ask >> for all columns for 100 keys and one query where I ask all columns for one >> key (which I know to have a lot of columns). I am not using any buffering >> for columns. I ran the tests multiple times to make sure file caching on >> server wouldn't mess up the comparison. >> >> >> >> Using a java client the results are: >> >> *** test1 *** >> >> running test get_range_slices >> >> 2.672 seconds. >> >> 100 keys >> >> 81849 total columns >> >> *** test2 *** >> >> running test multiget_slice >> >> 1.0 seconds. >> >> 1 keys >> >> 36626 total columns >> >> >> >> That's pretty impressive to me. I also later confirmed that with multiple >> nodes the query across multiple keys is much faster. Also using a clientpool >> would probably speed it up more too. >> >> >> >> Then I ran a python client. The results are: >> >> *** test1 *** >> >> client:rpc get_range_slices >> >> client:rpc call took 30.6 seconds >> >> 100 keys >> >> 81849 total columns >> >> *** test2 *** >> >> client:rpc multiget_slice >> >> client:rpc call took 13.9 seconds >> >> 1 keys >> >> 36626 total columns >> >> >> >> So the python client took 11.4 times as long with the first query and 13.9 >> times as long with the second query. That is a big difference! I suspect the >> avro deserialization is causing the slowdown (since the rpc call consists of >> contacting the server, retrieving results and deserializing results). Has >> anyone seen a similar performance difference? This would mean that for a >> production system python avro is not acceptable to me at the moment.... >> >> >> >> Both client use only the avro library. >> >> >> >> Best, Koert > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com