Re: Read Latency

Chris Goffinet Wed, 20 Oct 2010 12:12:11 -0700

If you are using Python, and raw Thrift, use the following:

protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)


The serialization/deserialization is done directly in C.

On Wed, Oct 20, 2010 at 11:53 AM, Wayne <wav...@gmail.com> wrote:

> We did some testing and the object is 23megs that is taking more than 3
> seconds for thrift to return as a python object. We also tested pickling
> this object to/from a string and to pickle it takes 1.5s and to convert the
> pickled string to a python object takes .75s. Added together they still take
> less than the 3 seconds Thrift is taking to create a python object. I think
> our 1s before also was an actual deep copy.
>
> We are definitely going to a streaming model and getting small batches of
> data at a time per the recommendation. The bigger concern of why thrift
> takes more time than Cassandra itself though is still out there. Thrift is
> taking too much time to convert to a python object and there is no
> explanation we can find why it takes so long. We have also tested with
> smaller and larger data requests and they all seem to have the same math -
> thrift takes a little more time to convert than Cassandra itself takes to
> respond. Is this specific to Python accessing thrift? Would it be faster to
> get data into C and we write our own python wrapper around C?
>
>
>
> On Tue, Oct 19, 2010 at 7:16 PM, Aaron Morton <aa...@thelastpickle.com>wrote:
>
>> Not sure how pycassa does it, but it a simple case of...
>>
>> - get_slice with start="", finish="" and count = 100,001
>> - pop the last column and store it's name
>> - get_slice with start as the last column name, finish="" and count =
>> 100,001
>>
>> repeat.
>>
>> A
>>
>> On 20 Oct, 2010,at 03:08 PM, Wayne <wav...@gmail.com> wrote:
>>
>> Thanks for all of the feedback. I may not very well be doing a deep copy,
>> so my numbers might not be accurate. I will test with writing to/from the
>> disk to verify how long native python takes. I will also check how large the
>> data is coming from cassandra is in size for comparison.
>>
>> Our high expectations are based on actual MySQL time which is in the range
>> of 3-4 seconds for the exact same data.
>>
>> I will also try to work with getting the data in batches. Not as easy of
>> course in Cassandra, which is probably why we have not tried that yet.
>>
>> Thanks for all of the feedback!
>>
>>
>> On Tue, Oct 19, 2010 at 8:51 PM, Aaron Morton <aa...@thelastpickle.com>wrote:
>>
>>> Hard to say why your code performs that way, it may not be creating as
>>> many objects for example strings may not be re-created just referenced. Are
>>> your creating new objects for every column returned?
>>>
>>> Bring 600,000 to 10M columns back at once is always going to take time. I
>>> think any python database client would take a while to create objects for
>>> 600,000 rows. Do you have an example of pulling 600,000 rows through MySQL
>>> into python to compare against?
>>>
>>> Is it possible to break up the get_slice into chunks of 10,000 or
>>> 100,000? IMHO you will get more consistent performance if you bound the
>>> requests, so you have an idea of the upper level of latency for each request
>>> and create a more consistent memory footprint.
>>>
>>> For example in the rough test below, 100,000 objects takes 0.75 secs but
>>> 600,000 takes 13.
>>>
>>> As an example of reprocessing the results, i called go2 with the output
>>> of go below.
>>>
>>> def go2(buffer):
>>>     start = timetime()
>>>     buffer2 = [
>>>         {"name" : csc.column.name <http://csccolumn.name>, "value" :
>>> csc.column.value}
>>>         for csc in buffer
>>>     ]
>>>     print "Done2 in %s" % (time.time() -start)
>>>
>>> {977} > python decode_test.py 100000
>>> Done in 0.75460100174
>>> Done2 in 0.314303874969
>>>
>>>  {978} > python decode_test.py 600000
>>> Done in 13.2945489883
>>> Done2 in 7.32861185074
>>>
>>> My general advice is to pull back less data in a single request.
>>>
>>> Aaron
>>>
>>>
>>> On 20 Oct, 2010,at 11:30 AM, Wayne <wav...@gmail.com> wrote:
>>>
>>>
>>> I am not sure how many bytes, but we do convert the cassandra object that
>>> is returned in 3s into a dictionary in ~1s and then again into a custom
>>> python object in about ~1.5s. Expectations are based on this timing. If we
>>> can convert what thrift returns into a completely new python object in 1s
>>> why does thrift need 3s to give it to us?
>>>
>>> To us it is like the MySQL client we use in python. It is really C
>>> wrapped in python and adds almost zero overhead to the time it takes mysql
>>> to return the data. That is the expectation we have and the performance we
>>> are looking to get to. Disk I/O + 20%.
>>>
>>> We are returning one big row and this is not our normal use case but a
>>> requirement for us to use Cassandra. We need to get all data for a specific
>>> value, as this is a secondary index. It is like getting all users in the
>>> state of CA. CA is the key and there is a column for every user id. We are
>>> testing with 600,000 but this will grow to 10+ million in the future.
>>>
>>> We can not test .7 as we are only using .6.6. We are trying to evaluate
>>> Cassandra and stability is one concern so .7 is definitely not for us at
>>> this point.
>>>
>>> Thanks.
>>>
>>>
>>> On Tue, Oct 19, 2010 at 4:27 PM, Aaron Morton 
>>> <aa...@thelastpickle.com>wrote:
>>>
>>>>
>>>>  Just wondering how many bytes you are returning to the client to get an
>>>> idea of how slow it is.
>>>>
>>>> The call to fastbinary is decoding the wireformat and creating the
>>>> Python objects. When you ask for 600,000 columns your are creating a lot of
>>>> python objects. Each column will be a ColumnOrSuperColumn, wrapping a
>>>> Column, which has probably 2 Strings. So 2.4 million python objects.
>>>>
>>>> Here's  my rough test script.
>>>>
>>>> def go(count):
>>>>     start = time.time()
>>>>     buffer = [
>>>>         ttypesColumnOrSuperColumn(column=ttypes.Column(
>>>>              "column_name_%s" % i, "row_size of something something",
>>>> 0, 0))
>>>>         for i in range(count)
>>>>     ]
>>>>     print "Done in %s" % (time.time() - start)
>>>>
>>>> On my machine that takes 13 seconds for 600,000 and 0.04 for 10,000. The
>>>> fastbinary module is running a lot faster because it's all in c.  It's not 
>>>> a
>>>> great test but I think it gives an idea of what you are asking for
>>>>
>>>> I think there is an element of python been slower than other languages.
>>>> But IMHO you are asking for a lot of data. Can you ask for less data?
>>>>
>>>> Out of interest are you able to try the avro client? It's still
>>>> experimental (0.7 only) but may give you something to compare it against.
>>>>
>>>> Aaron
>>>>
>>>> On 20 Oct, 2010,at 07:23 AM, Wayne <wav...@gmail.com> wrote:
>>>>
>>>>
>>>> It is an entire row which is 600,000 cols. We pass a limit of 10million
>>>> to make sure we get it all. Our issue is that it seems Thrift itself has
>>>> more overhead/latency added to a read that Cassandra takes itself to do the
>>>> read. If cfstats for the slowest node reports 2.25s to us it is not
>>>> acceptable that the data comes back to the client in 5.5s. After working
>>>> with Jonathon we have optimized Cassandra itself to return the quorum read
>>>> in 2.7s but we still have 3s getting lost in the thrift call
>>>> (fastbinary.decode_binary).
>>>>
>>>> We have seen this pattern totally hold for ms reads as well for a few
>>>> cols, but it is easier to look at things in seconds. If Cassandra can get
>>>> the data off of the disks in 2.25s we expect to have the data in a Python
>>>> object in under 3s. That is a totally realistic expectation from our
>>>> experience. All latency needs to be pushed down to disk random read latency
>>>> as that should always be what takes the longest. Everything else is passing
>>>> through memory.
>>>>
>>>>
>>>>
>>>> On Tue, Oct 19, 2010 at 2:06 PM, aaron morton 
>>>> <aa...@thelastpickle.com>wrote:
>>>>
>>>>>
>>>>> Wayne,
>>>>> I'm calling cassandra from Python and have not seen too many 3 second
>>>>> reads.
>>>>>
>>>>> Your last email with log messages in it looks like your are asking for
>>>>> 10,000,000 columns. How much data is this request actually transferring to
>>>>> the client? The column names suggest only a few.
>>>>>
>>>>> DEBUG [pool-1-thread-64] 2010-10-18 19:25:28,867 StorageProxy.java
>>>>> (line 471) strongread reading data for SliceFromReadCommand(table='table',
>>>>> key='key1', column_parent='QueryPath(columnFamilyName='fact',
>>>>> superColumnName='null', columnName='null')', start='503a', 
>>>>> finish='503a7c',
>>>>> reversed=false, count=10000000) from 698@/x.x.x.6
>>>>>
>>>>> Aaron
>>>>>
>>>>>
>>>>>
>>>>> On 20 Oct 2010, at 06:18, Jonathan Ellis wrote:
>>>>>
>>>>> > I would expect C++ or Java to be substantially faster than Python.
>>>>> > However, I note that Hector (and I believe Pelops) don't yet use the
>>>>> > newest, fastest Thrift library.
>>>>> >
>>>>> > On Tue, Oct 19, 2010 at 8:21 AM, Wayne <wav...@gmail.com> wrote:
>>>>> >> The changes seems to do the trick. We are down to about 1/2 of the
>>>>> original
>>>>> >> quorum read performance. I did not see any more errors.
>>>>> >>
>>>>> >> More than 3 seconds on the client side is still not acceptable to
>>>>> us. We
>>>>> >> need the data in Python, but would we be better off going through
>>>>> Java or
>>>>> >> something else to increase performance? All three seconds are taken
>>>>> up in
>>>>> >> Thrift itself (fastbinary.decode_binary(self, iprottrans,
>>>>> (self.__class__,
>>>>>
>>>>> >> self.thrift_spec))) so I am not sure what other options we have.
>>>>> >>
>>>>> >> Thanks for your help.
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Jonathan Ellis
>>>>> > Project Chair, Apache Cassandra
>>>>> > co-founder of Riptano, the source for professional Cassandra support
>>>>> > http://riptanocom <http://riptano.com>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Read Latency

Reply via email to