Re: Sorting keys for batch reads to minimize seeks

Artur Kronenberg Tue, 22 Oct 2013 02:17:19 -0700

Hi,

we did some testing and found that doing range queries is much quickerthen querying data regularly. I am guessing that a range query requestis going to seek much more efficiently on disk.

This is where the idea of sorting our tokens comes in. We have a batchrequest of say 1000 items and instead of doing a multiget from cassandrawhich involves a lot of random I/O seeks, we would like to have a way toseek for the range. It doesn't actually matter if the range is slightlybiggern then the amount of items we would like to retrieve as the timewe loose filtering unneeded items in code is quicker then doing amultiget for 1000 items in the first place.

Is there a way for basing token ranges somewhat on a certain value inour schema? Say every row has a value A and B. While A is just a randomidentifier and we can't really rely on what this will be, all ourqueries operate on a way that B is going to be the same value for allitems in the query. If we had the token range being random however withregards that the random values are generated based on the B value andtherefore all items with B are close together in range and thereforeoptimized for range queries rather then gets, that could possibly speedup read performance significantly.


Thanks!

Artur

On 21/10/13 16:58, Edward Capriolo wrote:

I am not sure what you are working on will have an effect. You can notactually control the way the operating system seeks data on disk. Theio scheduling is done outside cassandra. You can try to write the codein an optimistic way taking phyical hardware into account, but thenyou have to consider there are n concurrent requests on the io system.
On Friday, October 18, 2013, Viktor Jevdokimov<viktor.jevdoki...@adform.com <mailto:viktor.jevdoki...@adform.com>>wrote:
> Read latency depends on many factors, don't forget "physics".
> If it meets your requirements, it is good.
>
>
> -----Original Message-----
> From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com<mailto:artur.kronenb...@openmarket.com>]
> Sent: Friday, October 18, 2013 1:03 PM
> To: user@cassandra.apache.org <mailto:user@cassandra.apache.org>
> Subject: Re: Sorting keys for batch reads to minimize seeks
>
> Hi,
>
> Thanks for your reply. Our latency currently is 23.618ms. However Isimply read that off one node just now while it wasn't under a loadtest. I am going to be able to get a better number after the next testrun.
>
> What is a good value for read latency?
>
>
> On 18/10/13 08:31, Viktor Jevdokimov wrote:
>> The only thing you may win - avoid unnecessary network hops if:
>> - request sorted keys (by token) from appropriate replica withConsistencyLevel.ONE and "dynamic_snitch: false".
>> - nodes has the same load
>> - replica not doing GC, and GC pauses are much higher thaninternode communication.
>>
>> For multiple keys request C* will do multiple single key reads,except for range scan requests, where only starting key and batch sizeis used in request.
>>
>> Consider multiple key request as a slow request by design, try tomodel your data for low latency single key requests.
>>
>> So, what latencies do you want to achieve?
>>
>>
>>
>> Best regards / Pagarbiai
>>
>> Viktor Jevdokimov
>> Senior Developer
>>
>> Email: viktor.jevdoki...@adform.com<mailto:viktor.jevdoki...@adform.com>
>> Phone: +370 5 212 3063
>> Fax: +370 5 261 0453
>>
>> J. Jasinskio 16C,
>> LT-03163 Vilnius,
>> Lithuania
>>
>>
>>
>> Disclaimer: The information contained in this message and attachments
>> is intended solely for the attention and use of the named addressee
>> and may be confidential. If you are not the intended recipient, you
>> are reminded that the information remains the property of the sender.
>> You must not use, disclose, distribute, copy, print or rely on this
>> e-mail. If you have received this message in error, please contact the
>> sender immediately and irrevocably delete this message and any
>> copies.-----Original Message-----
>> From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com<mailto:artur.kronenb...@openmarket.com>]
>> Sent: Thursday, October 17, 2013 7:40 PM
>> To: user@cassandra.apache.org <mailto:user@cassandra.apache.org>
>> Subject: Sorting keys for batch reads to minimize seeks
>>
>> Hi,
>>
>> I am looking to somehow increase read performance on cassandra. Weare still playing with configurations but I was thinking if therewould be solutions in software that might help us speed up our readperformance.
>>
>> E.g. one idea, not sure how sane that is, was to sort read-batchesby row-keys before submitting them to cassandra. The idea is thatrow-keys should be closer together on the physical disk and thereforthis may minimize the amount of random seeks we have to do whenquerying say 1000 entries from cassandra. Does that make any sense?
>>
>> Is there anything else that we can do in software to improveperformance? Like specific batch sizes for reads? We are using theastyanax library to access cassandra.
>>
>> Thanks!
>>
>>
>
>

Re: Sorting keys for batch reads to minimize seeks

Reply via email to