Re: Sorting keys for batch reads to minimize seeks

Edward Capriolo Mon, 21 Oct 2013 09:00:24 -0700

I am not sure what you are working on will have an effect. You can not
actually control the way the operating system seeks data on disk. The io
scheduling is done outside cassandra. You can try to write the code in an
optimistic way taking phyical hardware into account, but then you have to
consider there are n concurrent requests on the io system.


On Friday, October 18, 2013, Viktor Jevdokimov <[email protected]>
wrote:
> Read latency depends on many factors, don't forget "physics".
> If it meets your requirements, it is good.
>
>
> -----Original Message-----
> From: Artur Kronenberg [mailto:[email protected]]
> Sent: Friday, October 18, 2013 1:03 PM
> To: [email protected]
> Subject: Re: Sorting keys for batch reads to minimize seeks
>
> Hi,
>
> Thanks for your reply. Our latency currently is 23.618ms. However I
simply read that off one node just now while it wasn't under a load test. I
am going to be able to get a better number after the next test run.
>
> What is a good value for read latency?
>
>
> On 18/10/13 08:31, Viktor Jevdokimov wrote:
>> The only thing you may win - avoid unnecessary network hops if:
>> - request sorted keys (by token) from appropriate replica with
ConsistencyLevel.ONE and "dynamic_snitch: false".
>> - nodes has the same load
>> - replica not doing GC, and GC pauses are much higher than internode
communication.
>>
>> For multiple keys request C* will do multiple single key reads, except
for range scan requests, where only starting key and batch size is used in
request.
>>
>> Consider multiple key request as a slow request by design, try to model
your data for low latency single key requests.
>>
>> So, what latencies do you want to achieve?
>>
>>
>>
>> Best regards / Pagarbiai
>>
>> Viktor Jevdokimov
>> Senior Developer
>>
>> Email: [email protected]
>> Phone: +370 5 212 3063
>> Fax: +370 5 261 0453
>>
>> J. Jasinskio 16C,
>> LT-03163 Vilnius,
>> Lithuania
>>
>>
>>
>> Disclaimer: The information contained in this message and attachments
>> is intended solely for the attention and use of the named addressee
>> and may be confidential. If you are not the intended recipient, you
>> are reminded that the information remains the property of the sender.
>> You must not use, disclose, distribute, copy, print or rely on this
>> e-mail. If you have received this message in error, please contact the
>> sender immediately and irrevocably delete this message and any
>> copies.-----Original Message-----
>> From: Artur Kronenberg [mailto:[email protected]]
>> Sent: Thursday, October 17, 2013 7:40 PM
>> To: [email protected]
>> Subject: Sorting keys for batch reads to minimize seeks
>>
>> Hi,
>>
>> I am looking to somehow increase read performance on cassandra. We are
still playing with configurations but I was thinking if there would be
solutions in software that might help us speed up our read performance.
>>
>> E.g. one idea, not sure how sane that is, was to sort read-batches by
row-keys before submitting them to cassandra. The idea is that row-keys
should be closer together on the physical disk and therefor this may
minimize the amount of random seeks we have to do when querying say 1000
entries from cassandra. Does that make any sense?
>>
>> Is there anything else that we can do in software to improve
performance? Like specific batch sizes for reads? We are using the astyanax
library to access cassandra.
>>
>> Thanks!
>>
>>
>
>

Re: Sorting keys for batch reads to minimize seeks

Reply via email to