Re: Cassandra counter readtimeout error

Alain RODRIGUEZ Mon, 19 Feb 2018 08:43:53 -0800

Hi Javier,

Glad to hear it is solved now. Cassandra 3.11.1 should be a more stable
version and 3.11 a better series.


Excuse my misunderstanding, your table seems to be better designed than
thought.

Welcome to the Apache Cassandra community!

C*heers ;-)
-----------------------
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2018-02-19 9:31 GMT+00:00 Javier Pareja <pareja.jav...@gmail.com>:

> Hi,
>
> Thank you for your reply.
>
> As I was bothered by this problem, last night I upgraded the cluster to
> version 3.11.1 and everything is working now. As far as I can tell the
> counter table can be read now. I will be doing more testing today with this
> version but it is looking good.
>
> To answer your questions:
> - I might not have explained the table definition very well but the table
> does not have 6 partitions, but 6 partition keys. There are thousands of
> partitions in that table, a combination of all those partition keys. I also
> made sure that the partitions remained small when designing the table.
> - I also enabled tracing in the CQLSH but it showed nothing when querying
> this row. It however did when querying other tables...
>
> Thanks again for your reply!! I am very excited to be part of the
> Cassandra user base.
>
> Javier
>
>
>
> F Javier Pareja
>
> On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ <arodr...@gmail.com>
> wrote:
>
>>
>> Hello,
>>
>> This table has 6 partition keys, 4 primary keys and 5 counters.
>>
>>
>> I think the root issue is this ^. There might be some inefficiency or
>> issues with counter, but this design, makes Cassandra relatively
>> inefficient in most cases and using standard columns or counters
>> indifferently.
>>
>> Cassandra data is supposed to be well distributed for a maximal
>> efficiency. With only 6 partitions, if you have 6+ nodes, there is 100%
>> chances that the load is fairly imbalanced. If you have less nodes, it's
>> still probably poorly balanced. Also reading from a small number of
>> sstables and in parallel within many nodes ideally to split the work and
>> make queries efficient, but in this case cassandra is reading huge
>> partitions from one node most probably. When the size of the request is too
>> big it can timeout. I am not sure how pagination works with counters, but I
>> believe even if pagination is working, at some point, you are just reading
>> too much (or too inefficiently) and the timeout is reached.
>>
>> I imagined it worked well for a while as counters are very small columns
>> / tables compared to any event data but at some point you might have
>> reached 'physical' limit, because you are pulling *all* the information
>> you need from one partition (and probably many SSTables)
>>
>> Is there really no other way to design this use case?
>>
>> When data starts to be inserted, I can query the counters correctly from
>>> that particular row but after a few minutes updating the table with
>>> thousands of events, I get a read timeout every time
>>>
>>
>> Troubleshot:
>> - Use tracing to understand what takes so long with your queries
>> - Check for warns / error in the logs. Cassandra use to complain when it
>> is unhappy with the configurations. There a lot of interesting and it's
>> been a while I last had a failure with no relevant informations in the logs.
>> - Check SSTable per read and other read performances for this counter
>> table. Using some monitoring could make the reason of this timeout obvious.
>> If you use Datadog for example, I guess that a quick look at the "Read
>> Path" Dashboard would help. If you are using any other tool, look for
>> SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources
>> (as maybe fast insert rate compactions and implicit 'read-before-writes'
>> are making the machine less responsive.
>>
>> Fix:
>> - Improve design to improve the findings you made above ^
>> - Improve compaction strategy or read operations depending on the
>> findings above ^
>>
>> I am not saying there is no bug in counters and in your version, but I
>> would say it is to early to state this, given the data model, some other
>> reasons could explain this slowness.
>>
>> If you don't have any monitoring in place, tracing and logs are a nice
>> place to start digging. If you want to share those here, we can help
>> interpreting outputs you will share if needed :).
>>
>> C*heers,
>>
>> Alain
>>
>>
>> 2018-02-17 11:40 GMT+00:00 Javier Pareja <pareja.jav...@gmail.com>:
>>
>>> Hello everyone,
>>>
>>> I get a timeout error when reading a particular row from a large
>>> counters table.
>>>
>>> I have a storm topology that inserts data into a Cassandra counter
>>> table. This table has 6 partition keys, 4 primary keys and 5 counters.
>>>
>>> When data starts to be inserted, I can query the counters correctly from
>>> that particular row but after a few minutes updating the table with
>>> thousands of events, I get a readtimeout every time I try to read a
>>> particular row from the table (the most frequently updated). Other rows I
>>> can read quick and fine. Also if I run "select *", the top few hundreds are
>>> returned quick and fine as expected. The storm topology is stopped but the
>>> error is still there.
>>>
>>> I am using Cassandra 3.6.
>>>
>>> More information here:
>>> https://stackoverflow.com/q/48833146
>>>
>>> Are counters in this version broken? I run the query from CQLSH and get
>>> the same error every time. I tried running it with trace enabled and get
>>> nothing but the error:
>>>
>>> ReadTimeout: Error from server: code=1200 [Coordinator node timed out 
>>> waiting for replica nodes' responses] message="Operation timed out - 
>>> received only 0 responses." info={'received_responses': 0, 
>>> 'required_responses': 1, 'consistency': 'ONE'}
>>>
>>>
>>> Any ideas?
>>>
>>
>>
>

Re: Cassandra counter readtimeout error

Reply via email to