Hi Javier, Glad to hear it is solved now. Cassandra 3.11.1 should be a more stable version and 3.11 a better series.
Excuse my misunderstanding, your table seems to be better designed than thought. Welcome to the Apache Cassandra community! C*heers ;-) ----------------------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-02-19 9:31 GMT+00:00 Javier Pareja <pareja.jav...@gmail.com>: > Hi, > > Thank you for your reply. > > As I was bothered by this problem, last night I upgraded the cluster to > version 3.11.1 and everything is working now. As far as I can tell the > counter table can be read now. I will be doing more testing today with this > version but it is looking good. > > To answer your questions: > - I might not have explained the table definition very well but the table > does not have 6 partitions, but 6 partition keys. There are thousands of > partitions in that table, a combination of all those partition keys. I also > made sure that the partitions remained small when designing the table. > - I also enabled tracing in the CQLSH but it showed nothing when querying > this row. It however did when querying other tables... > > Thanks again for your reply!! I am very excited to be part of the > Cassandra user base. > > Javier > > > > F Javier Pareja > > On Mon, Feb 19, 2018 at 8:08 AM, Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > >> >> Hello, >> >> This table has 6 partition keys, 4 primary keys and 5 counters. >> >> >> I think the root issue is this ^. There might be some inefficiency or >> issues with counter, but this design, makes Cassandra relatively >> inefficient in most cases and using standard columns or counters >> indifferently. >> >> Cassandra data is supposed to be well distributed for a maximal >> efficiency. With only 6 partitions, if you have 6+ nodes, there is 100% >> chances that the load is fairly imbalanced. If you have less nodes, it's >> still probably poorly balanced. Also reading from a small number of >> sstables and in parallel within many nodes ideally to split the work and >> make queries efficient, but in this case cassandra is reading huge >> partitions from one node most probably. When the size of the request is too >> big it can timeout. I am not sure how pagination works with counters, but I >> believe even if pagination is working, at some point, you are just reading >> too much (or too inefficiently) and the timeout is reached. >> >> I imagined it worked well for a while as counters are very small columns >> / tables compared to any event data but at some point you might have >> reached 'physical' limit, because you are pulling *all* the information >> you need from one partition (and probably many SSTables) >> >> Is there really no other way to design this use case? >> >> When data starts to be inserted, I can query the counters correctly from >>> that particular row but after a few minutes updating the table with >>> thousands of events, I get a read timeout every time >>> >> >> Troubleshot: >> - Use tracing to understand what takes so long with your queries >> - Check for warns / error in the logs. Cassandra use to complain when it >> is unhappy with the configurations. There a lot of interesting and it's >> been a while I last had a failure with no relevant informations in the logs. >> - Check SSTable per read and other read performances for this counter >> table. Using some monitoring could make the reason of this timeout obvious. >> If you use Datadog for example, I guess that a quick look at the "Read >> Path" Dashboard would help. If you are using any other tool, look for >> SSTable per reads, Tombstone scanned (if any), keycache hitrate, resources >> (as maybe fast insert rate compactions and implicit 'read-before-writes' >> are making the machine less responsive. >> >> Fix: >> - Improve design to improve the findings you made above ^ >> - Improve compaction strategy or read operations depending on the >> findings above ^ >> >> I am not saying there is no bug in counters and in your version, but I >> would say it is to early to state this, given the data model, some other >> reasons could explain this slowness. >> >> If you don't have any monitoring in place, tracing and logs are a nice >> place to start digging. If you want to share those here, we can help >> interpreting outputs you will share if needed :). >> >> C*heers, >> >> Alain >> >> >> 2018-02-17 11:40 GMT+00:00 Javier Pareja <pareja.jav...@gmail.com>: >> >>> Hello everyone, >>> >>> I get a timeout error when reading a particular row from a large >>> counters table. >>> >>> I have a storm topology that inserts data into a Cassandra counter >>> table. This table has 6 partition keys, 4 primary keys and 5 counters. >>> >>> When data starts to be inserted, I can query the counters correctly from >>> that particular row but after a few minutes updating the table with >>> thousands of events, I get a readtimeout every time I try to read a >>> particular row from the table (the most frequently updated). Other rows I >>> can read quick and fine. Also if I run "select *", the top few hundreds are >>> returned quick and fine as expected. The storm topology is stopped but the >>> error is still there. >>> >>> I am using Cassandra 3.6. >>> >>> More information here: >>> https://stackoverflow.com/q/48833146 >>> >>> Are counters in this version broken? I run the query from CQLSH and get >>> the same error every time. I tried running it with trace enabled and get >>> nothing but the error: >>> >>> ReadTimeout: Error from server: code=1200 [Coordinator node timed out >>> waiting for replica nodes' responses] message="Operation timed out - >>> received only 0 responses." info={'received_responses': 0, >>> 'required_responses': 1, 'consistency': 'ONE'} >>> >>> >>> Any ideas? >>> >> >> >