Does the last_update_date constraint filter out a lot of rows? In that case the server may be reading a large number of rows, only to throw them away since they get filtered out.

If you apply the filter on the client side, you shouldn't see timeouts (but overall the process will be slower since you have to transfer more data).


btw, from the logs it looks like the client is multi-threaded, there are different token ranges in the same time period.


On 01/26/2018 10:39 PM, Juan Manuel Alonso wrote:
Hi guys,

I'm having some trouble while using paged queries on Cassandra's Java driver (version 3.3.2). I'm using Cassandra 3.11.0.

I have to fetch a page of data from the DB, then make some trivial changes, and then update these rows.

A simplified version of the code i'm running would be:

                    Integer rowCounter = 0;
                    Statement selectQuery = QueryBuilder.select()...setFetchSize(pageSize)...;                     ResultSet result = cassandraSession.execute(selectQuery);
                    List<MyClass> mappedResults = new ArrayList<>();
                    for (Row row : result) {
                        rowCounter++;
                        mappedResults.add(map(row));
                        if (rowCounter % pageSize == 0) {

                            List<MyClass> resultsToUpdate = modifyData(mappedResults);                             for (MyClass resultToUpdate : resultsToUpdate){                                 Statement updateQuery = QueryBuilder.update(KEYSPACE, tableName)...;
cassandraSession.execute(query);
                            }
TimeUnit.SECONDS.sleep(sleepSeconds); //Sleep for a few seconds to let the DB... breathe
                        }

                    }

I'm using consistency level ONE on both select and update queries, the value of sleepSeconds is 5 and the pageSize is 47.

My problem is that I have to use very small page sizes, otherwise queries start to timeout on Cassandra.

There is only one thread running this long update process, but when i check Cassandra's debug.log, it looks like this:

...
DEBUG [ScheduledTasks:1] 2018-01-26 12:43:09,221 MonitoringTask.java:173 - 55 operations were slow in the last 5001 msecs: <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 1940709131428868672 AND token(id) <= 1976881771356013545 LIMIT 47>, time 1232 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -31240603717813337 AND token(id) <= 93066413544676618 LIMIT 47>, time 672 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -2746601914911102981 AND token(id) <= -2679503374406295369 LIMIT 47>, time 722 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 8697901506577253630 AND token(id) <= 8756251242481074941 LIMIT 47>, time 1737 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -2566441277217350674 AND token(id) <= -2410488306633473620 LIMIT 47>, time 997 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 5186947162422827855 AND token(id) <= 5251256039266177164 LIMIT 47>, time 1619 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 523415566358416448 AND token(id) <= 558165594730430519 LIMIT 47>, time 793 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -6313110054894254305 AND token(id) <= -6149701678898756666 LIMIT 47>, time 510 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 133117363640100699 AND token(id) <= 326755086351479456 LIMIT 47>, time 594 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -5773756298752768296 AND token(id) <= -5672224259310839216 LIMIT 47>, time 631 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 9138868762246577790 AND token(id) <= 9184809921750217730 LIMIT 47>, time 1680 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 1481347618188085389 AND token(id) <= 1529429375374220120 LIMIT 47>, time 1337 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -3179570044050246190 AND token(id) <= -2975237200717735765 LIMIT 47>, time 773 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -1992364373944487162 AND token(id) <= -1754930707218513982 LIMIT 47>, time 793 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 7461256765584395144 AND token(id) <= 7513523865647503158 LIMIT 47>, time 1569 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 2199511646454841639 AND token(id) <= 2235092311035533306 LIMIT 47>, time 1157 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 5981009014177068366 AND token(id) <= 6193847522724693984 LIMIT 47>, time 1549 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 6587824379305518475 AND token(id) <= 6941621185441223079 LIMIT 47>, time 1491 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -2888016351766682341 AND token(id) <= -2832466742668731344 LIMIT 47>, time 642 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 4599678137499867302 AND token(id) <= 4681791682494977137 LIMIT 47>, time 1222 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 4097891947569113599 AND token(id) <= 4205652216148641874 LIMIT 47>, time 1421 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 1313069522772322867 AND token(id) <= 1345209653063462051 LIMIT 47>, time 1042 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 687538582456175035 AND token(id) <= 714387656474794527 LIMIT 47>, time 1262 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 3166906839309766962 AND token(id) <= 3220931935607646481 LIMIT 47>, time 1589 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 5661754624946584572 AND token(id) <= 5764445628939515857 LIMIT 47>, time 1491 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > 6988693948300673349 AND token(id) <= 7029878484744478913 LIMIT 47>, time 1579 msec - slow timeout 500 msec/cross-node <SELECT * FROM keyspace.table WHERE last_update_date = 2017-04-07 AND token(id) > -601653023540100942 AND token(id) <= -570711602903916229 LIMIT 47>, time 1032 msec - slow timeout 500 msec/cross-node
... (5 were dropped)
DEBUG [ScheduledTasks:1] 2018-01-26 12:43:14,222 MonitoringTask.java:173 - 52 operations were slow in the last 4998 msecs:
...


I can't understand why my process is making lots of SELECT queries in just 5 seconds, when it should be making one fetch, updating the rows, and then sleeping for five seconds. If i check my application logs, everything seems to be correct, there is only one fetch every five seconds.

I tried running the same code on another environment (with the same volume of data). It runs much faster and the debug.log looks correct:
...
DEBUG [ScheduledTasks:1] 2018-01-26 12:32:25,574 MonitoringTask.java:173 - 1 operations were slow in the last 5003 msecs: <SELECT * FROM keyspace.table WHERE last_update_date = 2017-03-13 AND token(id) > token(AVrJ6PqGIUe_WZqLXGLv) LIMIT 200>, time 517 msec - slow timeout 500 msec/cross-node DEBUG [ScheduledTasks:1] 2018-01-26 12:32:35,575 MonitoringTask.java:173 - 1 operations were slow in the last 5000 msecs: <SELECT * FROM keyspace.table WHERE last_update_date = 2017-03-13 AND token(id) > token(AVrJ6GJqVlin-KDUvTKL) LIMIT 200>, time 680 msec - slow timeout 500 msec/cross-node DEBUG [ScheduledTasks:1] 2018-01-26 12:32:40,577 MonitoringTask.java:173 - 1 operations were slow in the last 5001 msecs: <SELECT * FROM keyspace.table WHERE last_update_date = 2017-03-13 AND token(id) > token(AVrJ6xVrVlin-KDUvXep) LIMIT 200>, time 647 msec - slow timeout 500 msec/cross-node
...

Any thoughts on why is this happening would be highly appreciated.

Thanks in advance.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to