We've recently started upgrading from 1.2.12 to 2.1.7. In 1.2.12 we wrote code that used the well-known pagination pattern (tokens) to process all rows in one of our tables. For 2.1.7 we tried replacing that code with the new built-in pagination code:

   List<Row> queryRows = new ArrayList<>();
        String query = "select * from " + schema + "." + table;
        Statement stmt = new SimpleStatement(query);
        stmt.setFetchSize(rowLimit);
        ResultSet rs = session.execute(stmt);
        for (Row row : rs)
        {
            queryRows.add(row);
            int avail = rs.getAvailableWithoutFetching();
            if ((!rs.isFullyFetched()) && (avail <= rowLimit - 10))
            {
                rs.fetchMoreResults(); // async
            }

            if (avail == 0)
            {
                processor.process(queryRows);
                queryRows.clear();
            }
        }
The schema:
create table x.messages (

sourceday           text,       // partition-key
seqnumber           int,        // partition-key

sourcetimeus        bigint,     // clustering-key
unique              bigint,     // clustering-key

tags                set<text>,
dc                  text,
sc                  set<text>,

dn                  text,
type                text,
subtype             text,
das                 int,

ingesttimems        bigint,
vs                  int,

chunknum            bigint,

humantext           text,
fields              map<text, text>,

primary key ((sourceday, seqnumber), sourcetimeus, unique)
)
with clustering order by (sourcetimeus ASC, unique ASC) and compression = { 'sstable_compression' : 'LZ4Compressor' };

Messages average about 1k in size (most of that in the "fields" map)

In this test, the processor.process() call just prints a progress message to sysout.

In a direct comparison reading our test data set (24.1M rows on a single node) we see (average of 3 runs each):

 * old paging: 908 seconds, 26k rows/sec
 * new paging: 1044 seconds, 23k rows/sec


Is this appx. ~13% slowdown with the new paging known/expected? If not, how would we diagnose the cause? We'd definitely prefer to use the new paging since the code is MUCH simpler.


Reply via email to