We've recently started upgrading from 1.2.12 to 2.1.7. In 1.2.12 we
wrote code that used the well-known pagination pattern (tokens) to
process all rows in one of our tables. For 2.1.7 we tried replacing
that code with the new built-in pagination code:
List<Row> queryRows = new ArrayList<>();
String query = "select * from " + schema + "." + table;
Statement stmt = new SimpleStatement(query);
stmt.setFetchSize(rowLimit);
ResultSet rs = session.execute(stmt);
for (Row row : rs)
{
queryRows.add(row);
int avail = rs.getAvailableWithoutFetching();
if ((!rs.isFullyFetched()) && (avail <= rowLimit - 10))
{
rs.fetchMoreResults(); // async
}
if (avail == 0)
{
processor.process(queryRows);
queryRows.clear();
}
}
The schema:
create table x.messages (
sourceday text, // partition-key
seqnumber int, // partition-key
sourcetimeus bigint, // clustering-key
unique bigint, // clustering-key
tags set<text>,
dc text,
sc set<text>,
dn text,
type text,
subtype text,
das int,
ingesttimems bigint,
vs int,
chunknum bigint,
humantext text,
fields map<text, text>,
primary key ((sourceday, seqnumber), sourcetimeus, unique)
)
with clustering order by (sourcetimeus ASC, unique ASC) and
compression = { 'sstable_compression' : 'LZ4Compressor' };
Messages average about 1k in size (most of that in the "fields" map)
In this test, the processor.process() call just prints a progress
message to sysout.
In a direct comparison reading our test data set (24.1M rows on a single
node) we see (average of 3 runs each):
* old paging: 908 seconds, 26k rows/sec
* new paging: 1044 seconds, 23k rows/sec
Is this appx. ~13% slowdown with the new paging known/expected? If not,
how would we diagnose the cause? We'd definitely prefer to use the new
paging since the code is MUCH simpler.