> I think that, from a client perspective it would be nicer in many scenarios > just to “ask for all rows in a cf” and to receive some kind of stream and > read the rows one by one from that stream instead of receiving all rows and > then iterating over them (and being limited by the count of rows). Of course > client side libraries could hide the paging stuff, but that would not > improve latency.
Well, a high-level client could pre-fetch pages asynchronously such that the latency issue goes away (given sufficient read-ahead). Assuming a reasonably sized page size/count, hopefully the latency is not huge relative to the time it takes to do the actual work. Further performance (in terms of a single client, not overall throughput) could be had by increasing concurrency (i.e., still doing read-ahead of pages but pre-fetching multiple at the same time - within reason). Not saying that true streaming wouldn't be nice though. > Is something like this possible? Is it perhaps already implemented? Not implemented AFAIK; certainly possible though non-trivial (e.g., thrift doesn't directly support streaming so as long as thrift is used, an underlying request/response oriented approach would be needed anyway). I can't speak to what plans are, so leaving that for someone else... But my personal feel is that simply implementing pre-fetching paging in higher-level clients seem easier to pull off than orchestrating proper streaming support natively in Cassandra internally and it's wire level API. But maybe I'm being too paranoid about the issues involved; if I'm way off maybe someone more familiar with the code base will correct me. -- / Peter Schuller