Re: Streaming Row Ranges

Peter Schuller Thu, 16 Dec 2010 12:48:46 -0800

> I think that, from a client perspective it would be nicer in many scenarios
> just to “ask for all rows in a cf” and to receive some kind of stream and
> read the rows one by one from that stream instead of receiving all rows and
> then iterating over them (and being limited by the count of rows). Of course
> client side libraries could hide the paging stuff, but that would not
> improve latency.


Well, a high-level client could pre-fetch pages asynchronously such
that the latency issue goes away (given sufficient read-ahead).
Assuming a reasonably sized page size/count, hopefully the latency is
not huge relative to the time it takes to do the actual work. Further
performance (in terms of a single client, not overall throughput)
could be had by increasing concurrency (i.e., still doing read-ahead
of pages but pre-fetching multiple at the same time - within reason).

Not saying that true streaming wouldn't be nice though.

> Is something like this possible? Is it perhaps already implemented?

Not implemented AFAIK; certainly possible though non-trivial (e.g.,
thrift doesn't directly support streaming so as long as thrift is
used, an underlying request/response oriented approach would be needed
anyway). I can't speak to what plans are, so leaving that for someone
else... But my personal feel is that simply implementing pre-fetching
paging in higher-level clients seem easier to pull off than
orchestrating proper streaming support natively in Cassandra
internally and it's wire level API. But maybe I'm being too paranoid
about the issues involved; if I'm way off maybe someone more familiar
with the code base will correct me.

-- 
/ Peter Schuller

Re: Streaming Row Ranges

Reply via email to