The short answer is yes, we are looking into adding streaming of results to solve that problem (https://issues.apache.org/jira/browse/CASSANDRA-4415).
-- Sylvain On Tue, Jul 24, 2012 at 6:51 PM, Josep Blanquer <blanq...@rightscale.com> wrote: > Thank Sylvain, > > The main argument for this is pagination. Let me try to explain the use > cases, and compare it to RDBMS for better illustration: > 1- Right now, Cassandra doesn't stream the requests, so large resultsets > are a royal pain in the neck to deal with. I.e., if I have a range_slice, or > even a slice query that cuts across 1 million columns...I have to completely > "eat it all" in the client receiving the response. That is, I'll need to > store 1 million results in the client no matter what, and that can be quite > prohibitive. > 2- In an effort to alleviate that, one can be smarter in the client and > play the pagination game...i.e., start slicing at some column and get the > next N results, then start the slice at the last column seen and get N > more....etc. That results in many more queries from the smart client, but at > least it would allow you to handle large result sets. (That's where the need > for the CQL query in my original email was about). > 3- There's another important factor related to this problem in my opinion: > the LIMIT clause in Cassandra (in both CQL or Thrift) is a "required" field. > What I mean by "required" is that cassandra requires an explicit "count" to > operate underneath. So it is really different from RDBMS' semantics where no > LIMIT means you'll get "all" the results (instead of the high, yet still > bound count of 10K or 20K max resultset row cassandra enforces by > defaul)...and I cannot tell you how many problems we've had with developers > forgetting about these "default" counts in queries, and realizing that some > had results truncated because of that...in my mind, LIMIT should be to only > used restrict results...queries with no LIMIT should always return all > results (much like RDBMS)...otherwise the query "seems" the same but it is > semantically different. > > So, all in all I think that the main problem/use case I'm facing is that > Cassandra cannot stream resultsets. If it did, I believe that the need for > my pagination use case would basically disappear, since it'd be the > transport/client that would throttle how many results are stored in the > client buffer at any point time. At the same time, I believe that with a > streaming protocol you could simply change Cassandra internals to have > "infinite" default limits...since there wouldn't be no reason to stop > scanning (unless an explicit LIMIT clause was specified by the client). That > would give you not only the SQL-equivalent syntax, but also the equivalent > semantics of most current DBs. > > I hope that makes sense. That being said, are there any plans for streaming > results? I believe that without that (and especially with the new CQL > restrictions) it make much more difficult to use Cassandra with wide rows > and large resultsets (which, in my mind is one of its sweet spots ). I > believe that if that doesn't happen it would a) force the clients to be > built in a much more complex and inefficient way to handle wide rows or b) > will force users to use different, less efficient datamodels for their data. > Both seem bad propositions to me, as they wouldn't be taking advantage of > Cassandra's power, therefore diminishing its value. > > Cheers, > > Josep M. > > > On Tue, Jul 24, 2012 at 3:11 AM, Sylvain Lebresne <sylv...@datastax.com> > wrote: >> >> On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer >> <blanq...@rightscale.com> wrote: >> > is there some way to express that in CQL3? something logically >> > equivalent to >> > >> > SELECT * FROM bug_test WHERE a:b:c:d:e > 1:1:1:1:2 ?? >> >> No, there isn't. Not currently at least. But feel free of course to >> open a ticket/request on >> https://issues.apache.org/jira/browse/CASSANDRA. >> >> I note that I would be curious to know the concrete use case you have >> for such type of queries. It would also help as an argument to add >> such facilities more quickly (or at all). Typically, "we should >> support it in CQL3 because it was possible with thrift" is >> definitively an argument, but a much weaker one without concrete >> examples of why it might be useful in the first place. >> >> -- >> Sylvain > >