Thank Sylvain, The main argument for this is pagination. Let me try to explain the use cases, and compare it to RDBMS for better illustration: 1- Right now, Cassandra doesn't stream the requests, so large resultsets are a royal pain in the neck to deal with. I.e., if I have a range_slice, or even a slice query that cuts across 1 million columns...I have to completely "eat it all" in the client receiving the response. That is, I'll need to store 1 million results in the client no matter what, and that can be quite prohibitive. 2- In an effort to alleviate that, one can be smarter in the client and play the pagination game...i.e., start slicing at some column and get the next N results, then start the slice at the last column seen and get N more....etc. That results in many more queries from the smart client, but at least it would allow you to handle large result sets. (That's where the need for the CQL query in my original email was about). 3- There's another important factor related to this problem in my opinion: the LIMIT clause in Cassandra (in both CQL or Thrift) is a "required" field. What I mean by "required" is that cassandra requires an explicit "count" to operate underneath. So it is really different from RDBMS' semantics where no LIMIT means you'll get "all" the results (instead of the high, yet still bound count of 10K or 20K max resultset row cassandra enforces by defaul)...and I cannot tell you how many problems we've had with developers forgetting about these "default" counts in queries, and realizing that some had results truncated because of that...in my mind, LIMIT should be to only used restrict results...queries with no LIMIT should always return all results (much like RDBMS)...otherwise the query "seems" the same but it is semantically different.
So, all in all I think that the main problem/use case I'm facing is that Cassandra cannot stream resultsets. If it did, I believe that the need for my pagination use case would basically disappear, since it'd be the transport/client that would throttle how many results are stored in the client buffer at any point time. At the same time, I believe that with a streaming protocol you could simply change Cassandra internals to have "infinite" default limits...since there wouldn't be no reason to stop scanning (unless an explicit LIMIT clause was specified by the client). That would give you not only the SQL-equivalent syntax, but also the equivalent semantics of most current DBs. I hope that makes sense. That being said, are there any plans for streaming results? I believe that without that (and especially with the new CQL restrictions) it make much more difficult to use Cassandra with wide rows and large resultsets (which, in my mind is one of its sweet spots ). I believe that if that doesn't happen it would a) force the clients to be built in a much more complex and inefficient way to handle wide rows or b) will force users to use different, less efficient datamodels for their data. Both seem bad propositions to me, as they wouldn't be taking advantage of Cassandra's power, therefore diminishing its value. Cheers, Josep M. On Tue, Jul 24, 2012 at 3:11 AM, Sylvain Lebresne <sylv...@datastax.com>wrote: > On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer > <blanq...@rightscale.com> wrote: > > is there some way to express that in CQL3? something logically > equivalent to > > > > SELECT * FROM bug_test WHERE a:b:c:d:e > 1:1:1:1:2 ?? > > No, there isn't. Not currently at least. But feel free of course to > open a ticket/request on > https://issues.apache.org/jira/browse/CASSANDRA. > > I note that I would be curious to know the concrete use case you have > for such type of queries. It would also help as an argument to add > such facilities more quickly (or at all). Typically, "we should > support it in CQL3 because it was possible with thrift" is > definitively an argument, but a much weaker one without concrete > examples of why it might be useful in the first place. > > -- > Sylvain >