Thank Sylvain,

 The main argument for this is pagination. Let me try to explain the use
cases, and compare it to RDBMS for better illustration:
 1- Right now, Cassandra doesn't stream the requests, so large resultsets
are a royal pain in the neck to deal with. I.e., if I have a range_slice,
or even a slice query that cuts across 1 million columns...I have to
completely "eat it all" in the client receiving the response. That is, I'll
need to store 1 million results in the client no matter what, and that can
be quite prohibitive.
 2- In an effort to alleviate that, one can be smarter in the client and
play the pagination game...i.e., start slicing at some column and get the
next N results, then start the slice at the last column seen and get N
more....etc. That results in many more queries from the smart client, but
at least it would allow you to handle large result sets. (That's where the
need for the CQL query in my original email was about).
3- There's another important factor related to this problem in my opinion:
the LIMIT clause in Cassandra (in both CQL or Thrift) is a "required"
field. What I mean by "required" is that cassandra requires an explicit
"count" to operate underneath. So it is really different from RDBMS'
semantics where no LIMIT means you'll get "all" the results (instead of the
high, yet still bound count of 10K or 20K max resultset row cassandra
enforces by defaul)...and I cannot tell you how many problems we've had
with developers forgetting about these "default" counts in queries, and
realizing that some had results truncated because of that...in my mind,
LIMIT should be to only used restrict results...queries with no LIMIT
should always return all results (much like RDBMS)...otherwise the query
"seems" the same but it is semantically different.

So, all in all I think that the main problem/use case I'm facing is that
Cassandra cannot stream resultsets. If it did, I believe that the need for
my pagination use case would basically disappear, since it'd be the
transport/client that would throttle how many results are stored in the
client buffer at any point time. At the same time, I believe that with a
streaming protocol you could simply change Cassandra internals to have
"infinite" default limits...since there wouldn't be no reason to stop
scanning (unless an explicit LIMIT clause was specified by the client).
That would give you not only the SQL-equivalent syntax, but also the
equivalent semantics of most current DBs.

I hope that makes sense. That being said, are there any plans for streaming
results? I believe that without that (and especially with the new CQL
restrictions) it make much more difficult to use Cassandra with wide rows
and large resultsets (which, in my mind is one of its sweet spots ). I
believe that if that doesn't happen it would a) force the clients to be
built in a much more complex and inefficient way to handle wide rows or b)
will force users to use different, less efficient datamodels for their
data. Both seem bad propositions to me, as they wouldn't be taking
advantage of Cassandra's power, therefore diminishing its value.

 Cheers,

 Josep M.


On Tue, Jul 24, 2012 at 3:11 AM, Sylvain Lebresne <sylv...@datastax.com>wrote:

> On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer
> <blanq...@rightscale.com> wrote:
> > is there some way to express that in CQL3? something logically
> equivalent to
> >
> > SELECT *  FROM bug_test WHERE a:b:c:d:e > 1:1:1:1:2    ??
>
> No, there isn't. Not currently at least. But feel free of course to
> open a ticket/request on
> https://issues.apache.org/jira/browse/CASSANDRA.
>
> I note that I would be curious to know the concrete use case you have
> for such type of queries. It would also help as an argument to add
> such facilities more quickly (or at all). Typically, "we should
> support it in CQL3 because it was possible with thrift" is
> definitively an argument, but a much weaker one without concrete
> examples of why it might be useful in the first place.
>
> --
> Sylvain
>

Reply via email to