The short answer is yes, we are looking into adding streaming of
results to solve that problem
(https://issues.apache.org/jira/browse/CASSANDRA-4415).

--
Sylvain

On Tue, Jul 24, 2012 at 6:51 PM, Josep Blanquer <blanq...@rightscale.com> wrote:
> Thank Sylvain,
>
>  The main argument for this is pagination. Let me try to explain the use
> cases, and compare it to RDBMS for better illustration:
>  1- Right now, Cassandra doesn't stream the requests, so large resultsets
> are a royal pain in the neck to deal with. I.e., if I have a range_slice, or
> even a slice query that cuts across 1 million columns...I have to completely
> "eat it all" in the client receiving the response. That is, I'll need to
> store 1 million results in the client no matter what, and that can be quite
> prohibitive.
>  2- In an effort to alleviate that, one can be smarter in the client and
> play the pagination game...i.e., start slicing at some column and get the
> next N results, then start the slice at the last column seen and get N
> more....etc. That results in many more queries from the smart client, but at
> least it would allow you to handle large result sets. (That's where the need
> for the CQL query in my original email was about).
> 3- There's another important factor related to this problem in my opinion:
> the LIMIT clause in Cassandra (in both CQL or Thrift) is a "required" field.
> What I mean by "required" is that cassandra requires an explicit "count" to
> operate underneath. So it is really different from RDBMS' semantics where no
> LIMIT means you'll get "all" the results (instead of the high, yet still
> bound count of 10K or 20K max resultset row cassandra enforces by
> defaul)...and I cannot tell you how many problems we've had with developers
> forgetting about these "default" counts in queries, and realizing that some
> had results truncated because of that...in my mind, LIMIT should be to only
> used restrict results...queries with no LIMIT should always return all
> results (much like RDBMS)...otherwise the query "seems" the same but it is
> semantically different.
>
> So, all in all I think that the main problem/use case I'm facing is that
> Cassandra cannot stream resultsets. If it did, I believe that the need for
> my pagination use case would basically disappear, since it'd be the
> transport/client that would throttle how many results are stored in the
> client buffer at any point time. At the same time, I believe that with a
> streaming protocol you could simply change Cassandra internals to have
> "infinite" default limits...since there wouldn't be no reason to stop
> scanning (unless an explicit LIMIT clause was specified by the client). That
> would give you not only the SQL-equivalent syntax, but also the equivalent
> semantics of most current DBs.
>
> I hope that makes sense. That being said, are there any plans for streaming
> results? I believe that without that (and especially with the new CQL
> restrictions) it make much more difficult to use Cassandra with wide rows
> and large resultsets (which, in my mind is one of its sweet spots ). I
> believe that if that doesn't happen it would a) force the clients to be
> built in a much more complex and inefficient way to handle wide rows or b)
> will force users to use different, less efficient datamodels for their data.
> Both seem bad propositions to me, as they wouldn't be taking advantage of
> Cassandra's power, therefore diminishing its value.
>
>  Cheers,
>
>  Josep M.
>
>
> On Tue, Jul 24, 2012 at 3:11 AM, Sylvain Lebresne <sylv...@datastax.com>
> wrote:
>>
>> On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer
>> <blanq...@rightscale.com> wrote:
>> > is there some way to express that in CQL3? something logically
>> > equivalent to
>> >
>> > SELECT *  FROM bug_test WHERE a:b:c:d:e > 1:1:1:1:2    ??
>>
>> No, there isn't. Not currently at least. But feel free of course to
>> open a ticket/request on
>> https://issues.apache.org/jira/browse/CASSANDRA.
>>
>> I note that I would be curious to know the concrete use case you have
>> for such type of queries. It would also help as an argument to add
>> such facilities more quickly (or at all). Typically, "we should
>> support it in CQL3 because it was possible with thrift" is
>> definitively an argument, but a much weaker one without concrete
>> examples of why it might be useful in the first place.
>>
>> --
>> Sylvain
>
>

Reply via email to