Re: Pagination support on Java Driver Query API

Eric Stevens Thu, 12 Feb 2015 08:03:04 -0800

Thanks Ondřej!  Definitely much easier.

N/B, this is a new feature in 2.0.x, it will not work in 1.2.x.


cqlsh:scratch> SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) >
(1,2) limit 2;
Bad Request: line 1:45 no viable alternative at input '('


On Thu, Feb 12, 2015 at 8:44 AM, Ondřej Nešpor <ondrej.nes...@gmail.com>
wrote:

>  There is a much easier way to do that (and I suppose the Java driver does
> it this way):
>
> page 1:
>
> SELECT * FROM foo WHERE partitionkey = 1 limit 2;
>
>  partitionkey | ck1 | ck2 | col1 | col2
>
> --------------+-----+-----+------+------
>
>             1 |   1 |   3 |    3 |    3
>
>             1 |   1 |   2 |    2 |    2
>
>
>
> page2:
>
> SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (1,2) limit 2;
>
>
>  partitionkey | ck1 | ck2 | col1 | col2
>
> --------------+-----+-----+------+------
>
>             1 |   1 |   1 |    1 |    1
>
>             1 |   2 |   3 |    6 |    6
>
>
>
>
> page 3:
>
> SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (2,3) limit 2;
>
>  partitionkey | ck1 | ck2 | col1 | col2
>
> --------------+-----+-----+------+------
>
>             1 |   2 |   2 |    5 |    5
>
>             1 |   2 |   1 |    4 |    4
>
>
>
> Basically you pass ck1 and ck2 values from the last row of the previous
> result and tell C* you want results that are greater (next results in case
> the first clustering key is ASC).
>
>
>
> Andrew
>
>
>
> Dne 12.2.2015 v 15:50 Eric Stevens napsal(a):
>
> Your page state then needs to track the last ck1 and last ck2 you saw.
> Pages 2+ will end up needing to be up to two queries if the first query
> doesn't fill the page size.
>
>  CREATE TABLE foo (
>   partitionkey int,
>   ck1 int,
>   ck2 int,
>   col1 int,
>   col2 int,
>   PRIMARY KEY ((partitionkey), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>
>  INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
>  INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
>
>  If you're pulling the whole of partition 1 and your page size is 2, your
> first page looks like:
>
>  *PAGE 1*
>
>  SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>   partitionkey | ck1 | ck2 | col1 | col2
> --------------+-----+-----+------+------
>             1 |   1 |   3 |    3 |    3
>             1 |   1 |   2 |    2 |    2
>
>  You got enough rows to satisfy the page, Your page state is taken from
> the last row: (ck1=1, ck2=2)
>
>
>  *PAGE 2*
> Notice that you have a page state, and add some limiting clauses on the
> statement:
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
>   partitionkey | ck1 | ck2 | col1 | col2
> --------------+-----+-----+------+------
>             1 |   1 |   1 |    1 |    1
>
>  Oops, we didn't get enough rows to satisfy the page limit, so we need to
> continue on, we just need one more:
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
>   partitionkey | ck1 | ck2 | col1 | col2
>  --------------+-----+-----+------+------
>             1 |   2 |   3 |    6 |    6
>
>  We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2
> = 3).
>
>
>  *PAGE 3*
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --------------+-----+-----+------+------
>             1 |   2 |   2 |    5 |    5
>             1 |   2 |   1 |    4 |    4
>
>  Great, we satisfied this page with only one query, page state: (ck1 = 2,
> ck2 = 1).
>
>
>  *PAGE 4*
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
> (0 rows)
>
>  Oops, our initial query was on the boundary of ck1, but this looks like
> any other time that the initial query returns < pageSize rows, we just move
> on to the next page:
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
> (0 rows)
>
>  Aha, we've exhausted ck1 as well, so there are no more pages, page 3
> actually pulled the last possible value; page 4 is empty, and we're all
> done.  Generally speaking you know you're done when your first clustering
> key is the only non-equality operator in the statement, and you got no rows
> back.
>
>
>
>
>
>
> On Wed, Feb 11, 2015 at 10:55 AM, Ajay <ajay.ga...@gmail.com> wrote:
>
>> Basically I am trying different queries with your approach.
>>
>> One such query is like
>>
>> Select * from mycf where condition on partition key order by ck1 asc, ck2
>> desc where ck1 and ck2 are clustering keys in that order.
>>
>> Here how do we achieve pagination support?
>>
>> Thanks
>> Ajay
>>  On Feb 11, 2015 11:16 PM, "Ajay" <ajay.ga...@gmail.com> wrote:
>>
>>>
>>> Hi Eric,
>>>
>>> Thanks for your reply.
>>>
>>> I am using Cassandra 2.0.11 and in that I cannot append condition like
>>> last clustering key column > value of the last row in the previous batch.
>>> It fails Preceding column is either not restricted or by a non-EQ relation.
>>> It means I need to specify equal  condition for all preceding clustering
>>> key columns. With this I cannot get the pagination correct.
>>>
>>> Thanks
>>> Ajay
>>>  > I can't believe that everyone read & process all rows at once
>>> (without pagination).
>>>
>>>  Probably not too many people try to read all rows in a table as a
>>> single rolling operation with a standard client driver.  But those who do
>>> would use token() to keep track of where they are and be able to resume
>>> with that as well.
>>>
>>>  But it sounds like you're talking about paginating a subset of data -
>>> larger than you want to process as a unit, but prefiltered by some other
>>> criteria which prevents you from being able to rely on token().  For this
>>> there is no general purpose solution, but it typically involves you
>>> maintaining your own paging state, typically keeping track of the last
>>> partitioning and clustering key seen, and using that to construct your next
>>> query.
>>>
>>>  For example, we have client queries which can span several
>>> partitioning keys.  We make sure that the List of partition keys generated
>>> by a given client query List(Pq) is deterministic, then our paging
>>> state is the index offset of the final Pq in the response, plus the
>>> value of the final clustering column.  A query coming in with a paging
>>> state attached to it starts the next set of queries from the provided Pq 
>>> offset
>>> where clusteringKey > the provided value.
>>>
>>>  So if you can just track partition key offset (if spanning multiple
>>> partitions), and clustering key offset, you can construct your next query
>>> from those instead.
>>>
>>> On Tue, Feb 10, 2015 at 6:58 PM, Ajay <ajay.ga...@gmail.com> wrote:
>>>
>>>> Thanks Alex.
>>>>
>>>> But is there any workaround possible?. I can't believe that everyone
>>>> read & process all rows at once (without pagination).
>>>>
>>>> Thanks
>>>> Ajay
>>>>  On Feb 10, 2015 11:46 PM, "Alex Popescu" <al...@datastax.com> wrote:
>>>>
>>>>>
>>>>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay <ajay.ga...@gmail.com> wrote:
>>>>>
>>>>>> 1) Java driver implicitly support Pagination in the ResultSet (using
>>>>>> Iterator) which can be controlled through FetchSize. But it is limited 
>>>>>> in a
>>>>>> way that we cannot skip or go previous. The FetchState is not exposed.
>>>>>
>>>>>
>>>>> Cassandra doesn't support skipping so this is not really a limitation
>>>>> of the driver.
>>>>>
>>>>>
>>>>>  --
>>>>>
>>>>> [:>-a)
>>>>>
>>>>> Alex Popescu
>>>>> Sen. Product Manager @ DataStax
>>>>> @al3xandru
>>>>>    To unsubscribe from this group and stop receiving emails from it,
>>>>> send an email to java-driver-user+unsubscr...@lists.datastax.com.
>>>>>
>>>>
>>>
>
>

Re: Pagination support on Java Driver Query API

Reply via email to