Re: Pagination support on Java Driver Query API

Ondřej Nešpor Thu, 12 Feb 2015 07:46:47 -0800

There is a much easier way to do that (and I suppose the Java driver
does it this way):


page 1:

SELECT * FROM foo WHERE partitionkey = 1 limit 2;

 partitionkey | ck1 | ck2 | col1 | col2

--------------+-----+-----+------+------

            1 |   1 |   3 |    3 |    3

            1 |   1 |   2 |    2 |    2



page2:

SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (1,2) limit 2;

 partitionkey | ck1 | ck2 | col1 | col2

--------------+-----+-----+------+------

            1 |   1 |   1 |    1 |    1

            1 |   2 |   3 |    6 |    6




page 3:

SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (2,3) limit 2;

 partitionkey | ck1 | ck2 | col1 | col2

--------------+-----+-----+------+------

            1 |   2 |   2 |    5 |    5

            1 |   2 |   1 |    4 |    4



Basically you pass ck1 and ck2 values from the last row of the previous
result and tell C* you want results that are greater (next results in
case the first clustering key is ASC).



Andrew



Dne 12.2.2015 v 15:50 Eric Stevens napsal(a):
> Your page state then needs to track the last ck1 and last ck2 you
> saw.  Pages 2+ will end up needing to be up to two queries if the
> first query doesn't fill the page size.
>
> CREATE TABLE foo (
>   partitionkey int,
>   ck1 int,
>   ck2 int,
>   col1 int,
>   col2 int,
>   PRIMARY KEY ((partitionkey), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
>
> If you're pulling the whole of partition 1 and your page size is 2,
> your first page looks like:
>
> *PAGE 1*
>
> SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --------------+-----+-----+------+------
>             1 |   1 |   3 |    3 |    3
>             1 |   1 |   2 |    2 |    2
>
> You got enough rows to satisfy the page, Your page state is taken from
> the last row: (ck1=1, ck2=2)
>
>
> *PAGE 2*
> Notice that you have a page state, and add some limiting clauses on
> the statement:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --------------+-----+-----+------+------
>             1 |   1 |   1 |    1 |    1
>
> Oops, we didn't get enough rows to satisfy the page limit, so we need
> to continue on, we just need one more:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
>  partitionkey | ck1 | ck2 | col1 | col2
> --------------+-----+-----+------+------
>             1 |   2 |   3 |    6 |    6
>
> We have enough to satisfy page 2 now, our new page state: (ck1 = 2,
> ck2 = 3).
>
>
> *PAGE 3*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --------------+-----+-----+------+------
>             1 |   2 |   2 |    5 |    5
>             1 |   2 |   1 |    4 |    4
>
> Great, we satisfied this page with only one query, page state: (ck1 =
> 2, ck2 = 1).  
>
>
> *PAGE 4*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
> (0 rows)
>
> Oops, our initial query was on the boundary of ck1, but this looks
> like any other time that the initial query returns < pageSize rows, we
> just move on to the next page:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
> (0 rows)
>
> Aha, we've exhausted ck1 as well, so there are no more pages, page 3
> actually pulled the last possible value; page 4 is empty, and we're
> all done.  Generally speaking you know you're done when your first
> clustering key is the only non-equality operator in the statement, and
> you got no rows back.
>
>
>
>
>
>
> On Wed, Feb 11, 2015 at 10:55 AM, Ajay <ajay.ga...@gmail.com
> <mailto:ajay.ga...@gmail.com>> wrote:
>
>     Basically I am trying different queries with your approach.
>
>     One such query is like
>
>     Select * from mycf where condition on partition key order by ck1
>     asc, ck2 desc where ck1 and ck2 are clustering keys in that order.
>
>     Here how do we achieve pagination support?
>
>     Thanks
>     Ajay
>
>     On Feb 11, 2015 11:16 PM, "Ajay" <ajay.ga...@gmail.com
>     <mailto:ajay.ga...@gmail.com>> wrote:
>
>
>         Hi Eric,
>
>         Thanks for your reply.
>
>         I am using Cassandra 2.0.11 and in that I cannot append
>         condition like last clustering key column > value of the last
>         row in the previous batch. It fails Preceding column is either
>         not restricted or by a non-EQ relation. It means I need to
>         specify equal  condition for all preceding clustering key
>         columns. With this I cannot get the pagination correct. 
>
>         Thanks
>         Ajay
>
>         > I can't believe that everyone read & process all rows at
>         once (without pagination).
>
>         Probably not too many people try to read all rows in a table
>         as a single rolling operation with a standard client driver. 
>         But those who do would use token() to keep track of where they
>         are and be able to resume with that as well.
>
>         But it sounds like you're talking about paginating a subset of
>         data - larger than you want to process as a unit, but
>         prefiltered by some other criteria which prevents you from
>         being able to rely on token().  For this there is no general
>         purpose solution, but it typically involves you maintaining
>         your own paging state, typically keeping track of the last
>         partitioning and clustering key seen, and using that to
>         construct your next query.
>
>         For example, we have client queries which can span several
>         partitioning keys.  We make sure that the List of partition
>         keys generated by a given client query List(Pq) is
>         deterministic, then our paging state is the index offset of
>         the final Pq in the response, plus the value of the final
>         clustering column.  A query coming in with a paging state
>         attached to it starts the next set of queries from the
>         provided Pq offset where clusteringKey > the provided value.
>
>         So if you can just track partition key offset (if spanning
>         multiple partitions), and clustering key offset, you can
>         construct your next query from those instead.  
>
>         On Tue, Feb 10, 2015 at 6:58 PM, Ajay <ajay.ga...@gmail.com
>         <mailto:ajay.ga...@gmail.com>> wrote:
>
>             Thanks Alex.
>
>             But is there any workaround possible?. I can't believe
>             that everyone read & process all rows at once (without
>             pagination).
>
>             Thanks
>             Ajay
>
>             On Feb 10, 2015 11:46 PM, "Alex Popescu"
>             <al...@datastax.com <mailto:al...@datastax.com>> wrote:
>
>
>                 On Tue, Feb 10, 2015 at 4:59 AM, Ajay
>                 <ajay.ga...@gmail.com <mailto:ajay.ga...@gmail.com>>
>                 wrote:
>
>                     1) Java driver implicitly support Pagination in
>                     the ResultSet (using Iterator) which can be
>                     controlled through FetchSize. But it is limited in
>                     a way that we cannot skip or go previous. The
>                     FetchState is not exposed.
>
>
>                 Cassandra doesn't support skipping so this is not
>                 really a limitation of the driver. 
>
>
>                 -- 
>
>                 [:>-a)
>
>                 Alex Popescu
>                 Sen. Product Manager @ DataStax
>                 @al3xandru
>                 To unsubscribe from this group and stop receiving
>                 emails from it, send an email to
>                 java-driver-user+unsubscr...@lists.datastax.com
>                 <mailto:java-driver-user+unsubscr...@lists.datastax.com>.
>
>
>

Re: Pagination support on Java Driver Query API

Reply via email to