OpenstackPaginationEmail

I think there is a lot of confusion in the two uses of the word 'marker', and 
maybe under Jay's proposal we need another word for 'marker'.

Suppose we have the following images:

PK         Created_at                    Updated_at          Deleted_at
1       2011-05-25 12:00:00        2011-05-25 12:00:04        
2       2011-05-25 12:00:01        2011-05-25 12:00:03
3       2011-05-25 12:00:02        2011-05-25 12:00:05
4       2011-05-25 12:00:03                                 2011-05-25 12:00:09

Under the current 1.1 spec, 'marker' means the id of the last element you saw, 
such that:

/images?marker=3&limit=2

will give you the 2 images *updated* before server with id '3':

[1, 2]

(assuming order by updated_at desc)

This is *not* what the current code does because we do not ORDER BY updated_at 
yet, as Jay pointed out.  I'm just showing what the spec wants as it is 
currently written.

If I understand Jay's ideas correctly, he wants us to pass marker(a different 
marker), offset, AND limit.  So a query would go something like this:

/images?marker=<timestamp>&offset=3&limit=10

I believe that marker can be left empty here, and it will default to now(), but 
whatever <timestamp> gets set to, it will return images that were created 
*before* <timestamp> and that were deleted *after* <timestamp> (if any).  The 
main advantage here is that it gives you a persistent 'collection snapshot' of 
your query results, based on the time that you made the initial query.  If it 
takes you a minute to page through results, and images were deleted or added 
during that time, it wont throw off your pagination if you keep your marker 
constant.

If we passed in marker = '2011-05-25 12:00:03', offset = '0', and limit = '4', 
we would get:

[4, 3, 2, 1]

(assuming order by created_at desc)

using jay's method.  If we kept everything the same, but passed in '2011-05-25 
12:00:10' as the marker, image 4 would not be on the list because at that time 
image 4 was deleted.

Please correct me if something above is incorrect.


As for thoughts, I talked with Mark Washenberger and Brian Waldon, and we came 
up with 2 possible ways to move forward.  

Things that we agree on in both cases:
        
* The current way nova handles paging is inefficient, and needs to be improved. 
* We need to use ORDER BY in all of our queries, and not assume that id's will 
be ordered by time.  
* We order our queries by created_at, *not* updated_at as specified in the 
current spec (you can see the confusion this may cause in my first example 
above).

I personally like Jay's proposal (except maybe keeping 'pages' out for now in 
favor of just having 1 way to do things, rather than many ways to do the same 
thing), but feel that the term 'marker' should maybe be renamed.  Maybe 
'timestamp' would even be better? I'm open to other suggestions.

Another idea that we had was to still use marker/limit with marker being an id, 
but to move the existing inefficient python logic into the db layer.  This will 
give us the sharding/scaling advantages that Greg mentioned, and also get rid 
of a lot of the problems Jay outlined with our current implementation.

With this method, however, I believe we will need glance to support 
marker/limit as well in order to make things efficient.  

So, to summarize, our two suggestions currently are:

1) Follow Jay's proposal, but find a better name for 'marker'.
2) Keep with marker/limit but still move inefficient logic out of python to db 
layer, and have glance support marker/limit as well.

Thoughts on these two paths of moving forward?  Anyone have ideas for other 
routes we could take?



-----Original Message-----
From: "Jay Pipes" <jaypi...@gmail.com>
Sent: Wednesday, May 25, 2011 15:57
To: "Greg Holt" <gh...@rackspace.com>
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Getting pagination right

On Wed, May 25, 2011 at 3:43 PM, Greg Holt <gh...@rackspace.com> wrote:
> Okay, I give up then. Not sure what's different with what you have vs. Swift 
> dbs. Just trying to offer up what we do and have been doing for a while now.

The pagination in Swift is not consistent. Inserts into the Swift
databases in between the time of the initial query and the requesting
the "next page" can result in rows from the original first page
getting on the new second page.

Code in swift/common/db.py lines 958 through 974 shows an ORDER BY
name. Newly inserted objects (or records that are deleted) with a name
value > marker and < end_marker can result in a page changing its
contents on refresh. This is why I was saying it's not a consistent
view of the data.

-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp



_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to