Also, using marker/limit rather than offset/limit makes it much easier to shard 
for scale if desired. One of the reasons we chose that route with Swift. 
Regular databases with indexes work better with marker/limit than offset/limit 
too.

On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote:

> I definitely agree that the paging logic is questionable; we definitely 
> should be specifying the sort order if we're relying on it, as I believe a 
> database is otherwise free to return results however it sees fit.
> 
> I'm not convinced that the proposed logic around the queries and update 
> timestamps is typo-free.  But even if it was correct, the problem with 
> timestamps is that I don't think you can get a sensible set of results.  If 
> you order by increasing timestamp, then I think you see items repeatedly if 
> they are concurrently updated.  If you order by decreasing timestamp, I think 
> you can miss an item altogether if it's updated in between page fetches.
> 
> I think the 'normal' way to make this work is to order by primary key, and to 
> have the marker be the PK of the 'last seen row'.  Because the client can't 
> know what the pk is, the marker is normally just an opaque value.  It's also 
> (normally) much easier for the DB to sort by primary key than by any other 
> value.
> 
> The behaviour of an iteration using the PK is fairly nice.  You never see a 
> row twice, so you don't need to de-dup client side.  You are guaranteed to 
> see all rows, unless they are concurrently inserted or deleted.  If a row is 
> inserted you'll see it if you haven't yet fetched the page it would be on.  
> If a row is deleted you don't see it unless you've already fetched the page 
> it was on.
> 
> Zones may complicate any scheme, though I think both proposed methods could 
> be made to work.  I think in general the next N rows have to be fetched from 
> each child zone and then combined (top N), throwing away the extra records.   
> One way around this would be to define an order on the child zones, and then 
> show the records from zone 1, then those from zone 2 etc etc.  You just need 
> to map from the marker to a child zone, and if you don't get the full 
> complement of rows you need to merge in the first results from the next zone 
> as well.
> 
> The SQL query using the PK would look like this (and there's no need for an 
> OFFSET clause):
> 
> SELECT *
> FROM <table>
> ORDER BY <pk>
> WHERE <pk> > @marker
> AND (<additional filters>)
> LIMIT <pagesize>
> 
> Without additional filters, that's a _very_ efficient query for a sensible 
> database.
> 
> Justin
> 
> 
> 
> On Wed, May 25, 2011 at 9:28 AM, Jay Pipes <jaypi...@gmail.com> wrote:
> When a user issues a GET request for a resource collection, it's clear
> that you don't want to return ALL results in the collection. Returning
> hundreds of thousands of records per request isn't particularly a good
> idea.
> 
> Therefore, we use the concept of "pagination" -- returning a subset of
> records matching the search criteria.
> 
> In Glance's API, we've added this ability (currently in code review),
> but our approach is a bit at odds with the way that the OpenStack
> Compute API is structured. I'd like to explain why I think our
> approach is better, both in terms of usability and in terms of
> efficiency.
> 
> First, here is how Nova does its pagination in the OpenStack API.
> 
> 1) Look for a "marker" query parameter in the request
> 2) Look for a "limit" query parameter in the request, otherwise use a
> default max limit.
> 3) If marker parameter is found, the value of the "marker" query
> parameter is assumed to be an *integer* and assumed to be the
> *identifier of the last object on the previous "page" of results*
> 4) Request *all* results
> 5) In the code, find the "page" of results by looking for the object
> in the set of results that matches the marker parameter
> 6) When found, grab the next object in the result set, plus X objects,
> where X is the limit parameter
> 
> The relevant code is in /nova/api/openstack/common.py, and looks like so:
> 
> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit):
>    """Return a slice of items according to the requested marker and limit."""
> 
> <snipped for brevity... >
> 
>    limit = min(max_limit, limit)
>    start_index = 0
>    if marker:
>        start_index = -1
>        for i, item in enumerate(items):
>            if item['id'] == marker:
>                start_index = i + 1
>                break
>        if start_index < 0:
>            raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % marker))
>    range_end = start_index + limit
>    return items[start_index:range_end]
> 
> There are three MAJOR problems with the above method of pagination:
> 
> 1) Very inefficient
> 
> The Python code does the job of determining the page of results to
> find. This means that the database query returns EVERY row in the
> resultset, instead of only returning a small subset of the results
> using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands
> of records in the various databases, this method is simply not
> scalable.
> 
> 2) There is no ordering of the results in the queries the Nova
> database API makes
> 
> The only queries in the Nova database API that order results is the
> instance_type_get_all() method.
> 
> Without an order to the search results, paging is almost meaningless.
> A page of results depends on the order of the list of results. The
> reason the existing Nova code works at all is because of the fact that
> all objects have an autoincrementing integer primary key, and by
> nature, this autoincrementing integer primary key increases over time.
> If the identifier was switched to, say, UUID, the results ordering
> would fall apart completely, and "pages" of results would get skewed
> and corrupted as records were inserted over time.
> 
> For instance, assume a character primary key (like UUID keys would be
> stored in hex string format). Assume the following simplified set of
> instances:
> 
> PK         Created_at
> ABC       2011-05-25 12:00:00
> DEF       2011-05-25 12:00:01
> GHJ       2011-05-25 12:00:02
> KLM       2011-05-25 12:00:03
> 
> Assume you want to get a list of instances with GET /servers?limit=2.
> With the existing code, you'd get this:
> 
> ABC       2011-05-25 12:00:00
> DEF       2011-05-25 12:00:01
> 
> With GET /servers?limit=2?marker=DEF, you'd get this:
> 
> GHJ       2011-05-25 12:00:00
> KLM       2011-05-25 12:00:03
> 
> However, let's say a new instance is added:
> 
> BCD       2011-05-25 12:00:04
> 
> Because there is no sort order specified in the database queries, the
> order of results is indeterminate. Page 1 may now contain the BCD
> record, or it may not. It totally depends on the implementation of the
> relational database, because the SQL standard does not mandate a
> default sort order when one is not specified in the ORDER BY clause.
> 
> 3) The marker parameter assumes an autoincrementing primary key
> 
> Because the marker parameter assumes an autoincrementing primary key,
> the paging of results currently DOES solve the problem of new records
> messing up paging.
> 
> As an example, let's assume I've fixed #2 above and put a default sort
> order on the primary key, descending. Also assume a simple resultset
> like so:
> 
> PK         Created_at
> 1      2011-05-25 12:00:00
> 2      2011-05-25 12:00:01
> 3      2011-05-25 12:00:02
> 4      2011-05-25 12:00:03
> 
> Assume I make a request for GET /servers?limit=2. I would get:
> 
> 4      2011-05-25 12:00:03
> 3      2011-05-25 12:00:02
> 
> The results returned would be the "last" 10 server instances created.
> If I did a request for GET /servers?limit=2?marker=3, I would get the
> results:
> 
> 2      2011-05-25 12:00:01
> 1      2011-05-25 12:00:00
> 
> If I add a new instance, say:
> 
> 5      2011-05-25 12:00:05
> 
> and did the request for GET /servers?limit=2?marker=3, I would still
> get the same results:
> 
> 2      2011-05-25 12:00:01
> 1      2011-05-25 12:00:00
> 
> However, the ONLY reason this works is because the primary key is
> auto-incrementing, and new records always come at the "end" of the
> primary key. If I change the primary key to a UUID, the call to GET
> /servers?limit=2?marker=3 would no longer be guaranteed to return the
> same results. The newly-added instance's UUID may come between the
> values of the primary keys in the original "page 2", and therefore
> messed up the consistency of the pagination of my original query.
> 
> The way to avoid this inconsistency is to ensure that you have a
> consistent view of the set of results of your original query. To do
> that, you must always use a WHERE condition on a temporal,
> always-incrementing (timestamp) column.
> 
> In SQL, this is what is necessary to have a consistent view of pages
> of a result set (assuming 10 elements to a page):
> 
> Page 1:
> 
> SELECT i.* FROM instances i
> ORDER BY i.id DESC;
> LIMIT 10;
> 
> SET @marker = <ID of *first* record>
> SET @updated_at_marker = SELECT updated_at FROM instances WHERE id = @marker;
> 
> Page 2:
> 
> SELECT i.* FROM instances i
> WHERE updated_at <= @updated_at_marker
> ORDER BY i.id DESC
> LIMIT 10 OFFSET 10;
> 
> 
> SUMMARY:
> 
> My proposal is to overhaul the way that Nova does pagination so that we:
> 
> 1) Pass the LIMIT and OFFSET parameters down into the database API queries
> 2) Add a default ORDER BY for all queries returning result sets
> 3) Add a "page" query parameter that would be used to calculate the
> OFFSET programmatically
> 4) Change the concept of the "marker" parameter to refer to *the ID of
> the FIRST record returned in the original query*
> 
> Thoughts?
> 
> -jay
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to