Also, using marker/limit rather than offset/limit makes it much easier to shard for scale if desired. One of the reasons we chose that route with Swift. Regular databases with indexes work better with marker/limit than offset/limit too.
On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote: > I definitely agree that the paging logic is questionable; we definitely > should be specifying the sort order if we're relying on it, as I believe a > database is otherwise free to return results however it sees fit. > > I'm not convinced that the proposed logic around the queries and update > timestamps is typo-free. But even if it was correct, the problem with > timestamps is that I don't think you can get a sensible set of results. If > you order by increasing timestamp, then I think you see items repeatedly if > they are concurrently updated. If you order by decreasing timestamp, I think > you can miss an item altogether if it's updated in between page fetches. > > I think the 'normal' way to make this work is to order by primary key, and to > have the marker be the PK of the 'last seen row'. Because the client can't > know what the pk is, the marker is normally just an opaque value. It's also > (normally) much easier for the DB to sort by primary key than by any other > value. > > The behaviour of an iteration using the PK is fairly nice. You never see a > row twice, so you don't need to de-dup client side. You are guaranteed to > see all rows, unless they are concurrently inserted or deleted. If a row is > inserted you'll see it if you haven't yet fetched the page it would be on. > If a row is deleted you don't see it unless you've already fetched the page > it was on. > > Zones may complicate any scheme, though I think both proposed methods could > be made to work. I think in general the next N rows have to be fetched from > each child zone and then combined (top N), throwing away the extra records. > One way around this would be to define an order on the child zones, and then > show the records from zone 1, then those from zone 2 etc etc. You just need > to map from the marker to a child zone, and if you don't get the full > complement of rows you need to merge in the first results from the next zone > as well. > > The SQL query using the PK would look like this (and there's no need for an > OFFSET clause): > > SELECT * > FROM <table> > ORDER BY <pk> > WHERE <pk> > @marker > AND (<additional filters>) > LIMIT <pagesize> > > Without additional filters, that's a _very_ efficient query for a sensible > database. > > Justin > > > > On Wed, May 25, 2011 at 9:28 AM, Jay Pipes <jaypi...@gmail.com> wrote: > When a user issues a GET request for a resource collection, it's clear > that you don't want to return ALL results in the collection. Returning > hundreds of thousands of records per request isn't particularly a good > idea. > > Therefore, we use the concept of "pagination" -- returning a subset of > records matching the search criteria. > > In Glance's API, we've added this ability (currently in code review), > but our approach is a bit at odds with the way that the OpenStack > Compute API is structured. I'd like to explain why I think our > approach is better, both in terms of usability and in terms of > efficiency. > > First, here is how Nova does its pagination in the OpenStack API. > > 1) Look for a "marker" query parameter in the request > 2) Look for a "limit" query parameter in the request, otherwise use a > default max limit. > 3) If marker parameter is found, the value of the "marker" query > parameter is assumed to be an *integer* and assumed to be the > *identifier of the last object on the previous "page" of results* > 4) Request *all* results > 5) In the code, find the "page" of results by looking for the object > in the set of results that matches the marker parameter > 6) When found, grab the next object in the result set, plus X objects, > where X is the limit parameter > > The relevant code is in /nova/api/openstack/common.py, and looks like so: > > def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit): > """Return a slice of items according to the requested marker and limit.""" > > <snipped for brevity... > > > limit = min(max_limit, limit) > start_index = 0 > if marker: > start_index = -1 > for i, item in enumerate(items): > if item['id'] == marker: > start_index = i + 1 > break > if start_index < 0: > raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % marker)) > range_end = start_index + limit > return items[start_index:range_end] > > There are three MAJOR problems with the above method of pagination: > > 1) Very inefficient > > The Python code does the job of determining the page of results to > find. This means that the database query returns EVERY row in the > resultset, instead of only returning a small subset of the results > using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands > of records in the various databases, this method is simply not > scalable. > > 2) There is no ordering of the results in the queries the Nova > database API makes > > The only queries in the Nova database API that order results is the > instance_type_get_all() method. > > Without an order to the search results, paging is almost meaningless. > A page of results depends on the order of the list of results. The > reason the existing Nova code works at all is because of the fact that > all objects have an autoincrementing integer primary key, and by > nature, this autoincrementing integer primary key increases over time. > If the identifier was switched to, say, UUID, the results ordering > would fall apart completely, and "pages" of results would get skewed > and corrupted as records were inserted over time. > > For instance, assume a character primary key (like UUID keys would be > stored in hex string format). Assume the following simplified set of > instances: > > PK Created_at > ABC 2011-05-25 12:00:00 > DEF 2011-05-25 12:00:01 > GHJ 2011-05-25 12:00:02 > KLM 2011-05-25 12:00:03 > > Assume you want to get a list of instances with GET /servers?limit=2. > With the existing code, you'd get this: > > ABC 2011-05-25 12:00:00 > DEF 2011-05-25 12:00:01 > > With GET /servers?limit=2?marker=DEF, you'd get this: > > GHJ 2011-05-25 12:00:00 > KLM 2011-05-25 12:00:03 > > However, let's say a new instance is added: > > BCD 2011-05-25 12:00:04 > > Because there is no sort order specified in the database queries, the > order of results is indeterminate. Page 1 may now contain the BCD > record, or it may not. It totally depends on the implementation of the > relational database, because the SQL standard does not mandate a > default sort order when one is not specified in the ORDER BY clause. > > 3) The marker parameter assumes an autoincrementing primary key > > Because the marker parameter assumes an autoincrementing primary key, > the paging of results currently DOES solve the problem of new records > messing up paging. > > As an example, let's assume I've fixed #2 above and put a default sort > order on the primary key, descending. Also assume a simple resultset > like so: > > PK Created_at > 1 2011-05-25 12:00:00 > 2 2011-05-25 12:00:01 > 3 2011-05-25 12:00:02 > 4 2011-05-25 12:00:03 > > Assume I make a request for GET /servers?limit=2. I would get: > > 4 2011-05-25 12:00:03 > 3 2011-05-25 12:00:02 > > The results returned would be the "last" 10 server instances created. > If I did a request for GET /servers?limit=2?marker=3, I would get the > results: > > 2 2011-05-25 12:00:01 > 1 2011-05-25 12:00:00 > > If I add a new instance, say: > > 5 2011-05-25 12:00:05 > > and did the request for GET /servers?limit=2?marker=3, I would still > get the same results: > > 2 2011-05-25 12:00:01 > 1 2011-05-25 12:00:00 > > However, the ONLY reason this works is because the primary key is > auto-incrementing, and new records always come at the "end" of the > primary key. If I change the primary key to a UUID, the call to GET > /servers?limit=2?marker=3 would no longer be guaranteed to return the > same results. The newly-added instance's UUID may come between the > values of the primary keys in the original "page 2", and therefore > messed up the consistency of the pagination of my original query. > > The way to avoid this inconsistency is to ensure that you have a > consistent view of the set of results of your original query. To do > that, you must always use a WHERE condition on a temporal, > always-incrementing (timestamp) column. > > In SQL, this is what is necessary to have a consistent view of pages > of a result set (assuming 10 elements to a page): > > Page 1: > > SELECT i.* FROM instances i > ORDER BY i.id DESC; > LIMIT 10; > > SET @marker = <ID of *first* record> > SET @updated_at_marker = SELECT updated_at FROM instances WHERE id = @marker; > > Page 2: > > SELECT i.* FROM instances i > WHERE updated_at <= @updated_at_marker > ORDER BY i.id DESC > LIMIT 10 OFFSET 10; > > > SUMMARY: > > My proposal is to overhaul the way that Nova does pagination so that we: > > 1) Pass the LIMIT and OFFSET parameters down into the database API queries > 2) Add a default ORDER BY for all queries returning result sets > 3) Add a "page" query parameter that would be used to calculate the > OFFSET programmatically > 4) Change the concept of the "marker" parameter to refer to *the ID of > the FIRST record returned in the original query* > > Thoughts? > > -jay > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp