We can handle pagination whether we have a single database, multiple databases with cache, or query each zone on each request. In the last case an instance would be identified with the zone it exists in (for example, the marker would be a fully qualified zone:instance name) and we can just pick up where we left off, using a deterministic order of zones/instances for all API frontends. I don't think we need this, we need an active cache with one db per zone (same thing I've been saying since the Austin summit).
I have a number of issues with using a central DB for this application, but I'll save my usual rant and focus on a main issue you already mentioned: hybrid clouds. If someone stands up a large public cloud, lets say dozens of zones, and customers are allowed to connect their private cloud to their account (possibly thousands of zones), do folks expect to use a central db? If so, please explain in detail with how this will work focusing on scalability and security. I propose we stick with the original proposal of each zone having it's own DB and ability for active caching for zones that need it (aggregate zones). We should be doing active caching so we don't have staleness issues that Ed mentions. All records should be timestamped (and indexed) so parent zones can efficiently ask for "all updates since X" if they need to resync. Child zones will push updates to any subscribed parent zones which can keep a list that should hardly ever be out of sync (for listing/pagination/etc.). We should batch updates between each zone level to ensure efficient data flow. -Eric On Wed, Mar 16, 2011 at 04:45:46PM +0000, Ed Leafe wrote: > On Mar 16, 2011, at 12:23 PM, Paul Voccio wrote: > > > Not only is this expensive, but there is no way I can see at the moment to > > do pagination, which is what makes this really expensive. If someone asked > > for an entire list of all their instances and it was > 10,000 then I would > > think they're ok with waiting while that response is gathered and returned. > > However, since the API spec says we should be able to do pagination, this > > is where asking each zone for all its children every time gets untenable. > > This gets us into the caching issues that were discussed at the last > summit. We could run the query and then cache the results at the endpoint, > but this would require accepting some level of staleness of the results. The > cache would handle the paging, and some sort of TTL would have to be > established as a balance between performance and staleness. > > > > -- Ed Leafe > > > _______________________________________________ > Mailing list: https://launchpad.net/~openstack > Post to : openstack@lists.launchpad.net > Unsubscribe : https://launchpad.net/~openstack > More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp