Hi Rich, Thanks for contributing!
Regarding the 412, you're right, I mainly went by the 412 being returned if a database exists and we try to create one [1]. The precondition there is in the start line of the http request, just like `...?limit=...` would be. However a 400 would work better in general so I agree. In principle I do like the bookmarks better but it also seems like a large change to the API so am on the fence there. Also interestingly it would almost be easier to change _list_dbs, _dbs_info and other endpoints which return plain arrays to return {"total": ..., "items": [...], ...} objects since breakage would be really obvious, in other words old client will fail right away as opposed returning misleading data. Cheers, -Nick [1] https://docs.couchdb.org/en/stable/api/database/common.html#put--db On Thu, Apr 9, 2020 at 12:10 PM Richard Ellis <ricel...@uk.ibm.com> wrote: > > HI Nick, > > I think that if client side code is expected to make multiple requests > then those requests should be made as easy as possible. So whilst a client > library can implement complicated pagination recipes (like the current > Couch view one) - it is much simpler to collect and send a single > bookmark/token. Especially so if the naming and structural position of the > bookmark in requests and responses is consistent across all endpoints > supporting pagination such that the client side code for pagination is > easily reusable. I'm in favour of anything supporting pagination to > provide a bookmark/token based system. > > Also if there are limits applied to limits then I'd expect that anything > out of that accepted range would be a 400 bad request - IIUC 412 > Precondition failed has specific meaning relating to matching headers > https://tools.ietf.org/html/rfc2616#section-10.4.13 which I don't think > apply in this case. > > Rich > > > > From: Nick Vatamaniuc <vatam...@gmail.com> > To: dev@couchdb.apache.org > Date: 09/04/2020 00:25 > Subject: [EXTERNAL] Re: [DISCUSS] Streaming API in CouchDB 4.0 > > > > Thanks for replying, Adam! > > Thinking about it some more, it seems there are two benefits > to changing the streaming APIs: > > 1) To provide users with a serializable snapshot. We don't currently > have that, as Mike pointed out, unless we use n=1&q=1 or CouchDB > version 1.x. It would be nice to get that with a new release. > > 2) To avoid the general anti-pattern of streaming all the data in one > single request or using a very large skip or limit values. > > However, I think the two improvements are not necessarily tied to each > other. For example, we could set configurable mandatory max limits > (option E) for all the streaming endpoints in 3.x as well. On the > other hand, even with a single transaction we could stream say 150k > rows in 5 seconds. If at some future point FDB would allow minute long > transactions, we could stream millions of rows before timing out and > it would still not be a desirable pattern of usage. This is also > basically option F in a read-only case (we can emit a snapshot as long > as there are no writes to that particular db), and I think we agree > that it is not that appealing of an option. > > What do we think then about having per-endpoint configurable max > limits (option E). The configuration could look something like: > > [request_limits] > all_docs = 5000 > views = 2500 > list_dbs = 1000 > dbs_info = 500 > > If those limits are set, and a request is made against an endpoint > without the limit parameter, or a limit or skip is provided but is > greater than the maximum, it would return back immediately with an > error (412) and an indication of what the a max limit value is. > > And I agree that client libraries are important in helping here. > So for example, Cloudant client libraries could detect that error and > either return it to the user, or, as a compatibility mode, use a > few consecutive requests behind the scenes to stream all the data back > to the user as requested without the users application code needing > any updates at all. > > If those limits are not set, the API would behave as it does now. This > would provide a smoother upgrade path from 3.x to 4.x so users > wouldn't have to rewrite their applications. > > I still think the bookmarking approach is interesting, but I think it > might work better for a new API endpoint or enabled with an explicit > parameter. I can see a case where users might be using Python requests > library to fetch _all_docs, assuming it gets all the data, then after > the upgrade same API endpoint suddenly returns only a fraction of the > rows. There might be another "bookmark" field there but it is buried > under a few application layers and gets ignored. They users just notice > the missing data at some point, and it could be perceived as data loss > in a sense. > > > how much value do we derive from that streaming behavior if we > aggressively limit the `limit`? > > Oh good point! That makes sense. We might be able to simplify > quite a bit of logic internally if we didn't actually stream the data. > We buffer thousands of doc updates for _bulk_docs already so perhaps > it is not that different doing it when reading data in these APIs as > well. > It is something that we'd have to experiment with and see how it would > behaves. > > -Nick > > On Wed, Apr 1, 2020 at 9:07 PM Adam Kocoloski <kocol...@apache.org> wrote: > > > > This is a really important topic; thanks Nick for bringing it up. Sorry > I didn’t comment earlier. I think Mike neatly captures my perspective with > this bit: > > > > >> Our current behaviour seems extremely subtle and, I'd argue, > unexpected. It is hard to reason about if you really need a particular > guarantee. > > >> > > >> Is there an opportunity to clarify behaviour here, such that we > really _do_ guarantee point-in-time within _any_ single request, but only > do this by leveraging FoundationDB's transaction isolation semantics and > as such are only able to offer this based on the 5s timeout in place? The > request boundary offers a very clear cut, user-visible boundary. This > would obviously need to cover reads/writes of single docs and so on as > well as probably needing further work w.r.t. bulk docs etc. > > >> > > >> This restriction may naturally loosen as FoundationDB improves and > the 5s timeout may be increased. > > > > It’d be great if we could agree on this use of serializable snapshot > isolation under the hood for each response to a CouchDB API request > (excepting _changes) as an optimal state. > > > > Of course, we have this complicating factor of an existing API and a > community of users running applications in production against that API :) > As you can imagine from the above, I’d be opposed to A); I think that > squanders a real opportunity that we have here with a new major version. I > also think that the return on investment for F) is too low; a large > portion of our production databases see a 24/7 write load so a code path > that only activates when a DB is quiesced doesn’t get my vote. > > > > When I look at the other options, I think it’s important to take a > broader view and consider the user experience in the client libraries as > well as the API. Our experience at IBM Cloud is that a large majority of > API requests come from a well-defined set of client libraries, and as we > consider non-trivial changes to the API we can look to those libraries as > a way to smooth over the API breakage, and intelligently surface new > capabilities even if the least-disruptive way to introduce them to the API > is a bit janky. > > > > As a concrete example, I would support an aggressive ceiling on `limit` > and `skip` in the 4.0 API, while enhancing popular client libraries as > needed to allow users to opt-in to automatic pagination through larger > result sets. > > > > Nick rightly points out that we don’t have a good way to declare a read > version timeout when we’ve already streamed a portion of the result set to > the client, which is something we ought consider even if we do apply the > restrictions in E). I acknowledge that I may be opening a can of worms, > but ... how much value do we derive from that streaming behavior if we > aggressively limit the `limit`? We wouldn’t be holding that much data in > memory on the CouchDB side, and I don’t think many of our clients are > parsing half-completed JSON objects for anything beyond the _changes feed. > Something to think about. > > > > Cheers, Adam > > > > > On Feb 25, 2020, at 2:52 PM, Nick Vatamaniuc <vatam...@gmail.com> > wrote: > > > > > > Hi Mike, > > > > > > Good point about CouchDB not actually providing point-in-time > > > snapshots. I missed those cases when thinking about it. > > > > > > I wonder if that points to defaulting to option A since it maintains > > > the API compatibility and doesn't loosen the current constraints > > > anyway. At least it will un-break the current version of the branch > > > until we figure out something better. Otherwise it's completely > > > unusable for dbs with more than 200-300k documents. > > > > > > I like the idea of returning a bookmark and a completed/not-completed > > > flag. That is, it would be option D for _all_docs and map-reduce > > > views, but instead of the complex continuation object it would be a > > > base64-encoded, opaque object. Passing a bookmark back in as a > > > parameter would be exclusive to passing in a start, end, skip, limit, > > > and direction (forward/reverse) parameters. For _all_dbs, and > > > _dbs_info where we don't have a place for metadata rows, we might need > > > a new API endpoint. And maybe that opens the door to expose more > > > transactional features in the API in general... > > > > > > Also, it seems B, C and F have too many corner cases and > > > inconsistencies so they can probably be discarded, unless someone > > > disagrees. > > > > > > Configurable skips and limit maximums (E) may still be interesting. > > > Though, they don't necessarily have to be related to transactions, but > > > can instead be used to ensure streaming APIs are consumed in smaller > > > chunks. > > > > > > Cheers, > > > -Nick > > > > > > > > > > > > On Mon, Feb 24, 2020 at 7:26 AM Mike Rhodes <couc...@dx13.co.uk> > wrote: > > >> > > >> Nick, > > >> > > >> Thanks for thinking this through, it's certainly subtle and very > unclear what is a "good" solution :( > > >> > > >> I have a couple of thoughts, firstly about the guarantees we > currently offer and then wondering whether there is an opportunity to > improve our API by offering a single guarantee across all request types > rather than bifurcating guarantees. > > >> > > >> --- > > >> > > >> The first point is that, by my reasoning, CouchDB 2.x doesn't > actually don't offer a point-in-time guarantee of the following sort > currently. I read this as your saying Couch does offer this guarantee, > apologies if I'm misreading: > > >> > > >>> Document the API behavior change that it may > > >>> present a view of the data is never a point-in-time[4] snapshot of > the > > >>> DB. > > >> ... > > >>> [4] For example they have a constraint that documents "a" and "z" > > >>> cannot both be in the database at the same time. But when iterating > > >>> it's possible that "a" was there at the start. Then by the end, "a" > > >>> was removed and "z" added, so both "a" and "z" would appear in the > > >>> emitted stream. Note that FoundationDB has APIs which exhibit the > same > > >>> "relaxed" constrains: > > >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_api-2Dpython.html-23module-2Dfdb.locality&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=HzP2Pj3x5vl5LP656xQI81QM2YyZZuPN1HYbw_L7jzE&e= > > > >> > > >> I don't believe we offer this guarantee because different database > shards will respond to the scatter-gather inherent to a single global > query type request at different times. This means that, given the > following sequence of events: > > >> > > >> (1) The shard containing "a" may start returning at time N. > > >> (2) "a" may be deleted at N+1, but (1) will still be streaming from > time N. > > >> (3) "z" may be written to a second shard at time N+2. > > >> (4) that second shard may not start returning until time N+3. > > >> > > >> By my reasoning, "a" and "z" could thus appear in the same result set > in current CouchDB, even if they never actually appear in the primary data > at the same time (regardless of latency of shard replicas coming into > agreement), voiding [4]. > > >> > > >> By my reckoning, you have point-in-time across a query request when > you are working with a single shard, meaning we do have point in time for > two scenarios: > > >> > > >> - Partitioned queries. > > >> - Q=1 databases. > > >> > > >> Albeit this guarantee is still talking about the point in time of a > single shard's replica rather than all replicas, meaning that further > requests may produce different results if the shards are not in agreement. > Which can perhaps be fixed by using stable=true. > > >> > > >> I _think_ the working here is correct, but I'd welcome corrections in > my understanding! > > >> > > >> --- > > >> > > >> Our current behaviour seems extremely subtle and, I'd argue, > unexpected. It is hard to reason about if you really need a particular > guarantee. > > >> > > >> Is there an opportunity to clarify behaviour here, such that we > really _do_ guarantee point-in-time within _any_ single request, but only > do this by leveraging FoundationDB's transaction isolation semantics and > as such are only able to offer this based on the 5s timeout in place? The > request boundary offers a very clear cut, user-visible boundary. This > would obviously need to cover reads/writes of single docs and so on as > well as probably needing further work w.r.t. bulk docs etc. > > >> > > >> This restriction may naturally loosen as FoundationDB improves and > the 5s timeout may be increased. > > >> > > >> In this approach, my preference would be to add a closing line to the > result stream to contain both a bookmark (based on the FoundationDB key > perhaps rather than the index key of itself to avoid problems with > skip/limit?) and a complete/not-complete boolean to enable clients to > avoid the extra HTTP round-trip for completed result sets that Nick > mentions. > > >> > > >> --- > > >> > > >> For option (F), I feel that the "it sometimes works and sometimes > doesn't" effect of checking the update-seq to see if we can continue > streaming will be a confusing experience. I also find something similar > with option (A) where a single request covers potentially many points in > time and so feels hard to reason about, although it's a bit less subtle > than today. > > >> > > >> Footnote [2] seems quite a major problem, however, with the single > transaction approach and as Nick says, it is hard to pick "good" maximums > for skip -- perhaps users need to just avoid use of these in the new > system given its behaviour? It feels like there's a definite "against the > grain" aspect to these. > > >> > > >> -- > > >> Mike. > > >> > > >> On Wed, 19 Feb 2020, at 22:39, Nick Vatamaniuc wrote: > > >>> Hello everyone, > > >>> > > >>> I'd like to discuss the shape and behavior of streaming APIs for > CouchDB 4.x > > >>> > > >>> By "streaming APIs" I mean APIs which stream data in row as it gets > > >>> read from the database. These are the endpoints I was thinking of: > > >>> > > >>> _all_docs, _all_dbs, _dbs_info and query results > > >>> > > >>> I want to focus on what happens when FoundationDB transactions > > >>> time-out after 5 seconds. Currently, all those APIs except > _changes[1] > > >>> feeds, will crash or freeze. The reason is because the > > >>> transaction_too_old error at the end of 5 seconds is retry-able by > > >>> default, so the request handlers run again and end up shoving the > > >>> whole request down the socket again, headers and all, which is > > >>> obviously broken and not what we want. > > >>> > > >>> There are few alternatives discussed in couchdb-dev channel. I'll > > >>> present some behaviors but feel free to add more. Some ideas might > > >>> have been discounted on the IRC discussion already but I'll present > > >>> them anyway in case is sparks further conversation: > > >>> > > >>> A) Do what _changes[1] feeds do. Start a new transaction and > continue > > >>> streaming the data from the next key after last emitted in the > > >>> previous transaction. Document the API behavior change that it may > > >>> present a view of the data is never a point-in-time[4] snapshot of > the > > >>> DB. > > >>> > > >>> - Keeps the API shape the same as CouchDB <4.0. Client libraries > > >>> don't have to change to continue using these CouchDB 4.0 endpoints > > >>> - This is the easiest to implement since it would re-use the > > >>> implementation for _changes feed (an extra option passed to the fold > > >>> function). > > >>> - Breaks API behavior if users relied on having a point-in-time[4] > > >>> snapshot view of the data. > > >>> > > >>> B) Simply end the stream. Let the users pass a `?transaction=true` > > >>> param which indicates they are aware the stream may end early and so > > >>> would have to paginate from the last emitted key with a skip=1. This > > >>> will keep the request bodies the same as current CouchDB. However, > if > > >>> the users got all the data one request, they will end up wasting > > >>> another request to see if there is more data available. If they > didn't > > >>> get any data they might have a too large of a skip value (see [2]) > so > > >>> would have to guess different values for start/end keys. Or impose > max > > >>> limit for the `skip` parameter. > > >>> > > >>> C) End the stream and add a final metadata row like a "transaction": > > >>> "timeout" at the end. That will let the user know to keep paginating > > >>> from the last key onward. This won't work for `_all_dbs` and > > >>> `_dbs_info`[3] Maybe let those two endpoints behave like _changes > > >>> feeds and only use this for views and and _all_docs? If we like this > > >>> choice, let's think what happens for those as I couldn't come up > with > > >>> anything decent there. > > >>> > > >>> D) Same as C but to solve the issue with skips[2], emit a bookmark > > >>> "key" of where the iteration stopped and the current "skip" and > > >>> "limit" params, which would keep decreasing. Then user would pass > > >>> those in "start_key=..." in the next request along with the limit > and > > >>> skip params. So something like "continuation":{"skip":599, > "limit":5, > > >>> "key":"..."}. This has the same issue with array results for > > >>> `_all_dbs` and `_dbs_info`[3]. > > >>> > > >>> E) Enforce low `limit` and `skip` parameters. Enforce maximum values > > >>> there such that response time is likely to fit in one transaction. > > >>> This could be tricky as different runtime environments will have > > >>> different characteristics. Also, if the timeout happens there isn't > a > > >>> a nice way to send an HTTP error since we already sent the 200 > > >>> response. The downside is that this might break how some users use > the > > >>> API, if say the are using large skips and limits already. Perhaps > here > > >>> we do both B and D, such that if users want transactional behavior, > > >>> they specify that `transaction=true` param and only then we enforce > > >>> low limit and skip maximums. > > >>> > > >>> F) At least for `_all_docs` it seems providing a point-in-time > > >>> snapshot view doesn't necessarily need to be tied to transaction > > >>> boundaries. We could check the update sequence of the database at > the > > >>> start of the next transaction and if it hasn't changed we can > continue > > >>> emitting a consistent view. This can apply to C and D and would just > > >>> determine when the stream ends. If there are no writes happening to > > >>> the db, this could potential streams all the data just like option A > > >>> would do. Not entirely sure if this would work for views. > > >>> > > >>> So what do we think? I can see different combinations of options > here, > > >>> maybe even different for each API point. For example `_all_dbs`, > > >>> `_dbs_info` are always A, and `_all_docs` and views default to A but > > >>> have parameters to do F, etc. > > >>> > > >>> Cheers, > > >>> -Nick > > >>> > > >>> Some footnotes: > > >>> > > >>> [1] _changes feeds is the only one that works currently. It behaves > as > > >>> per RFC > > >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_couchdb-2Ddocumentation_blob_master_rfcs_003-2Dfdb-2Dseq-2Dindex.md-23access-2Dpatterns&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=7Khj1Mm0BvTQkebpPs4O7kE2moJMmmGjV7_icfRo_q8&e= > . > > >>> That is, we continue streaming the data by resetting the transaction > > >>> object and restarting from the last emitted key (db sequence in this > > >>> case). However, because the transaction restarts if a document is > > >>> updated while the streaming take place, it may appear in the > _changes > > >>> feed twice. That's a behavior difference from CouchDB < 4.0 and we'd > > >>> have to document it, since previously we presented this > point-in-time > > >>> snapshot of the database from when we started streaming. > > >>> > > >>> [2] Our streaming APIs have both skips and limits. Since FDB doesn't > > >>> currently support efficient offsets for key selectors > > >>> ( > https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_known-2Dlimitations.html-23dont-2Duse-2Dkey-2Dselectors-2Dfor-2Dpaging&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=iJJQDYHRGZ6PQ_FF3sy9nBSygRYkEh3cRPMFYg2Tkq8&e= > ) > > >>> we implemented skip by iterating over the data. This means that a > skip > > >>> of say 100000 could keep timing out the transaction without yielding > > >>> any data. > > >>> > > >>> [3] _all_dbs and _dbs_info return a JSON array so they don't have an > > >>> obvious place to insert a last metadata row. > > >>> > > >>> [4] For example they have a constraint that documents "a" and "z" > > >>> cannot both be in the database at the same time. But when iterating > > >>> it's possible that "a" was there at the start. Then by the end, "a" > > >>> was removed and "z" added, so both "a" and "z" would appear in the > > >>> emitted stream. Note that FoundationDB has APIs which exhibit the > same > > >>> "relaxed" constrains: > > >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__apple.github.io_foundationdb_api-2Dpython.html-23module-2Dfdb.locality&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=CDCq0vbFWjQXx1sCFm2-iYoMZQ4i0QQj2XmPZmLvZp0&m=2FAZh1XhUm-BtJzY8Lca-ZC-76K6zLu2Q7zpFojCGk8&s=HzP2Pj3x5vl5LP656xQI81QM2YyZZuPN1HYbw_L7jzE&e= > > > >>> > > > > > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >