Yeah, not a bad idea. An extra query arg (akin to open_revs=all, conflicts=true, etc) would avoid compatibility breaks and would clearly put the onus on those supplying it to tolerate the presence of the extra reserved field.
+1 > On 2 Apr 2015, at 10:32, Benjamin Bastian <bbast...@apache.org> wrote: > > What about adding an optional query parameter to indicate whether or not > Couch should include the _r_met flag in the document body/bodies > (defaulting to false)? That wouldn't break older clients and it'd work for > the bulk API as well. As far as the case where there are conflicts, it > seems like the most intuitive thing would be for the "r" in "_r_met" to > have the same semantic meaning as the "r" in "?r=" (i.e. "?r=" means "wait > for r copies of the same doc rev until a timeout" and "_r_met" would mean > "we got/didn't get r copies of the same doc rev within the timeout"). > > Just my two cents. > > On Thu, Apr 2, 2015 at 1:22 AM, Robert Samuel Newson <rnew...@apache.org> > wrote: > >> >> Paul outlined his previous efforts to introduce this indication, and the >> problems he faced doing so. Can we come up with an acceptable mechanism? >> >> A different status code will break a lot of users. While the http spec >> says you can treat any 2xx code as success, plenty of libraries, etc, only >> recognise 201 / 202 as successful write and 200 (and maybe 204, 206) for >> reads. >> >> My preference is for a change that "can’t" break anyone, which I think >> only leaves an "X-CouchDB-R-Met: 2" response header, which isn’t the most >> pleasant thing. >> >> Suggestions? >> >> B. >> >> >>> On 1 Apr 2015, at 06:55, Mutton, James <jmut...@akamai.com> wrote: >>> >>> For at least my part of it, I agree with Adam. Bigcouch has made an >> effort to inform in the case of a failure to apply W. I've seen it lead to >> confusion when the same logic was not applied on R. >>> >>> I also agree that W and R are not binding contracts. There's no >> agreement protocol to assure that W is met before being committed to disk. >> But they are exposed as a blocking parameter of the request, so >> notification being consistent appeared to me to be the best compromise (vs >> straight up removal). >>> >>> </JamesM> >>> >>> >>>> On Mar 31, 2015, at 13:15, Robert Newson <rnew...@apache.org> wrote: >>>> >>>> >>>> If a way can be found that doesn't break things that can be sent in all >> or most cases, sure. It's what a user can really infer from that which I >> focused on. Not as much, I think, as users that want that info really want. >>>> >>>> >>>>> On 31 Mar 2015, at 21:08, Adam Kocoloski <kocol...@apache.org> wrote: >>>>> >>>>> I hope we can all agree that CouchDB should inform the user when it is >> unable to satisfy the requested read "quorum". >>>>> >>>>> Adam >>>>> >>>>>> On Mar 31, 2015, at 3:20 PM, Paul Davis <paul.joseph.da...@gmail.com> >> wrote: >>>>>> >>>>>> Sounds like there's a bit of confusion here. >>>>>> >>>>>> What Nathan is asking for is the ability to have Couch respond with >> some >>>>>> information on the actual number of replicas that responded to a read >>>>>> request. That way a user could tell that they issued an r=2 request >> when >>>>>> only r=1 was actually performed. Depending on your point of view in >> an MVCC >>>>>> world this is either a bug or a feature. :) >>>>>> >>>>>> It was generally agreed upon that if we could return this information >> it >>>>>> would be beneficial. Although what happened when I started >> implementing >>>>>> this patch was that we are either only able to return it in a subset >> of >>>>>> cases where it happens, return it inconsistently between various >> responses, >>>>>> or break replication. >>>>>> >>>>>> The three general methods for this would be to either include a new >>>>>> "_r_met" key in the doc body that would be a boolean indicating if the >>>>>> requested read quorum was actually met for the document. The second >> was to >>>>>> return a custom X-R-Met type header, and lastly was the status code as >>>>>> described. >>>>>> >>>>>> The _r_met member was thought to be the best, but unfortunately that >> breaks >>>>>> replication with older clients because we throw an error rather than >> ignore >>>>>> any unknown underscore prefixed field name. Thus having something >> that was >>>>>> just dynamically injected into the document body was a non-starter. >>>>>> Unfortunately, if we don't inject into the document body then we limit >>>>>> ourselves to only the set of APIs where a single document is >> returned. This >>>>>> is due to both streaming semantics (we can't buffer an entire >> response in >>>>>> memory for large requests to _all_docs) as well as multi-doc >> responses (a >>>>>> single boolean doesn't say which document may have not had a properly >> met >>>>>> R). >>>>>> >>>>>> On top of that, the other confusing part of meeting the read quorum >> is that >>>>>> given MVCC semantics it becomes a bit confusing on how you respond to >>>>>> documents with different revision histories. For instance, if we read >> two >>>>>> docs, we have technically made the r=2 requirement, but what should >> our >>>>>> response be if those two revisions are different (technically, in >> this case >>>>>> we wait for the third response, but the decision on what to return >> for the >>>>>> "r met" value is still unclear). >>>>>> >>>>>> While I think everyone is in agreement that it'd be nice to return >> some of >>>>>> the information about the copies read, I think its much less clear >> what and >>>>>> how it should be returned in the multitude of cases that we can >> specify an >>>>>> value for R. >>>>>> >>>>>> While that doesn't offer a concrete path forward, hopefully it >> clarifies >>>>>> some of the issues at hand. >>>>>> >>>>>> On Tue, Mar 31, 2015 at 1:47 PM, Robert Samuel Newson < >> rnew...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> It’s testament to my friendship with Mike that we can disagree on >> such >>>>>>> things and remain friends. I am sorry he misled you, though. >>>>>>> >>>>>>> CouchDB 2.0 (like Cloudant) does not have read or write quorums at >> all, at >>>>>>> least in the formal sense, the only one that matters, this is >> unfortunately >>>>>>> sloppy language in too many places to correct. >>>>>>> >>>>>>> The r= and w= parameters control only how many of the n possible >> responses >>>>>>> are collected before returning an http response. >>>>>>> >>>>>>> It’s not true that returning 202 in the situation where one write is >> made >>>>>>> but fewer than 'r' writes are made means we’ve chosen availability >> over >>>>>>> consistency since even if we returned a 500 or closed the connection >>>>>>> without responding, a subsequent GET could return the document (a >>>>>>> probability that increases over time as anti-entropy makes the >> missing >>>>>>> copies). A write attempt that returned a 409 could, likewise, >> introduce a >>>>>>> new edit branch into the document, which might then 'win', altering >> the >>>>>>> results of a subsequent GET. >>>>>>> >>>>>>> The essential thing to remember is this: the ’n’ copies of your data >> are >>>>>>> completely independent when written/read by the clustered layer >> (fabric). >>>>>>> It is internal replication (anti-entropy) that converges those >> copies, >>>>>>> pair-wise, to the same eventual state. Fabric is converting the 3 >>>>>>> independent results into a single result as best it can. Older >> versions did >>>>>>> not expose the 201 vs 202 distinction, calling both of them 201. I >> do agree >>>>>>> with you that there’s little value in the 202 distinction. About the >> only >>>>>>> thing you could do is investigate your cluster for connectivity >> issues or >>>>>>> overloading if you get a sustained period of 202’s, as it would be an >>>>>>> indicator that the system is partitioned. >>>>>>> >>>>>>> In order to achieve your goals, CouchDB 2.0 would have to ensure >> that the >>>>>>> result of a write did not change after the fact. That is, >> anti-entropy >>>>>>> would need to be disabled, or somehow agree to roll forward or >> backward >>>>>>> based on the initial circumstances. In short, we’d have to introduce >> strong >>>>>>> consistency (paxos or raft or zab, say). While this would be a great >>>>>>> feature to add, it’s not currently present, and no amount of >> twiddling the >>>>>>> status codes will achieve it. We’d rather be honest about our >> position on >>>>>>> the CAP triangle. >>>>>>> >>>>>>> B. >>>>>>> >>>>>>> >>>>>>>>> On 30 Mar 2015, at 22:37, Nathan Vander Wilt < >> nate-li...@calftrail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> A technical co-founder of Cloudant agreed that this was a bug when I >>>>>>> first hit it a few years ago. I found back the original thread here >> — this >>>>>>> is the discussion I was trying to recall in my OP: >>>>>>>> It sounds like perhaps there is a related issue tracked internally >> at >>>>>>> Cloudant as a result of that conversation. >>>>>>>> >>>>>>>> JamesM, thanks for your support here and tracking this down. 203 >> seemed >>>>>>> like the best status code to "steal" for this to me too. Best wishes >> in >>>>>>> getting this fixed! >>>>>>>> >>>>>>>> regards, >>>>>>>> -natevw >>>>>>>> >>>>>>>> >>>>>>>>> On Mar 25, 2015, at 4:49 AM, Robert Newson <rnew...@apache.org> >> wrote: >>>>>>>>> >>>>>>>>> 2.0 is explicitly an AP system, the behaviour you describe is not >>>>>>> classified as a bug. >>>>>>>>> >>>>>>>>> Anti-entropy is the main reason that you cannot get strong >> consistency >>>>>>> from the system, it will transform "failed" writes (those that >> succeeded on >>>>>>> one node but fewer than R nodes) into success (N copies) as long as >> the >>>>>>> nodes have enough healthy uptime. >>>>>>>>> >>>>>>>>> True of cloudant and 2.0. >>>>>>>>> >>>>>>>>> Sent from my iPhone >>>>>>>>> >>>>>>>>>> On 24 Mar 2015, at 15:14, Mutton, James <jmut...@akamai.com> >> wrote: >>>>>>>>>> >>>>>>>>>> Funny you should mention it. I drafted an email in early >> February to >>>>>>> queue up the same discussion whenever I could get involved again >> (which I >>>>>>> promptly forgot about). What happens currently in 2.0 appears >> unchanged >>>>>>> from earlier versions. When R is not satisfied in fabric, >>>>>>> fabric_doc_open:handle_message eventually responds with a {stop, …} >> but >>>>>>> leaves the acc-state as the original r_not_met which triggers a >> read_repair >>>>>>> from the response handler. read_repair results in an {ok, …} with >> the only >>>>>>> doc available, because no other docs are in the list. The final doc >>>>>>> returned to chttpd_db:couch_doc_open and thusly to >> chttpd_db:db_doc_req is >>>>>>> simply {ok, Doc}, which has now lost the fact that the answer was not >>>>>>> complete. >>>>>>>>>> >>>>>>>>>> This seems straightforward to fix by a change in >>>>>>> fabric_open_doc:handle_response and read_repair. handle_response >> knows >>>>>>> whether it has R met and could pass that forward, or allow >> read-repair to >>>>>>> pass it forward if read_repair is able to satisfy acc.r. I can’t >> speak for >>>>>>> community interest in the behavior of sending a 202, but it’s >> something I’d >>>>>>> definitely like for the same reasons you cite. Plus it just seems >>>>>>> disconnected to do it on writes but not reads. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> </JamesM> >>>>>>>>>> >>>>>>>>>>> On Mar 24, 2015, at 14:06, Nathan Vander Wilt < >>>>>>> nate-li...@calftrail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Sorry, I have not been following CouchDB 2.0 roadmap but I was >>>>>>> extending my fermata-couchdb plugin today and realized that perhaps >> the >>>>>>> Apache release of BigCouch as CouchDB 2.0 might provide an >> opportunity to >>>>>>> fix a serious issue I had using Cloudant's implementation. >>>>>>>>>>> >>>>>>>>>>> See >>>>>>> https://github.com/cloudant/bigcouch/issues/55#issuecomment-30186518 >> for >>>>>>> some additional background/explanation, but my understanding is that >>>>>>> Cloudant for all practical purposes ignores the read durability >> parameter. >>>>>>> So you can write with ?w=N to attempt some level of quorum, and get >> a 202 >>>>>>> back if that quorum is unment. _However_ when you ?r=N it really >> doesn't >>>>>>> matter if only <N nodes are available…if even just a single >> available node >>>>>>> has some version of the requested document you will get a successful >>>>>>> response (!). >>>>>>>>>>> >>>>>>>>>>> So in practice, there's no way to actually use the quasi-Dynamo >>>>>>> features to dynamically _choose_ between consistency or availability >> — when >>>>>>> it comes time to read back a consistent result, BigCouch instead just >>>>>>> always gives you availability* regardless of what a given request >> actually >>>>>>> needs. (In my usage I ended up treating a 202 write as a 500, rather >> than >>>>>>> proceeding with no way of ever knowing whether a write did NOT >> ACTUALLY >>>>>>> conflict or just hadn't YET because $who_knows_how_many nodes were >> still >>>>>>> down…) >>>>>>>>>>> >>>>>>>>>>> IIRC, this was both confirmed and acknowledged as a serious bug >> by a >>>>>>> Cloudant engineer (or support personnel at least) but could not be >> quickly >>>>>>> fixed as it could introduce backwards-compatibility concerns. So… >>>>>>>>>>> >>>>>>>>>>> Is CouchDB 2.0 already breaking backwards compatibility with >>>>>>> BigCouch? If true, could this read durability issue now be fixed >> during the >>>>>>> merge? >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> -natevw >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> * DISCLAIMER: this statement has not been endorsed by actual >> uptime >>>>>>> of *any* Couch fork… >>>>> >> >>