Re: Could CouchDB 2.0 fix actual read quorum?

Robert Samuel Newson Sat, 04 Apr 2015 03:18:08 -0700

I’ve made branch 2655-r-met2 in fabric which will indicate the consistency of 
the response. I’ve kept the is_r_met and r_met names for now, but if this is 
the right direction we will want to change that.


When fabric returns r_met:"consistent" it means complete agreement among all R 
responses
When fabric returns r_met:"divergent" it means we saw more than one distinct 
revision from the R responses but all divergent copies are ancestors (i.e, 
they’re missing an update rather being an alternate branch)
When fabric returns r_met:"disagreement" we saw truly divergent responses. 
Fabric blocks for the repair, so the response is "healed", but nevertheless it 
indicates an issue like a recent partition not yet fully healed by anti-entropy.

Obviously these names are terrible and we’ll need to brainstorm on those, but 
let’s first establish if this is the right kind of metadata.

B.


> On 4 Apr 2015, at 10:41, Robert Samuel Newson <rnew...@apache.org> wrote:
> 
> 
> Ok, most of those make sense to me (I think the last two, and particularly 
> the last one, are confounded by the fact couch will initiate read repair if 
> it sees a lack of convergence, i.e, R to N* different revisions, and will 
> perform the usual arbitrary-but-consistent winner algorithm right there).
> 
> So, what we want is not really r_met in the sense that fabric means it; which 
> is the minimum number of responses to wait for before returning, regardless 
> of whether they are the same revision or not.
> 
> It’s as you said, did we see at least R responses with the same revision? 
> Would we want additional nuance like whether the responses were so 
> inconsistent that we ran read repair? This would distinguish the case where 
> there are simply fewer than R responses (for nodes down / slow / partitioned) 
> that are returning the same revision versus the case where all R to N* 
> responses return different revisions.
> 
> I’ll see how easy it is to return the first value while we ponder the other 
> question.
> 
> * I say "R to N" to mean fabric will wait for at least R responses (or 
> timeout) but up to N responses (or timeout) if the responses vary.
> 
> B.
> 
>> On 4 Apr 2015, at 02:08, Mutton, James <jmut...@akamai.com> wrote:
>> 
>> * Report the number of r_met failed conditions to a statistical aggregator 
>> for alerting or trending on client-visible behavior.
>> * Pause some operation for a time if possible, retry later.
>> * Possibly re-resolve and use another cluster that is more healthy or less 
>> loaded
>> * Indicate some hidden failure or bug in how shards got moved 
>> around/restored from down nodes
>> 
>> </JamesM>
>> 
>> On Apr 3, 2015, at 17:27, Robert Samuel Newson <rnew...@apache.org> wrote:
>> 
>>> 
>>> I’ve pushed an update to the fabric branch which accounts for when the r= 
>>> value is higher than the number of replicas (so that it returns r_met:false)
>>> 
>>> Changing this so that r_met is true only if R matching revisions are seen 
>>> doesn’t sound too difficult.
>>> 
>>> Where I struggle is seeing what a client can usefully do with this 
>>> information. When you receive the r_met:false indication, however we end up 
>>> conveying it, what will you do? Retry until r_met:true?
>>> 
>>> B.
>>> 
>>>> On 4 Apr 2015, at 00:55, Mutton, James <jmut...@akamai.com> wrote:
>>>> 
>>>> Based on Paul’s description it sounds like we may need to decide 3 things 
>>>> to close this out:
>>>> * What does satisfying R mean?
>>>> * What is the appropriate scope of when R is applied?
>>>> * How do we most appropriately convey the lack of R?
>>>> 
>>>> I’m basing my opinions of R on W.  W is satisfied when a write succeeds to 
>>>> W nodes.  For behavior to be consistent between R and W, R should be 
>>>> considered to be met when R “matching” results have been found, if we 
>>>> treat “matching” == “successful”.  I believe this to be a more-correct 
>>>> interpretation of R-W consistency then treating R-satisfied as 
>>>> “found-but-not-matching” since it matches the complete positive of W's 
>>>> “successfully-written”.  For scope, W acts for both current versions and 
>>>> historical revision updates (e.g. resolving conflicts).  W also functions 
>>>> in bulk operations so R should function in multi-key requests as well if 
>>>> it’s to be consistent.
>>>> 
>>>> The last question is how to appropriately convey lack of R.  I tested 
>>>> these branches to see that the _r_met was present, that worked.  I also 
>>>> made some quick modifications to return a 203 to see how some clients 
>>>> behaved.  Here are my test results: 
>>>> https://gist.github.com/jamutton/c823fdac328777e22646
>>>> 
>>>> I tested a few clients including an old version of couchdbkit and all 
>>>> worked while the server was returning a 203 and/or the meta-field.  A 
>>>> quick test-with replication was mixed.  I did a replicate into a couchdb 
>>>> 1.6 machine and although I did see some errors, replication succeeded (the 
>>>> errors were related to checkpointing the target and my 1.6 could have been 
>>>> messed up).  All that to say that where I tested it, returning a 203 on R 
>>>> was accepted behavior by clients, just as returning a 202 on W.  By no 
>>>> means is that extensive but at least indicative.  So, I think both 
>>>> approaches, field and status-code, are possible for single key requests 
>>>> (more on that in a second) and whether it’s status or field, I favor at 
>>>> least having consistency with W.  We could also have consistency by 
>>>> converting W’s 202 to a to be an in-document meta field like _w_met and 
>>>> only present when ?is_w_met=true is present on the query string.  That 
>>>> feels more drastic.
>>>> 
>>>> So the last issue is for the bulk/multi-doc responses.  Here the entire 
>>>> approach of reads and writes diverges.  Writes are still individual 
>>>> doc-updates, whereas reads of multi-docs are basically a “view” even if 
>>>> it’s all_docs.  IMHO, views could be called  out of scope for when R is 
>>>> Applied.  It doesn’t even descend into doc_open to apply R unless “keys” 
>>>> are specified and normal views without include_docs would do the same 
>>>> IIRC.  This approach of calling all views out of scope because they could 
>>>> only even be in scope under certain circumstances, leaves the door open 
>>>> still for either a status-code or field (and again, if using a field it 
>>>> would be more consistent API behavior to switch W to behave the same)
>>>> 
>>>> Cheers,
>>>> </JamesM>
>>>> 
>>>> On Apr 2, 2015, at 3:51, Robert Samuel Newson <rnew...@apache.org> wrote:
>>>> 
>>>>> To move this along I have COUCHDB-2655 and three branches with a working 
>>>>> solution;
>>>>> 
>>>>> https://git-wip-us.apache.org/repos/asf?p=couchdb-chttpd.git;h=b408ce5
>>>>> https://git-wip-us.apache.org/repos/asf?p=couchdb-couch.git;h=7d811d3
>>>>> https://git-wip-us.apache.org/repos/asf?p=couchdb-fabric.git;h=90e9691
>>>>> 
>>>>> All three branches are called 2655-r-met if you want to try this locally 
>>>>> (and please do!)
>>>>> 
>>>>> Sample output;
>>>>> 
>>>>> curl -v 'foo:bar@localhost:15984/db1/doc1?is_r_met=true'
>>>>> 
>>>>> {"_id":"doc1","_rev":"1-967a00dff5e02add41819138abb3284d","_r_met":true}
>>>>> 
>>>>> By making it opt-in, I think we avoid all the collateral damage that Paul 
>>>>> was concerned about.
>>>>> 
>>>>> B.
>>>>> 
>>>>> 
>>>>>> On 2 Apr 2015, at 10:36, Robert Samuel Newson <rnew...@apache.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> Yeah, not a bad idea. An extra query arg (akin to open_revs=all, 
>>>>>> conflicts=true, etc) would avoid compatibility breaks and would clearly 
>>>>>> put the onus on those supplying it to tolerate the presence of the extra 
>>>>>> reserved field.
>>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> 
>>>>>>> On 2 Apr 2015, at 10:32, Benjamin Bastian <bbast...@apache.org> wrote:
>>>>>>> 
>>>>>>> What about adding an optional query parameter to indicate whether or not
>>>>>>> Couch should include the _r_met flag in the document body/bodies
>>>>>>> (defaulting to false)? That wouldn't break older clients and it'd work 
>>>>>>> for
>>>>>>> the bulk API as well. As far as the case where there are conflicts, it
>>>>>>> seems like the most intuitive thing would be for the "r" in "_r_met" to
>>>>>>> have the same semantic meaning as the "r" in "?r=" (i.e. "?r=" means 
>>>>>>> "wait
>>>>>>> for r copies of the same doc rev until a timeout" and "_r_met" would 
>>>>>>> mean
>>>>>>> "we got/didn't get r copies of the same doc rev within the timeout").
>>>>>>> 
>>>>>>> Just my two cents.
>>>>>>> 
>>>>>>> On Thu, Apr 2, 2015 at 1:22 AM, Robert Samuel Newson 
>>>>>>> <rnew...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Paul outlined his previous efforts to introduce this indication, and 
>>>>>>>> the
>>>>>>>> problems he faced doing so. Can we come up with an acceptable 
>>>>>>>> mechanism?
>>>>>>>> 
>>>>>>>> A different status code will break a lot of users. While the http spec
>>>>>>>> says you can treat any 2xx code as success, plenty of libraries, etc, 
>>>>>>>> only
>>>>>>>> recognise 201 / 202 as successful write and 200 (and maybe 204, 206) 
>>>>>>>> for
>>>>>>>> reads.
>>>>>>>> 
>>>>>>>> My preference is for a change that "can’t" break anyone, which I think
>>>>>>>> only leaves an "X-CouchDB-R-Met: 2" response header, which isn’t the 
>>>>>>>> most
>>>>>>>> pleasant thing.
>>>>>>>> 
>>>>>>>> Suggestions?
>>>>>>>> 
>>>>>>>> B.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 1 Apr 2015, at 06:55, Mutton, James <jmut...@akamai.com> wrote:
>>>>>>>>> 
>>>>>>>>> For at least my part of it, I agree with Adam. Bigcouch has made an
>>>>>>>> effort to inform in the case of a failure to apply W. I've seen it 
>>>>>>>> lead to
>>>>>>>> confusion when the same logic was not applied on R.
>>>>>>>>> 
>>>>>>>>> I also agree that W and R are not binding contracts. There's no
>>>>>>>> agreement protocol to assure that W is met before being committed to 
>>>>>>>> disk.
>>>>>>>> But they are exposed as a blocking parameter of the request, so
>>>>>>>> notification being consistent appeared to me to be the best compromise 
>>>>>>>> (vs
>>>>>>>> straight up removal).
>>>>>>>>> 
>>>>>>>>> </JamesM>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Mar 31, 2015, at 13:15, Robert Newson <rnew...@apache.org> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> If a way can be found that doesn't break things that can be sent in 
>>>>>>>>>> all
>>>>>>>> or most cases, sure. It's what a user can really infer from that which 
>>>>>>>> I
>>>>>>>> focused on. Not as much, I think, as users that want that info really 
>>>>>>>> want.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 31 Mar 2015, at 21:08, Adam Kocoloski <kocol...@apache.org> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I hope we can all agree that CouchDB should inform the user when it 
>>>>>>>>>>> is
>>>>>>>> unable to satisfy the requested read "quorum".
>>>>>>>>>>> 
>>>>>>>>>>> Adam
>>>>>>>>>>> 
>>>>>>>>>>>> On Mar 31, 2015, at 3:20 PM, Paul Davis 
>>>>>>>>>>>> <paul.joseph.da...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Sounds like there's a bit of confusion here.
>>>>>>>>>>>> 
>>>>>>>>>>>> What Nathan is asking for is the ability to have Couch respond with
>>>>>>>> some
>>>>>>>>>>>> information on the actual number of replicas that responded to a 
>>>>>>>>>>>> read
>>>>>>>>>>>> request. That way a user could tell that they issued an r=2 request
>>>>>>>> when
>>>>>>>>>>>> only r=1 was actually performed. Depending on your point of view in
>>>>>>>> an MVCC
>>>>>>>>>>>> world this is either a bug or a feature. :)
>>>>>>>>>>>> 
>>>>>>>>>>>> It was generally agreed upon that if we could return this 
>>>>>>>>>>>> information
>>>>>>>> it
>>>>>>>>>>>> would be beneficial. Although what happened when I started
>>>>>>>> implementing
>>>>>>>>>>>> this patch was that we are either only able to return it in a 
>>>>>>>>>>>> subset
>>>>>>>> of
>>>>>>>>>>>> cases where it happens, return it inconsistently between various
>>>>>>>> responses,
>>>>>>>>>>>> or break replication.
>>>>>>>>>>>> 
>>>>>>>>>>>> The three general methods for this would be to either include a new
>>>>>>>>>>>> "_r_met" key in the doc body that would be a boolean indicating if 
>>>>>>>>>>>> the
>>>>>>>>>>>> requested read quorum was actually met for the document. The second
>>>>>>>> was to
>>>>>>>>>>>> return a custom X-R-Met type header, and lastly was the status 
>>>>>>>>>>>> code as
>>>>>>>>>>>> described.
>>>>>>>>>>>> 
>>>>>>>>>>>> The _r_met member was thought to be the best, but unfortunately 
>>>>>>>>>>>> that
>>>>>>>> breaks
>>>>>>>>>>>> replication with older clients because we throw an error rather 
>>>>>>>>>>>> than
>>>>>>>> ignore
>>>>>>>>>>>> any unknown underscore prefixed field name. Thus having something
>>>>>>>> that was
>>>>>>>>>>>> just dynamically injected into the document body was a non-starter.
>>>>>>>>>>>> Unfortunately, if we don't inject into the document body then we 
>>>>>>>>>>>> limit
>>>>>>>>>>>> ourselves to only the set of APIs where a single document is
>>>>>>>> returned. This
>>>>>>>>>>>> is due to both streaming semantics (we can't buffer an entire
>>>>>>>> response in
>>>>>>>>>>>> memory for large requests to _all_docs) as well as multi-doc
>>>>>>>> responses (a
>>>>>>>>>>>> single boolean doesn't say which document may have not had a 
>>>>>>>>>>>> properly
>>>>>>>> met
>>>>>>>>>>>> R).
>>>>>>>>>>>> 
>>>>>>>>>>>> On top of that, the other confusing part of meeting the read quorum
>>>>>>>> is that
>>>>>>>>>>>> given MVCC semantics it becomes a bit confusing on how you respond 
>>>>>>>>>>>> to
>>>>>>>>>>>> documents with different revision histories. For instance, if we 
>>>>>>>>>>>> read
>>>>>>>> two
>>>>>>>>>>>> docs, we have technically made the r=2 requirement, but what should
>>>>>>>> our
>>>>>>>>>>>> response be if those two revisions are different (technically, in
>>>>>>>> this case
>>>>>>>>>>>> we wait for the third response, but the decision on what to return
>>>>>>>> for the
>>>>>>>>>>>> "r met" value is still unclear).
>>>>>>>>>>>> 
>>>>>>>>>>>> While I think everyone is in agreement that it'd be nice to return
>>>>>>>> some of
>>>>>>>>>>>> the information about the copies read, I think its much less clear
>>>>>>>> what and
>>>>>>>>>>>> how it should be returned in the multitude of cases that we can
>>>>>>>> specify an
>>>>>>>>>>>> value for R.
>>>>>>>>>>>> 
>>>>>>>>>>>> While that doesn't offer a concrete path forward, hopefully it
>>>>>>>> clarifies
>>>>>>>>>>>> some of the issues at hand.
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Mar 31, 2015 at 1:47 PM, Robert Samuel Newson <
>>>>>>>> rnew...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It’s testament to my friendship with Mike that we can disagree on
>>>>>>>> such
>>>>>>>>>>>>> things and remain friends. I am sorry he misled you, though.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CouchDB 2.0 (like Cloudant) does not have read or write quorums at
>>>>>>>> all, at
>>>>>>>>>>>>> least in the formal sense, the only one that matters, this is
>>>>>>>> unfortunately
>>>>>>>>>>>>> sloppy language in too many places to correct.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The r= and w= parameters control only how many of the n possible
>>>>>>>> responses
>>>>>>>>>>>>> are collected before returning an http response.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It’s not true that returning 202 in the situation where one write 
>>>>>>>>>>>>> is
>>>>>>>> made
>>>>>>>>>>>>> but fewer than 'r' writes are made means we’ve chosen availability
>>>>>>>> over
>>>>>>>>>>>>> consistency since even if we returned a 500 or closed the 
>>>>>>>>>>>>> connection
>>>>>>>>>>>>> without responding, a subsequent GET could return the document (a
>>>>>>>>>>>>> probability that increases over time as anti-entropy makes the
>>>>>>>> missing
>>>>>>>>>>>>> copies). A write attempt that returned a 409 could, likewise,
>>>>>>>> introduce a
>>>>>>>>>>>>> new edit branch into the document, which might then 'win', 
>>>>>>>>>>>>> altering
>>>>>>>> the
>>>>>>>>>>>>> results of a subsequent GET.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The essential thing to remember is this: the ’n’ copies of your 
>>>>>>>>>>>>> data
>>>>>>>> are
>>>>>>>>>>>>> completely independent when written/read by the clustered layer
>>>>>>>> (fabric).
>>>>>>>>>>>>> It is internal replication (anti-entropy) that converges those
>>>>>>>> copies,
>>>>>>>>>>>>> pair-wise, to the same eventual state. Fabric is converting the 3
>>>>>>>>>>>>> independent results into a single result as best it can. Older
>>>>>>>> versions did
>>>>>>>>>>>>> not expose the 201 vs 202 distinction, calling both of them 201. I
>>>>>>>> do agree
>>>>>>>>>>>>> with you that there’s little value in the 202 distinction. About 
>>>>>>>>>>>>> the
>>>>>>>> only
>>>>>>>>>>>>> thing you could do is investigate your cluster for connectivity
>>>>>>>> issues or
>>>>>>>>>>>>> overloading if you get a sustained period of 202’s, as it would 
>>>>>>>>>>>>> be an
>>>>>>>>>>>>> indicator that the system is partitioned.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In order to achieve your goals, CouchDB 2.0 would have to ensure
>>>>>>>> that the
>>>>>>>>>>>>> result of a write did not change after the fact. That is,
>>>>>>>> anti-entropy
>>>>>>>>>>>>> would need to be disabled, or somehow agree to roll forward or
>>>>>>>> backward
>>>>>>>>>>>>> based on the initial circumstances. In short, we’d have to 
>>>>>>>>>>>>> introduce
>>>>>>>> strong
>>>>>>>>>>>>> consistency (paxos or raft or zab, say). While this would be a 
>>>>>>>>>>>>> great
>>>>>>>>>>>>> feature to add, it’s not currently present, and no amount of
>>>>>>>> twiddling the
>>>>>>>>>>>>> status codes will achieve it. We’d rather be honest about our
>>>>>>>> position on
>>>>>>>>>>>>> the CAP triangle.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> B.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 30 Mar 2015, at 22:37, Nathan Vander Wilt <
>>>>>>>> nate-li...@calftrail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> A technical co-founder of Cloudant agreed that this was a bug 
>>>>>>>>>>>>>> when I
>>>>>>>>>>>>> first hit it a few years ago. I found back the original thread 
>>>>>>>>>>>>> here
>>>>>>>> — this
>>>>>>>>>>>>> is the discussion I was trying to recall in my OP:
>>>>>>>>>>>>>> It sounds like perhaps there is a related issue tracked 
>>>>>>>>>>>>>> internally
>>>>>>>> at
>>>>>>>>>>>>> Cloudant as a result of that conversation.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> JamesM, thanks for your support here and tracking this down. 203
>>>>>>>> seemed
>>>>>>>>>>>>> like the best status code to "steal" for this to me too. Best 
>>>>>>>>>>>>> wishes
>>>>>>>> in
>>>>>>>>>>>>> getting this fixed!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> regards,
>>>>>>>>>>>>>> -natevw
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mar 25, 2015, at 4:49 AM, Robert Newson <rnew...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 2.0 is explicitly an AP system, the behaviour you describe is 
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>> classified as a bug.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Anti-entropy is the main reason that you cannot get strong
>>>>>>>> consistency
>>>>>>>>>>>>> from the system, it will transform "failed" writes (those that
>>>>>>>> succeeded on
>>>>>>>>>>>>> one node but fewer than R nodes) into success (N copies) as long 
>>>>>>>>>>>>> as
>>>>>>>> the
>>>>>>>>>>>>> nodes have enough healthy uptime.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> True of cloudant and 2.0.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 24 Mar 2015, at 15:14, Mutton, James <jmut...@akamai.com>
>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Funny you should mention it.  I drafted an email in early
>>>>>>>> February to
>>>>>>>>>>>>> queue up the same discussion whenever I could get involved again
>>>>>>>> (which I
>>>>>>>>>>>>> promptly forgot about).  What happens currently in 2.0 appears
>>>>>>>> unchanged
>>>>>>>>>>>>> from earlier versions.  When R is not satisfied in fabric,
>>>>>>>>>>>>> fabric_doc_open:handle_message eventually responds with a {stop, 
>>>>>>>>>>>>> …}
>>>>>>>> but
>>>>>>>>>>>>> leaves the acc-state as the original r_not_met which triggers a
>>>>>>>> read_repair
>>>>>>>>>>>>> from the response handler.  read_repair results in an {ok, …} with
>>>>>>>> the only
>>>>>>>>>>>>> doc available, because no other docs are in the list.  The final 
>>>>>>>>>>>>> doc
>>>>>>>>>>>>> returned to chttpd_db:couch_doc_open and thusly to
>>>>>>>> chttpd_db:db_doc_req is
>>>>>>>>>>>>> simply {ok, Doc}, which has now lost the fact that the answer was 
>>>>>>>>>>>>> not
>>>>>>>>>>>>> complete.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This seems straightforward to fix by a change in
>>>>>>>>>>>>> fabric_open_doc:handle_response and read_repair.  handle_response
>>>>>>>> knows
>>>>>>>>>>>>> whether it has R met and could pass that forward, or allow
>>>>>>>> read-repair to
>>>>>>>>>>>>> pass it forward if read_repair is able to satisfy acc.r.  I can’t
>>>>>>>> speak for
>>>>>>>>>>>>> community interest in the behavior of sending a 202, but it’s
>>>>>>>> something I’d
>>>>>>>>>>>>> definitely like for the same reasons you cite.  Plus it just seems
>>>>>>>>>>>>> disconnected to do it on writes but not reads.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> </JamesM>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Mar 24, 2015, at 14:06, Nathan Vander Wilt <
>>>>>>>>>>>>> nate-li...@calftrail.com> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Sorry, I have not been following CouchDB 2.0 roadmap but I was
>>>>>>>>>>>>> extending my fermata-couchdb plugin today and realized that 
>>>>>>>>>>>>> perhaps
>>>>>>>> the
>>>>>>>>>>>>> Apache release of BigCouch as CouchDB 2.0 might provide an
>>>>>>>> opportunity to
>>>>>>>>>>>>> fix a serious issue I had using Cloudant's implementation.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>> https://github.com/cloudant/bigcouch/issues/55#issuecomment-30186518
>>>>>>>> for
>>>>>>>>>>>>> some additional background/explanation, but my understanding is 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> Cloudant for all practical purposes ignores the read durability
>>>>>>>> parameter.
>>>>>>>>>>>>> So you can write with ?w=N to attempt some level of quorum, and 
>>>>>>>>>>>>> get
>>>>>>>> a 202
>>>>>>>>>>>>> back if that quorum is unment. _However_ when you ?r=N it really
>>>>>>>> doesn't
>>>>>>>>>>>>> matter if only <N nodes are available…if even just a single
>>>>>>>> available node
>>>>>>>>>>>>> has some version of the requested document you will get a 
>>>>>>>>>>>>> successful
>>>>>>>>>>>>> response (!).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> So in practice, there's no way to actually use the 
>>>>>>>>>>>>>>>>> quasi-Dynamo
>>>>>>>>>>>>> features to dynamically _choose_ between consistency or 
>>>>>>>>>>>>> availability
>>>>>>>> — when
>>>>>>>>>>>>> it comes time to read back a consistent result, BigCouch instead 
>>>>>>>>>>>>> just
>>>>>>>>>>>>> always gives you availability* regardless of what a given request
>>>>>>>> actually
>>>>>>>>>>>>> needs. (In my usage I ended up treating a 202 write as a 500, 
>>>>>>>>>>>>> rather
>>>>>>>> than
>>>>>>>>>>>>> proceeding with no way of ever knowing whether a write did NOT
>>>>>>>> ACTUALLY
>>>>>>>>>>>>> conflict or just hadn't YET because $who_knows_how_many nodes were
>>>>>>>> still
>>>>>>>>>>>>> down…)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> IIRC, this was both confirmed and acknowledged as a serious 
>>>>>>>>>>>>>>>>> bug
>>>>>>>> by a
>>>>>>>>>>>>> Cloudant engineer (or support personnel at least) but could not be
>>>>>>>> quickly
>>>>>>>>>>>>> fixed as it could introduce backwards-compatibility concerns. So…
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Is CouchDB 2.0 already breaking backwards compatibility with
>>>>>>>>>>>>> BigCouch? If true, could this read durability issue now be fixed
>>>>>>>> during the
>>>>>>>>>>>>> merge?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>>>>> -natevw
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> * DISCLAIMER: this statement has not been endorsed by actual
>>>>>>>> uptime
>>>>>>>>>>>>> of *any* Couch fork…
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Could CouchDB 2.0 fix actual read quorum?

Reply via email to