Re: [DISCUSS] couchdb 4.0 transactional semantics

Robert Newson Thu, 07 Jan 2021 02:33:37 -0800

Apologies for resurrecting this thread after so long.

I’ve looked over the thread again today and it seems there is general consensus 
on the desired semantics. I will start a vote thread.


B.

> On 24 Jul 2020, at 18:27, Nick Vatamaniuc <vatam...@gmail.com> wrote:
> 
> Great discussion everyone!
> 
> For normal replications, I think it might be nice to make an exception
> and allow server-side pagination for compatibility at first, with a
> new option to explicitly enable strict snapshots behavior. Then, in a
> later release make it the default to match _all_docs and _view reads.
> In other words, for a short while, we'd support bi-directional
> replications between 4.x and 1/2/3.x on any replicator and document
> that fact, then after a while will switch that capability off and
> users would have to run replications on a 4.x replicator only, or
> specially updated 3.x replicators.
> 
>> I'd rather support this scenario than have to support explaining why the 
>> "one shot" replication back to an old 1.x, when initiated by a 1.x cluster, 
>> is returning results "ahead" of the time at which the one-shot replication 
>> was started.
> 
> Ah, that won't happen in the current fdb prototype branch
> implementation. What might happen is there would be changes present in
> the changes feed that happened _after_ the request has started. That
> won't be any different than if a node where replication runs restarts,
> or there is a network glitch. The changes feed would proceed from the
> last checkpoint and see changes that happened after the initial
> starting sequence and apply them in order (document "a" was deleted,
> then it was updated again then deleted again, every change will be
> applied incrementally to the target, etc).
> 
> We'd have to document the fact that a single snapshot replication from
> 4.x -> 1/2/3.x is impossible anyway (unless we do the trick where we
> compare the update sequence and db was not updated in the meantime or
> the new FDB storage engine allows it).  The question then becomes if
> we allow the pagination to happen on the client or the server. In case
> of normal replication I think it would be nice to allow it to happen
> on the server for a bit to allow for maximum initial replication
> interoperability.
> 
>> For cases where you’re not concerned about the snapshot isolation (e.g. 
>> streaming an entire _changes feed), there is a small performance benefit to 
>> requesting a new FDB transaction asynchronously before the old one actually 
>> times out and swapping over to it. That’s a pattern I’ve seen in other FDB 
>> layers but I’m not sure we’ve used it anywhere in CouchDB yet.
> 
> Good point, Adam. We could optimize that part, yeah. Fetch a GRV after
> 4.9 seconds or so and keep it ready to go for example. So far we tried
> to react to the transaction_too_old exception, as opposed to starting
> a timer there in order to allow us to use the maximum time a tx is
> alive, to save a few seconds or milliseconds. That required some
> tricks such as handling the exception bubbling up from either the
> range read itself, or from the user's callback (say if user code in
> the callback fetched a doc body which blew up with a
> transaction_too_old exception). As an interesting aside, from quick
> experiments I had noticed we were able to stream about 100-150k rows
> from a single tx snapshot, that wasn't too bad I thought.
> 
> Speaking of replication, I am trying to see what the replicator might
> look like in 4.x in the https://github.com/apache/couchdb/pull/3015
> (prototype/fdb-replicator branch). It's very much a wip and hot mess
> currently. Will issue an RFC once I have a better handle on the
> general shape of it. So far it's based on couch_jobs, with a global
> queue and looks like it might be smaller overall, as it's leveraging
> the scheduling capabilities already present in couch_jobs, and but
> once started individual replication job process hierarchy is largely
> the same as before.
> 
> Cheers,
> -Nick
> 
> 
> 
> 
> 
> On Wed, Jul 22, 2020 at 8:48 AM Bessenyei Balázs Donát
> <bes...@apache.org> wrote:
>> 
>> On Tue, 21 Jul 2020 at 18:45, Jan Lehnardt <j...@apache.org> wrote:
>>> I’m not sure why a URL parameter vs. a path makes a big difference?
>>> 
>>> Do you have an example?
>>> 
>>> Best
>>> Jan
>>> —
>> 
>> Oh, sure! OpenAPI Generator [1] and et al. for example generate Java
>> methods (like [2] out of spec [3]) per path per verb.
>> Java's type safety and the way methods are currently generated don't
>> really provide an easy way to retrieve multiple kinds of responses, so
>> having them separate would help a lot there.
>> 
>> 
>> Donat
>> 
>> PS. I'm getting self-conscious about discussing this in this thread.
>> Should I open a new one?
>> 
>> 
>> [1] https://openapi-generator.tech/
>> [2] 
>> https://github.com/OpenAPITools/openapi-generator/blob/c49d8fd/samples/client/petstore/java/okhttp-gson/src/main/java/org/openapitools/client/api/PetApi.java#L606
>> [3] 
>> https://github.com/OpenAPITools/openapi-generator/blob/c49d8fd/samples/client/petstore/java/okhttp-gson/api/openapi.yaml#L208

Re: [DISCUSS] couchdb 4.0 transactional semantics

Reply via email to