Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

Jan Lehnardt Wed, 23 Jan 2019 10:36:13 -0800

Hi Eli,

Thanks for chiming in. These are all good topics and are in some form or 
another already on our list to be discussed.  Re query servers: it is for now 
really just custom reduces and arbitrary startkey/endkey ranges. JS views 
aren't going anywhere.


Cheers
Jan
—

> On 23. Jan 2019, at 18:54, Eli Stevens (Gmail) <wickedg...@gmail.com> wrote:
> 
> I'd like to request that there be threads where it's appropriate to discuss:
> 
> - Managing the refactoring/merge process to avoid the previous situation
> where 1.x was mostly dead, but 2.x wasn't going to land for a few years.
> - Other features to deprecate at the same time as losing JS reduce (I
> assume that this really means "all external query servers" are going away?).
> - What the support for users who will be stuck on 2.x will be.
> 
> Apologies for the noise if those are already on the list of topics.  :)
> 
> Cheers,
> Eli
> 
>> On Wed, Jan 23, 2019 at 5:33 AM Jan Lehnardt <j...@apache.org> wrote:
>> 
>> Hi Bob,
>> 
>> this is all very exciting!
>> 
>> First up, full disclosure, the CouchDB PMC has had about two weeks to
>> think about this already, so if any of the following doesn’t sound like a
>> knee-jerk reaction, that’s why.
>> 
>> I’m personally tentatively optimistic about this proposal and I’m willing
>> to work through all open questions from governance, contribution management
>> to the technical bits to see if we as the CouchDB project arrive at a point
>> where we are comfortable going down this path.
>> 
>> The PMC has already identified a set of discussion areas for this dev@
>> mailing list to go through before any definite decision can be made.
>> Separate emails for those discussions are going to be posted on this list
>> shortly, so I won’t go into further detail here.
>> 
>> If anyone sees a need for discussion beyond the threads that will appear
>> here, please speak up at your earliest convenience. This proposal would
>> mean a big step for our project, and we must make sure to hear all voices.
>> 
>> Once we’ve gone through all this, the resulting answers to all the open
>> questions coming up will end up in a consensus finding process on this
>> mailing list, which will signify the final project decision.
>> 
>> * * *
>> 
>> That said, I’d like to highlight one of these topics: IBM/Cloudant’s
>> contributions going forward.
>> 
>> Looking at how 2.0 came to be, the contributions were mostly taken on good
>> faith (and legal review), and from the trust Cloudant built up operating a
>> large number of large instances of clusters of what would eventually become
>> CouchDB 2.0. It has clearly paid off for CouchDB and our current level of
>> success wouldn’t be without IBM/Cloudant.
>> 
>> However, some of the ways we work with the IBM team leave things to be
>> desired. Specifically, the Apache CouchDB community is frequently not
>> involved in design discussions around new features. Those happen inside IBM
>> and we “only” get a PR that then goes through the regular review process.
>> Again, this has served us well, but we can do even better, so I’d like to
>> take the opportunity of this larger proposal to suggest we actually do
>> better. As promised, a more detailed thread about this is going to come up,
>> and it’ll be the right place to go through the minutiae of this.
>> 
>> With this structural change, I believe we are in a great position to work
>> through the details of this proposal and the subsequent design and
>> engineering steps.
>> 
>> * * *
>> 
>> Finally, I want to reiterate Bob’s point: while this proposal is largely
>> driven by IBM, IBM has no power to unilaterally force the CouchDB project
>> to accept this proposal and they have already signalled and worked towards
>> making this a mutually beneficial endeavour. The CouchDB project has
>> different objectives from IBM and it is up to us to come up with a proposal
>> that satisfies all of our objectives as well as IBMs, should this motion
>> pass.
>> 
>> Best
>> Jan
>> —
>> 
>> 
>>> On 23. Jan 2019, at 11:00, Robert Samuel Newson <rnew...@apache.org>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> CouchDB 2.0 introduced clustering; the ability to scale a single
>> database across multiple nodes, increasing both the maximum size of a
>> database and adding native fault-tolerance. This welcome and considerable
>> step forward was not without its trade-offs. In the years since 2.0 was
>> released, users frequently encounter the following issues as a direct
>> consequence of the 2.0 clustering approach:
>>> 
>>> 1. Conflict revisions can be created on normal concurrent updates issued
>> to a single database, since each replica of a database shard independently
>> chooses whether to accept a given update, and all replicas will eventually
>> propagate updates that any one of them has chosen to accept.
>>> 2. Secondary indexes ("views") do not scale the same way as document
>> lookups, as they are sharded by doc id, not emitted view key (thus forcing
>> a consultation of all shard ranges for each query).
>>> 3. The changes feed is no longer totally ordered and, worse, could
>> replay earlier changes in the event of a node failure (even a temporary
>> one).
>>> 
>>> The idea is to use FoundationDB as the new CouchDB foundational layer,
>> letting it take care of data storage and placement. An introduction to
>> FoundationDB would take up too much space here so I will summarise it as a
>> highly scalable ordered key-value store with transactional semantics,
>> provides strong consistency, scaling from a single node to many. It is
>> licensed under the ASLv2 but is not an Apache project.
>>> 
>>> By using FoundationDB we can solve all three of the problems listed
>> above and deliver semantics much closer to CouchDB 1.x's behaviour while
>> improving upon the scalability advantages that 2.0 introduced. The
>> essential character of CouchDB would be preserved (MVCC for documents,
>> replication between CouchDB databases) but the underlying plumbing would
>> change significantly. In addition, this new foundation will allow us to add
>> long wished-for features more easily. For example, multi-document
>> transactions become possible, as does efficient field-level reading and
>> writing. A further thought is the ability to update views transactionally
>> with the database update.
>>> 
>>> For those familiar with the CouchDB 2.0 architecture, the proposal is,
>> in effect, to change all the functions in fabric.erl so that they work
>> against a (possibly remote) FoundationDB cluster instead of the current
>> implementation of calling into the original CouchDB 1.x code (couch_btree,
>> couch_file, etc).
>>> 
>>> This is a large change and, for full disclosure, the IBM Cloudant team
>> are proposing it. We have done our due diligence in investigating
>> FoundationDB as well as detailed investigation into how CouchDB semantics
>> would be built on top of FoundationDB. Any and all decisions on that must
>> take place here on the CouchDB developer mailing list, of course, but we
>> are confident that this is feasible.
>>> During those investigations we have identified a small number of CouchDB
>> features that we do not yet see a way to do on FoundationDB, the main one
>> being custom (Javascript) reduces. This is a direct consequence of no
>> longer rolling our own persistence layer (couch_btree and friends) and
>> would likely apply to any alternative technology.
>>> 
>>> I think this would be a great advance for CouchDB, preserving what makes
>> CouchDB special but taking advantage of the superbly engineered
>> FoundationDB software at the bottom of the stack.
>>> 
>>> Regards,
>>> Robert Newson
>> 
>> --
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 
>>

Re: [DISCUSS] Rebase CouchDB on top of FoundationDB

Reply via email to