Hi Mike, > The trivial fix is to use DOCID/REVISIONID as DOC_KEY. This doesn't solve the issue with scalar values being over the limits FoundationDB can support.
Best regards, iilyak On 2019/01/30 19:00:15, Michael Fair <mich...@daclubhouse.net> wrote: > I know the claim was to avoid "revisions" and "conflicts" discussion in > this thread but isn't that unavoidable. > > In scheme #1 you have multiple keys with the same DOCID/PART_IDX but > different data. > In schemes #2 / #3 you have multiple copies of the JSON_PATH but different > values. > > The trivial fix is to use DOCID/REVISIONID as DOC_KEY. > > Mike > > On Wed, Jan 30, 2019 at 9:53 AM Ilya Khlopotov <iil...@apache.org> wrote: > > > FoundationDB Records layer uses global schema for JSON documents. They > > also have a nice way of creating indexes and schema evolution support. > > However this support comes at a cost of extra lookups in different > > subspace. With local mapping table we almost (except a corner case) certain > > that the schema and JSON fields would be collocated on a single node. Due > > to common prefix. > > > > Best regards, > > iilyak > > On 2019/01/30 17:05:01, Jan Lehnardt <j...@apache.org> wrote: > > > Ah sure, if we store the *cough* schema per doc, then it's not that > > easy. An iteration of this proposal could store paths globally with ids > > that the k/v store then uses for keys, which would enable what I described, > > but happy to ignore this for the time being. :) > > > > > > Cheers > > > Jan > > > — > > > > > > > On 30. Jan 2019, at 17:58, Adam Kocoloski <kocol...@apache.org> wrote: > > > > > > > > Jan, I don’t think it does have that "fun property #2", as the mapping > > is created separately for each document. In this proposal the field name > > “foo” could map to 2 in one document and 42 in another. > > > > > > > > Thanks for the proposal Ilya. Personally I wonder if the 10KB limit on > > field paths is anything more than a theoretical concern. It’s hard for me > > to imagine a useful schema that would get anywhere near that deep, but > > maybe I’m insufficiently creative :) There’s certainly a storage overhead > > from repeating the upper portion of a path over and over again, but that’s > > also something the storage engine can optimize away through prefix elision. > > The current production storage engine in FoundationDB does not do this > > elision, but the new one in development does. > > > > > > > > The value size limit is probably not so theoretical. I think as a > > project we could choose to impose a 100KB size limit on scalar values - a > > user who had a string longer than 100KB could chunk it up into an array of > > strings pretty easily to work around that limit. But let’s say we don’t > > want to impose that limit. In your design, how do I distinguish {PART_IDX} > > from the elements of the {JSON_PATH}? I was kind of expecting to see some > > magic value indicating that the subsequent set of keys with the same prefix > > are all elements of a “multi-part object”: > > > > > > > > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} = kMULTIPART > > > > {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX} = “First 100 KB …" > > > > ... > > > > > > > > You might have figured out something more efficient that saves a KV > > here but I can’t quite grok it. > > > > > > > > Cheers, Adam > > > > > > > > > > > >> On Jan 30, 2019, at 8:24 AM, Jan Lehnardt <j...@apache.org> wrote: > > > >> > > > >> > > > >> > > > >>> On 30. Jan 2019, at 14:22, Jan Lehnardt <j...@apache.org <mailto: > > j...@apache.org>> wrote: > > > >>> > > > >>> Thanks Ilya for getting this started! > > > >>> > > > >>> Two quick notes on this one: > > > >>> > > > >>> 1. note that JSON does not guarantee object key order and that > > CouchDB has never guaranteed it either, and with say emit(doc.foo, > > doc.bar), if either emit() parameter was an object, the > > undefined-sort-order of SpiderMonkey would mix things up. While worth > > bringing up, this is not a BC break. > > > >>> > > > >>> 2. This would have the fun property of being able to rename a key > > inside all docs that have that key. > > > >> > > > >> …in one short operation. > > > >> > > > >> Best > > > >> Jan > > > >> — > > > >>> > > > >>> Best > > > >>> Jan > > > >>> — > > > >>> > > > >>>> On 30. Jan 2019, at 14:05, Ilya Khlopotov <iil...@apache.org> > > wrote: > > > >>>> > > > >>>> # First proposal > > > >>>> > > > >>>> In order to overcome FoudationDB limitations on key size (10 kB) > > and value size (100 kB) we could use the following approach. > > > >>>> > > > >>>> Bellow the paths are using slash for illustration purposes only. We > > can use nested subspaces, tuples, directories or something else. > > > >>>> > > > >>>> - Store documents in a subspace or directory (to keep prefix for a > > key short) > > > >>>> - When we store the document we would enumerate all field names (0 > > and 1 are reserved) and store the mapping table in the key which look like: > > > >>>> ``` > > > >>>> {DB_DOCS_NS} / {DOC_KEY} / 0 > > > >>>> ``` > > > >>>> - Flatten the JSON document (convert it into key value pairs where > > the key is `JSON_PATH` and value is `SCALAR_VALUE`) > > > >>>> - Replace elements of JSON_PATH with integers from mapping table we > > constructed earlier > > > >>>> - When we have array use `1 / {array_idx}` > > > >>>> - Store scalar values in the keys which look like the following (we > > use `JSON_PATH` with integers). > > > >>>> ``` > > > >>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} > > > >>>> ``` > > > >>>> - If the scalar value exceeds 100kB we would split it and store > > every part under key constructed as: > > > >>>> ``` > > > >>>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX} > > > >>>> ``` > > > >>>> > > > >>>> Since all parts of the documents are stored under a common > > `{DB_DOCS_NS} / {DOC_KEY}` they will be stored on the same server most of > > the time. The document can be retrieved by using range query > > (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} / > > 0xFF")`). We can reconstruct the document since the mapping is returned as > > well. > > > >>>> > > > >>>> The downside of this approach is we wouldn't be able to ensure the > > same order of keys in the JSON object. Currently the `jiffy` JSON encoder > > respects order of keys. > > > >>>> ``` > > > >>>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}). > > > >>>> <<"{\"bbb\":1,\"aaa\":12}">> > > > >>>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}). > > > >>>> <<"{\"aaa\":12,\"bbb\":1}">> > > > >>>> ``` > > > >>>> > > > >>>> Best regards, > > > >>>> iilyak > > > >>>> > > > >>>>> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote: > > > >>>>> As you might already know the FoundationDB has a number of > > limitations which influences the way we might store JSON documents. The > > limitations are: > > > >>>>> > > > >>>>> | limitation |recommended value|recommended > > max|absolute max| > > > >>>>> > > |-------------------------|----------------------:|--------------------:|--------------:| > > > >>>>> | transaction duration | | > > | 5 sec | > > > >>>>> | transaction data size | | > > | 10 Mb | > > > >>>>> | key size | 32 bytes | > > 1 kB | 10 kB | > > > >>>>> | value size | | > > 10 kB | 100 kB | > > > >>>>> > > > >>>>> In order to fit the JSON document into 100kB we would have to > > partition it in some way. There are three ways of partitioning the document > > > >>>>> 1. store multiple binary blobs (parts) in different keys > > > >>>>> 2. flatten JSON structure and store every path leading to a scalar > > value under own key > > > >>>>> 3. measure the size of different branches of a tree representing > > the JSON document (while we parse) and use another key for the branch when > > we about to exceed the limit > > > >>>>> > > > >>>>> - The first approach is the simplest but it wouldn't allow us to > > access parts of the document. > > > >>>>> - The downsides of a second approach are: > > > >>>>> - flattened JSON structure would have long paths which means > > longer keys > > > >>>>> - the scalar value cannot be more than 100kb (unless we split it > > as well) > > > >>>>> - Third approach falls short in cases when the structure of the > > document doesn't allow a clean cut off branches: > > > >>>>> - complex rules to handle all corner cases > > > >>>>> > > > >>>>> The goals of this thread are: > > > >>>>> - to collect ideas on how to encode and store the JSON document > > > >>>>> - to comment on the collected ideas > > > >>>>> > > > >>>>> Non goals: > > > >>>>> - the storage of metadata for the document would be discussed > > elsewhere > > > >>>>> - thumb stones > > > >>>>> - edit conflicts > > > >>>>> - revisions > > > >>>>> > > > >>>>> Best regards, > > > >>>>> iilyak > > > >>>>> > > > >>> > > > >>> -- > > > >>> Professional Support for Apache CouchDB: > > > >>> https://neighbourhood.ie/couchdb-support/ > > > >>> > > > >> > > > >> -- > > > >> Professional Support for Apache CouchDB: > > > >> https://neighbourhood.ie/couchdb-support/ < > > https://neighbourhood.ie/couchdb-support/> > > > > > > > > >