>From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] >thing where you have multiple JSON keys with the same name, i.e., { "foo": 1, >"foo": 2 }.
Are the proposals on the table able to continue this support (or am I wrong about Jiffy)? [1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an object SHOULD be unique.", though https://tools.ietf.org/html/rfc7493#section-2.3 does sensibly close that down. -- Mike. On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote: > > > > On 30. Jan 2019, at 14:22, Jan Lehnardt <j...@apache.org> wrote: > > > > Thanks Ilya for getting this started! > > > > Two quick notes on this one: > > > > 1. note that JSON does not guarantee object key order and that CouchDB has > > never guaranteed it either, and with say emit(doc.foo, doc.bar), if either > > emit() parameter was an object, the undefined-sort-order of SpiderMonkey > > would mix things up. While worth bringing up, this is not a BC break. > > > > 2. This would have the fun property of being able to rename a key inside > > all docs that have that key. > > …in one short operation. > > Best > Jan > — > > > > Best > > Jan > > — > > > >> On 30. Jan 2019, at 14:05, Ilya Khlopotov <iil...@apache.org> wrote: > >> > >> # First proposal > >> > >> In order to overcome FoudationDB limitations on key size (10 kB) and value > >> size (100 kB) we could use the following approach. > >> > >> Bellow the paths are using slash for illustration purposes only. We can > >> use nested subspaces, tuples, directories or something else. > >> > >> - Store documents in a subspace or directory (to keep prefix for a key > >> short) > >> - When we store the document we would enumerate all field names (0 and 1 > >> are reserved) and store the mapping table in the key which look like: > >> ``` > >> {DB_DOCS_NS} / {DOC_KEY} / 0 > >> ``` > >> - Flatten the JSON document (convert it into key value pairs where the key > >> is `JSON_PATH` and value is `SCALAR_VALUE`) > >> - Replace elements of JSON_PATH with integers from mapping table we > >> constructed earlier > >> - When we have array use `1 / {array_idx}` > >> - Store scalar values in the keys which look like the following (we use > >> `JSON_PATH` with integers). > >> ``` > >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} > >> ``` > >> - If the scalar value exceeds 100kB we would split it and store every part > >> under key constructed as: > >> ``` > >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX} > >> ``` > >> > >> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / > >> {DOC_KEY}` they will be stored on the same server most of the time. The > >> document can be retrieved by using range query > >> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} > >> / 0xFF")`). We can reconstruct the document since the mapping is returned > >> as well. > >> > >> The downside of this approach is we wouldn't be able to ensure the same > >> order of keys in the JSON object. Currently the `jiffy` JSON encoder > >> respects order of keys. > >> ``` > >> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}). > >> <<"{\"bbb\":1,\"aaa\":12}">> > >> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}). > >> <<"{\"aaa\":12,\"bbb\":1}">> > >> ``` > >> > >> Best regards, > >> iilyak > >> > >> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote: > >>> As you might already know the FoundationDB has a number of limitations > >>> which influences the way we might store JSON documents. The limitations > >>> are: > >>> > >>> | limitation |recommended value|recommended max|absolute > >>> max| > >>> |-------------------------|----------------------:|--------------------:|--------------:| > >>> | transaction duration | | > >>> | 5 sec | > >>> | transaction data size | | > >>> | 10 Mb | > >>> | key size | 32 bytes | > >>> 1 kB | 10 kB | > >>> | value size | | > >>> 10 kB | 100 kB | > >>> > >>> In order to fit the JSON document into 100kB we would have to partition > >>> it in some way. There are three ways of partitioning the document > >>> 1. store multiple binary blobs (parts) in different keys > >>> 2. flatten JSON structure and store every path leading to a scalar value > >>> under own key > >>> 3. measure the size of different branches of a tree representing the JSON > >>> document (while we parse) and use another key for the branch when we > >>> about to exceed the limit > >>> > >>> - The first approach is the simplest but it wouldn't allow us to access > >>> parts of the document. > >>> - The downsides of a second approach are: > >>> - flattened JSON structure would have long paths which means longer keys > >>> - the scalar value cannot be more than 100kb (unless we split it as well) > >>> - Third approach falls short in cases when the structure of the document > >>> doesn't allow a clean cut off branches: > >>> - complex rules to handle all corner cases > >>> > >>> The goals of this thread are: > >>> - to collect ideas on how to encode and store the JSON document > >>> - to comment on the collected ideas > >>> > >>> Non goals: > >>> - the storage of metadata for the document would be discussed elsewhere > >>> - thumb stones > >>> - edit conflicts > >>> - revisions > >>> > >>> Best regards, > >>> iilyak > >>> > > > > -- > > Professional Support for Apache CouchDB: > > https://neighbourhood.ie/couchdb-support/ > > > > -- > Professional Support for Apache CouchDB: > https://neighbourhood.ie/couchdb-support/ > >