Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Mike Rhodes Wed, 30 Jan 2019 06:21:44 -0800

>From what I recall Jiffy is able to cope with the valid-but-kinda-silly[1] 
>thing where you have multiple JSON keys with the same name, i.e., { "foo": 1, 
>"foo": 2 }.


Are the proposals on the table able to continue this support (or am I wrong 
about Jiffy)?

[1] https://tools.ietf.org/html/rfc8259#section-4, "The names within an object 
SHOULD be unique.", though https://tools.ietf.org/html/rfc7493#section-2.3 does 
sensibly close that down.

-- 
Mike.

On Wed, 30 Jan 2019, at 13:33, Jan Lehnardt wrote:
> 
> 
> > On 30. Jan 2019, at 14:22, Jan Lehnardt <j...@apache.org> wrote:
> > 
> > Thanks Ilya for getting this started!
> > 
> > Two quick notes on this one:
> > 
> > 1. note that JSON does not guarantee object key order and that CouchDB has 
> > never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
> > emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
> > would mix things up. While worth bringing up, this is not a BC break.
> > 
> > 2. This would have the fun property of being able to rename a key inside 
> > all docs that have that key.
> 
> …in one short operation.
> 
> Best
> Jan
> —
> > 
> > Best
> > Jan
> > —
> > 
> >> On 30. Jan 2019, at 14:05, Ilya Khlopotov <iil...@apache.org> wrote:
> >> 
> >> # First proposal
> >> 
> >> In order to overcome FoudationDB limitations on key size (10 kB) and value 
> >> size (100 kB) we could use the following approach.
> >> 
> >> Bellow the paths are using slash for illustration purposes only. We can 
> >> use nested subspaces, tuples, directories or something else. 
> >> 
> >> - Store documents in a subspace or directory  (to keep prefix for a key 
> >> short)
> >> - When we store the document we would enumerate all field names (0 and 1 
> >> are reserved) and store the mapping table in the key which look like:
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / 0
> >> ```
> >> - Flatten the JSON document (convert it into key value pairs where the key 
> >> is `JSON_PATH` and value is `SCALAR_VALUE`)
> >> - Replace elements of JSON_PATH with integers from mapping table we 
> >> constructed earlier
> >> - When we have array use `1 / {array_idx}`
> >> - Store scalar values in the keys which look like the following (we use 
> >> `JSON_PATH` with integers). 
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
> >> ```
> >> - If the scalar value exceeds 100kB we would split it and store every part 
> >> under key constructed as:
> >> ```
> >> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
> >> ```
> >> 
> >> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
> >> {DOC_KEY}` they will be stored on the same server most of the time. The 
> >> document can be retrieved by using range query 
> >> (`txn.get_range("{DB_DOCS_NS} / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} 
> >> / 0xFF")`). We can reconstruct the document since the mapping is returned 
> >> as well.
> >> 
> >> The downside of this approach is we wouldn't be able to ensure the same 
> >> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
> >> respects order of keys.
> >> ```
> >> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
> >> <<"{\"bbb\":1,\"aaa\":12}">>
> >> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
> >> <<"{\"aaa\":12,\"bbb\":1}">>
> >> ```
> >> 
> >> Best regards,
> >> iilyak
> >> 
> >> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote: 
> >>> As you might already know the FoundationDB has a number of limitations 
> >>> which influences the way we might store JSON documents. The limitations 
> >>> are:
> >>> 
> >>> |      limitation             |recommended value|recommended max|absolute 
> >>> max|
> >>> |-------------------------|----------------------:|--------------------:|--------------:|
> >>> | transaction duration  |                              |                  
> >>>          |      5 sec      |
> >>> | transaction data size |                              |                  
> >>>          |      10 Mb     |
> >>> | key size                   |                 32 bytes |                 
> >>>   1 kB  |     10 kB      |
> >>> | value size                |                               |             
> >>>      10 kB |    100 kB     |
> >>> 
> >>> In order to fit the JSON document into 100kB we would have to partition 
> >>> it in some way. There are three ways of partitioning the document
> >>> 1. store multiple binary blobs (parts) in different keys
> >>> 2. flatten JSON structure and store every path leading to a scalar value 
> >>> under own key
> >>> 3. measure the size of different branches of a tree representing the JSON 
> >>> document (while we parse) and use another key for the branch when we 
> >>> about to exceed the limit
> >>> 
> >>> - The first approach is the simplest but it wouldn't allow us to access 
> >>> parts of the document.
> >>> - The downsides of a second approach are:
> >>> - flattened JSON structure would have long paths which means longer keys
> >>> - the scalar value cannot be more than 100kb (unless we split it as well)
> >>> - Third approach falls short in cases when the structure of the document 
> >>> doesn't allow a clean cut off branches:
> >>> - complex rules to handle all corner cases
> >>> 
> >>> The goals of this thread are:
> >>> - to collect ideas on how to encode and store the JSON document
> >>> - to comment on the collected ideas
> >>> 
> >>> Non goals:
> >>> - the storage of metadata for the document would be discussed elsewhere
> >>> - thumb stones
> >>> - edit conflicts
> >>> - revisions 
> >>> 
> >>> Best regards,
> >>> iilyak
> >>> 
> > 
> > -- 
> > Professional Support for Apache CouchDB:
> > https://neighbourhood.ie/couchdb-support/
> > 
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 
>

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to