Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Jan Lehnardt Wed, 30 Jan 2019 05:34:09 -0800


> On 30. Jan 2019, at 14:22, Jan Lehnardt <j...@apache.org> wrote:
> 
> Thanks Ilya for getting this started!
> 
> Two quick notes on this one:
> 
> 1. note that JSON does not guarantee object key order and that CouchDB has 
> never guaranteed it either, and with say emit(doc.foo, doc.bar), if either 
> emit() parameter was an object, the undefined-sort-order of SpiderMonkey 
> would mix things up. While worth bringing up, this is not a BC break.
> 
> 2. This would have the fun property of being able to rename a key inside all 
> docs that have that key.


…in one short operation.

Best
Jan
—
> 
> Best
> Jan
> —
> 
>> On 30. Jan 2019, at 14:05, Ilya Khlopotov <iil...@apache.org> wrote:
>> 
>> # First proposal
>> 
>> In order to overcome FoudationDB limitations on key size (10 kB) and value 
>> size (100 kB) we could use the following approach.
>> 
>> Bellow the paths are using slash for illustration purposes only. We can use 
>> nested subspaces, tuples, directories or something else. 
>> 
>> - Store documents in a subspace or directory  (to keep prefix for a key 
>> short)
>> - When we store the document we would enumerate all field names (0 and 1 are 
>> reserved) and store the mapping table in the key which look like:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / 0
>> ```
>> - Flatten the JSON document (convert it into key value pairs where the key 
>> is `JSON_PATH` and value is `SCALAR_VALUE`)
>> - Replace elements of JSON_PATH with integers from mapping table we 
>> constructed earlier
>> - When we have array use `1 / {array_idx}`
>> - Store scalar values in the keys which look like the following (we use 
>> `JSON_PATH` with integers). 
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH}
>> ```
>> - If the scalar value exceeds 100kB we would split it and store every part 
>> under key constructed as:
>> ```
>> {DB_DOCS_NS} / {DOC_KEY} / {JSON_PATH} / {PART_IDX}
>> ```
>> 
>> Since all parts of the documents are stored under a common `{DB_DOCS_NS} / 
>> {DOC_KEY}` they will be stored on the same server most of the time. The 
>> document can be retrieved by using range query (`txn.get_range("{DB_DOCS_NS} 
>> / {DOC_KEY} / 0", "{DB_DOCS_NS} / {DOC_KEY} / 0xFF")`). We can reconstruct 
>> the document since the mapping is returned as well.
>> 
>> The downside of this approach is we wouldn't be able to ensure the same 
>> order of keys in the JSON object. Currently the `jiffy` JSON encoder 
>> respects order of keys.
>> ```
>> 4> jiffy:encode({[{bbb, 1}, {aaa, 12}]}).
>> <<"{\"bbb\":1,\"aaa\":12}">>
>> 5> jiffy:encode({[{aaa, 12}, {bbb, 1}]}).
>> <<"{\"aaa\":12,\"bbb\":1}">>
>> ```
>> 
>> Best regards,
>> iilyak
>> 
>> On 2019/01/30 13:02:57, Ilya Khlopotov <iil...@apache.org> wrote: 
>>> As you might already know the FoundationDB has a number of limitations 
>>> which influences the way we might store JSON documents. The limitations are:
>>> 
>>> |      limitation             |recommended value|recommended max|absolute 
>>> max|
>>> |-------------------------|----------------------:|--------------------:|--------------:|
>>> | transaction duration  |                              |                    
>>>        |      5 sec      |
>>> | transaction data size |                              |                    
>>>        |      10 Mb     |
>>> | key size                   |                 32 bytes |                   
>>> 1 kB  |     10 kB      |
>>> | value size                |                               |               
>>>    10 kB |    100 kB     |
>>> 
>>> In order to fit the JSON document into 100kB we would have to partition it 
>>> in some way. There are three ways of partitioning the document
>>> 1. store multiple binary blobs (parts) in different keys
>>> 2. flatten JSON structure and store every path leading to a scalar value 
>>> under own key
>>> 3. measure the size of different branches of a tree representing the JSON 
>>> document (while we parse) and use another key for the branch when we about 
>>> to exceed the limit
>>> 
>>> - The first approach is the simplest but it wouldn't allow us to access 
>>> parts of the document.
>>> - The downsides of a second approach are:
>>> - flattened JSON structure would have long paths which means longer keys
>>> - the scalar value cannot be more than 100kb (unless we split it as well)
>>> - Third approach falls short in cases when the structure of the document 
>>> doesn't allow a clean cut off branches:
>>> - complex rules to handle all corner cases
>>> 
>>> The goals of this thread are:
>>> - to collect ideas on how to encode and store the JSON document
>>> - to comment on the collected ideas
>>> 
>>> Non goals:
>>> - the storage of metadata for the document would be discussed elsewhere
>>> - thumb stones
>>> - edit conflicts
>>> - revisions 
>>> 
>>> Best regards,
>>> iilyak
>>> 
> 
> -- 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: [DISCUSS] : things we need to solve/decide : storing JSON documents

Reply via email to