Re: [GENERAL] JSONB spaces in text presentation

Ilya I. Ashchepkov Wed, 24 Sep 2014 09:30:01 -0700

With the same data:

# create cast (jsonb as bytea) without function;
# select
sum(length(data::text))::float/sum(octet_length((data::jsonb)::bytea)) from
data.packets;
     ?column?
-------------------
 0.630663654967513


and 0.554666142734544 without spaces

On Wed, Sep 24, 2014 at 9:22 PM, Seref Arikan <serefari...@gmail.com> wrote:

> This is interesting. Most binary encoding methods I use produce smaller
> files than the text files for the same content.
> Having read your mail, I've realized that I have no reason to accept the
> same from the jsonb. I did a quick google search to see if it is wrong to
> expect binary encoding to decrease size and saw that I'm not alone (which
> still does not mean I'm being reasonable).
> This project: http://ubjson.org/#size is one of the hits which mentions
> some nice space gains thanks to binary encoding.
>
> The "much larger" part is a bit scary. Is this documented somewhere?
>
> Best regards
> Seref
>
>
> On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmonc...@gmail.com>
> wrote:
>
>> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koc...@gmail.com>
>> wrote:
>> > I'm sorry about sending email several times. I haven't understand, was
>> it
>> > sent by gmail or not.
>> >
>> >
>> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pie...@hogranch.com>
>> wrote:
>> >>
>> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote:
>> >>>
>> >>>
>> >>> Is spaces is necessary in text presentation of JSONB?
>> >>> In my data resulting text contains ~12% of spaces.
>> >>
>> >>
>> >> can you show us an example of this?
>> >
>> >
>> > One record
>> > # select data from events.data limit 1;
>> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed":
>> 74,
>> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps":
>> 1,
>> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19,
>> "speed":
>> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256,
>> "digital":
>> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0,
>> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265,
>> > "altitude": 143, "latitude": 55.127888997395836, "longitude":
>> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource":
>> "terminal",
>> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19}
>> >
>> > Whitespacis percents in this record:
>> > # select array_length(regexp_split_to_array(data::text, text ' '),
>> > 1)*100./length(data::text) from events.data limit 1;
>> >       ?column?
>> > ---------------------
>> >  12.3417721518987342
>> >
>> > Whitespace in test data
>> >  # select count(*),avg(array_length(regexp_split_to_array(data::text,
>> text '
>> > '), 1)*100./length(data::text)) from events.data ;
>> >  count  |         avg
>> > --------+---------------------
>> >  242222 | 12.3649234646118312
>>
>>
>> For jsonb (unlike json), data is not actually stored as json but in a
>> binary format.  It will generally be much larger than the text
>> representation in fact but in exchange for that many operations will
>> be faster.  The spaces you see are generated when the jsonb type is
>> converted to text for output.  I actually think it's pretty reasonable
>> to want to redact all spaces from such objects in all cases where
>> converstion to text happens (output functions, xxxto_json, etc)
>> because ~12% savings are nothing to sneeze at when moving large
>> documents in and out of the database.
>>
>> On the flip side, a more verbose prettification would be pretty nice
>> too.  I wonder if a hypothetical GUC is the best way to control this
>> behavior...
>>
>> merlin
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


-- 
С уважением,
Ащепков Илья koc...@gmail.com

Re: [GENERAL] JSONB spaces in text presentation

Reply via email to