With the same data: # create cast (jsonb as bytea) without function; # select sum(length(data::text))::float/sum(octet_length((data::jsonb)::bytea)) from data.packets; ?column? ------------------- 0.630663654967513
and 0.554666142734544 without spaces On Wed, Sep 24, 2014 at 9:22 PM, Seref Arikan <serefari...@gmail.com> wrote: > This is interesting. Most binary encoding methods I use produce smaller > files than the text files for the same content. > Having read your mail, I've realized that I have no reason to accept the > same from the jsonb. I did a quick google search to see if it is wrong to > expect binary encoding to decrease size and saw that I'm not alone (which > still does not mean I'm being reasonable). > This project: http://ubjson.org/#size is one of the hits which mentions > some nice space gains thanks to binary encoding. > > The "much larger" part is a bit scary. Is this documented somewhere? > > Best regards > Seref > > > On Wed, Sep 24, 2014 at 2:44 PM, Merlin Moncure <mmonc...@gmail.com> > wrote: > >> On Wed, Sep 24, 2014 at 2:44 AM, Ilya I. Ashchepkov <koc...@gmail.com> >> wrote: >> > I'm sorry about sending email several times. I haven't understand, was >> it >> > sent by gmail or not. >> > >> > >> > On Wed, Sep 24, 2014 at 2:30 PM, John R Pierce <pie...@hogranch.com> >> wrote: >> >> >> >> On 9/24/2014 12:23 AM, Ilya I. Ashchepkov wrote: >> >>> >> >>> >> >>> Is spaces is necessary in text presentation of JSONB? >> >>> In my data resulting text contains ~12% of spaces. >> >> >> >> >> >> can you show us an example of this? >> > >> > >> > One record >> > # select data from events.data limit 1; >> > {"can": {"lls": {"1": 76.4}, "mhs": 4674.85, "rpm": 168.888, "speed": >> 74, >> > "runned": 166855895, "fuel_consumption": 74213.5}, "crc": 10084, "gps": >> 1, >> > "gsm": {"signal": 100}, "lls": {"1": 733, "2": 717}, "used": 19, >> "speed": >> > 87.4, "valid": 1, "msg_id": 89, "runned": 72.75, "boot_no": 256, >> "digital": >> > {"in": {"1": 1, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0}, "out": {"1": 0, >> > "2": 0}}, "visible": 20, "ignition": 1, "location": {"course": 265, >> > "altitude": 143, "latitude": 55.127888997395836, "longitude": >> > 80.8046142578125}, "protocol": 4, "coldstart": 1, "timesource": >> "terminal", >> > "receiver_on": 1, "external_power": 28.07, "internal_power": 4.19} >> > >> > Whitespacis percents in this record: >> > # select array_length(regexp_split_to_array(data::text, text ' '), >> > 1)*100./length(data::text) from events.data limit 1; >> > ?column? >> > --------------------- >> > 12.3417721518987342 >> > >> > Whitespace in test data >> > # select count(*),avg(array_length(regexp_split_to_array(data::text, >> text ' >> > '), 1)*100./length(data::text)) from events.data ; >> > count | avg >> > --------+--------------------- >> > 242222 | 12.3649234646118312 >> >> >> For jsonb (unlike json), data is not actually stored as json but in a >> binary format. It will generally be much larger than the text >> representation in fact but in exchange for that many operations will >> be faster. The spaces you see are generated when the jsonb type is >> converted to text for output. I actually think it's pretty reasonable >> to want to redact all spaces from such objects in all cases where >> converstion to text happens (output functions, xxxto_json, etc) >> because ~12% savings are nothing to sneeze at when moving large >> documents in and out of the database. >> >> On the flip side, a more verbose prettification would be pretty nice >> too. I wonder if a hypothetical GUC is the best way to control this >> behavior... >> >> merlin >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > > -- С уважением, Ащепков Илья koc...@gmail.com