> It's a decent amount of work to define one though... It's clearly not
> acceptable to just dump out the internal representation, as already discussed
> in this thread.

I totally agree that it should be a well-defined format that doesn't
leak stuff like endianness and alignment of the underlying database.

With a bit of googling I found the UBJSON specification:
https://ubjson.org/#data_format
It seems like it would be possible to transform between JSONB and
UBJSON efficiently. As an example: For my recent use case the main
thing that was slow was parsing JSON strings, because of the escape
characters. That's not needed with UBJSON, because strings are simply
UTF-8 encoded binary data, that are prefixed with their length. So all
that would be needed is checking if the binary data is valid UTF-8.

Also there seem to be implementations in many languages for this spec:
https://ubjson.org/libraries/ So, that should make it easy for
Postgres client libraries to support this binary format.

> I'm still bemused by the proposition that that common interchange format
> shouldn't be, um, JSON. We've already seen BSON, BJSON, etc die
> well-deserved deaths.

@Tom Lane: UBJSON calls explicitly lists these specific failed
attempts at a binary encoding for JSON as the reason for why it was
created, aiming to fix the issues those specs have:
https://github.com/ubjson/universal-binary-json#why

Jelte


Reply via email to