On Tue, 2 Jan 2024 at 11:30, Peter Eisentraut <pe...@eisentraut.org> wrote: > > On 06.12.23 22:08, Matthias van de Meent wrote: > > PFA a patch that reduces the output size of nodeToString by 50%+ in > > most cases (measured on pg_rewrite), which on my system reduces the > > total size of pg_rewrite by 33% to 472KiB. This does keep the textual > > pg_node_tree format alive, but reduces its size signficantly. > > > > The basic techniques used are > > - Don't emit scalar fields when they contain a default value, and > > make the reading code aware of this. > > - Reasonable defaults are set for most datatypes, and overrides can > > be added with new pg_node_attr() attributes. No introspection into > > non-null Node/Array/etc. is being done though. > > - Reset more fields to their default values before storing the values. > > - Don't write trailing 0s in outDatum calls for by-ref types. This > > saves many bytes for Name fields, but also some other pre-existing > > entry points. > > Based on our discussions, my understanding is that you wanted to produce > an updated patch set that is split up a bit.
I mentioned that I've been working on implementing (but have not yet completed) a binary serialization format, with an implementation based on Andres' generated metadata idea. However, that requires more elaborate infrastructure than is currently available, so while I said I'd expected it to be complete before the Christmas weekend, it'll take some more time - I'm not sure it'll be ready for PG17. In the meantime here's an updated version of the v0 patch, formally keeping the textual format alive, while reducing the size significantly (nearing 2/3 reduction), taking your comments into account. I think the gains are worth the consideration without taking into account the as-of-yet unimplemented binary format. > My suggestion is to make incremental patches along these lines: > [...] Something like the attached? It splits out into the following 0001: basic 'omit default values' 0002: reset location and other querystring-related node fields for all catalogs of type pg_node_tree. 0003: add default marking on typmod fields. 0004 & 0006: various node fields marked with default() based on observed common or initial values of those fields 0005: truncate trailing 0s from outDatum 0007 (new): do run-length + gap coding for bitmapset and the various integer list types. This saves a surprising amount of bytes. > The last one I have some doubts about, as previously expressed, but the > first few seem sensible to me. By splitting it up we can consider these > incrementally. That makes a lot of sense. The numbers for the full patchset do seem quite positive though: The metrics of the query below show a 40% decrease in size of a fresh pg_rewrite (standard toast compression) and a 5% decrease in size of the template0 database. The uncompressed data of pg_rewrite.ev_action is also 60% smaller. select pg_database_size('template0') as "template0" , pg_total_relation_size('pg_rewrite') as "pg_rewrite" , sum(pg_column_size(ev_action)) as "compressed" , sum(octet_length(ev_action)) as "raw" from pg_rewrite; version | template0 | pg_rewrite | compressed | raw ---------|-----------+------------+------------+--------- master | 7545359 | 761856 | 573307 | 2998712 0001 | 7365135 | 622592 | 438224 | 1943772 0002 | 7258639 | 573440 | 401660 | 1835803 0003 | 7258639 | 565248 | 386211 | 1672539 0004 | 7176719 | 483328 | 317099 | 1316552 0005 | 7176719 | 483328 | 315556 | 1300420 0006 | 7160335 | 466944 | 302806 | 1208621 0007 | 7143951 | 450560 | 287659 | 1187237 While looking through the data, I noticed the larger views now consist for a significant portion out of range table entries, specifically the Alias and Var nodes (which are mostly repeated and/or repetative values, but split across Nodes). I think column-major storage would be more efficient to write, but I'm not sure it's worth the effort in planner code. Kind regards, Matthias van de Meent Neon (https://neon.tech)
v1-0001-pg_node_tree-Don-t-serialize-fields-with-type-def.patch
Description: Binary data
v1-0002-pg_node_tree-reset-node-location-before-catalog-s.patch
Description: Binary data
v1-0005-NodeSupport-Don-t-emit-trailing-0s-in-outDatum.patch
Description: Binary data
v1-0004-NodeSupport-add-some-more-default-markers-for-var.patch
Description: Binary data
v1-0003-Nodesupport-add-support-for-custom-default-values.patch
Description: Binary data
v1-0007-NodeSupport-Apply-RLE-and-differential-encoding-o.patch
Description: Binary data
v1-0006-NodeSupport-Apply-some-more-defaults-serializatio.patch
Description: Binary data