On Wed, Jul 01, 2020 at 06:24:18PM +0900, Michael Paquier wrote: > I am not sure either, still it looks worth thinking about it. > Attached is a rebased version of the last patch. What this currently > holds is just the switch to heap_multi_insert() for the three catalogs > pg_attribute, pg_depend and pg_shdepend. One point that looks worth > debating about is to how much to cap the data inserted at once. This > uses 64kB for all three, with a number of slots chosen based on the > size of each record, similarly to what we do for COPY.
I got an extra round of review done for this patch. One spot was missed in heap_multi_insert() for a comment telling catalogs not using multi inserts. After some consideration, I think that using 64kB as a base number to calculate the number of slots should be fine, similarly to COPY. While on it, I have done some measurements to see the difference in WAL produced and get an idea of the gain. For example, this function would create one table with a wanted number of attributes: CREATE OR REPLACE FUNCTION create_cols(tabname text, num_cols int) RETURNS VOID AS $func$ DECLARE query text; BEGIN query := 'CREATE TABLE ' || tabname || ' ('; FOR i IN 1..num_cols LOOP query := query || 'a_' || i::text || ' int'; IF i != num_cols THEN query := query || ', '; END IF; END LOOP; query := query || ')'; EXECUTE format(query); END $func$ LANGUAGE plpgsql; On HEAD, with a table that has 1300 attributes, this leads to 563kB of WAL produced. With the patch, we get down to 505kB. That's an extreme case of course, but that's nice a nice gain. A second test, after creating a database from a template that has roughly 10k entries in pg_shdepend (10k empty tables actually), showed a reduction from 2158kB to 1762kB in WAL. Finally comes the third catalog, pg_depend, and there is one thing that makes me itching about this part. We do a lot of useless work for the allocation and destruction of the slots when there are pinned dependencies, and there can be a lot of them. Just by running the main regression tests, it is easy to see that in 0003 we still do a lot of calls of recordMultipleDependencies() for one single dependency, and that most of these are actually pinned. So we finish by doing a lot of slot manipulation to insert nothing at the end, contrary to the counterparts with pg_shdepend and pg_attribute. In short, I think that for now it would be fine to commit a patch that does the multi-INSERT optimization for pg_attribute and pg_shdepend, but that we need more careful work for pg_depend. For example we could go through all the dependencies first and recalculate the number of slots to use depending on what is pinned or not, but this would make sense actually when more dependencies are inserted at once in more code paths, mostly for ALTER TABLE. So this needs more consideration IMO. Thoughts? -- Michael
signature.asc
Description: PGP signature