I wrote:
> I've not made any attempt to do performance testing on this,
> but I think that's about the only thing standing between us
> and committing this thing.

I think the main gating condition for committing this is "does it
make things worse for simple non-partitioned updates?".  The need
for an extra tuple fetch per row certainly makes it seem like there
could be a slowdown there.  However, in some tests here with current
HEAD (54bb91c30), I concur with Amit's findings that there's barely
any slowdown in that case.  I re-did Heikki's worst-case example [1]
of updating both columns of a 2-column table.  I also tried variants
of that involving updating two columns of a 6-column table and of a
10-column table, figuring that those cases might be a bit more
representative of typical usage (see attached scripts).  What I got
was

Times in ms, for the median of 3 runs:

Table width     HEAD    patch   HEAD    patch
                -- jit on ---   -- jit off --

2 columns       12528   13329   12574   12922
6 columns       15861   15862   14775   15249
10 columns      18399   16985   16994   16907

So even with the worst case, it's not too bad, just a few percent
worse, and once you get to a reasonable number of columns the v13
patch is starting to win.

However, I then tried a partitioned equivalent of the 6-column case
(script also attached), and it looks like

6 columns       16551   19097   15637   18201

which is really noticeably worse, 16% or so.  I poked at it with
"perf" to see if there were any clear bottlenecks, and didn't find
a smoking gun.  As best I can tell, the extra overhead must come
from the fact that all the tuples are now passing through an Append
node that's not there in the old-style plan tree.  I find this
surprising, because a (non-parallelized) Append doesn't really *do*
much; it certainly doesn't add any data copying or the like.
Maybe it's not so much the Append as that the rules for what kind of
tuple slot can be used have changed somehow?  Andres would have a
clearer idea about that than I do.

Anyway, I'm satisfied that this patch isn't likely to seriously
hurt non-partitioned cases.  There may be some micro-optimization
that could help simple partitioned cases, though.

This leaves us with a question whether to commit this patch now or
delay it till we have a better grip on why cases like this one are
slower.  I'm inclined to think that since there are a lot of clear
wins for users of partitioning, we shouldn't let the fact that there
are also some losses block committing.  But it's a judgment call.

                        regards, tom lane

[1] 
https://www.postgresql.org/message-id/2e50d782-36f9-e723-0c4b-d133e63c6127%40iki.fi

drop table if exists tab;
create unlogged table tab (a int4, b int4, f3 int, f4 int, f5 text, f6 float8);
\timing on
insert into tab select g, g, g, g, g::text, g from generate_series(1, 10000000) 
g;
vacuum tab;
explain verbose update tab set b = b, f6 = f6;
update tab set b = b, f6 = f6;
drop table tab;
create unlogged table tab (a int4, b int4, f3 int, f4 int, f5 text, f6 float8);
\timing on
insert into tab select g, g, g, g, g::text, g from generate_series(1, 10000000) 
g;
vacuum tab;
explain update tab set b = b, a = a;
update tab set b = b, a = a;
drop table if exists tab;
create unlogged table tab (a int4, b int4, f3 int, f4 int, f5 text, f6 float8)
partition by range(a);
do $$
begin
for i in 0..9 loop
  execute 'create unlogged table tab'||i||' partition of tab for values from
 ('||i*1000000||') to ('||(i+1)*1000000||');';
end loop;
end$$;

\timing on
insert into tab select g, g, g, g, g::text, g from generate_series(1, 
10000000-1) g;
vacuum tab;
explain verbose update tab set b = b, f6 = f6;
update tab set b = b, f6 = f6;

Reply via email to