Hi Hackers,

Here is a new thread for the next part of SQL:2011 Application Time: UPDATE and DELETE commands with FOR PORTION OF. This continues the long-running thread that ended with [1].

I don't have a new patch set yet, but I wanted to summarize the discussion at the PGConf.dev Advanced Patch Feedback session, especially to continue the conversation about triggers fired from inserting "temporal leftovers" as part of an UPDATE/DELETE FOR PORTION OF.

In my last patch series, I fire all statement & row triggers when the inserts happen for temporal leftovers. So let's assume there is a row with valid_at of [2000-01-01,2020-01-01) and the user's query is UPDATE t FOR PORTION OF valid_at FROM '2010-01-01' TO '2011-01-01'. So it changes one row, targeting only 2010. There are two temporal leftovers: one for 2000-2009 and one for 2011-2019 (inclusive). Then these triggers fire in the order given:

BEFORE UPDATE STATEMENT
BEFORE UPDATE ROW
BEFORE INSERT STATEMENT -- for the 2000-2009 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT
BEFORE INSERT STATEMENT -- for the 2011-2019 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT
AFTER UPDATE ROW
AFTER UPDATE STATEMENT

I think this is the correct behavior (as I'll get to below), but at the session none of us seemed completely sure. What we all agreed on is that we shouldn't implement it with SPI.

Before I switched to SPI, I feared that getting INSERT STATEMENT triggers to fire was going to cause a lot of code duplication. But I took my last pre-SPI patch (v39 from 7 Aug 2024), restored its implementation for ExecForPortionOfLeftovers, and got the desired behavior with just these lines (executed once per temporal leftover):

AfterTriggerBeginQuery()
ExecSetupTransitionCaptureState(mtstate, estate);
fireBSTriggers(mtstate);
ExecInsert(context, resultRelInfo, leftoverSlot, node->canSetTag, NULL, NULL);
fireASTriggers(mtstate);
AfterTriggerEndQuery(estate);

You'll be able to see all that with my next patch set, but for now I'm just saying: replacing SPI was easier than I thought.

There were different opinions about whether this behavior is correct. Robert and Tom both thought that firing INSERT STATEMENT triggers was weird. (Please correct me if I misrepresent anything you said!)

Robert pointed out that if you are using statement triggers for performance reasons (since that may be the only reason to prefer them to row triggers), you might be annoyed to find that your INSERT STATEMENT triggers fire up to two times every time you update a *row*.

Robert also warned that some people implement replication with statement triggers (though maybe not people running v18), and they might not like INSERT STATEMENT triggers firing when there was no user-issued insert statement. This is especially true since C-based triggers have access to the FOR PORTION OF details, as do PL/pgSQL triggers (in a follow-on patch), so they don't need to hear about the implicit inserts.

Also trigger-based auditing will see insert statements that were never 
explicitly sent by a user.
(OTOH this is also true for inserts made from triggers, and (as we'll see below) several other commands fire statement triggers for implicit actions.)

Robert & Tom agreed that if we leave out the statement triggers, then the NEW transition table for the overall UPDATE STATEMENT trigger should include all three rows: the updated version of the old row and the (up to) two temporal leftovers.

A philosophical argument I can see for omitting INSERT STATEMENT is that the temporal leftovers only preserve the history that was already there. They don't add to what is asserted by the table. But reporting them as statements feels a bit like treating them as user assertions. (I'm not saying I find this argument very strong, but I can see how someone would make it.)

Tom & Robert thought that firing the INSERT *ROW* triggers made sense and was valuable for some use-cases, e.g. auditing.

Robert also thought that nesting was weird. He thought that the order should be this (and even better if omitting the INSERT STATEMENTs):

BEFORE UPDATE STATEMENT
BEFORE UPDATE ROW
AFTER UPDATE ROW
AFTER UPDATE STATEMENT
BEFORE INSERT STATEMENT -- for the 2000-2009 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT
BEFORE INSERT STATEMENT -- for the 2011-2019 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT

But I think that the behavior I have is correct. My draft copy of the 2011 standard says this about inserting temporal leftovers (15.13, General Rules 10.c.ii):

> The following <insert statement> is effectively executed without further 
Access Rule
> and constraint checking:
>   INSERT INTO TN VALUES (VL1, ..., VLd)

When I compared IBM DB2 and MariaDB, I found that DB2 does this:

AFTER INSERT ROW -- for the 2000-2009 leftovers
AFTER INSERT STATEMENT
AFTER INSERT ROW -- for the 2011-2019 leftovers
AFTER INSERT STATEMENT
AFTER UPDATE ROW
AFTER UPDATE STATEMENT

(I didn't quickly find a way to observe BEFORE triggers firing, so they aren't show here. I was misremembering when I said at the session that it doesn't support BEFORE triggers. It does, but they can't do certain things, like insert into an auditing table.)

And MariaDB (which doesn't have statement triggers) does this:

BEFORE UPDATE ROW
BEFORE INSERT ROW -- for the 2000-2009 leftovers
AFTER INSERT ROW
BEFORE INSERT ROW -- for the 2011-2019 leftovers
AFTER INSERT ROW
AFTER UPDATE ROW

So both of those match the behavior I've implemented (including the nesting).

Peter later looked up the current text of the standard, and he found several parts that confirm the existing behavior. (Thank you for checking that for me Peter!) To paraphrase a note from him:

Paper SQL-026R2, which originally created this feature, says:

> All UPDATE triggers defined on the table will get activated in the usual way 
for all rows that are
> updated. In addition, all INSERT triggers will get activated for all rows 
that are inserted.

He also found the same text I quoted above (now in section 15.14).

He also brought up this other passage from SQL-026R2:

> Currently it is not possible
> for the body of an UPDATE trigger to gain access to the FROM and TO values in 
the FOR PORTION OF
> clause if one is specified. The syntax of <trigger definition> will need to 
be extended to allow
> such access. We are not proposing to enhance the syntax of <trigger 
definition> in this proposal.
> We leave it as a future Language Opportunity.

Since the standard still hasn't added that, firing at least INSERT ROW triggers is necessary if you want trigger-based replication. (I don't think this speaks strongly to INSERT STATEMENT triggers though.)

Incidentally, note that my patches *do* include this information (as noted above): both in the TriggerData struct passed to C triggers, and (in a separate patch) via PL/pgSQL variables. I don't include it for SQL-language triggers, and perhaps those should wait to see what the standard recommends.

In a world where we *do* fire statement triggers, I think each statement should get its own transition table contents.

Robert also said that we should choose behavior that is consistent with other 
features in Postgres.
I've attached a script to demonstrate a few interesting comparisons. It tests:

- INSERT ON CONFLICT DO NOTHING (without then with a conflict)
- INSERT ON CONFLICT DO UPDATE (without then with a conflict)
- INSERT ON CONFLICT DO UPDATE WHERE (with a conflict)
- MERGE DO NOTHING (without then with a conflict)
- MERGE UPDATE (without then with a conflict)
- cross-partition UPDATE
- ON DELETE CASCADE
- ON DELETE SET NULL

ON CONFLICT DO NOTHING and MERGE DO NOTHING do not fire an UPDATE STATEMENT 
trigger (naturally).

Cross-partition update does not fire extra statement triggers. Everything else does fire extra statement triggers. I think this is what I would have guessed if I hadn't tested it first. It feels like the natural choice for each feature.

Note that commands have to "decide" a priori which statement triggers they'll fire, before they process rows. So ON CONFLICT DO UPDATE fires first BEFORE INSERT STATEMENT, then BEFORE UPDATE STATEMENT, then row triggers, and finally AFTER UPDATE STATEMENT and AFTER INSERT STATEMENT. MERGE UPDATE is the same. It fires BEFORE INSERT STATEMENT, then BEFORE UPDATE STATEMENT, then row triggers, and finally AFTER UPDATE STATEMENT and AFTER INSERT STATEMENT. And the referential integrity actions fire statement triggers (as expected, since they are implemented with SPI).

In all cases we see nesting. With cross-partition update, the DELETE & INSERT triggers are nested inside the before/after UPDATE trigger (although interestingly the AFTER DELETE/INSERT triggers don't quite follow a nesting-like order with respect to each other):

BEFORE UPDATE STATEMENT
BEFORE UPDATE ROW
BEFORE DELETE ROW
BEFORE INSERT ROW
AFTER DELETE ROW
AFTER INSERT ROW
AFTER UPDATE STATEMENT

That covers all my research. My conclusion is that we *should* fire INSERT STATEMENT triggers, and they should be nested within the BEFORE & AFTER UPDATE triggers. I'm pleased that achieving that without SPI is not as hard as I expected.

Please stay tuned for some actual patches!

[1] https://www.postgresql.org/message-id/CA%2BrenyUZuWOxvY1Lv9O3F1LdpKc442EYvViR1DVzbD9ztaa6Yg%40mail.gmail.com

Yours,

--
Paul              ~{:-)
p...@illuminatedcomputing.com

Attachment: how-do-triggers-work.sql
Description: application/sql

Reply via email to