On Thu, 9 Jan 2025, 12:26 Feike Steenbergen, <feikesteenber...@gmail.com> wrote:
> I'm trying to change a few applications to fully use this, as PostgreSQL > 17 added this support. > > The application does something like this: > > - fetch information from a source system and store it in a temp table > - run a MERGE with a table (target) in this database, > updating, inserting and deleting in a single statement > - the target table holds information for multiple systems > > The temp table (source) doesn't contain the identifier for the system, as > we can inject that at runtime. > > This is the shape of the statement: > > MERGE INTO > merge_target AS t > USING > merge_source AS s ON (t.time = s.time AND t.device_id = $1) > WHEN MATCHED THEN > UPDATE set > value = s.value > WHEN NOT MATCHED THEN > INSERT (device_id, time, value) VALUES ($1, time, value) > WHEN NOT MATCHED BY SOURCE > AND t.device_id = $1 > THEN DELETE; > > If we run this however, there is always a Seq Scan against merge_target, > whereas the filter of `AND t.device_id = $1` uses a (Bitmap) Index scan > in other types of queries. > > Previously we would - in a CTE - do a delete *and then* the merge. > > Which performed much better as the DELETE would not do a Seq Scan. > > Attached a full reproducing test case. Some numbers: > > Previously, DELETE in CTE, then merge: > Planning Time: 6.700 ms > Execution Time: 7.473 ms > > Using the MERGE WHEN MATCHED ON SOURCE THEN DELETE: > Planning Time: 0.994 ms > Execution Time: 65.695 ms > > My questions are: > > - is the Seq Scan expected by others in this mailing list as well? > - Is it required to do the Seq Scan? > - is this something that could be optimized? > > Kind regards, > > Feike Steenbergen > An excellent post. I wish all posters provided nice contained example cases like this one. I note ,in the documentation, that a Warning box got added which says this... If both WHEN NOT MATCHED BY SOURCE and WHEN NOT MATCHED [BY TARGET] clauses are specified, the MERGE command will perform a FULL join between data_source and the target table. For this to work, at least one join_condition subexpression must use an operator that can support a hash join, or all of the subexpressions must use operators that can support a merge join. This could be a hint as to the reason maybe ? The NOT MATCHED BY SOURCE is new feature to 17. I'm looking forward to others replies here. >