Bruce Momjian <[email protected]> wrote: > I think the reporter above extrapolated this example to use ctid in > other, non-appropriate cases. Should we add a warning to the docs to > explain that the general use of ctid is discouraged?
I've been explaining this badly. Here's an example that I hope will make clear why I'm so concerned. (Read Committed default isolation level) drop table if exists b1, b2; create table b1 as select 1 id, 100 balance; create table b2 as select 1 id, 100 balance; With an immutable column self-join, it works as expected and serializes the balance changes. --session1 begin; with x as (select id from b1 order by id for update) update b1 set balance = balance + 1 from x where b1.id = x.id; --session2 with x as (select id from b1 order by id for update) update b1 set balance = balance - 100 from x where b1.id = x.id; select * from b1; --session1 commit; --session2 UPDATE 1 id | balance ----+--------- 1 | 1 (1 row) But with ctid, the second update is lost. --session1 begin; with x as (select ctid from b2 order by id for update) update b2 set balance = balance + 1 from x where b2.ctid = x.ctid; --session2 with x as (select ctid from b2 order by id for update) update b2 set balance = balance - 100 from x where b2.ctid = x.ctid; select * from b2; --session1 commit; --session2 UPDATE 0 id | balance ----+--------- 1 | 101 (1 row) The session2 CTE blocks until it gets the updated version of the row from session 1 commit. It gets b2.ctid = (0,2) giving x.ctid = (0,2). But the UPDATE gets b2.ctid = (0,1) at the start of the transaction. This doesn't change even after session 1 commits. The join fails. This is correct, but not obvious. This is why I think the docs should say something different to what they currently do. The UPDATE example is fine only because it's run repeatedly until there are no more rows to find. I learned the primary key self-join pattern for order by/limit in delete/update soon after I began with postgres. I'm pretty sure ctid would have confused me. So I think the doc examples with an id column will be very helpful, and safer. If users discover ctid on internet forums or LLMs and then check the docs, I think they should get some notice of the concurrency risk. Here's a patch with an attempt to do the above. Forum examples: https://stackoverflow.com/questions/10245560/deadlocks-in-postgresql-when-running-update/71163671#71163671 https://www.reddit.com/r/PostgreSQL/comments/1cn0q1c/comment/l36ppve/ Thanks, Bernice
From 69595aeb9f1e6eda74fa29fde22bd117a8044812 Mon Sep 17 00:00:00 2001 From: Bernice Southey <[email protected]> Date: Thu, 20 Nov 2025 11:53:57 +0000 Subject: [PATCH] Add ctid guidance and remove from examples --- doc/src/sgml/ddl.sgml | 5 ++++- doc/src/sgml/ref/delete.sgml | 10 +++++----- doc/src/sgml/ref/update.sgml | 9 +++------ 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 09ad8400fd0..99c8facfe0b 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1559,7 +1559,10 @@ CREATE TABLE circles ( <structfield>ctid</structfield> will change if it is updated or moved by <command>VACUUM FULL</command>. Therefore <structfield>ctid</structfield> is useless as a long-term row - identifier. A primary key should be used to identify logical rows. + identifier. Be aware that ctids will change in data-modifying + concurrent transactions, and they are not unique across table partitions. + A primary key should be used to identify logical rows. + </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml index 5b52f77e28f..48eb1b6755a 100644 --- a/doc/src/sgml/ref/delete.sgml +++ b/doc/src/sgml/ref/delete.sgml @@ -313,15 +313,15 @@ DELETE FROM tasks WHERE CURRENT OF c_tasks; documentation of <command>UPDATE</command></link>: <programlisting> WITH delete_batch AS ( - SELECT l.ctid FROM user_logs AS l - WHERE l.status = 'archived' - ORDER BY l.creation_date + SELECT a.id FROM accounts AS a + WHERE a.status = 'archived' + ORDER BY a.id FOR UPDATE LIMIT 10000 ) -DELETE FROM user_logs AS dl +DELETE FROM accounts AS dl USING delete_batch AS del - WHERE dl.ctid = del.ctid; + WHERE dl.id = del.id; </programlisting> </para> </refsect1> diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml index 40cca063946..7374b884080 100644 --- a/doc/src/sgml/ref/update.sgml +++ b/doc/src/sgml/ref/update.sgml @@ -486,13 +486,10 @@ UPDATE films SET kind = 'Dramatic' WHERE CURRENT OF c_films; no <literal>LIMIT</literal> clause for <command>UPDATE</command>, it is possible to get a similar effect through the use of a <link linkend="queries-with">Common Table Expression</link> and a - self-join. With the standard <productname>PostgreSQL</productname> - table access method, a self-join on the system - column <link linkend="ddl-system-columns-ctid">ctid</link> is very - efficient: + self-join: <programlisting> WITH exceeded_max_retries AS ( - SELECT w.ctid FROM work_item AS w + SELECT w.id FROM work_item AS w WHERE w.status = 'active' AND w.num_retries > 10 ORDER BY w.retry_timestamp FOR UPDATE @@ -500,7 +497,7 @@ WITH exceeded_max_retries AS ( ) UPDATE work_item SET status = 'failed' FROM exceeded_max_retries AS emr - WHERE work_item.ctid = emr.ctid; + WHERE work_item.id = emr.id; </programlisting> This command will need to be repeated until no rows remain to be updated. Use of an <literal>ORDER BY</literal> clause allows the command to -- 2.43.0
