Re: Bulk DML performance

Greg Sabino Mullane Mon, 17 Mar 2025 07:15:09 -0700

On Mon, Mar 17, 2025 at 4:19 AM <[email protected]> wrote:

 Can you help me understand why performing 3 million lookups on a b-tree
> index with all pages cached in memory takes so long?



It's not the lookup, it's writing the 3 million rows (and in this
particular upsert case, deleting 3 million, then inserting 3 million)

> Well, that is not a great statement.
>
> Understood, but I was highlighting the performance of deleting 3 million
> rows identified by 3 million IDs, as opposed to deleting rows in a given
> range of IDs or deleting the whole table. It seems like deleting 3 million
> rows identified by 3 million IDs should be faster than updating 3 million
> rows (also identified by 3 million IDs).
>

It should indeed be faster. But keep in mind a delete immediately after
that upsert now has twice as many rows to walk through as the upsert did.
Also, a subselect like your original query can lead to a large nested loop.
Try another variant such as this one:

with ids as (select x from generate_series(1, 3_000_000) x) delete from
test using ids where id=x;

> With the table as it is you won't get better performance if you want the
> features that a relational database provides.
>
> Sorry to hear that. I had hoped there was room to improve this performance.
>

If pure upsert performance is the goal, remove the unique index and store a
timestamp along with your inserted data. Back to pure inserts again! (and a
few new downsides). When querying, only use the version of the row with the
highest timestamp.

Other random ideas:

* remove or consolidate columns you don't need, or can store in another
table
* pre-filter the rows in the app, so you can do a pure-insert (or COPY) of
known-to-be-new rows, then upsert the remaining rows
* use the smallest data types possible
* avoid or minimize toasted values
* pack your columns efficiently (e.g. reorder for 8 byte blocks)
* put the indexes on a ram-based tablespace
* boost your work_mem (for things like giant deletes which build hashes)
* revisit unlogged tables and partitioning

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

Re: Bulk DML performance

Reply via email to