Hi, Here are some observations. Em seg., 17 de mar. de 2025 às 09:19, <bill.po...@ymail.com> escreveu:
> > PostgreSQL has a lot of overhead per row. > > Okay, thanks. I'm not actually too worried about this since in my > scenario, each row is about 1.5 kB, so the % overhead is negligible. > > > It is probably not the lookup, but the *modification* of the index that > is slow. > > Yes that makes sense for the original 3 million inserts, but when I > perform the update of the 3 million rows, the index doesn't change - they > are all HOT updates. > Using "perf" I can see that the overhead is indeed due to index lookup when we do HOT updates. > > Then the best you can do is to use COPY rather than INSERT. It will > perform better (but [not] vastly better). > > I need to perform a merge (INSERT ... ON CONFLICT ... DO UPDATE) on the > data, so sadly I cannot use COPY. > > I have discovered that for some reason, performing the original insert > without the ON CONFLICT statement is twice as fast as performing the > original insert with an ON CONFLICT ... DO UPDATE clause, completing in 4 > seconds instead of 8. That seems strange to me because I wouldn't have > thought it would be doing any additional work since a unique constraint is > on the primary key, so each inserted value would need to be checked in > either case, and there is no extra work to be done in either case. > > In the INSERT case, we do not check the unique constraint for each row. We run into an error when inserting a duplicate, aborting the operation. > INSERT INTO test (id, text1) > SELECT generate_series, 'x' > FROM generate_series(1, 3000000) > > It remains 4 seconds even when adding a clause to not insert duplicates. > > INSERT INTO test (id, text1) > SELECT generate_series, 'x' > FROM generate_series(1, 3000000) > WHERE NOT EXISTS ( > SELECT 1 > FROM test4 > WHERE id = generate_series > ) > > In this case, we are not checking duplicates inside the input dataset. If you can guarantee, at the application level, that there are no duplicates, this seems a good speedup. Perhaps the MERGE clause... > Furthermore, I have found that performing an UPDATE rather than an INSERT > ... ON CONFLICT ... DO UPDATE is twice as slow, completing in 16 seconds > instead of 14 seconds. > > UPDATE test > SET text1 = 'x' > FROM generate_series(1, 3000000) > WHERE test4.id = generate_series > > This also now means that updating 3 million rows takes 4x longer than > inserting those rows. Do we expect updates to be 4x slower than inserts? > > It is not the update that is slower. It is the attached where clause that makes it slower. Try: UPDATE test SET text1='x'; In my tests, the update of non-indexed columns is slightly faster than an insert. Regards, Renan Fonseca