Hi, On 2025-02-13 10:52:31 +0100, Sébastien wrote: > Introduce an INSERT FROZEN feature to bypass vacuum processing for > large-scale cold data imports, reducing the impact on system performance > post-import. For large imports, migrations and major version upgrades. > Business Use-case: > > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a heavily > loaded server, we observed that the system struggled significantly weeks > later when the autovacuum process had to freeze all the imported data > pages. This led to severe performance degradation, requiring manual > intervention to prioritize vacuum jobs to complete them as quickly as > possible.
What version of postgres was this? What batch sizes do you need to support? I.e. is all of this data inserted at once, or in steps? As already discussed, it seems unlikely that we'll ever support INSERT FROZEN, due to the potential of causing concurrent queries to give bogus answers. But there's actually a lot we can do to improve this short of INSERT FROZEN. The reason I asked for the version is that the behaviour would e.g. likely be worse before autovacuum_vacuum_insert_scale_factor existed. We are working on improvements around that in 18 too, ensuring that the gap between insert triggered vacuums does not grow forever. Several recent releases have also improved the situation around this in other ways, e.g. by just making vacuuming faster and by avoiding doing redundant work in more cases (by increasing relfrozenzid more aggressively). We've also been talking about performing freezing during e.g. checkpoints, if possible. If you're inserting all the data in a single transaction however, it'll be hard to improve most of this, because while that long long transaction runs, we can't do anything that needs to know the transaction has finished. OTOH, if it were a single transaction, you could already use COPY FREEZE. A somewhat related issue is that bulk INSERTs, in contrast to COPY, currently does not use the bulk-insert logic, leading the INSERT to cause a lot more WAL to be emitted compared to inserting the same data via COPY. > This issue is particularly critical during database *migrations* or *version > upgrades*, where a full data reload is often necessary. Each time a major > PostgreSQL upgrade occurs, users must reimport large datasets, leading to > the same problem of vacuum storms post-import. An INSERT FROZEN feature > would allow importing data that is known to be immutable, preventing > unnecessary vacuum overhead and reducing system strain. What are you using for such upgrades or migrations? I'd not expect INSERT to be used, due to the overhead that has compared to COPY. Greetings, Andres Freund