HI Sébastien You can check out the email subject:Trigger more frequent autovacuums of heavy insert tables , I think it can alleviate the problem
Thanks On Sat, Feb 15, 2025 at 3:13 AM Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2025-02-13 10:52:31 +0100, Sébastien wrote: > > Introduce an INSERT FROZEN feature to bypass vacuum processing for > > large-scale cold data imports, reducing the impact on system performance > > post-import. For large imports, migrations and major version upgrades. > > Business Use-case: > > > > When importing massive datasets (e.g., 6-10TB) into PostgreSQL on a > heavily > > loaded server, we observed that the system struggled significantly weeks > > later when the autovacuum process had to freeze all the imported data > > pages. This led to severe performance degradation, requiring manual > > intervention to prioritize vacuum jobs to complete them as quickly as > > possible. > > What version of postgres was this? What batch sizes do you need to > support? > I.e. is all of this data inserted at once, or in steps? > > As already discussed, it seems unlikely that we'll ever support INSERT > FROZEN, > due to the potential of causing concurrent queries to give bogus answers. > But > there's actually a lot we can do to improve this short of INSERT FROZEN. > > The reason I asked for the version is that the behaviour would e.g. likely > be > worse before autovacuum_vacuum_insert_scale_factor existed. We are working > on > improvements around that in 18 too, ensuring that the gap between insert > triggered vacuums does not grow forever. > > Several recent releases have also improved the situation around this in > other > ways, e.g. by just making vacuuming faster and by avoiding doing redundant > work in more cases (by increasing relfrozenzid more aggressively). > > We've also been talking about performing freezing during e.g. checkpoints, > if > possible. > > If you're inserting all the data in a single transaction however, it'll be > hard to improve most of this, because while that long long transaction > runs, > we can't do anything that needs to know the transaction has finished. OTOH, > if it were a single transaction, you could already use COPY FREEZE. > > > A somewhat related issue is that bulk INSERTs, in contrast to COPY, > currently > does not use the bulk-insert logic, leading the INSERT to cause a lot more > WAL > to be emitted compared to inserting the same data via COPY. > > > > This issue is particularly critical during database *migrations* or > *version > > upgrades*, where a full data reload is often necessary. Each time a major > > PostgreSQL upgrade occurs, users must reimport large datasets, leading to > > the same problem of vacuum storms post-import. An INSERT FROZEN feature > > would allow importing data that is known to be immutable, preventing > > unnecessary vacuum overhead and reducing system strain. > > What are you using for such upgrades or migrations? I'd not expect INSERT > to > be used, due to the overhead that has compared to COPY. > > Greetings, > > Andres Freund > > >