On Tue, Mar 17, 2020 at 11:37 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote: > > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund <and...@anarazel.de> wrote: > > > > > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote: > > > > > I think Andres was thinking this would maybe be an optimization > > > > > independent of > > > > > is_insert_only (?) > > > > > > > > I wasn't sure. > > > > > > I'm not sure myself - but I'm doubtful that using a 0 min age by default > > > will be ok. > > > > > > I was trying to say (in a later email) that I think it might be a good > > > compromise to opportunistically freeze if we're dirtying the page > > > anyway, but not optimize WAL emission etc. That's a pretty simple > > > change, and it'd address a lot of the potential performance regressions, > > > while still freezing for the "first" vacuum in insert only workloads. > > > > If we have truly insert-only tables, then doesn't vacuuming with > > freezing every tuple actually decrease total vacuum cost (perhaps > > significantly) since otherwise every vacuum keeps having to scan the > > heap for dead tuples on pages where we know there are none? Those > > pages could conceptually be frozen and ignored, but are not frozen > > because of the default behavior, correct? > > The essential part of this patch is to trigger vacuum *at all* on an > insert-only table. Before today's updated patch, it also used FREEZE on any > table which hit the new insert threshold. The concern I raised is for > insert-MOSTLY tables. I thought it might be an issue if repeatedly freezing > updated tuples caused vacuum to be too slow, especially if they're distributed > in pages all across the table rather than clustered.
Yeah, for some reason I'd completely forgotten (caught up in thinking about the best possible outcome re: freezing insert only tables) that the bigger problem was just triggering vacuum at all on those tables. > And I asked that the behavior (FREEZE) be configurable by a separate setting > than the one that triggers autovacuum to run. FREEZE is already controlled by > the vacuum_freeze_table_age param. > > I think you're right that VACUUM FREEZE on an insert-only table would be less > expensive than vacuum once without freeze and vacuum again later, which uses > freeze. To me, that suggests setting vacuum_freeze_table_age to a low value > on > those tables. > > Regular vacuum avoids scanning all-visible pages, so for an insert-only table > pages should only be vacuumed once (if frozen the 1st time) or twice (if not). > > * Except when aggressive is set, we want to skip pages that are > * all-visible according to the visibility map, but only when we can > skip > > postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT > generate_series(1,999999); VACUUM VERBOSE t; VACUUM VERBOSE t; > ... > INFO: "t": found 0 removable, 999999 nonremovable row versions in 4425 out > of 4425 pages > ... > VACUUM > Time: 106.038 ms > INFO: "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 > pages > VACUUM > Time: 1.828 ms > > => That's its not very clear way of saying that it only scanned 1 page the 2nd > time around. I didn't realize that about the visibility map being taken into account. > > We have tables that log each change to a business object (as I suspect > > many transactional workloads do), and I've often thought that > > immediately freeze every page as soon as it fills up would be a real > > win for us. > > > > If that's all true, it seems to me that removing that part of the > > patch significantly lowers its value. > > > If we opportunistically freeze only if we're already dirtying a page, > > would that help a truly insert-only workload? E.g., are there hint > > bits on the page that would need to change the first time we vacuum a > > full page with no dead tuples? I would have assumed the answer was > > "no" (since if so I think it would follow that _all_ pages need > > updated the first time they're vacuumed?). > > You probably know that hint bits are written by the first process to access > the > tuple after it was written. I think you're asking if the first *vacuum* > requires additional writes beyond that. And I think vacuum wouldn't touch the > page until it decides to freeze tuples. I think my assumption is that (at least in our case), the first process to access will definitely not be vacuum on any regular basis. > I do have a patch to display the number of hint bits written and pages frozen. > https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com I'll take a look at that too. > > But if that's the case, then this kind of opportunistic freezing wouldn't > > help this kind of workload. Maybe there's something I'm misunderstanding > > about how vacuum works though. > > I am reminding myself about vacuum with increasing frequency and usually still > learn something new. For sure. Thanks, James