Re: PG12 autovac issues

2020-03-30 Thread Michael Paquier
On Sat, Mar 28, 2020 at 05:53:59PM +0900, Michael Paquier wrote: > And I'll follow up there with anything new I find. Please let me know > if there are any objections with the revert though, this will address > the problem reported by Justin. Okay. Done with this part now as of dd9ac7d. Now for

Re: PG12 autovac issues

2020-03-29 Thread Michael Paquier
On Sat, Mar 28, 2020 at 11:29:41AM -0700, Andres Freund wrote: > I assume you're still trying to track the actual cause of the problem > further? That's the plan, and I'll try to spend some time on it next week. Any new information I have will be added to the thread you have begun on -hackers a c

Re: PG12 autovac issues

2020-03-28 Thread Andres Freund
Hi, On 2020-03-28 17:47:19 +0900, Michael Paquier wrote: > On Fri, Mar 27, 2020 at 05:10:03PM -0500, Justin King wrote: > > This is encouraging. As I mentioned, we have a workaround in place for > > the moment, but don't hesitate if you need anything else from me. > > Thanks for jumping in on the

Re: PG12 autovac issues

2020-03-28 Thread Michael Paquier
On Fri, Mar 27, 2020 at 08:23:03PM +0100, Julien Rouhaud wrote: > FTR we reached the 200M transaxtion earlier, and I can see multiple logs of > the > form "automatic vacuum to prevent wraparound", so non-aggressive > antiwraparound > autovacuum, all on shared relations. Thanks Julien for sharing

Re: PG12 autovac issues

2020-03-28 Thread Michael Paquier
On Fri, Mar 27, 2020 at 05:10:03PM -0500, Justin King wrote: > Sounds great. I will email you directly with a link! Thanks. From the logs, the infinite loop on which autovacuum jobs are stuck is clear. We have a repetitive number of anti-wraparound and non-aggressive jobs happening for 7 shared

Re: PG12 autovac issues

2020-03-27 Thread Justin King
On Fri, Mar 27, 2020 at 12:12 AM Michael Paquier wrote: > > On Thu, Mar 26, 2020 at 09:46:47AM -0500, Justin King wrote: > > Nope, it was just these tables that were looping over and over while > > nothing else was getting autovac'd. I'm happy to share the full log > > if you'd like. > > Thanks,

Re: PG12 autovac issues

2020-03-27 Thread Julien Rouhaud
On Fri, Mar 27, 2020 at 02:12:04PM +0900, Michael Paquier wrote: > On Thu, Mar 26, 2020 at 09:46:47AM -0500, Justin King wrote: > > Nope, it was just these tables that were looping over and over while > > nothing else was getting autovac'd. I'm happy to share the full log > > if you'd like. > > T

Re: PG12 autovac issues

2020-03-26 Thread Michael Paquier
On Thu, Mar 26, 2020 at 09:46:47AM -0500, Justin King wrote: > Nope, it was just these tables that were looping over and over while > nothing else was getting autovac'd. I'm happy to share the full log > if you'd like. Thanks, that could help. If that's very large, it could be a problem to send

Re: PG12 autovac issues

2020-03-26 Thread Justin King
On Wed, Mar 25, 2020 at 8:43 PM Michael Paquier wrote: > > On Wed, Mar 25, 2020 at 10:39:17AM -0500, Justin King wrote: > > This started happening again. DEBUG1 is enabled: > > Thanks for enabling DEBUG1 logs while this happened. > > > Mar 25 14:48:26 cowtn postgres[39875]: [35298-1] 2020-03-25 >

Re: PG12 autovac issues

2020-03-25 Thread Michael Paquier
On Wed, Mar 25, 2020 at 07:59:56PM -0700, Andres Freund wrote: > FWIW, this kind of thing is why I think the added skipping logic is a > bad idea. Silently skipping things like this (same with the "bogus" > logic in datfrozenxid computation) is dangerous. I think we should > seriously consider back

Re: PG12 autovac issues

2020-03-25 Thread Andres Freund
Hi, On 2020-03-26 10:43:36 +0900, Michael Paquier wrote: > On Wed, Mar 25, 2020 at 10:39:17AM -0500, Justin King wrote: > > Mar 25 14:48:26 cowtn postgres[39875]: [35298-1] 2020-03-25 > > 14:48:26.329 GMT [39875] DEBUG: skipping redundant vacuum to prevent > > wraparound of table "postgres.pg_cat

Re: PG12 autovac issues

2020-03-25 Thread Michael Paquier
On Wed, Mar 25, 2020 at 10:39:17AM -0500, Justin King wrote: > This started happening again. DEBUG1 is enabled: Thanks for enabling DEBUG1 logs while this happened. > Mar 25 14:48:26 cowtn postgres[39875]: [35298-1] 2020-03-25 > 14:48:26.329 GMT [39875] DEBUG: skipping redundant vacuum to preve

Re: PG12 autovac issues

2020-03-25 Thread Justin King
All- This started happening again. DEBUG1 is enabled: Mar 25 14:48:03 cowtn postgres[39720]: [35294-1] 2020-03-25 14:48:03.972 GMT [39720] DEBUG: autovacuum: processing database "template0" Mar 25 14:48:06 cowtn postgres[39735]: [35294-1] 2020-03-25 14:48:06.545 GMT [39735] DEBUG: autovacuum:

Re: PG12 autovac issues

2020-03-24 Thread Andres Freund
Hi, On 2020-03-24 15:12:38 +0900, Michael Paquier wrote: > > Well, there's no logging of autovacuum launchers that don't do anything > > due to the "skipping redundant" logic, with normal log level. If somehow > > the horizon logic of autovacuum workers gets out of whack with what > > vacuumlazy.c

Re: PG12 autovac issues

2020-03-23 Thread Michael Paquier
On Mon, Mar 23, 2020 at 10:40:39PM -0700, Andres Freund wrote: > On 2020-03-24 14:26:06 +0900, Michael Paquier wrote: >> Nothing really fancy: >> - autovacuum_vacuum_cost_delay to 2ms (default of v12, but we used it >> in v11 as well). >> - autovacuum_naptime = 15s >> - autovacuum_max_workers = 6 >

Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi, On 2020-03-24 14:26:06 +0900, Michael Paquier wrote: > > Could you share what the config of the server was? > > Nothing really fancy: > - autovacuum_vacuum_cost_delay to 2ms (default of v12, but we used it > in v11 as well). > - autovacuum_naptime = 15s > - autovacuum_max_workers = 6 > - log_

Re: PG12 autovac issues

2020-03-23 Thread Michael Paquier
On Mon, Mar 23, 2020 at 01:00:51PM -0700, Andres Freund wrote: > On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote: >>> - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest >>> database >>> SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class WHERE >>> relfrozenx

Re: PG12 autovac issues

2020-03-23 Thread Justin King
On Mon, Mar 23, 2020 at 4:31 PM Justin King wrote: > > On Mon, Mar 23, 2020 at 3:00 PM Andres Freund wrote: > > > > Hi, > > > > On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote: > > > > - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest > > > > database > > > > SELECT oid::

Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi, On 2020-03-23 16:31:21 -0500, Justin King wrote: > This is occurring in our environment right now (started about 30 min > ago). Here 's the latest logs (grepped by vacuum): > > Mar 23 20:54:16 cowtn postgres[15569]: [12-1] 2020-03-23 20:54:16.542 > GMT [15569] LOG: automatic vacuum of table

Re: PG12 autovac issues

2020-03-23 Thread Justin King
On Mon, Mar 23, 2020 at 3:00 PM Andres Freund wrote: > > Hi, > > On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote: > > > - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest > > > database > > > SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class > > > WHERE r

Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi, On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote: > > - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest > > database > > SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class WHERE > > relfrozenxid <> 0 ORDER BY age(relfrozenxid) DESC LIMIT 1; > > The vm

Re: PG12 autovac issues

2020-03-23 Thread Julien Rouhaud
Hi, On Mon, Mar 23, 2020 at 09:23:03AM -0700, Andres Freund wrote: > Hi, > > On 2020-03-23 16:22:47 +0100, Julien Rouhaud wrote: > > On Fri, Mar 20, 2020 at 12:03:17PM -0700, Andres Freund wrote: > > > Hi, > > > > > > On 2020-03-20 12:42:31 -0500, Justin King wrote: > > > > When we get into this

Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi, On 2020-03-23 16:22:47 +0100, Julien Rouhaud wrote: > On Fri, Mar 20, 2020 at 12:03:17PM -0700, Andres Freund wrote: > > Hi, > > > > On 2020-03-20 12:42:31 -0500, Justin King wrote: > > > When we get into this state again, is there some other information > > > (other than what is in pg_stat_s

Re: PG12 autovac issues

2020-03-23 Thread Julien Rouhaud
On Fri, Mar 20, 2020 at 12:03:17PM -0700, Andres Freund wrote: > Hi, > > On 2020-03-20 12:42:31 -0500, Justin King wrote: > > When we get into this state again, is there some other information > > (other than what is in pg_stat_statement or pg_stat_activity) that > > would be useful for folks here

Re: PG12 autovac issues

2020-03-20 Thread Michael Lewis
> > We haven't isolated *which* table it is blocked on (assuming it is), > but all autovac's cease running until we manually intervene. > > When we get into this state again, is there some other information > (other than what is in pg_stat_statement or pg_stat_activity) that > would be useful for f

Re: PG12 autovac issues

2020-03-20 Thread Andres Freund
Hi, On 2020-03-20 12:42:31 -0500, Justin King wrote: > When we get into this state again, is there some other information > (other than what is in pg_stat_statement or pg_stat_activity) that > would be useful for folks here to help understand what is going on? If it's actually stuck on a single t

Re: PG12 autovac issues

2020-03-20 Thread Justin King
On Thu, Mar 19, 2020 at 6:56 PM Andres Freund wrote: > > Hi, > > On 2020-03-19 18:07:14 -0500, Justin King wrote: > > On Thu, Mar 19, 2020 at 5:35 PM Andres Freund wrote: > > > > > > Hi, > > > > > > On 2020-03-19 10:23:48 -0500, Justin King wrote: > > > > > From a single stats snapshot we can't a

Re: PG12 autovac issues

2020-03-19 Thread Andres Freund
Hi, On 2020-03-19 18:07:14 -0500, Justin King wrote: > On Thu, Mar 19, 2020 at 5:35 PM Andres Freund wrote: > > > > Hi, > > > > On 2020-03-19 10:23:48 -0500, Justin King wrote: > > > > From a single stats snapshot we can't actually understand the actual xid > > > > consumption - is it actually th

Re: PG12 autovac issues

2020-03-19 Thread Justin King
On Thu, Mar 19, 2020 at 5:35 PM Andres Freund wrote: > > Hi, > > On 2020-03-19 10:23:48 -0500, Justin King wrote: > > > From a single stats snapshot we can't actually understand the actual xid > > > consumption - is it actually the xid usage that triggers the vacuums? > > > > We have looked at thi

Re: PG12 autovac issues

2020-03-19 Thread Andres Freund
Hi, On 2020-03-19 10:23:48 -0500, Justin King wrote: > > From a single stats snapshot we can't actually understand the actual xid > > consumption - is it actually the xid usage that triggers the vacuums? > > We have looked at this and the xid consumption averages around 1250 > xid/sec -- this is

Re: Fwd: PG12 autovac issues

2020-03-19 Thread Justin King
On Thu, Mar 19, 2020 at 11:02 AM Michael Lewis wrote: > > On Thu, Mar 19, 2020 at 9:31 AM Justin King wrote: >> >> On Wed, Mar 18, 2020 at 1:40 PM Michael Lewis wrote: >> > >> > Do you have default fillfactor set on this table? If not, I would wonder >> > if reducing it to 50% or even 20% would

Re: Fwd: PG12 autovac issues

2020-03-19 Thread Michael Lewis
On Thu, Mar 19, 2020 at 9:31 AM Justin King wrote: > On Wed, Mar 18, 2020 at 1:40 PM Michael Lewis wrote: > > > > Do you have default fillfactor set on this table? If not, I would wonder > if reducing it to 50% or even 20% would allow many more HOT updates that > would reduce bloat. > > I don't

Re: Fwd: PG12 autovac issues

2020-03-19 Thread Justin King
On Wed, Mar 18, 2020 at 1:40 PM Michael Lewis wrote: > > Do you have default fillfactor set on this table? If not, I would wonder if > reducing it to 50% or even 20% would allow many more HOT updates that would > reduce bloat. I don't believe we have a default fillfactor, but I'm still trying t

Re: Re: PG12 autovac issues

2020-03-19 Thread Justin King
On Wed, Mar 18, 2020 at 10:13 AM Adrian Klaver wrote: > > On 3/18/20 6:57 AM, Justin King wrote: > Please reply to list also > Ccing list > > > >>> Here are the settings, these are the only ones that are not set to > >>> default with the exception of a few tables that have been overridden > >>> wi

RE: PG12 autovac issues

2020-03-19 Thread Justin King
Hi Andres- Thanks for the reply, answers below. On Tue, Mar 17, 2020 at 8:19 PM Andres Freund wrote: > > Hi, > > On 2020-03-17 17:18:57 -0500, Justin King wrote: > > As you can see in this table, there are only ~80K rows, but billions > > of updates. What we have observed is that the frozenxid

Re: Fwd: PG12 autovac issues

2020-03-18 Thread Michael Lewis
Do you have default fillfactor set on this table? If not, I would wonder if reducing it to 50% or even 20% would allow many more HOT updates that would reduce bloat. Also, is there any period of lower activity on your system that you could schedule a vacuum freeze for daily or weekly? I believe ha

Re: Fwd: PG12 autovac issues

2020-03-18 Thread Adrian Klaver
On 3/18/20 6:57 AM, Justin King wrote: Please reply to list also Ccing list Here are the settings, these are the only ones that are not set to default with the exception of a few tables that have been overridden with a different value due to lots of updates and few rows: And those values are?

Re: Fwd: PG12 autovac issues

2020-03-17 Thread Adrian Klaver
On 3/17/20 3:48 PM, Justin King wrote: On Tue, Mar 17, 2020 at 5:39 PM Adrian Klaver wrote: On 3/17/20 3:22 PM, Justin King wrote: Apologies, I accidentally sent this to the pgsql-admin list initially but intended it go here: We have a database that isn't overly large (~20G), but gets incred

Re: Fwd: PG12 autovac issues

2020-03-17 Thread Justin King
On Tue, Mar 17, 2020 at 5:39 PM Adrian Klaver wrote: > > On 3/17/20 3:22 PM, Justin King wrote: > > Apologies, I accidentally sent this to the pgsql-admin list initially > > but intended it go here: > > > > We have a database that isn't overly large (~20G), but gets incredibly > > frequent updates

Re: Fwd: PG12 autovac issues

2020-03-17 Thread Adrian Klaver
On 3/17/20 3:22 PM, Justin King wrote: Apologies, I accidentally sent this to the pgsql-admin list initially but intended it go here: We have a database that isn't overly large (~20G), but gets incredibly frequent updates. Here's an example table: feedi=# select * from pg_stat_all_tables where

Fwd: PG12 autovac issues

2020-03-17 Thread Justin King
Apologies, I accidentally sent this to the pgsql-admin list initially but intended it go here: We have a database that isn't overly large (~20G), but gets incredibly frequent updates. Here's an example table: feedi=# select * from pg_stat_all_tables where schemaname = 'production' and relname =