Re: Some pgq table rewrite incompatibility with logical decoding?

Tomas Vondra Tue, 28 Aug 2018 14:50:11 -0700

Hi Jeremy,

On 08/28/2018 10:46 PM, Jeremy Finzel wrote:
>     We have hit this error again, and we plan to snapshot the database
>     as to be able to do whatever troubleshooting we can. 
> 
> 
> I am happy to report that we were able to get replication working again
> by running snapshots of the systems in question on servers running the
> latest point release 9.6.10, and replication simply works and skips over
> these previously erroring relfilenodes.  So whatever fixes were made in
> this point release to logical decoding seems to have fixed the issue.
>


Interesting.

So you were running 9.6.9 before, it triggered the issue (and was not
able to recover). You took a filesystem snapshot, started a 9.6.10 on
the snapshot, and it recovered without hitting the issue?

I quickly went through the commits in 9.6 branch between 9.6.9 and
9.6.10, looking for stuff that might be related, and these three commits
seem possibly related (usually because of invalidations, vacuum, ...):

  6a46aba1cd6dd7c5af5d52111a8157808cbc5e10
  Fix bugs in vacuum of shared rels, by keeping their relcache entries
  current.

  da10d6a8a94eec016fa072d007bced9159a28d39
  Fix "base" snapshot handling in logical decoding

  0a60a291c9a5b8ecdf44cbbfecc4504e3c21ef49
  Add table relcache invalidation to index builds.

But it's hard to say if/which of those commits did the trick, without
more information.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Some pgq table rewrite incompatibility with logical decoding?

Reply via email to