Re: Checksum errors in pg_stat_database

Magnus Hagander Sun, 11 Dec 2022 12:19:18 -0800

On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand <
bertranddrouvot...@gmail.com> wrote:


>
>
> On 4/2/19 7:06 PM, Magnus Hagander wrote:
> > On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <mich...@paquier.xyz
> <mailto:mich...@paquier.xyz>> wrote:
> >
> >     On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
> >      > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <
> mich...@paquier.xyz <mailto:mich...@paquier.xyz>> wrote:
> >      >>  One thing which is not
> >      >> proposed on this patch, and I am fine with it as a first draft,
> is
> >      >> that we don't have any information about the broken block number
> and
> >      >> the file involved.  My gut tells me that we'd want a separate
> view,
> >      >> like pg_stat_checksums_details with one tuple per (dboid, rel,
> fork,
> >      >> blck) to be complete.  But that's just for future work.
> >      >
> >      > That could indeed be nice.
> >
> >     Actually, backpedaling on this one...  pg_stat_checksums_details may
> >     be a bad idea as we could finish with one row per broken block.  If
> >     a corruption is spreading quickly, pgstat would not be able to
> sustain
> >     that amount of objects.  Having pg_stat_checksums would allow us to
> >     plugin more data easily based on the last failure state:
> >     - last relid of failure
> >     - last fork type of failure
> >     - last block number of failure.
> >     Not saying to do that now, but having that in pg_stat_database does
> >     not seem very natural to me.  And on top of that we would have an
> >     extra row full of NULLs for shared objects in pg_stat_database if we
> >     adopt the unique view approach...  I find that rather ugly.
> >
> >
> > I think that tracking each and every block is of course a non-starter,
> as you've noticed.
>
> I think that's less of a concern now that the stats collector process has
> gone and that the stats are now collected in shared memory, what do you
> think?
>

It would be less of a concern yes, but I think it still would be a concern.
If you have a large amount of corruption you could quickly get to millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?

But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Re: Checksum errors in pg_stat_database

Reply via email to