On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand < bertranddrouvot...@gmail.com> wrote:
> > > On 4/2/19 7:06 PM, Magnus Hagander wrote: > > On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <mich...@paquier.xyz > <mailto:mich...@paquier.xyz>> wrote: > > > > On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote: > > > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier < > mich...@paquier.xyz <mailto:mich...@paquier.xyz>> wrote: > > >> One thing which is not > > >> proposed on this patch, and I am fine with it as a first draft, > is > > >> that we don't have any information about the broken block number > and > > >> the file involved. My gut tells me that we'd want a separate > view, > > >> like pg_stat_checksums_details with one tuple per (dboid, rel, > fork, > > >> blck) to be complete. But that's just for future work. > > > > > > That could indeed be nice. > > > > Actually, backpedaling on this one... pg_stat_checksums_details may > > be a bad idea as we could finish with one row per broken block. If > > a corruption is spreading quickly, pgstat would not be able to > sustain > > that amount of objects. Having pg_stat_checksums would allow us to > > plugin more data easily based on the last failure state: > > - last relid of failure > > - last fork type of failure > > - last block number of failure. > > Not saying to do that now, but having that in pg_stat_database does > > not seem very natural to me. And on top of that we would have an > > extra row full of NULLs for shared objects in pg_stat_database if we > > adopt the unique view approach... I find that rather ugly. > > > > > > I think that tracking each and every block is of course a non-starter, > as you've noticed. > > I think that's less of a concern now that the stats collector process has > gone and that the stats are now collected in shared memory, what do you > think? > It would be less of a concern yes, but I think it still would be a concern. If you have a large amount of corruption you could quickly get to millions of rows to keep track of which would definitely be a problem in shared memory as well, wouldn't it? But perhaps we could keep a list of "the last 100 checksum failures" or something like that? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>