Re: Online enabling of checksums

Stephen Frost Fri, 23 Feb 2018 18:53:08 -0800

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 02/24/2018 03:11 AM, Andres Freund wrote:
> > On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
> >> I agree having to restart the whole operation after a crash is not
> >> ideal, but I don't see how adding a flag actually solves it. The problem
> >> is the large databases often store most of the data (>80%) in one or two
> >> central tables (think fact tables in star schema, etc.). So if you
> >> crash, it's likely half-way while processing this table, so the whole
> >> table would still have relchecksums=false and would have to be processed
> >> from scratch.
> > 
> > I don't think it's quite as large a problem as you make it out to
> > be. Even in those cases you'll usually have indexes, toast tables and so
> > forth.
> 
> Hmmm, right. I've been focused on tables and kinda forgot that the other
> objects need to be transformed too ... :-/


There's also something of a difference between just scanning a table or
index, where you don't have to do much in the way of actual writes
because most of the table already has valid checksums, and having to
actually write out all the changes.

> >> But perhaps you meant something like "position" instead of just a simple
> >> true/false flag?
> > 
> > I think that'd incur a much larger complexity cost.
> 
> Yep, that was part of the point that I was getting to - that actually
> addressing the issue would be more expensive than simple flags. But as
> you pointed out, that was not quite ... well thought through.

No, but it's also not entirely wrong.  Huge tables aren't uncommon.

That said, I'm not entirely convinced that these new flags would be as
unnoticed as is being suggested here, but rather than focus on either
side of that, I'm thinking about what we want to have *next*- we know
that enabling/disabling checksums is an issue that needs to be solved,
and this patch is making progress towards that, but the next question is
what does one do when a page has been detected as corrupted?  Are there
flag fields which would be useful to have at a per-relation level to
support some kind of corrective action or setting that says "don't care
about checksums on this table, even though the entire database is
supposed to have valid checksums, but instead do X with failed pages" or
similar.

Beyond dealing with corruption-recovery cases, are there other use cases
for having a given table not have checksums?

Would it make sense to introduce a flag or field which indicates that an
entire table's pages has some set of attributes, of which 'checksums' is
just one attribute?  Perhaps a page version, which potentially allows us
to have a way to change page layouts in the future?

I'm happy to be told that we simply don't have enough information at
this point to make anything larger than a relchecksums field-level
decision, but perhaps these thoughts will spark an idea about how we
could define something a bit broader with clear downstream usefulness
that happens to also cover the "does this relation have checksums?"
question.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Online enabling of checksums

Reply via email to