On 02/24/2018 01:34 AM, Robert Haas wrote: > On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <mag...@hagander.net> wrote: >> I would prefer that yes. But having to re-read 9TB is still significantly >> better than not being able to turn on checksums at all (state today). And >> adding a catalog column for it will carry the cost of the migration >> *forever*, both for clusters that never have checksums and those that had it >> from the beginning. >> >> Accepting that the process will start over (but only read, not re-write, the >> blocks that have already been processed) in case of a crash does >> significantly simplify the process, and reduce the long-term cost of it in >> the form of entries in the catalogs. Since this is a on-time operation (or >> for many people, a zero-time operation), paying that cost that one time is >> probably better than paying a much smaller cost but constantly. > > That's not totally illogical, but to be honest I'm kinda surprised > that you're approaching it that way. I would have thought that > relchecksums and datchecksums columns would have been a sort of > automatic design choice for this feature. The thing to keep in mind > is that nobody's going to notice the overhead of adding those columns > in practice, but someone will surely notice the pain that comes from > having to restart the whole operation. You're talking about trading > an effectively invisible overhead for a very noticeable operational > problem. >
I agree having to restart the whole operation after a crash is not ideal, but I don't see how adding a flag actually solves it. The problem is the large databases often store most of the data (>80%) in one or two central tables (think fact tables in star schema, etc.). So if you crash, it's likely half-way while processing this table, so the whole table would still have relchecksums=false and would have to be processed from scratch. But perhaps you meant something like "position" instead of just a simple true/false flag? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services