On 4/5/18 11:07 AM, Magnus Hagander wrote: > > > On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra > <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote: > > ... > > It however still fails to initialize the attempts field after allocating > the db entry in BuildDatabaseList, so if you try running with > -DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this: > > WARNING: attempts = -1684366952 > WARNING: attempts = 1010514489 > WARNING: attempts = -1145390664 > WARNING: attempts = 1162101570 > > I guess those are not the droids we're looking for? > > > When looking at that and after a quick discussion, we just decided it's > better to completely remove the retry logic. As you mentioned in some > earlier mail, we had all this logic to retry databases (unlikely) but > not relations (likely). Attached patch simplifies it to only detect the > "database was dropped" case (which is fine), and consider every other > kind of failure a permanent one and just not turn on checksums in those > cases. >
OK, works for me. > > > Likewise, I don't see where ChecksumHelperShmemStruct->abort gets > initialized. I think it only ever gets set in launcher_exit(), but that > does not seem sufficient. I suspect it's the reason for this behavior: > > > It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed. > (The whole memset-to-zero) > OK, seems fine now. > > test=# select pg_enable_data_checksums(10, 10); > ERROR: database "template0" does not allow connections > HINT: Allow connections using ALTER DATABASE and try again. > test=# alter database template0 allow_connections = true; > ALTER DATABASE > test=# select pg_enable_data_checksums(10, 10); > ERROR: could not start checksumhelper: already running > test=# select pg_disable_data_checksums(); > pg_disable_data_checksums > --------------------------- > > (1 row) > > test=# select pg_enable_data_checksums(10, 10); > ERROR: could not start checksumhelper: has been cancelled > > > Turns out that wasn't the problem. The problem was that we *set* it > before erroring out with the "does not allow connections", but never > cleared it. In that case, it would be listed as launcher-is-running even > though the launcher was never started. > > Basically the check for datallowconn was put in the wrong place. That > check should go away completely once we merge (because we should also > merge the part that allows us to bypass it), but for now I have moved > the check to the correct place. > Ah, OK. I was just guessing. > > > Attached is a diff with a couple of minor comment tweaks, and correct > initialization of the attempts field. > > > Thanks. This is included in the attached update, along with the above > fixes and some other small touches from Daniel. > This patch version seems fine to me. I'm inclined to mark it RFC. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services