On 04/05/2017 12:36 PM, Kuntal Ghosh wrote:
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.
OK, so essentially we reset the counters, getting
parallel_register_count = 0
parallel_terminate_count = 0
and then ForgetBackgroundWworker() runs and increments the terminate_count,
breaking the logic?
And you're using (parallel_register_count > 0) to detect whether we're still
in the init phase or not. Hmmm.
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.
Sure, and I agree that having an explicit flag is the right solution.
I'm just trying to make sure I understand what's happening.
The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these values
parallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAX
which is fine, because the C handles the overflow during subtraction and so
parallel_register_count - parallel_terminate_count = 1
But the assert is not doing subtraction, it's comparing the values directly:
Assert(parallel_register_count >= parallel_terminate_count);
and the (perfectly valid) values trivially violate this comparison.
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(
Actually, that assert would work, because C does handle overflows on
uint values during subtraction just fine. That is,
(UINT_MAX+x) - UINT_MAX = x
Also, the comment about overflows before BackgroundWorkerArray claims
this is the case.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers