Hello,

We encountered a rare and hard-to-investigate problem on Windows, which one of 
our customers reported.  Please find the attached patch to fix that.  I'll add 
this to the next CF.


PROBLEM
==============================

PostgreSQL sometimes crashes with the following messages.  This is infrequent 
(but frequent for the customer); it occurred about 10 times in the past 5 
months.

LOG:  server process (PID 2712) was terminated by exception 0xC0000005
HINT:  See C include file "ntstatus.h" for a description of the hexadecimal 
value.
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
LOG:  all server processes terminated; reinitializing

"server process" shows that an client backend crashed.  The above messages 
indicate that the process was not running an SQL command.

PostgreSQL runs as a Windows service.

No crash dump was produced anywhere, despite the facts:
- <PGDATA>/crashdumps folder exists and is writable by the PostgreSQL user 
account (which is the user postgres.exe runs as)
- The Windows registry configuration allows dumping the crash dump


CAUSE
==============================

We believe WSAStartup() in main.c failed.  The only conceivable error is:

WSAEPROCLIM
10067
Too many processes.
A Windows Sockets implementation may have a limit on the number of applications 
that can use it simultaneously. WSAStartup may fail with this error if the 
limit has been reached.

But I couldn't find what the limit is and whether we can tune it.  We couldn't 
reproduce the problem.

When I pretend that WSAStartup() failed while a client backend is starting up, 
I could see the same phenomenon as the customer.  This problem only occurs when 
PostgreSQL runs as a Windows service.

The bug is in write_eventlog().  It calls pgwin32_message_to_utf16() which in 
turn calls palloc(), which requires the memory management system to be set up 
(CurrentMemoryContext != NULL).


FIX
==============================

Add the check "CurrentMemoryContext != NULL" in write_eventlog() as in 
write_console().


NOTE
==============================

The reason is for not outputing the crash dump is a) the crash occurred before 
installing the Windows exception handler (pgwin32_install_crashdump_handler() 
call) and b) the effect of the following call in postmaster is inherited in the 
child process.

                /* In case of general protection fault, don't show GUI popup 
box */
                SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);

But I'm not sure in what order we should do 
pgwin32_install_crashdump_handler(), startup_hacks() and steps therein, 
MemoryContextInit().  I think that's another patch.

Regards
Takayuki Tsunakawa


Attachment: write_eventlog_crash.patch
Description: write_eventlog_crash.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to