[BUGS] PostgreSQL 7.4RC1 crashes on Panther
I've encountered a problem where the PostgreSQL database crashes when attempting to load pltcl.so on Mac OS 10.3. PostgreSQL fails because memory cannot be allocated during a shmget call. Here is the exact error message: FATAL: could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=5432001, size=3809280, 03600). HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space. To reduce the request size (currently 3809280 bytes), reduce PostgreSQL's shared_buffers parameter (currently 300) and/or its max_connections parameter (currently 50). The PostgreSQL documentation contains more information about shared memory configuration. Here's the code that triggers it: create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER as 'pltcl.so' language 'c'; I have 1GB of memory and very little running on the powerbook (I rebooted just to be sure I started with a clean system). Not sure whether this is a PostgreSQL problem or a Mac OS 10.3 problem, but I can load plpgsql.so right before loading pltcl.so and it still only fails on the pltcl.so load. Commenting out the plpgsql.so load and trying again it still fails on the pltcl.so load. I'm compiling against a locally compiled version of Tcl 8.4.4. Here are the configure settings: ./configure \ --prefix=$INSTALL/postgresql \ --with-tcl \ --with-tclconfig=$INSTALL/tcl/lib \ --with-includes=$INSTALL/tcl/include:$INSTALL/readline/include \ --with-libraries=$INSTALL/readline/lib \ --without-tk \ --without-openssl thanks, /s. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther
Hi Tom, On Nov 4, 2003, at 4:48 PM, Tom Lane wrote: Here's the code that triggers it: create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER as 'pltcl.so' language 'c'; I don't think so. That's a startup failure; it can not be triggered by executing a SQL command, because if the postmaster is alive enough to accept a SQL command in the first place, it's already gotten past creation of the shared memory segment. I have to differ here. This problem is being triggered by the create function section above, it is doing it after startup, and it's doing it on Mac OS 10.3. Here are the commands I'm using, in the order I'm using them. I'll be glad to admit I'm the one screwing it up, but I don't see where. # Define vars ROOT=/Users/scott/m INSTALL=$ROOT/install PG=$INSTALL/postgresql PGLIB=$PG/lib PGDATA=$ROOT/var/db PORT=5432 DB=m DYLD_LIBRARY_PATH=$INSTALL/tcl/lib:$INSTALL/postgresql/lib:$INSTALL/ openssl/lib export DYLD_LIBRARY_PATH # Initialize the database cluster $PG/bin/initdb -D $PGDATA --locale=C -L $PG/share ...output of the above command is: The files belonging to this database system will be owned by user "scott". This user must also own the server process. The database cluster will be initialized with locale C. creating directory /Users/scott/m/var/db... ok creating directory /Users/scott/m/var/db/base... ok creating directory /Users/scott/m/var/db/global... ok creating directory /Users/scott/m/var/db/pg_xlog... ok creating directory /Users/scott/m/var/db/pg_clog... ok selecting default max_connections... 30 selecting default shared_buffers... 200 creating configuration files... ok creating template1 database in /Users/scott/m/var/db/base/1... ok initializing pg_shadow... ok enabling unlimited row size for system tables... ok initializing pg_depend... ok creating system views... ok loading pg_description... ok creating conversions... ok setting privileges on built-in objects... ok creating information schema... ok vacuuming database template1... ok copying template1 to template0... ok Success. You can now start the database server using: /Users/scott/m/install/postgresql/bin/postmaster -D /Users/scott/m/var/db or /Users/scott/m/install/postgresql/bin/pg_ctl -D /Users/scott/m/var/db -l logfile start # Start the database $PG/bin/pg_ctl start -D $PGDATA -l $ROOT/database/postgres.log -o "-i" ...at this point the database is running, as shown by ps: scott 2712 0.0 0.137288936 std S12:10PM 0:00.02 /Users/scott/m/install/postgresql/bin/postmaster -i -D /Users/scott/m/var/db scott 2715 0.0 0.038276168 std S12:10PM 0:00.00 /Users/scott/m/install/postgresql/bin/postmaster -i -D /Users/scott/m/var/db scott 2717 0.0 0.037288260 std S12:10PM 0:00.00 /Users/scott/m/install/postgresql/bin/postmaster -i -D /Users/scott/m/var/db ...and by the log file: LOG: database system was shut down at 2003-11-06 12:10:49 CST LOG: checkpoint record is at 0/9B13D8 LOG: redo record is at 0/9B13D8; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 534; next OID: 17142 LOG: database system is ready # Create the database $PG/bin/psql -d template1 -c "create database $DB" ...output on the command line: CREATE DATABASE # Add PL/pgsql and PL/tcl $PG/bin/psql -d $DB -f $OPS/database/sql/add_languages.sql ...output on the command line is: psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: connection to server was lost ...output in the log file is: LOG: server process (PID 2739) was terminated by signal 10 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing FATAL: could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=5432001, size=3809280, 03600). HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space. To reduce the request size (currently 3809280 bytes), reduce PostgreSQL's shared_buffers parameter (currently 300) and/or its max_connections parameter (currently 50). The PostgreSQL documentation contains more information about shared memory configuration. ...at this point, the server is no longer running. The add_languages.sql file contains: create function plpgsql_call_handler() RETURNS LANGUAGE_HANDLER as 'plpgsql.so' language 'c'; create trusted procedural language 'plpgsql' HANDLER plpgsql_call_handler LANCOMPILER 'PL/pgSQL'; create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER as 'pltcl.so' language 'c'; create trusted procedural language 'pltcl' HANDLER pltcl_call_handler LANCOMPILER 'PL/Tcl'; (Line 13 of my add_languages.sql corresponds to the creation o
Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther
Just compiled PG 7.3.4 with GCC 3.1 on Panther and it exhibits the same problem, but generates a SIGSEGV instead of a SIGBUS. Here's the log: LOG: server process (pid 12078) was terminated by signal 11 LOG: terminating any other active server processes LOG: all server processes terminated; reinitializing shared memory and semaphores LOG: database system was interrupted at 2003-11-06 14:19:26 CST LOG: checkpoint record is at 0/80212C LOG: redo record is at 0/80212C; undo record is at 0/0; shutdown TRUE LOG: next transaction id: 480; next oid: 16976 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/80216C LOG: ReadRecord: record with zero length at 0/81E754 LOG: redo done at 0/81E730 LOG: database system is ready A reboot does not help -- it still fails. I recompiled at GCC 3.1 and it's failing at pltcl load again. I rebooted, then tried to add the languages again. plpgsql was already loaded from the last time, but shared memory failed again when it tried to load pltcl. ipcs isn't installed on Panther. Strangely though, I've found ipcs in the Darwin source tree (previous version) under /usr/bin, and in the same place in FreeBSD source tree. /s. On Nov 6, 2003, at 2:41 PM, Tom Lane wrote: Scott Goodwin <[EMAIL PROTECTED]> writes: psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. ...output in the log file is: LOG: server process (PID 2739) was terminated by signal 10 Here's the real problem --- why are you getting a SIGBUS while trying to load the pltcl handler function? I suspect something broken in Tcl's shared library, but dunno what. You should be getting a core file from the crashed process --- can you get a stack trace from it with gdb? FATAL: could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=5432001, size=3809280, 03600). This is evidently happening during attempted restart after the backend crash. I suspect it is a matter of the OS not having released the old memory segment yet, together with the SHMMAX limit being too tight to allow two such segments to exist concurrently. Are you able to start the server by hand immediately afterwards, or a few seconds afterwards? Or do you have to reboot before it will restart? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther
After recompiling with GCC 3.1 it fails when I'm running initdb to create the cluster -- it's a shmget error again. I believe that takes both Tcl and PostgreSQL out of the suspect pool and leaves Mac OS 10.3 as the primary culprit. I installed Panther last week from scratch (reformatted disk etc.) and haven't made any mods to it aside from the SystemTuning params today. I haven't had any other apps crash, and I'm using the system all day using Apple's apps, AOLserver, OpenSSL and others. I tried gdb to get a backtrace but the signal gets caught by postgres, so it doesn't dump me back to the gdb command line. I'll have to set breakpoints, have GDB do something with the signal, or mod PG to not catch it. That'll have to wait until tomorrow or Saturday. thanks for the assist, /s. On Nov 6, 2003, at 2:41 PM, Tom Lane wrote: Scott Goodwin <[EMAIL PROTECTED]> writes: psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. ...output in the log file is: LOG: server process (PID 2739) was terminated by signal 10 Here's the real problem --- why are you getting a SIGBUS while trying to load the pltcl handler function? I suspect something broken in Tcl's shared library, but dunno what. You should be getting a core file from the crashed process --- can you get a stack trace from it with gdb? FATAL: could not create shared memory segment: Cannot allocate memory DETAIL: Failed system call was shmget(key=5432001, size=3809280, 03600). This is evidently happening during attempted restart after the backend crash. I suspect it is a matter of the OS not having released the old memory segment yet, together with the SHMMAX limit being too tight to allow two such segments to exist concurrently. Are you able to start the server by hand immediately afterwards, or a few seconds afterwards? Or do you have to reboot before it will restart? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther
Awesome! Thanks so much for the fix -- I depend on PostgreSQL and Tcl on my powerbook to do development work. cheers, /s. On Nov 8, 2003, at 2:09 PM, Tom Lane wrote: It turns out that the "createlang pltcl" failure on OS X 10.3 was due to our ps_status code doing the wrong thing. I have committed a fix. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] [HACKERS] Mac OS X, PostgreSQL, PL/Tcl
Found the problem. If I have a very long environment variable exported and I start PG, PG crashes when I try to load PG/Tcl. In my case I use color ls and I have a very long LS_COLORS environment variable set. I have duplicated the problem by renaming my .bashrc and logging back in. With this clean environment, I started PG and loaded PG/Tcl without any problems. I then created the following environment variable on the command line: LONG_VAR=aa:bbb:cc: ddd:eee:fff: g::iii: j:kk:: mmm:n: ooo:pp:qqq: rrr:ss: ttt:u: vv:ww: xxx:y: zzz and exported it. (Obviously the line above is going to be broken into multiple lines by the mailer...). Then I stopped and restarted PG, loaded PG/Tcl and PG crashed. You *must* stop and restart PG for the problem to exhibit itself, otherwise it won't pick up the change in the environment. I suspect I'm running into a buffer overflow situation. Ok, it fails consistently when LONG_VAR is 523 characters or greater; works consistently when LONG_VAR is 522 characters or smaller. Might not fail at the same number for others. /s. To prove that this was the problem, I cleaned out my environment by moving my .bashrc file to another name, logged out, logged in, start On Feb 21, 2004, at 1:51 AM, Tom Lane wrote: Scott Goodwin <[EMAIL PROTECTED]> writes: Hoping someone can help me figure out why I can't get PL/Tcl to load without crashing the backend on Mac OS 10.3.2. FWIW, pltcl seems to work for me. Using up-to-date Darwin 10.3.2 and PG CVS tip, I did configure --with-tcl --without-tk then make, make install, etc. pltcl installs and passes its regression test. psql:/Users/scott/pgtest/add_languages.sql:12: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Can you provide a stack trace for this? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] [HACKERS] Mac OS X, PostgreSQL, PL/Tcl
I'm certain that the length of a single env var is the only factor involved, and not the size of the enviroment itself. If I login to my normal environment and unset LS_COLORS, everything works fine. If I move my .bashrc out of the way, login fresh and create an env var > 522 chars, it fails. My login environment is much larger than the environment I get without . bashrc, and the results of setting a single env var to > 522 chars duplicates the problem in both envs. leading me to believe that env size doesn't have an effect on this problem. I've now set my PG startup script to 'unset LS_COLORS' before starting PG, and this works great. Has anyone else tried to duplicate this problem? I'm using Mac OS 10.3.2, PG 7.4.1, Tcl 8.4.5. /s. On Feb 22, 2004, at 12:21 PM, Tom Lane wrote: Scott Goodwin <[EMAIL PROTECTED]> writes: Found the problem. If I have a very long environment variable exported and I start PG, PG crashes when I try to load PG/Tcl. In my case I use color ls and I have a very long LS_COLORS environment variable set. Interesting. Did you check whether the limiting factor is the longest variable length, or the total size of the environment? ("env|wc" would probably do as an approximation for the latter.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [BUGS] [HACKERS] Mac OS X, PostgreSQL, PL/Tcl
I'll grab the CVS PG copy and try it out. Is this something the Darwin folks should be notified about? It might cause problems with other apps. thanks, /s. On Feb 22, 2004, at 4:47 PM, Tom Lane wrote: Scott Goodwin <[EMAIL PROTECTED]> writes: Found the problem. If I have a very long environment variable exported and I start PG, PG crashes when I try to load PG/Tcl. In my case I use color ls and I have a very long LS_COLORS environment variable set. I was able to duplicate this. I am not entirely sure why the problem is dependent on the environment size, but I now know what causes it. It seems Darwin's libc keeps its own copy of the argv pointer, and when we move argv and then scribble on the original, it causes problems for subsequent code that tries to look at argv[0] to determine the executable's location. (It's a good thing Darwin is open source, 'cause I'm not sure we'd have ever seen the connection if we hadn't been able to look at the source code for their libc.) The fix is basically + #if defined(__darwin__) + #include + #endif + #if defined(__darwin__) + *_NSGetArgv() = new_argv; + #endif which you can stick into main.c if you need a workaround. I applied a more extensive patch to HEAD that refactors this code into ps_status.c, but I'm disinclined to apply that patch to stable branches... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend