[BUGS] PostgreSQL 7.4RC1 crashes on Panther

2003-11-04 Thread Scott Goodwin
I've encountered a problem where the PostgreSQL database crashes when 
attempting to load pltcl.so on Mac OS 10.3. PostgreSQL fails because 
memory cannot be allocated during a shmget call. Here is the exact 
error message:

FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001, size=3809280, 
03600).
HINT:  This error usually means that PostgreSQL's request for a shared 
memory segment exceeded available memory or swap space. To reduce the 
request size (currently 3809280 bytes), reduce PostgreSQL's 
shared_buffers parameter (currently 300) and/or its max_connections 
parameter (currently 50).
The PostgreSQL documentation contains more information about 
shared memory configuration.

Here's the code that triggers it:

create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
   as 'pltcl.so' language 'c';
I have 1GB of memory and very little running on the powerbook (I 
rebooted just to be sure I started with a clean system).

Not sure whether this is a PostgreSQL problem or a Mac OS 10.3 problem, 
but I can load plpgsql.so right before loading pltcl.so and it still 
only fails on the pltcl.so load. Commenting out the plpgsql.so load and 
trying again it still fails on the pltcl.so load. I'm compiling against 
a locally compiled version of Tcl 8.4.4. Here are the configure 
settings:

./configure \
--prefix=$INSTALL/postgresql \
--with-tcl \
--with-tclconfig=$INSTALL/tcl/lib \
--with-includes=$INSTALL/tcl/include:$INSTALL/readline/include \
--with-libraries=$INSTALL/readline/lib \
--without-tk \
--without-openssl
thanks,

/s.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther

2003-11-08 Thread Scott Goodwin
Hi Tom,

On Nov 4, 2003, at 4:48 PM, Tom Lane wrote:

Here's the code that triggers it:
create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
as 'pltcl.so' language 'c';
I don't think so.  That's a startup failure; it can not be triggered by
executing a SQL command, because if the postmaster is alive enough to
accept a SQL command in the first place, it's already gotten past
creation of the shared memory segment.
I have to differ here. This problem is being triggered by the create  
function section above, it is doing it after startup, and it's doing it  
on Mac OS 10.3. Here are the commands I'm using, in the order I'm using  
them. I'll be glad to admit I'm the one screwing it up, but I don't see  
where.

# Define vars
ROOT=/Users/scott/m
INSTALL=$ROOT/install
PG=$INSTALL/postgresql
PGLIB=$PG/lib
PGDATA=$ROOT/var/db
PORT=5432
DB=m
DYLD_LIBRARY_PATH=$INSTALL/tcl/lib:$INSTALL/postgresql/lib:$INSTALL/ 
openssl/lib
export DYLD_LIBRARY_PATH

# Initialize the database cluster
$PG/bin/initdb -D $PGDATA --locale=C -L $PG/share
...output of the above command is:

The files belonging to this database system will be owned by user  
"scott".
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory /Users/scott/m/var/db... ok
creating directory /Users/scott/m/var/db/base... ok
creating directory /Users/scott/m/var/db/global... ok
creating directory /Users/scott/m/var/db/pg_xlog... ok
creating directory /Users/scott/m/var/db/pg_clog... ok
selecting default max_connections... 30
selecting default shared_buffers... 200
creating configuration files... ok
creating template1 database in /Users/scott/m/var/db/base/1... ok
initializing pg_shadow... ok
enabling unlimited row size for system tables... ok
initializing pg_depend... ok
creating system views... ok
loading pg_description... ok
creating conversions... ok
setting privileges on built-in objects... ok
creating information schema... ok
vacuuming database template1... ok
copying template1 to template0... ok
Success. You can now start the database server using:

/Users/scott/m/install/postgresql/bin/postmaster -D  
/Users/scott/m/var/db
or
/Users/scott/m/install/postgresql/bin/pg_ctl -D  
/Users/scott/m/var/db -l logfile start



# Start the database
$PG/bin/pg_ctl start -D $PGDATA -l $ROOT/database/postgres.log -o "-i"
...at this point the database is running, as shown by ps:

scott  2712   0.0  0.137288936 std  S12:10PM   0:00.02  
/Users/scott/m/install/postgresql/bin/postmaster -i -D  
/Users/scott/m/var/db
scott  2715   0.0  0.038276168 std  S12:10PM   0:00.00  
/Users/scott/m/install/postgresql/bin/postmaster -i -D  
/Users/scott/m/var/db
scott  2717   0.0  0.037288260 std  S12:10PM   0:00.00  
/Users/scott/m/install/postgresql/bin/postmaster -i -D  
/Users/scott/m/var/db

...and by the log file:

LOG:  database system was shut down at 2003-11-06 12:10:49 CST
LOG:  checkpoint record is at 0/9B13D8
LOG:  redo record is at 0/9B13D8; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 534; next OID: 17142
LOG:  database system is ready
# Create the database
$PG/bin/psql -d template1 -c "create database $DB"
...output on the command line:
CREATE DATABASE
# Add PL/pgsql and PL/tcl
$PG/bin/psql -d $DB -f $OPS/database/sql/add_languages.sql
...output on the command line is:

psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server  
closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: connection  
to server was lost

...output in the log file is:

LOG:  server process (PID 2739) was terminated by signal 10
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001, size=3809280,  
03600).
HINT:  This error usually means that PostgreSQL's request for a shared  
memory segment exceeded available memory or swap space. To reduce the  
request size (currently 3809280 bytes), reduce PostgreSQL's  
shared_buffers parameter (currently 300) and/or its max_connections  
parameter (currently 50).
The PostgreSQL documentation contains more information about  
shared memory configuration.

...at this point, the server is no longer running.



The add_languages.sql file contains:

create function plpgsql_call_handler() RETURNS LANGUAGE_HANDLER
   as 'plpgsql.so' language 'c';
create trusted procedural language 'plpgsql'
   HANDLER plpgsql_call_handler
   LANCOMPILER 'PL/pgSQL';
create function pltcl_call_handler() RETURNS LANGUAGE_HANDLER
   as 'pltcl.so' language 'c';
create trusted procedural language 'pltcl'
   HANDLER pltcl_call_handler
   LANCOMPILER 'PL/Tcl';
(Line 13 of my add_languages.sql corresponds to the creation o

Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther

2003-11-08 Thread Scott Goodwin
Just compiled PG 7.3.4 with GCC 3.1 on Panther and it exhibits the same 
problem, but generates a SIGSEGV instead of a SIGBUS. Here's the log:

LOG:  server process (pid 12078) was terminated by signal 11
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing shared memory and 
semaphores
LOG:  database system was interrupted at 2003-11-06 14:19:26 CST
LOG:  checkpoint record is at 0/80212C
LOG:  redo record is at 0/80212C; undo record is at 0/0; shutdown TRUE
LOG:  next transaction id: 480; next oid: 16976
LOG:  database system was not properly shut down; automatic recovery in 
progress
LOG:  redo starts at 0/80216C
LOG:  ReadRecord: record with zero length at 0/81E754
LOG:  redo done at 0/81E730
LOG:  database system is ready

A reboot does not help -- it still fails. I recompiled at GCC 3.1 and 
it's failing at pltcl load again. I rebooted, then tried to add the 
languages again. plpgsql was already loaded from the last time, but 
shared memory failed again when it tried to load pltcl.

ipcs isn't installed on Panther. Strangely though, I've found ipcs in 
the Darwin source tree (previous version) under /usr/bin, and in the 
same place in FreeBSD source tree.

/s.



On Nov 6, 2003, at 2:41 PM, Tom Lane wrote:

Scott Goodwin <[EMAIL PROTECTED]> writes:
psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server
closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.

...output in the log file is:

LOG:  server process (PID 2739) was terminated by signal 10
Here's the real problem --- why are you getting a SIGBUS while trying 
to
load the pltcl handler function?  I suspect something broken in Tcl's
shared library, but dunno what.  You should be getting a core file from
the crashed process --- can you get a stack trace from it with gdb?

FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001, size=3809280,
03600).
This is evidently happening during attempted restart after the backend
crash.  I suspect it is a matter of the OS not having released the old
memory segment yet, together with the SHMMAX limit being too tight to
allow two such segments to exist concurrently.  Are you able to start
the server by hand immediately afterwards, or a few seconds afterwards?
Or do you have to reboot before it will restart?
			regards, tom lane


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther

2003-11-08 Thread Scott Goodwin
After recompiling with GCC 3.1 it fails when I'm running initdb to 
create the cluster -- it's a shmget error again. I believe that takes 
both Tcl and PostgreSQL out of the suspect pool and leaves Mac OS 10.3 
as the primary culprit. I installed Panther last week from scratch 
(reformatted disk etc.) and haven't made any mods to it aside from the 
SystemTuning params today. I haven't had any other apps crash, and I'm 
using the system all day using Apple's apps, AOLserver, OpenSSL and 
others. I tried gdb to get a backtrace but the signal gets caught by 
postgres, so it doesn't dump me back to the gdb command line. I'll have 
to set breakpoints, have GDB do something with the signal, or mod PG to 
not catch it. That'll have to wait until tomorrow or Saturday.

thanks for the assist,

/s.

On Nov 6, 2003, at 2:41 PM, Tom Lane wrote:

Scott Goodwin <[EMAIL PROTECTED]> writes:
psql:/Users/scott/m/ops/database/sql/add_languages.sql:13: server
closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.

...output in the log file is:

LOG:  server process (PID 2739) was terminated by signal 10
Here's the real problem --- why are you getting a SIGBUS while trying 
to
load the pltcl handler function?  I suspect something broken in Tcl's
shared library, but dunno what.  You should be getting a core file from
the crashed process --- can you get a stack trace from it with gdb?

FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001, size=3809280,
03600).
This is evidently happening during attempted restart after the backend
crash.  I suspect it is a matter of the OS not having released the old
memory segment yet, together with the SHMMAX limit being too tight to
allow two such segments to exist concurrently.  Are you able to start
the server by hand immediately afterwards, or a few seconds afterwards?
Or do you have to reboot before it will restart?
			regards, tom lane


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [BUGS] PostgreSQL 7.4RC1 crashes on Panther

2003-11-09 Thread Scott Goodwin
Awesome! Thanks so much for the fix -- I depend on PostgreSQL and Tcl 
on my powerbook to do development work.

cheers,

/s.

On Nov 8, 2003, at 2:09 PM, Tom Lane wrote:

It turns out that the "createlang pltcl" failure on OS X 10.3 was due 
to
our ps_status code doing the wrong thing.  I have committed a fix.

			regards, tom lane


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [BUGS] [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

2004-02-21 Thread Scott Goodwin
Found the problem. If I have a very long environment variable exported  
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use  
color ls and I have a very long LS_COLORS environment variable set.

I have duplicated the problem by renaming my .bashrc and logging back  
in. With this clean environment, I started PG and loaded PG/Tcl without  
any problems. I then created the following environment variable on the  
command line:

LONG_VAR=aa:bbb:cc: 
ddd:eee:fff: 
g::iii: 
j:kk:: 
mmm:n: 
ooo:pp:qqq: 
rrr:ss: 
ttt:u: 
vv:ww: 
xxx:y: 
zzz

and exported it. (Obviously the line above is going to be broken into  
multiple lines by the mailer...).

Then I stopped and restarted PG, loaded PG/Tcl and PG crashed. You  
*must* stop and restart PG for the problem to exhibit itself, otherwise  
it won't pick up the change in the environment. I suspect I'm running  
into a buffer overflow situation.

Ok, it fails consistently when LONG_VAR is 523 characters or greater;  
works consistently when LONG_VAR is 522 characters or smaller. Might  
not fail at the same number for others.

/s.



 To prove that this was the problem, I cleaned out my environment by  
moving my .bashrc file to another name, logged out, logged in, start
On Feb 21, 2004, at 1:51 AM, Tom Lane wrote:

Scott Goodwin <[EMAIL PROTECTED]> writes:
Hoping someone can help me figure out why I can't get PL/Tcl to load
without crashing the backend on Mac OS 10.3.2.
FWIW, pltcl seems to work for me.  Using up-to-date Darwin 10.3.2
and PG CVS tip, I did
configure --with-tcl --without-tk
then make, make install, etc.  pltcl installs and passes its regression
test.
psql:/Users/scott/pgtest/add_languages.sql:12: server closed the
connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
Can you provide a stack trace for this?

			regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [BUGS] [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

2004-02-22 Thread Scott Goodwin
I'm certain that the length of a single env var is the only factor 
involved, and not the size of the enviroment itself. If I login to my 
normal environment and unset LS_COLORS, everything works fine. If I 
move my .bashrc out of the way, login fresh and create an env var > 522 
chars, it fails. My login environment is much larger than the 
environment I get without . bashrc, and the results of setting a single 
env var to > 522 chars duplicates the problem in both envs. leading me 
to believe that env size doesn't have an effect on this problem. I've 
now set my PG startup script to 'unset LS_COLORS' before starting PG, 
and this works great. Has anyone else tried to duplicate this problem? 
I'm using Mac OS 10.3.2, PG 7.4.1, Tcl 8.4.5.

/s.

On Feb 22, 2004, at 12:21 PM, Tom Lane wrote:

Scott Goodwin <[EMAIL PROTECTED]> writes:
Found the problem. If I have a very long environment variable exported
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
color ls and I have a very long LS_COLORS environment variable set.
Interesting.  Did you check whether the limiting factor is the longest
variable length, or the total size of the environment?  ("env|wc" would
probably do as an approximation for the latter.)
			regards, tom lane



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [BUGS] [HACKERS] Mac OS X, PostgreSQL, PL/Tcl

2004-06-06 Thread Scott Goodwin
I'll grab the CVS PG copy and try it out. Is this something the Darwin 
folks should be notified about? It might cause problems with other 
apps.

thanks,
/s.
On Feb 22, 2004, at 4:47 PM, Tom Lane wrote:
Scott Goodwin <[EMAIL PROTECTED]> writes:
Found the problem. If I have a very long environment variable exported
and I start PG, PG crashes when I try to load PG/Tcl. In my case I use
color ls and I have a very long LS_COLORS environment variable set.
I was able to duplicate this.  I am not entirely sure why the problem 
is
dependent on the environment size, but I now know what causes it.
It seems Darwin's libc keeps its own copy of the argv pointer, and when
we move argv and then scribble on the original, it causes problems for
subsequent code that tries to look at argv[0] to determine the
executable's location.  (It's a good thing Darwin is open source, 
'cause
I'm not sure we'd have ever seen the connection if we hadn't been able
to look at the source code for their libc.)

The fix is basically
+ #if defined(__darwin__)
+ #include 
+ #endif
+ #if defined(__darwin__)
+   *_NSGetArgv() = new_argv;
+ #endif
which you can stick into main.c if you need a workaround.  I applied a
more extensive patch to HEAD that refactors this code into ps_status.c,
but I'm disinclined to apply that patch to stable branches...
regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend