[BUGS] 7.3.5 initdb failure on Irix 6.5.18

2004-01-15 Thread Craig Ruff
I'm trying to use 7.3.5 (for an upgrade of 7.3.2) on Irix 6.5.18 using the
MIPSpro 7.4.1 compiler.  Everything compiles up ok, but 'make check' fails
at the "enabling unlimited row size for system tables..." step with
a core dump of postgres.

The failure is at /backend/access/transam/xlog.c:2544 with an
"unable to locate a valid checkpoint record" panic.  This happens
for both 7.3.4 and 7.3.5, either with -O or -g as the CFLAGS value.

Manually running the command being used by initdb:

tmp_check/install/stmgr/pgsql-7.3.5/bin/postgres -F \
-D/stmgr/src/postgresql-7.3.5/src/test/regress/data -O \
-c search_path=pg_catalog template1

gives:

LOG:  database system was shut down at 2004-01-15 11:20:44 MST
LOG:  ReadRecord: invalid magic number  in log file 0, segment 0, offset 
32768
LOG:  invalid primary checkpoint record
LOG:  ReadRecord: record with zero length at 0/50
LOG:  invalid secondary checkpoint record
PANIC:  unable to locate a valid checkpoint record


Interestingly, using a copy of an existing database created by the 7.3.2
installation on the same system works fine.

Has anyone fixed this yet?  If not, does anyone have hints that I can
pursue since I have the source compiled up with debugging enabled?

-- 

Craig Ruff  NCAR[EMAIL PROTECTED]
(303) 497-1211  P.O. Box 3000
Boulder, CO  80307

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [BUGS] 7.3.5 initdb failure on Irix 6.5.18

2004-01-15 Thread Craig Ruff
On Thu, Jan 15, 2004 at 04:42:50PM -0500, Tom Lane wrote:
> It would seem that the culprit must be somewhere in the 7.3.2-to-7.3.4
> changes in xlog.c:
> ...
> but I sure don't see anything there that looks like a potential
> portability issue.

I have some further info.  7.3.5 compiled with MIPSpro 7.4.1 is broken
with respect to the transaction log files.  Restarting my 7.3.5 install
results in similar errors.

However, when compiled with gcc, 7.3.5 initdb works correctly.  I'm
in the process of testing the import of the 7.3.2 database and running
some transactions to see if the restart works.

Also, PostgreSQL 7.4.1 compiled with MIPSpro 7.4.1 appears to work
(at least the regression test).

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [BUGS] 7.3.5 initdb failure on Irix 6.5.18

2004-01-16 Thread Craig Ruff
Ok, I have further information on this problem.  I believe it is a compiler
problem.  PostgreSQL version 7.3.3 is also affected when compiled with the
MIPSpro 7.4.1 compiler, but when compiled with MIPSpro 7.4 it is ok.

Using the gcc compiled version of backend/access/transam/xlog.c, I have
gotten the regression test to work.  Next week I'll have to further
nail it down so I can send a bug report to SGI.  Just replacing XLogFlush
with the gcc compiled version allows initdb to finish, but the regression
tests shows there are other problems.

So, a note should probably be made in the documentation that for the
moment, MIPSpro 7.4.1 should probably be avoided.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [BUGS] 7.3.5 initdb failure on Irix 6.5.18

2004-01-21 Thread Craig Ruff
Here is what I discovered about this problem.

The MIPSpro 7.4.1 C compiler apparently has a structure assignment code
generation bug that is triggered at backend/access/transam/xlog.c:2683

LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;

EndOfLog and LogwrtResult.Write are correct, but LogwrtResult.Flush ends
up corrupted.

I've opened a problem report with SGI (case ID 2505985 "MIPSpro 7.4.1 C
structure assignment bug") for those of you who need to track it.  From
what I can see, PostgreSQL 7.3.x is vulnerable, PostgreSQL 7.4.1 seems
to pass its regression test, but I'd probably think twice about using
it when compiled with MIPSpro 7.4.1.

Everything seems ok when compiled with the SGI provided version of GCC 3.2.2.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[BUGS] Irix initdb failure problem now fixed

2004-04-22 Thread Craig Ruff
Back in January, I posted a note (subject of "7.3.5 initdb failure on Irix
6.5.18") stating that I'd found a bug in the Irix MIPSpro 7.4.1 C
compiler that caused postgresql to fail reading in transaction logs,
which showed up while trying to run the regression tests.

I can now report that postgresql 7.4.2 now works under Irix 6.5.22 when
compiled with the MIPSpro 7.4.2m C compiler.

-- 

Craig Ruff  NCAR[EMAIL PROTECTED]
(303) 497-1211  P.O. Box 3000
Boulder, CO  80307

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[BUGS] Deadlock or other hang while vacuuming?

2004-11-08 Thread Craig Ruff
I have an annoying problem with some kind of a hang or deadlock triggered
sometimes when running a vacuum on a table with a pair of read only
cursors enumerating different subsets of rows of the same table.
(Originally, I also could have other queries that modified the table
running concurrently, but I prevented them from starting while the vacuum
query was running to narrow down the scope of the problem.)

I'm running PostgreSQL-7.4.5 on an SGI MIPS system running IRIX 6.5.24f,
compiled with the MIPSpro 7.4.2m C compiler.  The application driving
the database is multithreaded, and can have numerous sessions open to
backends.  I did make sure to verify I had compiled PostgreSQL with
threading enabled.

The table contains approximately 570,000 to 600,000 entries, and is defined
thusly:

CREATE TABLE seg (
id
serial8
PRIMARY KEY,
name
varchar(20)
NOT NULL,
lv_id
int4
NOT NULL
REFERENCES lv(id),
size
int8
NOT NULL
CHECK (size >= 0),
creation_time
timestamp
NOT NULL,
last_use_time
timestamp
DEFAULT timestamp 'epoch'
NOT NULL,
UNIQUE(lv_id, name)
) WITHOUT OIDS;

The enumeration sessions take a while, as the client system driving them
is slow.  Each enumeration session has an exclusive backend connection,
and takes place inside a transaction.  An example sequence of events
looks like this:

BEGIN;
DECLARE lsess CURSOR FOR
  SELECT name, size, to_char(creation_time, 'YY.DDD'),
 to_char(last_use_time, 'YY.DDD')
FROM seg WHERE lv_id = 12 AND name ~ '^M*';

(wait for a request for the next batch):

FETCH 60 FROM lsess;

(repeat as necessary)

CLOSE lsess;
COMMIT;

I have a periodic task which kicks off vacuums of all of the tables in
the database every 20 minutes.  It vacuums the other tables, then runs
this query:

VACUUM ANALYZE seg;


I'm not yet certain about the relative timing of the vacuum and the
declaration of the cursors.  It may be that the vacuum starts first,
or not.  I haven't figured that out yet (some additional debug output
may be necessary).

What happens is that the application grinds to a halt.  Looking at
core files (generated with kill -ILL ) shows that the vacuum query
is waiting for the result, the stack backtrace looks like this:

pqSocketPoll
pqSocketCheck
pqWaitTimed
pqWait
PQgetResult
PQexecFinish
PQexec("VACUUM ANALYZE seg;")

(When I allowed the other concurrent table modifying queries, many would
also blocked in pqSocketPoll waiting for results).  This table is normally
vacuumed in less than 1 minute, but even waiting for 1.5 hours does not
change things.  No backend appears to be active at that point.

Gathering information from the pg_locks table produces this:

relname|  pid   |   mode   | granted 
---++--+-
 seg   | 678547 | ShareUpdateExclusiveLock | t (VACUUM)
 seg   | 678547 | ShareUpdateExclusiveLock | t
 seg_lv_id_key | 703519 | AccessShareLock  | t (CURSOR lsess #1)
 seg   | 703519 | AccessShareLock  | t
 seg_lv_id_key | 703567 | AccessShareLock  | t (CURSOR lsess #2)
 seg   | 703567 | AccessShareLock  | t
 pg_class  | 777441 | AccessShareLock  | t
 pg_locks  | 777441 | AccessShareLock  | t

I tried killing one of the backends handling one of the CURSORs to
see what its state looked like, but the core file was overwritten by
one from my app when it threw an exception cleaning up the aftermath. :-(
Nothing shows up in the serverlog output, other than the normal connection
and transaction log messages.

At this point, I'm ready to exclude the enumeration sessions from starting
when the vacuum is active, but I thought I'd try and gather information
just in case it is a problem in PostgreSQL.

Does anyone have any suggestions for tracking this down?

-- 

Craig Ruff  NCAR[EMAIL PROTECTED]
(303) 497-1211  P.O. Box 3000
Boulder, CO  80307

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [BUGS] Deadlock or other hang while vacuuming?

2004-11-09 Thread Craig Ruff
On Mon, Nov 08, 2004 at 08:06:02PM -0500, Tom Lane wrote:
> I believe that if VACUUM wants to delete a tuple that is on the same
> physical page that a cursor is currently stopped on, the vacuum has to
> wait until the cursor moves off that page.  So the vacuum could
> definitely be blocked by the cursor if the application is slow about
> advancing the cursor.  This isn't a deadlock though, unless the
> application is also waiting for the vacuum to finish.

Well, that puts me back to one of my first theories, that I have an
effective deadlock due to the lack of a dedicated request processing
thread to handle the enumeration session requests.  All the other
threads have blocked waiting to handle other types of requests.

I thought I had ruled it out from my reading that a read only cursor
wouldn't block a vacuum, but I guess I was wrong.  Thanks, I'll implement
my work around.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[BUGS] BUG #1418: RFC: Challenge/response authentication support

2005-01-19 Thread Craig Ruff

The following bug has been logged online:

Bug reference:  1418
Logged by:  Craig Ruff
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.0
Operating system:   Any
Description:RFC: Challenge/response authentication support
Details: 

PAM supports challenge response authentication.  It is desirable that psql
and the backend support this by displaying the PAM conversation routine
message(s) and returing a response (optionally echoed to the user).

I had a look at the code, and the backend support isn't too bad, but psql
itself does not appear to be structured in a way to handle this easily.  The
current method of just closing the backend connection, prompting for the
password and trying again does not work since the one-time password
challenge/response method is stateful.

Unfortunately, at the moment, I don't have the time to delve into fixing
this up further, but thought I'd let the list know in case someone else is
hacking on psql.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster