[BUGS] 7.3.5 initdb failure on Irix 6.5.18
I'm trying to use 7.3.5 (for an upgrade of 7.3.2) on Irix 6.5.18 using the MIPSpro 7.4.1 compiler. Everything compiles up ok, but 'make check' fails at the "enabling unlimited row size for system tables..." step with a core dump of postgres. The failure is at /backend/access/transam/xlog.c:2544 with an "unable to locate a valid checkpoint record" panic. This happens for both 7.3.4 and 7.3.5, either with -O or -g as the CFLAGS value. Manually running the command being used by initdb: tmp_check/install/stmgr/pgsql-7.3.5/bin/postgres -F \ -D/stmgr/src/postgresql-7.3.5/src/test/regress/data -O \ -c search_path=pg_catalog template1 gives: LOG: database system was shut down at 2004-01-15 11:20:44 MST LOG: ReadRecord: invalid magic number in log file 0, segment 0, offset 32768 LOG: invalid primary checkpoint record LOG: ReadRecord: record with zero length at 0/50 LOG: invalid secondary checkpoint record PANIC: unable to locate a valid checkpoint record Interestingly, using a copy of an existing database created by the 7.3.2 installation on the same system works fine. Has anyone fixed this yet? If not, does anyone have hints that I can pursue since I have the source compiled up with debugging enabled? -- Craig Ruff NCAR[EMAIL PROTECTED] (303) 497-1211 P.O. Box 3000 Boulder, CO 80307 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [BUGS] 7.3.5 initdb failure on Irix 6.5.18
On Thu, Jan 15, 2004 at 04:42:50PM -0500, Tom Lane wrote: > It would seem that the culprit must be somewhere in the 7.3.2-to-7.3.4 > changes in xlog.c: > ... > but I sure don't see anything there that looks like a potential > portability issue. I have some further info. 7.3.5 compiled with MIPSpro 7.4.1 is broken with respect to the transaction log files. Restarting my 7.3.5 install results in similar errors. However, when compiled with gcc, 7.3.5 initdb works correctly. I'm in the process of testing the import of the 7.3.2 database and running some transactions to see if the restart works. Also, PostgreSQL 7.4.1 compiled with MIPSpro 7.4.1 appears to work (at least the regression test). ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [BUGS] 7.3.5 initdb failure on Irix 6.5.18
Ok, I have further information on this problem. I believe it is a compiler problem. PostgreSQL version 7.3.3 is also affected when compiled with the MIPSpro 7.4.1 compiler, but when compiled with MIPSpro 7.4 it is ok. Using the gcc compiled version of backend/access/transam/xlog.c, I have gotten the regression test to work. Next week I'll have to further nail it down so I can send a bug report to SGI. Just replacing XLogFlush with the gcc compiled version allows initdb to finish, but the regression tests shows there are other problems. So, a note should probably be made in the documentation that for the moment, MIPSpro 7.4.1 should probably be avoided. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] 7.3.5 initdb failure on Irix 6.5.18
Here is what I discovered about this problem. The MIPSpro 7.4.1 C compiler apparently has a structure assignment code generation bug that is triggered at backend/access/transam/xlog.c:2683 LogwrtResult.Write = LogwrtResult.Flush = EndOfLog; EndOfLog and LogwrtResult.Write are correct, but LogwrtResult.Flush ends up corrupted. I've opened a problem report with SGI (case ID 2505985 "MIPSpro 7.4.1 C structure assignment bug") for those of you who need to track it. From what I can see, PostgreSQL 7.3.x is vulnerable, PostgreSQL 7.4.1 seems to pass its regression test, but I'd probably think twice about using it when compiled with MIPSpro 7.4.1. Everything seems ok when compiled with the SGI provided version of GCC 3.2.2. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[BUGS] Irix initdb failure problem now fixed
Back in January, I posted a note (subject of "7.3.5 initdb failure on Irix 6.5.18") stating that I'd found a bug in the Irix MIPSpro 7.4.1 C compiler that caused postgresql to fail reading in transaction logs, which showed up while trying to run the regression tests. I can now report that postgresql 7.4.2 now works under Irix 6.5.22 when compiled with the MIPSpro 7.4.2m C compiler. -- Craig Ruff NCAR[EMAIL PROTECTED] (303) 497-1211 P.O. Box 3000 Boulder, CO 80307 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[BUGS] Deadlock or other hang while vacuuming?
I have an annoying problem with some kind of a hang or deadlock triggered sometimes when running a vacuum on a table with a pair of read only cursors enumerating different subsets of rows of the same table. (Originally, I also could have other queries that modified the table running concurrently, but I prevented them from starting while the vacuum query was running to narrow down the scope of the problem.) I'm running PostgreSQL-7.4.5 on an SGI MIPS system running IRIX 6.5.24f, compiled with the MIPSpro 7.4.2m C compiler. The application driving the database is multithreaded, and can have numerous sessions open to backends. I did make sure to verify I had compiled PostgreSQL with threading enabled. The table contains approximately 570,000 to 600,000 entries, and is defined thusly: CREATE TABLE seg ( id serial8 PRIMARY KEY, name varchar(20) NOT NULL, lv_id int4 NOT NULL REFERENCES lv(id), size int8 NOT NULL CHECK (size >= 0), creation_time timestamp NOT NULL, last_use_time timestamp DEFAULT timestamp 'epoch' NOT NULL, UNIQUE(lv_id, name) ) WITHOUT OIDS; The enumeration sessions take a while, as the client system driving them is slow. Each enumeration session has an exclusive backend connection, and takes place inside a transaction. An example sequence of events looks like this: BEGIN; DECLARE lsess CURSOR FOR SELECT name, size, to_char(creation_time, 'YY.DDD'), to_char(last_use_time, 'YY.DDD') FROM seg WHERE lv_id = 12 AND name ~ '^M*'; (wait for a request for the next batch): FETCH 60 FROM lsess; (repeat as necessary) CLOSE lsess; COMMIT; I have a periodic task which kicks off vacuums of all of the tables in the database every 20 minutes. It vacuums the other tables, then runs this query: VACUUM ANALYZE seg; I'm not yet certain about the relative timing of the vacuum and the declaration of the cursors. It may be that the vacuum starts first, or not. I haven't figured that out yet (some additional debug output may be necessary). What happens is that the application grinds to a halt. Looking at core files (generated with kill -ILL ) shows that the vacuum query is waiting for the result, the stack backtrace looks like this: pqSocketPoll pqSocketCheck pqWaitTimed pqWait PQgetResult PQexecFinish PQexec("VACUUM ANALYZE seg;") (When I allowed the other concurrent table modifying queries, many would also blocked in pqSocketPoll waiting for results). This table is normally vacuumed in less than 1 minute, but even waiting for 1.5 hours does not change things. No backend appears to be active at that point. Gathering information from the pg_locks table produces this: relname| pid | mode | granted ---++--+- seg | 678547 | ShareUpdateExclusiveLock | t (VACUUM) seg | 678547 | ShareUpdateExclusiveLock | t seg_lv_id_key | 703519 | AccessShareLock | t (CURSOR lsess #1) seg | 703519 | AccessShareLock | t seg_lv_id_key | 703567 | AccessShareLock | t (CURSOR lsess #2) seg | 703567 | AccessShareLock | t pg_class | 777441 | AccessShareLock | t pg_locks | 777441 | AccessShareLock | t I tried killing one of the backends handling one of the CURSORs to see what its state looked like, but the core file was overwritten by one from my app when it threw an exception cleaning up the aftermath. :-( Nothing shows up in the serverlog output, other than the normal connection and transaction log messages. At this point, I'm ready to exclude the enumeration sessions from starting when the vacuum is active, but I thought I'd try and gather information just in case it is a problem in PostgreSQL. Does anyone have any suggestions for tracking this down? -- Craig Ruff NCAR[EMAIL PROTECTED] (303) 497-1211 P.O. Box 3000 Boulder, CO 80307 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [BUGS] Deadlock or other hang while vacuuming?
On Mon, Nov 08, 2004 at 08:06:02PM -0500, Tom Lane wrote: > I believe that if VACUUM wants to delete a tuple that is on the same > physical page that a cursor is currently stopped on, the vacuum has to > wait until the cursor moves off that page. So the vacuum could > definitely be blocked by the cursor if the application is slow about > advancing the cursor. This isn't a deadlock though, unless the > application is also waiting for the vacuum to finish. Well, that puts me back to one of my first theories, that I have an effective deadlock due to the lack of a dedicated request processing thread to handle the enumeration session requests. All the other threads have blocked waiting to handle other types of requests. I thought I had ruled it out from my reading that a read only cursor wouldn't block a vacuum, but I guess I was wrong. Thanks, I'll implement my work around. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[BUGS] BUG #1418: RFC: Challenge/response authentication support
The following bug has been logged online: Bug reference: 1418 Logged by: Craig Ruff Email address: [EMAIL PROTECTED] PostgreSQL version: 8.0 Operating system: Any Description:RFC: Challenge/response authentication support Details: PAM supports challenge response authentication. It is desirable that psql and the backend support this by displaying the PAM conversation routine message(s) and returing a response (optionally echoed to the user). I had a look at the code, and the backend support isn't too bad, but psql itself does not appear to be structured in a way to handle this easily. The current method of just closing the backend connection, prompting for the password and trying again does not work since the one-time password challenge/response method is stateful. Unfortunately, at the moment, I don't have the time to delve into fixing this up further, but thought I'd let the list know in case someone else is hacking on psql. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster