[BUGS]
set nomail ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[BUGS] bgwriter interfering with consistent view of system tables?
When making lots of DDL changes to a database (I believe this includes temp tables too), delayed flushing of dirty buffers from the system catalogs is causing a severe problem with maintaining a consistent view of the structure of the database. For these examples, I'd create a quick Makefile to aid in testing. printf "testing_delay:" > Makefile.bug printf "\tpsql -c 'DROP DATABASE mydb' template1" >> Makefile.bug printf "\tpsql -c 'CREATE DATABASE mydb' template1" >> Makefile.bug To reproduce and test this bug, issue `make -f Makefile.bug`. With the following config settings: # - Background writer - bgwriter_delay = 5000 # 10-5000 milliseconds bgwriter_percent = 1# 0-100% of dirty buffers bgwriter_maxpages = 1 # 1-1000 buffers max at once it is *very* easy to reproduce this problem (note, there is a bug in the default config, the min percent is 1, no 0 as the comment suggests). With the default settings, it has been harder to spot on my laptop. I believe that higher end systems with higher values will trip over this problem less frequently. With the settings set: % make -f Makefile.bug psql -c "DROP DATABASE mydb" template1 DROP DATABASE psql -c "CREATE DATABASE mydb" template1 ERROR: source database "template1" is being accessed by other users *** Error code 1 The problem being, I've disconnected from template1 already, but the database hasn't flushed this to disk so the parent postmaster process isn't aware of the disconnection, so when I connect to the backend again, the newly created child has an inconsistent view of the current connections which prevents me from creating a new database (maybe the old backend is still around cleaning up and really hasn't exited, I'm not sure). I think the same phenomena used to exist with temp tables across connections that reconnected to a backend with the same backend # (ie, connect to backend 123, create a temp table, disconnect, reconnect and get backend 123, recreate the same temp table and you'll get an error... though I can't reproduce the temp table error right now, yay!). Anyway, Tom/Jan, this code seems to be your areas of expertise, could either of you take a look? -sc -- Sean Chittenden ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [BUGS] PLPGSQL and FOUND stange behaviour after EXECUTE
Tom Lane wrote: Yeah, this has been on my to-do list for awhile... Ah, ok. Is this something you want to handle, or should I take a look? One question here is whether Oracle's PL/SQL has a precedent, and if so which way does it point? I did some limited testing of this, and it appears that PL/SQL's EXECUTE IMMEDIATE modifies SQL%FOUND. -Neil ---(end of broadcast)--- TIP 8: explain analyze is your friend
[BUGS] Denial of service via VACUUM, all backends exit and restart...
There exists a crash that could easily be used as a denial of service against PostgreSQL by any user who can call a trusted stored procedure that makes use of temp tables. This DoS does not exist without the use of a stored procedure (from what I can tell). The gist of it being: CREATE DATABASE mydb WITH OWNER somedba; \c mydb somedba BEGIN; -- Call a stored procedure that runs as SECURITY DEFINER, which creates a temp table -- Add one or more rows of data to the table COMMIT; VACUUM FULL ANALYZE; Where somedba is the owner of the mydb database, but does not have any abnormal privs (is just a user who happens to be a database owner). When somedba runs VACUUM FULL ANALYZE, I get the following error during the VACUUM which panics the entire cluster and causes all backends to shutdown: [EMAIL PROTECTED]: [local] 22325 2004-10-03 10:51:15 PDT ERROR: relcache reference tmptbl is not owned by resource owner @: 21502 2004-10-03 10:51:20 PDT LOG: server process (PID 22325) was terminated by signal 10 @: 21502 2004-10-03 10:51:20 PDT LOG: terminating any other active server processes @: 21502 2004-10-03 10:51:20 PDT LOG: all server processes terminated; reinitializing @: 22328 2004-10-03 10:51:21 PDT LOG: database system was interrupted at 2004-10-03 10:50:03 PDT @: 22328 2004-10-03 10:51:21 PDT LOG: checkpoint record is at 0/4C42FC8 @: 22328 2004-10-03 10:51:21 PDT LOG: redo record is at 0/4C42FC8; undo record is at 0/0; shutdown FALSE @: 22328 2004-10-03 10:51:21 PDT LOG: next transaction ID: 14034; next OID: 32678 @: 22328 2004-10-03 10:51:21 PDT LOG: database system was not properly shut down; automatic recovery in progress @: 22328 2004-10-03 10:51:21 PDT LOG: redo starts at 0/4C43008 @: 22328 2004-10-03 10:51:21 PDT WARNING: could not remove database directory "/usr/local/pgsql/data/base/30827" @: 22328 2004-10-03 10:51:24 PDT LOG: record with zero length at 0/57AA09C @: 22328 2004-10-03 10:51:24 PDT LOG: redo done at 0/57AA070 @: 22328 2004-10-03 10:51:24 PDT LOG: database system is ready I think this could be related to the bug I sent in a few days ago regarding new databases not having the owner properly set when creating a new database (ie, public is still owned by the owner of the template database, same with information_schema, etc). Regardless, here's an SQL script that reproduces this fatal condition: \c template1 realdba DROP DATABASE testdb; CREATE USER testdba ENCRYPTED PASSWORD 'pass' NOCREATEDB NOCREATEUSER; CREATE DATABASE testdb WITH OWNER testdba; \c testdb realdba ALTER SCHEMA public OWNER TO testdba; \c testdb testdba CREATE FUNCTION plpgsql_call_handler() RETURNS language_handler AS '$libdir/plpgsql', 'plpgsql_call_handler' LANGUAGE c; CREATE TRUSTED LANGUAGE plpgsql HANDLER plpgsql_call_handler; REVOKE ALL PRIVILEGES ON DATABASE testdb FROM PUBLIC CASCADE; GRANT CREATE,TEMPORARY ON DATABASE testdb TO testdba; REVOKE ALL PRIVILEGES ON SCHEMA public FROM PUBLIC CASCADE; GRANT USAGE ON SCHEMA public TO PUBLIC; BEGIN; CREATE FUNCTION public.tmptbl_foo() RETURNS VOID AS 'BEGIN PERFORM TRUE FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = \'tmptbl\'::TEXT AND c.relkind = \'r\'::TEXT AND pg_catalog.pg_table_is_visible(c.oid); IF NOT FOUND THEN EXECUTE \'CREATE LOCAL TEMP TABLE tmptbl (key TEXT) WITHOUT OIDS ON COMMIT DELETE ROWS;\'; ELSE TRUNCATE TABLE tmptbl; END IF; RETURN; END;' LANGUAGE 'plpgsql' SECURITY DEFINER; GRANT EXECUTE ON FUNCTION public.tmptbl_foo() TO PUBLIC; SELECT public.tmptbl_foo(); -- There has to be data in the TEMP TABLE otherwise the backend does not crash INSERT INTO tmptbl VALUES ('goozfraba'); COMMIT; VACUUM FULL ANALYZE; The output: You are now connected to database "template1" as user "realdba". DROP DATABASE CREATE USER CREATE DATABASE You are now connected to database "testdb" as user "realdba". ALTER SCHEMA You are now connected to database "testdb" as user "testdba". CREATE FUNCTION CREATE LANGUAGE REVOKE GRANT REVOKE GRANT BEGIN CREATE FUNCTION GRANT tmptbl_foo (1 row) COMMIT psql:test.sql:36: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:test.sql:36: connection to server was lost Exit 2 And what a user sees on a different window: % psql somedb somedb=> BEGIN ; somedb=> INSERT INTO t1 (id) VALUES (1); somedb=> SELECT * from t1; id | i +--- 1 | (1 row) -- Run the SQL script from above somedb=> SELECT * from t1; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[BUGS] BUG #1278: PL/pgSQL: ROWTYPE does not care for droped columns
The following bug has been logged online: Bug reference: 1278 Logged by: Michael Howitz Email address: [EMAIL PROTECTED] PostgreSQL version: 7.3.7 Operating system: 2.6.8-gentoo-r3 Description:PL/pgSQL: ROWTYPE does not care for droped columns Details: IF you drop a column on a Table which is accessed via a PL/pgSQL-Function using tablename%ROWTYPE you get an Error. It seems that ROWTYPE does not take care about droped columns. Example in code: CREATE TABLE test ( id SERIAL, start_date DATE, testing INTEGER); INSERT INTO test (start_date, testing) VALUES ('2003-05-03', 1); SELECT * from test; -- test.id will be 1 ALTER TABLE test DROP COLUMN start_date; CREATE OR REPLACE FUNCTION dcTest(INTEGER) RETURNS INTEGER AS ' DECLARE tid ALIAS FOR $1; test_rec test%ROWTYPE; BEGIN SELECT INTO test_rec * FROM test WHERE id = tid; RETURN test_rec.testing; END; ' LANGUAGE 'plpgsql'; SELECT dcTest(1); gives the following Error: WARNING: plpgsql: ERROR during compile of dctest near line 0 ERROR: cache lookup for type 0 of test.pg.dropped.2 failed This code works correct in 7.4.1-dbExperts but also fails in 7.3.4-dbExperts. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] PLPGSQL and FOUND stange behaviour after EXECUTE
Neil Conway <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Yeah, this has been on my to-do list for awhile... > Ah, ok. Is this something you want to handle, or should I take a look? Well, it's not *high* on my to-do list; feel free to take a look. >> One question here is whether Oracle's PL/SQL has a >> precedent, and if so which way does it point? > I did some limited testing of this, and it appears that PL/SQL's EXECUTE > IMMEDIATE modifies SQL%FOUND. Hm, okay, then we should probably think about doing so too. If the EXECUTE executes something that's not SELECT/INSERT/UPDATE/DELETE, should it clear FOUND? Or leave it alone? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [BUGS] BUG #1278: PL/pgSQL: ROWTYPE does not care for droped columns
"PostgreSQL Bugs List" <[EMAIL PROTECTED]> writes: > This code works correct in 7.4.1-dbExperts but also fails in > 7.3.4-dbExperts. As you say, this has been fixed in 7.4. We considered the fix not a suitable candidate for back-patching into older releases. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] Denial of service via VACUUM, all backends exit and restart...
Sean Chittenden <[EMAIL PROTECTED]> writes: > There exists a crash that could easily be used as a denial of service > against PostgreSQL by any user who can call a trusted stored procedure > that makes use of temp tables. What this is actually exposing is a case where CurrentResourceOwner is left pointing at garbage. PortalRun saves and restores the caller's value of CurrentResourceOwner, which is normally fine and dandy. When doing a top-level command such as the VACUUM, CurrentResourceOwner is TopTransactionResourceOwner. However, VACUUM does internal CommitTransaction and StartTransaction commands, which destroy and recreate the whole transaction including TopTransactionResourceOwner. In many situations TopTransactionResourceOwner ends up getting recreated at the same address it was at before, but this is obviously not guaranteeable in the general case; Sean's test case simply exposes one path in which it isn't at the same address. What I'm thinking of doing to fix it is having PortalRun note whether the saved value of CurrentResourceOwner is the same as (the original value of) TopTransactionResourceOwner, and at exit restore to the current value of TopTransactionResourceOwner if so. This is pretty grotty but should cure the problem. Anyone see another low-impact fix? In the long run perhaps we should get rid of the idea of internal transaction start/commits in VACUUM et al, or at least reduce them to be just "partial commits" that don't go through the full CommitTransaction process and in particular don't destroy major amounts of backend internal state. The whole thing is just too reminiscent of Wiley Coyote sawing off the tree limb that he's standing on. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [BUGS] bgwriter interfering with consistent view of system tables?
Sean Chittenden <[EMAIL PROTECTED]> writes: > When making lots of DDL changes to a database (I believe this includes > temp tables too), delayed flushing of dirty buffers from the system > catalogs is causing a severe problem with maintaining a consistent view > of the structure of the database. This analysis is completely bogus. > % make -f Makefile.bug > psql -c "DROP DATABASE mydb" template1 > DROP DATABASE > psql -c "CREATE DATABASE mydb" template1 > ERROR: source database "template1" is being accessed by other users It's always been possible for this to happen, primarily because libpq doesn't wait around for the connected backend to exit. If the kernel prefers to schedule other processes then the old backend may still be alive when the new one tries to do CREATE DATABASE. There is nothing stopping the old one from exiting, it's just that the kernel hasn't given the old backend any cycles at all. There's been some discussion of making PQfinish() wait to observe connection closure, which would guarantee that the backend has exited in the non-SSL-connection case. It's not clear how well it would work in the SSL case, though. In any case it's a bit of a band-aid solution. I think the real solution is to find a way to not need the "accessed by other users" interlock for CREATE DATABASE. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [BUGS] BUG #1277: plpgsql EXECUTE bug in beta3
"PostgreSQL Bugs List" <[EMAIL PROTECTED]> writes: > Under beta3, the following behaviour is observed: > test=# create or replace function execute_sql(text) returns void AS $$begin > execute $1; return; end; $$ language plpgsql; > CREATE FUNCTION > test=# > test=# select execute_sql('create table a (i integer); insert into a(i) > values(1);'); > ERROR: relation "a" does not exist > CONTEXT: SQL query "create table a (i integer); insert into a(i) > values(1);" > PL/pgSQL function "execute_sql" line 1 at execute statement This is happening because EXECUTE now parses and plans the whole string in one go, so that it tries to plan the INSERT before the CREATE has been carried out. You would see the same behavior if you tried for instance to execute those two commands as the body of an SQL function. I am inclined to regard this as "not a bug", and tell you to execute the two queries in separate EXECUTE commands. I'm not sure it's worth the substantial additional complexity in spi.c that would be needed to preserve the old behavior --- especially when the documentation does not suggest anywhere that you can use EXECUTE to execute more than one command in the first place. Anyone else have an opinion? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [BUGS] bgwriter interfering with consistent view of system tables?
When making lots of DDL changes to a database (I believe this includes temp tables too), delayed flushing of dirty buffers from the system catalogs is causing a severe problem with maintaining a consistent view of the structure of the database. This analysis is completely bogus. That doesn't surprise me at all: I couldn't think of what else it would've been. % make -f Makefile.bug psql -c "DROP DATABASE mydb" template1 DROP DATABASE psql -c "CREATE DATABASE mydb" template1 ERROR: source database "template1" is being accessed by other users It's always been possible for this to happen, primarily because libpq doesn't wait around for the connected backend to exit. If the kernel prefers to schedule other processes then the old backend may still be alive when the new one tries to do CREATE DATABASE. There is nothing stopping the old one from exiting, it's just that the kernel hasn't given the old backend any cycles at all. There's been some discussion of making PQfinish() wait to observe connection closure, which would guarantee that the backend has exited in the non-SSL-connection case. It's not clear how well it would work in the SSL case, though. In any case it's a bit of a band-aid solution. I think the real solution is to find a way to not need the "accessed by other users" interlock for CREATE DATABASE. *shrug* It'd be good from a security stand point to wait if there is any chance the connection could be resurrected via a man-in-the-middle attack. As it stands, this isn't a real important bug given that the SQL is programatically created and it's trivial to throw in some kind of a sleep... still, it did bother me. I figured locks on tables were stored in stuffed into some kind of a refcount in shared memory segment and that the time needed to decrease the refcount would be insignificant or done as soon as the client signaled their intention to disconnect, not controlled by wait*(2) and the listening postmaster. -sc -- Sean Chittenden ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [BUGS] Denial of service via VACUUM, all backends exit and restart...
There exists a crash that could easily be used as a denial of service against PostgreSQL by any user who can call a trusted stored procedure that makes use of temp tables. What this is actually exposing is a case where CurrentResourceOwner is left pointing at garbage. PortalRun saves and restores the caller's value of CurrentResourceOwner, which is normally fine and dandy. When doing a top-level command such as the VACUUM, CurrentResourceOwner is TopTransactionResourceOwner. However, VACUUM does internal CommitTransaction and StartTransaction commands, which destroy and recreate the whole transaction including TopTransactionResourceOwner. In many situations TopTransactionResourceOwner ends up getting recreated at the same address it was at before, but this is obviously not guaranteeable in the general case; Sean's test case simply exposes one path in which it isn't at the same address. FYI, I can confirm that your commit fixes this issue. Thank you very much! -sc -- Sean Chittenden ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [BUGS] BUG #1276: Backend panics on SETVAL('..', 0)...
ERROR: setval: value 0 is out of bounds for sequence "foo_id_seq" (1..9223372036854775807) FATAL: block 0 of 1663/97972/98006 is still referenced (private 1, global 1) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost This bug can be closed. I was able to confirm that this bug has been fixed by Tom's latest VACUUM fix. I was triggering this via a pg_autovacuum that was running with a running with a 15sec sleep on a schema load that was taking roughly 3 minutes (just the DDL). After Tom's latest VACUUM commit, this bug does not appear to exist any more. Thank you Tom! -sc -- Sean Chittenden ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster