[BUGS]

2004-10-04 Thread Sean Chittenden
set nomail
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[BUGS] bgwriter interfering with consistent view of system tables?

2004-10-04 Thread Sean Chittenden
When making lots of DDL changes to a database (I believe this includes 
temp tables too), delayed flushing of dirty buffers from the system 
catalogs is causing a severe problem with maintaining a consistent view 
of the structure of the database.  For these examples, I'd create a 
quick Makefile to aid in testing.

printf "testing_delay:" > Makefile.bug
printf "\tpsql -c 'DROP DATABASE mydb' template1" >> Makefile.bug
printf "\tpsql -c 'CREATE DATABASE mydb' template1" >> Makefile.bug
To reproduce and test this bug, issue `make -f Makefile.bug`.
With the following config settings:
# - Background writer -
bgwriter_delay = 5000   # 10-5000 milliseconds
bgwriter_percent = 1# 0-100% of dirty buffers
bgwriter_maxpages = 1   # 1-1000 buffers max at once
it is *very* easy to reproduce this problem (note, there is a bug in 
the default config, the min percent is 1, no 0 as the comment 
suggests).  With the default settings, it has been harder to spot on my 
laptop.  I believe that higher end systems with higher values will trip 
over this problem less frequently.

With the settings set:
% make -f Makefile.bug
psql -c "DROP DATABASE mydb" template1
DROP DATABASE
psql -c "CREATE DATABASE mydb" template1
ERROR:  source database "template1" is being accessed by other users
*** Error code 1
The problem being, I've disconnected from template1 already, but the 
database hasn't flushed this to disk so the parent postmaster process 
isn't aware of the disconnection, so when I connect to the backend 
again, the newly created child has an inconsistent view of the current 
connections which prevents me from creating a new database (maybe the 
old backend is still around cleaning up and really hasn't exited, I'm 
not sure).

I think the same phenomena used to exist with temp tables across 
connections that reconnected to a backend with the same backend # (ie, 
connect to backend 123, create a temp table, disconnect, reconnect and 
get backend 123, recreate the same temp table and you'll get an 
error... though I can't reproduce the temp table error right now, 
yay!).

Anyway, Tom/Jan, this code seems to be your areas of expertise, could 
either of you take a look? -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [BUGS] PLPGSQL and FOUND stange behaviour after EXECUTE

2004-10-04 Thread Neil Conway
Tom Lane wrote:
Yeah, this has been on my to-do list for awhile...
Ah, ok. Is this something you want to handle, or should I take a look?
One question here is whether Oracle's PL/SQL has a
precedent, and if so which way does it point?
I did some limited testing of this, and it appears that PL/SQL's EXECUTE 
IMMEDIATE modifies SQL%FOUND.

-Neil
---(end of broadcast)---
TIP 8: explain analyze is your friend


[BUGS] Denial of service via VACUUM, all backends exit and restart...

2004-10-04 Thread Sean Chittenden
There exists a crash that could easily be used as a denial of service 
against PostgreSQL by any user who can call a trusted stored procedure 
that makes use of temp tables.  This DoS does not exist without the use 
of a stored procedure (from what I can tell).  The gist of it being:

CREATE DATABASE mydb WITH OWNER somedba;
\c mydb somedba
BEGIN;
-- Call a stored procedure that runs as SECURITY DEFINER, which creates 
a temp table
-- Add one or more rows of data to the table
COMMIT;
VACUUM FULL ANALYZE;

Where somedba is the owner of the mydb database, but does not have any 
abnormal privs (is just a user who happens to be a database owner).  
When somedba runs VACUUM FULL ANALYZE, I get the following error during 
the VACUUM which panics the entire cluster and causes all backends to 
shutdown:

[EMAIL PROTECTED]: [local] 22325 2004-10-03 10:51:15 PDT ERROR:  relcache 
reference tmptbl is not owned by resource owner
@:  21502 2004-10-03 10:51:20 PDT LOG:  server process (PID 22325) 
was terminated by signal 10
@:  21502 2004-10-03 10:51:20 PDT LOG:  terminating any other 
active server processes
@:  21502 2004-10-03 10:51:20 PDT LOG:  all server processes 
terminated; reinitializing
@:  22328 2004-10-03 10:51:21 PDT LOG:  database system was 
interrupted at 2004-10-03 10:50:03 PDT
@:  22328 2004-10-03 10:51:21 PDT LOG:  checkpoint record is at 
0/4C42FC8
@:  22328 2004-10-03 10:51:21 PDT LOG:  redo record is at 
0/4C42FC8; undo record is at 0/0; shutdown FALSE
@:  22328 2004-10-03 10:51:21 PDT LOG:  next transaction ID: 14034; 
next OID: 32678
@:  22328 2004-10-03 10:51:21 PDT LOG:  database system was not 
properly shut down; automatic recovery in progress
@:  22328 2004-10-03 10:51:21 PDT LOG:  redo starts at 0/4C43008
@:  22328 2004-10-03 10:51:21 PDT WARNING:  could not remove 
database directory "/usr/local/pgsql/data/base/30827"
@:  22328 2004-10-03 10:51:24 PDT LOG:  record with zero length at 
0/57AA09C
@:  22328 2004-10-03 10:51:24 PDT LOG:  redo done at 0/57AA070
@:  22328 2004-10-03 10:51:24 PDT LOG:  database system is ready

I think this could be related to the bug I sent in a few days ago 
regarding new databases not having the owner properly set when creating 
a new database (ie, public is still owned by the owner of the template 
database, same with information_schema, etc).

Regardless, here's an SQL script that reproduces this fatal condition:
\c template1 realdba
DROP DATABASE testdb;
CREATE USER testdba ENCRYPTED PASSWORD 'pass' NOCREATEDB NOCREATEUSER;
CREATE DATABASE testdb WITH OWNER testdba;
\c testdb realdba
ALTER SCHEMA public OWNER TO testdba;
\c testdb testdba
CREATE FUNCTION plpgsql_call_handler()
RETURNS language_handler
AS '$libdir/plpgsql', 'plpgsql_call_handler'
LANGUAGE c;
CREATE TRUSTED LANGUAGE plpgsql HANDLER plpgsql_call_handler;
REVOKE ALL PRIVILEGES ON DATABASE testdb FROM PUBLIC CASCADE;
GRANT CREATE,TEMPORARY ON DATABASE testdb TO testdba;
REVOKE ALL PRIVILEGES ON SCHEMA public FROM PUBLIC CASCADE;
GRANT USAGE ON SCHEMA public TO PUBLIC;
BEGIN;
CREATE FUNCTION public.tmptbl_foo() RETURNS VOID AS 'BEGIN
PERFORM TRUE FROM pg_catalog.pg_class c LEFT JOIN 
pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname = 
\'tmptbl\'::TEXT AND c.relkind = \'r\'::TEXT AND 
pg_catalog.pg_table_is_visible(c.oid);
IF NOT FOUND THEN
EXECUTE \'CREATE LOCAL TEMP TABLE tmptbl (key TEXT) 
WITHOUT OIDS ON COMMIT DELETE ROWS;\';
ELSE
TRUNCATE TABLE tmptbl;
END IF;
RETURN;
END;' LANGUAGE 'plpgsql' SECURITY DEFINER;
GRANT EXECUTE ON FUNCTION public.tmptbl_foo() TO PUBLIC;
SELECT public.tmptbl_foo();
-- There has to be data in the TEMP TABLE otherwise the backend does 
not crash
INSERT INTO tmptbl VALUES ('goozfraba');
COMMIT;
VACUUM FULL ANALYZE;

The output:
You are now connected to database "template1" as user "realdba".
DROP DATABASE
CREATE USER
CREATE DATABASE
You are now connected to database "testdb" as user "realdba".
ALTER SCHEMA
You are now connected to database "testdb" as user "testdba".
CREATE FUNCTION
CREATE LANGUAGE
REVOKE
GRANT
REVOKE
GRANT
BEGIN
CREATE FUNCTION
GRANT
 tmptbl_foo

(1 row)
COMMIT
psql:test.sql:36: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:test.sql:36: connection to server was lost
Exit 2
And what a user sees on a different window:
% psql somedb
somedb=> BEGIN ;
somedb=> INSERT INTO t1 (id) VALUES (1);
somedb=> SELECT * from t1;
 id | i
+---
  1 |
(1 row)
-- Run the SQL script from above
somedb=> SELECT * from t1;
WARNING:  terminating connection because of crash of another server 
process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.

[BUGS] BUG #1278: PL/pgSQL: ROWTYPE does not care for droped columns

2004-10-04 Thread PostgreSQL Bugs List

The following bug has been logged online:

Bug reference:  1278
Logged by:  Michael Howitz

Email address:  [EMAIL PROTECTED]

PostgreSQL version: 7.3.7

Operating system:   2.6.8-gentoo-r3

Description:PL/pgSQL: ROWTYPE does not care for droped columns

Details: 

IF you drop a column on a Table which is accessed via a PL/pgSQL-Function 
using tablename%ROWTYPE you get an Error. It seems that ROWTYPE does not 
take care about droped columns. 

Example in code:

CREATE TABLE test (
  id SERIAL,
  start_date DATE,
  testing INTEGER);

INSERT INTO test (start_date, testing) VALUES ('2003-05-03', 1);

SELECT * from test;
-- test.id will be 1

ALTER TABLE test DROP COLUMN start_date;

CREATE OR REPLACE FUNCTION dcTest(INTEGER) RETURNS INTEGER AS
'
DECLARE
tid ALIAS FOR $1;
test_rec test%ROWTYPE;
BEGIN
SELECT INTO test_rec *
FROM test
WHERE id = tid;
RETURN test_rec.testing;
END;
' LANGUAGE 'plpgsql';

SELECT dcTest(1);

gives the following Error:
WARNING:  plpgsql: ERROR during compile of dctest near line 0
ERROR:  cache lookup for type 0 of test.pg.dropped.2 failed

This code works correct in 7.4.1-dbExperts but also fails in 
7.3.4-dbExperts. 



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] PLPGSQL and FOUND stange behaviour after EXECUTE

2004-10-04 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Yeah, this has been on my to-do list for awhile...

> Ah, ok. Is this something you want to handle, or should I take a look?

Well, it's not *high* on my to-do list; feel free to take a look.

>> One question here is whether Oracle's PL/SQL has a
>> precedent, and if so which way does it point?

> I did some limited testing of this, and it appears that PL/SQL's EXECUTE 
> IMMEDIATE modifies SQL%FOUND.

Hm, okay, then we should probably think about doing so too.

If the EXECUTE executes something that's not
SELECT/INSERT/UPDATE/DELETE, should it clear FOUND?  Or leave it alone?

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [BUGS] BUG #1278: PL/pgSQL: ROWTYPE does not care for droped columns

2004-10-04 Thread Tom Lane
"PostgreSQL Bugs List" <[EMAIL PROTECTED]> writes:
> This code works correct in 7.4.1-dbExperts but also fails in 
> 7.3.4-dbExperts. 

As you say, this has been fixed in 7.4.  We considered the fix not
a suitable candidate for back-patching into older releases.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [BUGS] Denial of service via VACUUM, all backends exit and restart...

2004-10-04 Thread Tom Lane
Sean Chittenden <[EMAIL PROTECTED]> writes:
> There exists a crash that could easily be used as a denial of service 
> against PostgreSQL by any user who can call a trusted stored procedure 
> that makes use of temp tables.

What this is actually exposing is a case where CurrentResourceOwner is
left pointing at garbage.  PortalRun saves and restores the caller's
value of CurrentResourceOwner, which is normally fine and dandy.
When doing a top-level command such as the VACUUM, CurrentResourceOwner
is TopTransactionResourceOwner.  However, VACUUM does internal
CommitTransaction and StartTransaction commands, which destroy and
recreate the whole transaction including TopTransactionResourceOwner.
In many situations TopTransactionResourceOwner ends up getting recreated
at the same address it was at before, but this is obviously not
guaranteeable in the general case; Sean's test case simply exposes one
path in which it isn't at the same address.

What I'm thinking of doing to fix it is having PortalRun note whether
the saved value of CurrentResourceOwner is the same as (the original
value of) TopTransactionResourceOwner, and at exit restore to the
current value of TopTransactionResourceOwner if so.  This is pretty
grotty but should cure the problem.  Anyone see another low-impact fix?

In the long run perhaps we should get rid of the idea of internal
transaction start/commits in VACUUM et al, or at least reduce them to be
just "partial commits" that don't go through the full CommitTransaction
process and in particular don't destroy major amounts of backend
internal state.  The whole thing is just too reminiscent of Wiley Coyote
sawing off the tree limb that he's standing on.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [BUGS] bgwriter interfering with consistent view of system tables?

2004-10-04 Thread Tom Lane
Sean Chittenden <[EMAIL PROTECTED]> writes:
> When making lots of DDL changes to a database (I believe this includes 
> temp tables too), delayed flushing of dirty buffers from the system 
> catalogs is causing a severe problem with maintaining a consistent view 
> of the structure of the database.

This analysis is completely bogus.

> % make -f Makefile.bug
> psql -c "DROP DATABASE mydb" template1
> DROP DATABASE
> psql -c "CREATE DATABASE mydb" template1
> ERROR:  source database "template1" is being accessed by other users

It's always been possible for this to happen, primarily because libpq
doesn't wait around for the connected backend to exit.  If the kernel
prefers to schedule other processes then the old backend may still be
alive when the new one tries to do CREATE DATABASE.  There is nothing
stopping the old one from exiting, it's just that the kernel hasn't
given the old backend any cycles at all.

There's been some discussion of making PQfinish() wait to observe
connection closure, which would guarantee that the backend has exited
in the non-SSL-connection case.  It's not clear how well it would work
in the SSL case, though.  In any case it's a bit of a band-aid solution.
I think the real solution is to find a way to not need the "accessed by
other users" interlock for CREATE DATABASE.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #1277: plpgsql EXECUTE bug in beta3

2004-10-04 Thread Tom Lane
"PostgreSQL Bugs List" <[EMAIL PROTECTED]> writes:
> Under beta3, the following behaviour is observed:

> test=# create or replace function execute_sql(text) returns void AS $$begin 
> execute $1; return; end; $$ language plpgsql; 
> CREATE FUNCTION
> test=#
> test=# select execute_sql('create table a (i integer); insert into a(i) 
> values(1);'); 
> ERROR:  relation "a" does not exist
> CONTEXT:  SQL query "create table a (i integer); insert into a(i) 
> values(1);" 
> PL/pgSQL function "execute_sql" line 1 at execute statement

This is happening because EXECUTE now parses and plans the whole string
in one go, so that it tries to plan the INSERT before the CREATE has
been carried out.  You would see the same behavior if you tried for
instance to execute those two commands as the body of an SQL function.
I am inclined to regard this as "not a bug", and tell you to execute the
two queries in separate EXECUTE commands.  I'm not sure it's worth the
substantial additional complexity in spi.c that would be needed to
preserve the old behavior --- especially when the documentation does not
suggest anywhere that you can use EXECUTE to execute more than one
command in the first place.

Anyone else have an opinion?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [BUGS] bgwriter interfering with consistent view of system tables?

2004-10-04 Thread Sean Chittenden
When making lots of DDL changes to a database (I believe this includes
temp tables too), delayed flushing of dirty buffers from the system
catalogs is causing a severe problem with maintaining a consistent 
view
of the structure of the database.
This analysis is completely bogus.
That doesn't surprise me at all: I couldn't think of what else it 
would've been.

% make -f Makefile.bug
psql -c "DROP DATABASE mydb" template1
DROP DATABASE
psql -c "CREATE DATABASE mydb" template1
ERROR:  source database "template1" is being accessed by other users
It's always been possible for this to happen, primarily because libpq
doesn't wait around for the connected backend to exit.  If the kernel
prefers to schedule other processes then the old backend may still be
alive when the new one tries to do CREATE DATABASE.  There is nothing
stopping the old one from exiting, it's just that the kernel hasn't
given the old backend any cycles at all.
There's been some discussion of making PQfinish() wait to observe
connection closure, which would guarantee that the backend has exited
in the non-SSL-connection case.  It's not clear how well it would work
in the SSL case, though.  In any case it's a bit of a band-aid 
solution.
I think the real solution is to find a way to not need the "accessed by
other users" interlock for CREATE DATABASE.
*shrug*  It'd be good from a security stand point to wait if there is 
any chance the connection could be resurrected via a man-in-the-middle 
attack.  As it stands, this isn't a real important bug given that the 
SQL is programatically created and it's trivial to throw in some kind 
of a sleep... still, it did bother me.  I figured locks on tables were 
stored in stuffed into some kind of a refcount in shared memory segment 
and that the time needed to decrease the refcount would be 
insignificant or done as soon as the client signaled their intention to 
disconnect, not controlled by wait*(2) and the listening postmaster.

-sc
--
Sean Chittenden
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [BUGS] Denial of service via VACUUM, all backends exit and restart...

2004-10-04 Thread Sean Chittenden
There exists a crash that could easily be used as a denial of service
against PostgreSQL by any user who can call a trusted stored procedure
that makes use of temp tables.
What this is actually exposing is a case where CurrentResourceOwner is
left pointing at garbage.  PortalRun saves and restores the caller's
value of CurrentResourceOwner, which is normally fine and dandy.
When doing a top-level command such as the VACUUM, CurrentResourceOwner
is TopTransactionResourceOwner.  However, VACUUM does internal
CommitTransaction and StartTransaction commands, which destroy and
recreate the whole transaction including TopTransactionResourceOwner.
In many situations TopTransactionResourceOwner ends up getting 
recreated
at the same address it was at before, but this is obviously not
guaranteeable in the general case; Sean's test case simply exposes one
path in which it isn't at the same address.
FYI, I can confirm that your commit fixes this issue.  Thank you very 
much!  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [BUGS] BUG #1276: Backend panics on SETVAL('..', 0)...

2004-10-04 Thread Sean Chittenden
ERROR:  setval: value 0 is out of bounds for sequence "foo_id_seq"
(1..9223372036854775807)
FATAL:  block 0 of 1663/97972/98006 is still referenced (private 1, 
global
1)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
This bug can be closed.  I was able to confirm that this bug has been 
fixed by Tom's latest VACUUM fix.  I was triggering this via a 
pg_autovacuum that was running with a running with a 15sec sleep on a 
schema load that was taking roughly 3 minutes (just the DDL).  After 
Tom's latest VACUUM commit, this bug does not appear to exist any more. 
 Thank you Tom!  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster