Re: [BUGS] issue with integer nullable column and value 0

2009-10-15 Thread Sean Hsien
I'm using CentOS 5.2 64-bits with postgres 8.1.11 + java 6u16, and
Windows Vista 32-bits with postgres 8.4.1 + java 6u13.

Here is a small code snippet that shows the problem, where the user id
value is 0:

public void update(EntityVO vo) {
StringBuilder sql = new StringBuilder();
sql.append("UPDATE nvt_entity" +
" SET user_id = ?" +
" WHERE entity_id = ?");

Object args[] = {
vo.getUserId() == -1 ? null : vo.getUserId(),
vo.getEntityId()
};

getJdbcTemplate().update(sql.toString(), args);
}

Thanks for your time.

2009/10/15 Kevin Grittner :
> Sean Hsien  wrote:
>
>> using the latest JDBC driver type 4.
>
>> I have a nullable integer column in one of my tables. When I'm
>> updating the column in 8.4 Windows with value 0, it stays as null,
>> but on the Linux 8.1 it will try to update it with the value 0.
>
> Could you post a small, self-contained example of code which exhibits
> this problem?  Also, what are the OS and Java versions on the client
> side?
>
> -Kevin
>

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #5117: Table type Function Bug when column dropped

2009-10-15 Thread cf of

The following bug has been logged online:

Bug reference:  5117
Logged by:  cf of
Email address:  ccooff...@hotmail.com
PostgreSQL version: 8.4 (maybe all)
Operating system:   Linux
Description:Table type Function Bug when column dropped
Details: 

create table test_table
   (
key1 integer,
key2 integer,
key3 integer
   );

   CREATE OR REPLACE FUNCTION test_function()
 RETURNS SETOF test_table AS
   $BODY$
   DECLARE
_Ret RECORD;
   BEGIN
FOR _Ret IN
 Select * From test_table
LOOP
 RETURN NEXT _Ret;
END LOOP;
   END;
   $BODY$
 LANGUAGE 'plpgsql';

   insert into test_table values( 1, 1, 1 );
   insert into test_table values( 2, 2, 2 );
   insert into test_table values( 3, 3, 3 );
   insert into test_table values( 4, 4, 4 );

   alter table test_table drop column key3;

   select * from test_function();
   ERROR:  wrong record type supplied in RETURN NEXT
   DETAIL:  Number of returned columns (2) does not match expected column
count (3).
   CONTEXT:  PL/pgSQL function "test_function" line 7 at RETURN NEXT


Thx.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Gerhard Leykam

The following bug has been logged online:

Bug reference:  5118
Logged by:  Gerhard Leykam
Email address:  gel...@sealsystems.de
PostgreSQL version: 8.4.0
Operating system:   linux
Description:start-status-insert-fatal
Details: 

Hi!

I am using a start script to set up my PostgreSQL database: it runs initdb,
if not done yet, starts the instance with pg_ctl start and checks everything
is fine by pg_ctl status.

If there is another PostgreSQL database on the same machine listening to the
same port, postmaster comes up, pg_ctl status says everthings fine, but
postmaster falls down with appropriate message in postgres.log.

All SQL commands in my script after status check are running against the
other database!

How do you think about some sort of postmaster.up file next to
postmaster.pid, which is created just as really everything is up and OK?

Regards,
  Gerhard

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] issue with integer nullable column and value 0

2009-10-15 Thread Kevin Grittner
Sean Hsien  wrote:
> 2009/10/15 Kevin Grittner :
 
>> what are the OS and Java versions on the client side?
 
> I'm using CentOS 5.2 64-bits with postgres 8.1.11 + java 6u16, and
> Windows Vista 32-bits with postgres 8.4.1 + java 6u13.
 
So the Java code is running on the same machine as the database in
each case?
 
>> Could you post a small, self-contained example of code which
>> exhibits this problem?
 
> Here is a small code snippet
 
A self-contained example would include creation and population of the
table, as well as enough code to actually run the snippet and show the
results.  You should run this in both environments to confirm that the
problem indeed manifests as you describe with the example you provide.
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5117: Table type Function Bug when column dropped

2009-10-15 Thread Alvaro Herrera
cf of wrote:

>alter table test_table drop column key3;
> 
>select * from test_function();
>ERROR:  wrong record type supplied in RETURN NEXT
>DETAIL:  Number of returned columns (2) does not match expected column
> count (3).

Known problem, fixed in HEAD (8.5-to-be).  Not sure how backpatchable
the fix was.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Kevin Grittner
"Gerhard Leykam"  wrote:
 
> I am using a start script to set up my PostgreSQL database: it runs
> initdb,  if not done yet, starts the instance with pg_ctl start and
> checks everything is fine by pg_ctl status.
> 
> If there is another PostgreSQL database on the same machine
> listening to the same port, postmaster comes up, pg_ctl status says
> everthings fine, but postmaster falls down with appropriate message
> in postgres.log.
 
This is definitely not a PostgreSQL bug.
 
Perhaps the best place to start, before suggesting a new PostgreSQL
feature to solve this, would be to post to one of the other lists
(admin, maybe?) and describe what you are trying to accomplish with
your script, along with the problems you've found with your current
version of the script.  With a little more information, someone might
be able to suggest a solution.  (Since you're running on Linux,
something involving the lockfile utility might suffice.)
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Tom Lane
"Kevin Grittner"  writes:
> "Gerhard Leykam"  wrote:
>> I am using a start script to set up my PostgreSQL database: it runs
>> initdb,  if not done yet, starts the instance with pg_ctl start and
>> checks everything is fine by pg_ctl status.
>> 
>> If there is another PostgreSQL database on the same machine
>> listening to the same port, postmaster comes up, pg_ctl status says
>> everthings fine, but postmaster falls down with appropriate message
>> in postgres.log.
 
> This is definitely not a PostgreSQL bug.

Well, it's arguably a start-script bug, but I think his point is that
it's hard to fix it without any additional support from PG.

While mulling that it occurred to me that some additional output from
the postmaster would help to solve another thing that's an acknowledged
shortcoming of pg_ctl, namely that it can't parse postgresql.conf to
find out where the postmaster's communication socket is; cf
http://archives.postgresql.org/pgsql-bugs/2009-10/msg00024.php
and other older complaints.

We could redefine things so that it doesn't need to do that (and also
doesn't need to try to intuit the postmaster's port number, which it
does do now, but not terribly well).  Suppose that after the postmaster
is fully up, it writes a file $PGDATA/postmaster.ports, with contents
along the lines of

5432
/tmp/.s.PGSQL.5432

ie, IP port number and full socket location (the latter missing on
Windows of course).  Then pg_ctl only needs to know $PGDATA, and it
can get the socket address without any guessing or hard work.

(Alternatively we could add this info to postmaster.pid, but a separate
file seems like a better idea, if only because the contents of the pid
file are subject to change that pg_ctl shouldn't have to deal with.)

I am not convinced that this solves the OP's complaint 100%.  The
failure case that remains is that if the postmaster crashes without
deleting this file, then pg_ctl could still be fooled by a competing
postmaster that's taken the same port number.  I don't know that there's
any good way around that one, though.  The main thing it does do is
remove the pressure to add a lot of code to pg_ctl to try to read
postgresql.conf.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Kevin Grittner
Tom Lane  wrote:
 
> Well, it's arguably a start-script bug
 
OK.
 
> While mulling that it occurred to me that some additional output
> from the postmaster would help to solve another thing that's an
> acknowledged shortcoming of pg_ctl, namely that it can't parse
> postgresql.conf to find out where the postmaster's communication
> socket is;
> cf http://archives.postgresql.org/pgsql-bugs/2009-10/msg00024.php
> and other older complaints.
> 
> We could redefine things so that it doesn't need to do that (and
> also doesn't need to try to intuit the postmaster's port number,
> which it does do now, but not terribly well).  Suppose that after
> the postmaster is fully up, it writes a file
> $PGDATA/postmaster.ports, with contents along the lines of
> 
>   5432
>   /tmp/.s.PGSQL.5432
 
The listen_addresses setting would need to figure in, too.
 
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00022.php
 
Matching that stuff up could start to get a little messy, but it
should be doable somehow.
 
This seems likely to overlap the review I was soon going to do of the
differences between pg_ctl behavior and what is required for LSB
conformance.  I'll make sure to test this behavior along with others.
One of my current complaints is that pg_ctl doesn't wait until it is
actually ready to receive connections before returning an indication
of success.  I see that I neglected that point in my recently proposed
LSB conforming script, but I'm guessing that this fits with other
points in the argument that if what I'm doing in the script is
demonstrably better than current pg_ctl behavior, we should change
pg_ctl to support it rather than scripting around it.  (Not that it
would be hard to add ten or twenty lines to the script to cover
this)
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Tom Lane
"Kevin Grittner"  writes:
> Tom Lane  wrote:
>> Suppose that after the postmaster is fully up, it writes a file
>> $PGDATA/postmaster.ports, with contents along the lines of
>> 
>> 5432
>> /tmp/.s.PGSQL.5432
 
> The listen_addresses setting would need to figure in, too.

Yeah, I'm not entirely sure how we'd want to deal with IP addresses,
but in principle there could be a line for each postmaster socket not
only the Unix-domain socket.
 
> This seems likely to overlap the review I was soon going to do of the
> differences between pg_ctl behavior and what is required for LSB
> conformance.

Agreed, it would be good to do a holistic review of what pg_ctl needs.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Alvaro Herrera
Kevin Grittner wrote:

> This seems likely to overlap the review I was soon going to do of the
> differences between pg_ctl behavior and what is required for LSB
> conformance.  I'll make sure to test this behavior along with others.
> One of my current complaints is that pg_ctl doesn't wait until it is
> actually ready to receive connections before returning an indication
> of success.

Maybe write the file as postmaster.ports.starting or some such and
rename it to its final name when recovery has finished?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #5120: Performance difference between running a query with named cursor and straight SELECT

2009-10-15 Thread Steven McLellan

The following bug has been logged online:

Bug reference:  5120
Logged by:  Steven McLellan
Email address:  smclel...@mintel.com
PostgreSQL version: 8.3.x
Operating system:   FreeBSD 6.2
Description:Performance difference between running a query with
named cursor and straight SELECT
Details: 

Hi,

I've found what appears to be a bug seriously affecting performance running
a particular query using a named cursor versus running it as a simple
SELECT. Running with SELECT, the query below takes around .5s on a
reasonably specced dev database (4 core, 3GB RAM) running postgresql 8.3.3
and is a little faster on a better specced production database running
8.3.5. Running through a named cursor with FETCH FORWARD it takes upwards of
10 minutes (about 6 on the production box). Both boxes run FreeBSD 6.2. I
can reproduce the performance problem using:
BEGIN;
DECLARE named_cursor CURSOR FOR <>;
FETCH ALL FROM named_cursor;

The prompt hangs at the FETCH ALL. The CPU remains pegged the whole
execution time, disks are inactive, and the query returns around 20 results.
Curiously, FETCH NEXT returns faster (in 3-5 seconds), but subsequent calls
to FETCH  take minutes. 

Below is a slightly anonymized version of the query (company policy).
Reducing the filters speed things up, and removing joins or WHERE clauses
have a beneficial impact, but I can't narrow it down to any individual piece
of the query slowing things down, and in any case, I'm not able to change it
much (it's machine generated). EXPLAIN ANALYZE follows below.

Does anyone know of any reason why this particular query would take so much
longer using a named cursor than running it without? We don't know of any
other queries that exhibit this behaviour.

Any help greatly appreciated, and if any more information is required please
ask.

Steve McLellan

SELECT fact_table."Instance Id"
FROM
 instance_core_fact AS fact_table
 INNER JOIN record_dimension ON fact_table.record_key =
record_dimension.record_key
 INNER JOIN originator_dimension ON fact_table.originator_key =
originator_dimension.originator_key
 INNER JOIN date_dimension AS "date_dimension_Published Date" ON
record_dimension.published_date_key = "date_dimension_Published
Date".date_key
 INNER JOIN record_type_mv ON fact_table.record_key =
record_type_mv.record_key
WHERE
 record_dimension."Sector" = 'Sector I' AND
 record_dimension."Category" = 'Cat 1' AND
 record_dimension."Country" = 'USA' AND
 (record_dimension."Dataset - A" OR record_dimension."Dataset - C" OR
record_dimension."Dataset - B") AND
 originator_dimension."Area" IN('National', 'Phoenix, AZ') AND
 "date_dimension_Published Date"."SQL Date" >= NOW() - INTERVAL '6 MONTH'
AND
 record_type_mv."Type" IN('Cat 1 - yyy', 'Cat 1 - www', 'Cat 1 - zzz') AND
 originator_dimension."Originator Type" = 'NO DATA';

==
EXPLAIN ANALYZE

Hash Join  (cost=33364.36..46606.90 rows=1 width=4) (actual
time=795.690..833.577 rows=14 loops=1)
   Hash Cond: (fact_table.originator_key =
originator_dimension.originator_key)
   ->  Nested Loop  (cost=9.89..46577.20 rows=1392 width=8) (actual
time=792.676..832.203 rows=186 loops=1)
 ->  Nested Loop  (cost=9.89..40678.24 rows=48 width=8) (actual
time=792.653..830.017 rows=110 loops=1)
   ->  Hash Join  (cost=9.89..40641.84 rows=66 width=12)
(actual time=467.029..790.075 rows=3884 loops=1)
 Hash Cond: (record_type_mv.record_key =
record_dimension.record_key)
 ->  Seq Scan on record_type_mv  (cost=0.00..7076.20
rows=30012 width=4) (actual time=0.920..270.790 rows=31056 loops=1)
   Filter: (("Type")::text = ANY (('{"Cat 1 -
Medicare","Cat 1 - www","Cat 1 - zzz"}'::character varying[])::text[]))
 ->  Hash  (cost=33287.67..33287.67 rows=4178 width=8)
(actual time=447.819..447.819 rows=10203 loops=1)
   ->  Bitmap Heap Scan on record_dimension 
(cost=18214.01..33287.67 rows=4178 width=8) (actual time=371.277..428.278
rows=10203 loops=1)
 Recheck Cond: (("Category" = 'Cat 1'::text)
AND ("Sector" = 'Sector I'::text) AND ("Dataset - A" OR "Dataset - C" OR
"Dataset - B"))
 Filter: (("Dataset - A" OR "Dataset - C" OR
"Dataset - B") AND ("Country" = 'USA'::text))
 ->  BitmapAnd  (cost=18214.01..18214.01
rows=4623 width=0) (actual time=367.336..367.336 rows=0 loops=1)
   ->  Bitmap Index Scan on "Category" 
(cost=0.00..3091.75 rows=123623 width=0) (actual time=53.713..53.713
rows=124053 loops=1)
 Index Cond: ("Category" = 'Cat
1'::text)
   ->  Bitmap Index Scan on "Sector" 
(cost=0.00..7525.37 rows=327577 width=0) (actual time=129.610..129.610
rows=328054 loops=1)
 Index Cond: ("Sector" = 'Sector
I'::text)
   

Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Kevin Grittner
"Kevin Grittner"  wrote:
 
> I neglected that point in my recently proposed LSB conforming script
 
Hmmm...  On review, I see that I assumed that the -w switch on pg_ctl
start would cover this.  I see that the problem is that this uses psql
to connect to the specified port.  Besides the problems Tom mentioned
with its heuristics to find the right port number for this cluster,
there is the OP's point that connections will go to the competing
cluster.  One thought that occurs to me is that instead of, or in
addition to, the new file Tom proposes, the "other cluster" issue
could be solved by having a pg_postmaster_pid function in addition to
the pg_backend_pid function.  This would allow pg_ctl or a script to
connect to a port and see if it is the expected postmaster process.
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Tom Lane
"Kevin Grittner"  writes:
> Hmmm...  On review, I see that I assumed that the -w switch on pg_ctl
> start would cover this.  I see that the problem is that this uses psql
> to connect to the specified port.  Besides the problems Tom mentioned
> with its heuristics to find the right port number for this cluster,
> there is the OP's point that connections will go to the competing
> cluster.  One thought that occurs to me is that instead of, or in
> addition to, the new file Tom proposes, the "other cluster" issue
> could be solved by having a pg_postmaster_pid function in addition to
> the pg_backend_pid function.  This would allow pg_ctl or a script to
> connect to a port and see if it is the expected postmaster process.

I would rather see us implement the hypothetical pg_ping protocol
and remember to include the postmaster's PID in the response.  One
of the worst misfeatures of pg_ctl is the need to be able to
authenticate itself to the postmaster, and having it rely on being
able to actually issue a SQL command would set that breakage in stone.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Kevin Grittner
Tom Lane  wrote:
 
> I would rather see us implement the hypothetical pg_ping protocol
> and remember to include the postmaster's PID in the response.  One
> of the worst misfeatures of pg_ctl is the need to be able to
> authenticate itself to the postmaster, and having it rely on being
> able to actually issue a SQL command would set that breakage in
> stone.
 
Sounds good to me, other than it stalls pg_ctl revamp until pg_ping is
done.  I don't remember a clear design of what pg_ping should look
like.  Does anyone have a clear plan in their head?
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5120: Performance difference between running a query with named cursor and straight SELECT

2009-10-15 Thread Tom Lane
"Steven McLellan"  writes:
> I've found what appears to be a bug seriously affecting performance running
> a particular query using a named cursor versus running it as a simple
> SELECT.

You haven't shown us a plan for the cursor case, but I'm thinking the
issue here is that Postgres prefers fast-start plans for cursors, on
the theory that if you're using a cursor you probably care more about
incremental fetching than the total elapsed time.  As of 8.4 you can
twiddle the strength of that preference via cursor_tuple_fraction.
http://www.postgresql.org/docs/8.4/static/runtime-config-query.html#GUC-CURSOR-TUPLE-FRACTION

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Tom Lane
I wrote:
> "Kevin Grittner"  writes:
>> ... This would allow pg_ctl or a script to
>> connect to a port and see if it is the expected postmaster process.

> I would rather see us implement the hypothetical pg_ping protocol
> and remember to include the postmaster's PID in the response.

Although on second thought, any such test is worth approximately nothing
anyway.  You can check that the postmaster answering the doorbell
reports the same PID that you see in $PGDATA/postmaster.pid, but that
still doesn't prove that that postmaster is using that data directory.
It could be a random coincidence of PIDs.  And in the case of a start
script, the probability of random PID match to a stale lockfile is many
orders of magnitude higher than you might think; see prior discussions.

This could be addressed by having the postmaster report its $PGDATA
value in the pg_ping response, but I would be against that on security
grounds.  We don't let nonprivileged users know where PGDATA is, why
would we make the information available without any authentication at
all?

[ thinks... ]  Maybe we could have the postmaster generate a random
number at start and include that in both the postmaster.ports file
and its pg_ping responses.  That would have a substantially lower
collision probability than PID, if the number generation process
were well designed; and it wouldn't risk exposing anything sensitive
in the ping response.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Kevin Grittner
Tom Lane  wrote:
 
> [ thinks... ]  Maybe we could have the postmaster generate a random
> number at start and include that in both the postmaster.ports file
> and its pg_ping responses.  That would have a substantially lower
> collision probability than PID, if the number generation process
> were well designed; and it wouldn't risk exposing anything sensitive
> in the ping response.
 
Unless two postmasters could open the same server socket within a
microsecond of one another, a timestamp value captured on opening the
server socket seems even better than a random number.  Well, I guess
if someone subverted the clock it could mislead, but is that really
more likely to cause a false match than a random number?
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Tom Lane
"Kevin Grittner"  writes:
> Sounds good to me, other than it stalls pg_ctl revamp until pg_ping is
> done.  I don't remember a clear design of what pg_ping should look
> like.  Does anyone have a clear plan in their head?

I don't think anyone's written down a full spec, but it seems like a
relatively trivial thing to me.

* Client connects to the usual place and sends a packet that has a
special "protocol number" (similar to the way we handle SSL requests).
AFAICS there wouldn't need to be anything else in the packet.

* Postmaster responds with a suitable message and closes the connection.
The message should at least include the current postmaster
CanAcceptConnections status and the PID/magic number we were just
discussing.  I can't think of anything else offhand --- anyone else?

I'm not sure whether we'd want to provide a function within libpq
for this, or just code it in pg_ctl.  Within libpq the natural
thing would be to take a conninfo connection string, but I'm not
sure that suits pg_ctl's purposes.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-10-15 Thread Tom Lane
Richard Neill  writes:
> The good news is that the patch has now been in place for 5 days, and, 
> despite some very high loading, it has survived without a single crash.
> I'd venture to say that this issue is now fixed.

Great, thanks for the followup.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Tom Lane
"Kevin Grittner"  writes:
> Tom Lane  wrote:
>> [ thinks... ]  Maybe we could have the postmaster generate a random
>> number at start and include that in both the postmaster.ports file
>> and its pg_ping responses.
 
> Unless two postmasters could open the same server socket within a
> microsecond of one another, a timestamp value captured on opening the
> server socket seems even better than a random number.

Well, that raises the question of whether postmaster uptime could be
considered security-sensitive info.  I'd still rather use a random
number.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Kevin Grittner
Tom Lane  wrote:
 
> I'm not sure whether we'd want to provide a function within libpq
> for this, or just code it in pg_ctl.
 
I'm inclined to think there would be value to a pg_ping utility to
support automated monitoring by unprivileged users on other boxes.
That both suggests libpq as the location, and one or two additional
pieces of information.  An indication of "in archive recovery" versus
production or shutdown, for example, might be useful.  I'm not sure
what else might make sense.
 
> Within libpq the natural thing would be to take a conninfo
> connection string, but I'm not sure that suits pg_ctl's purposes.
 
I'm a little lost on that.  Would it cause any problems for pg_ctl,
or just be more than it would need if it's only implemented there?
 
-Kevin

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Tom Lane
"Kevin Grittner"  writes:
> Tom Lane  wrote:
>> I'm not sure whether we'd want to provide a function within libpq
>> for this, or just code it in pg_ctl.
 
> I'm inclined to think there would be value to a pg_ping utility to
> support automated monitoring by unprivileged users on other boxes.

True.  I had first thought that pg_ctl itself could serve that purpose,
but it's really designed around the assumption that it has direct access
to $PGDATA, so it wouldn't fit well for monitoring from another machine.

> That both suggests libpq as the location, and one or two additional
> pieces of information.  An indication of "in archive recovery" versus
> production or shutdown, for example, might be useful.  I'm not sure
> what else might make sense.

IIRC, that's already covered by the CanAcceptConnections state.
We need to be pretty conservative about how much information we
expose here, anyhow, since it will be handed out to absolutely
anybody who can reach the postmaster port.

>> Within libpq the natural thing would be to take a conninfo
>> connection string, but I'm not sure that suits pg_ctl's purposes.
 
> I'm a little lost on that.  Would it cause any problems for pg_ctl,
> or just be more than it would need if it's only implemented there?

Well, given what we were saying about a postmaster.ports file, pg_ctl
would typically be working with an absolute path to the socket file.
Which is not what normally goes into a conninfo string.  Perhaps that
could be addressed by specifying the file contents differently, but
I'd be wary of assuming that *all* users of the ports file will be
libpq-based --- for instance a Java version of pg_ctl wouldn't be.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5120: Performance difference between running a query with named cursor and straight SELECT

2009-10-15 Thread Steve McLellan
>"Steven McLellan"  writes:
>> I've found what appears to be a bug seriously affecting performance
running
>> a particular query using a named cursor versus running it as a simple
>> SELECT.

> You haven't shown us a plan for the cursor case, but I'm thinking the
> issue here is that Postgres prefers fast-start plans for cursors, on
> the theory that if you're using a cursor you probably care more about
> incremental fetching than the total elapsed time.  As of 8.4 you can
> twiddle the strength of that preference via cursor_tuple_fraction.
>
http://www.postgresql.org/docs/8.4/static/runtime-config-query.html#GUC-CURSOR-TUPLE-FRACTION
>
>regards, tom lane

Thanks! Installing 8.4 and setting cursor_tuple_fraction to 1.0 does seem to
force it to execute in the same time as not using the cursor, and we'll
probably go with this solution (the only reason we're using cursors is to
avoid retrieving vast result sets through psycopg2/fetchmany). Your
explanation makes sense, and I'm curious to see why this particular query
ends up being so different, but I couldn't figure out how to run the explain
- the DECLARE syntax doesn't seem to allow it. Do I need to do it through
plpgsql?

Thanks again,

Steve McLellan


Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-10-15 Thread Richard Neill

Dear Tom,

Thanks for this, and sorry for not replying earlier. We finally obtained 
a window to deploy this patch on the real (rather busy!) production 
system as of last Saturday evening.


The good news is that the patch has now been in place for 5 days, and, 
despite some very high loading, it has survived without a single crash.


I'd venture to say that this issue is now fixed.

Best wishes,

Richard




Tom Lane wrote:

I wrote:

I'll get you a real fix as soon as I can, but might not be till
tomorrow.


The attached patch (against 8.4.x) fixes the problem as far as I can
tell.  Please test.

regards, tom lane





Index: src/backend/utils/cache/relcache.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
retrieving revision 1.287
diff -c -r1.287 relcache.c
*** src/backend/utils/cache/relcache.c  11 Jun 2009 14:49:05 -  1.287
--- src/backend/utils/cache/relcache.c  25 Sep 2009 17:32:02 -
***
*** 1386,1392 
 *
 * The data we insert here is pretty incomplete/bogus, but it'll serve 
to
 * get us launched.  RelationCacheInitializePhase2() will read the real
!* data from pg_class and replace what we've done here.
 */
relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);
  
--- 1386,1394 

 *
 * The data we insert here is pretty incomplete/bogus, but it'll serve 
to
 * get us launched.  RelationCacheInitializePhase2() will read the real
!* data from pg_class and replace what we've done here.  Note in 
particular
!* that relowner is left as zero; this cues 
RelationCacheInitializePhase2
!* that the real data isn't there yet.
 */
relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);
  
***

*** 2603,2619 
 * rows and replace the fake entries with them. Also, if any of the
 * relcache entries have rules or triggers, load that info the hard way
 * since it isn't recorded in the cache file.
 */
hash_seq_init(&status, RelationIdCache);
  
  	while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)

{
Relationrelation = idhentry->reldesc;
  
  		/*

 * If it's a faked-up entry, read the real pg_class tuple.
 */
!   if (needNewCacheFile && relation->rd_isnailed)
{
HeapTuple   htup;
Form_pg_class relp;
--- 2605,2635 
 * rows and replace the fake entries with them. Also, if any of the
 * relcache entries have rules or triggers, load that info the hard way
 * since it isn't recorded in the cache file.
+*
+* Whenever we access the catalogs to read data, there is a possibility
+* of a shared-inval cache flush causing relcache entries to be removed.
+* Since hash_seq_search only guarantees to still work after the 
*current*
+* entry is removed, it's unsafe to continue the hashtable scan 
afterward.
+* We handle this by restarting the scan from scratch after each access.
+* This is theoretically O(N^2), but the number of entries that actually
+* need to be fixed is small enough that it doesn't matter.
 */
hash_seq_init(&status, RelationIdCache);
  
  	while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)

{
Relationrelation = idhentry->reldesc;
+   boolrestart = false;
+ 
+ 		/*

+* Make sure *this* entry doesn't get flushed while we work 
with it.
+*/
+   RelationIncrementReferenceCount(relation);
  
  		/*

 * If it's a faked-up entry, read the real pg_class tuple.
 */
!   if (relation->rd_rel->relowner == InvalidOid)
{
HeapTuple   htup;
Form_pg_class relp;
***
*** 2630,2636 
 * Copy tuple to relation->rd_rel. (See notes in
 * AllocateRelationDesc())
 */
-   Assert(relation->rd_rel != NULL);
memcpy((char *) relation->rd_rel, (char *) relp, 
CLASS_TUPLE_SIZE);
  
  			/* Update rd_options while we have the tuple */

--- 2646,2651 
***
*** 2639,2660 
RelationParseRelOptions(relation, htup);
  
  			/*

!* Also update the derived fields in rd_att.
 */
!   relation->rd_att->tdtypeid = relp->reltype;
!   relation->rd_att->tdtypmod = -1;  /* unnecessary, 
but... */
!   rel

Re: [BUGS] BUG #5120: Performance difference between running a query with named cursor and straight SELECT

2009-10-15 Thread Tom Lane
Steve McLellan  writes:
> Thanks! Installing 8.4 and setting cursor_tuple_fraction to 1.0 does seem to
> force it to execute in the same time as not using the cursor, and we'll
> probably go with this solution (the only reason we're using cursors is to
> avoid retrieving vast result sets through psycopg2/fetchmany). Your
> explanation makes sense, and I'm curious to see why this particular query
> ends up being so different, but I couldn't figure out how to run the explain
> - the DECLARE syntax doesn't seem to allow it.

"EXPLAIN [ANALYZE] DECLARE x CURSOR ..." works for me.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5

2009-10-15 Thread Ryan Douglas

The following bug has been logged online:

Bug reference:  5121
Logged by:  Ryan Douglas
Email address:  rdoug...@arbinet.com
PostgreSQL version: 8.4.1
Operating system:   Fedora 11
Description:Segmentation Fault when using pam w/ krb5
Details: 

Whenever I use psql to remotely connect to the database the server crashes
(see log below). If I use psql with the '-W' option then it's fine. 

I also tested with pam_tacplus.so and in both cases the db didn't crash. It
just complained about not having credentials to authenticate when the -W
option is not used.

I can reproduce at will so let me know if you need more information.




- pam configuration

auth   sufficient   pam_krb5.so no_user_check
accountrequired pam_permit.so
sessionrequired pam_permit.so



 postgresql log -with krb5 configured in pam --

<[unkno...@[unknown]  2009-10-15 16:21:11.939 EDT>LOG:  connection received:
host=10.0.20.38 port=42662
LOG:  could
not receive data from client: Connection reset by peer
<@  2009-10-15 16:21:11.987 EDT>LOG:  server process (PID 16978) was
terminated by signal 11: Segmentation fault
<@  2009-10-15 16:21:11.987 EDT>LOG:  terminating any other active server
processes
<@  2009-10-15 16:21:11.989 EDT>LOG:  all server processes terminated;
reinitializing
<@  2009-10-15 16:21:12.109 EDT>LOG:  database system was interrupted; last
known up at 2009-10-15 16:21:07 EDT
<@  2009-10-15 16:21:12.109 EDT>LOG:  database system was not properly shut
down; automatic recovery in progress
<@  2009-10-15 16:21:12.110 EDT>LOG:  record with zero length at 3/B7C396B8
<@  2009-10-15 16:21:12.110 EDT>LOG:  redo is not required
<@  2009-10-15 16:21:12.137 EDT>LOG:  database system is ready to accept
connections
<@  2009-10-15 16:21:12.137 EDT>LOG:  autovacuum launcher started



 postgresql log -with tacplus configured in pam --

<[unkno...@[unknown]  2009-10-15 16:41:01.544 EDT>LOG:  connection received:
host=10.0.20.38 port=58894
LOG:  could
not receive data from client: Connection reset by peer
LOG: 
pam_authenticate failed: Insufficient credentials to access authentication
data
FATAL:  PAM
authentication failed for user "rdouglas"
<[unkno...@[unknown]  2009-10-15 16:41:05.298 EDT>LOG:  connection received:
host=10.0.20.38 port=58895
LOG: 
connection authorized: user=rdouglas database=tacacs



 /var/log/messages 

Oct 15 16:21:07 va-mp-db02 kernel: postgres[16971]: segfault at 0 ip
00559624 sp 7fff43dbe180 error 4 in postgres[40+439000]
Oct 15 16:21:11 va-mp-db02 kernel: postgres[16978]: segfault at 0 ip
00559624 sp 7fff43dbe180 error 4 in postgres[40+439000]
-

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5120: Performance difference between running a query with named cursor and straight SELECT

2009-10-15 Thread Steve McLellan
>Steve McLellan  writes:
>> Thanks! Installing 8.4 and setting cursor_tuple_fraction to 1.0 does seem
to
>> force it to execute in the same time as not using the cursor, and we'll
>> probably go with this solution (the only reason we're using cursors is to
>> avoid retrieving vast result sets through psycopg2/fetchmany). Your
>> explanation makes sense, and I'm curious to see why this particular query
>> ends up being so different, but I couldn't figure out how to run the
explain
>> - the DECLARE syntax doesn't seem to allow it.
>
> "EXPLAIN [ANALYZE] DECLARE x CURSOR ..." works for me.
>
>   regards, tom lane

Sorry, I should've tried that. The two explains are below; without cursor
then with cursor. I don't know enough to reliably say whether there's
anything wrong with either (we use default_statistics_target=100 although
the estimates don't look right for some of the query fragments), but they're
certainly very different.



WITHOUT CURSOR:

Hash Join  (cost=33364.36..46606.90 rows=1 width=4) (actual
time=795.690..833.577 rows=14 loops=1)
   Hash Cond: (fact_table.originator_key =
originator_dimension.originator_key)
   ->  Nested Loop  (cost=9.89..46577.20 rows=1392 width=8) (actual
time=792.676..832.203 rows=186 loops=1)
 ->  Nested Loop  (cost=9.89..40678.24 rows=48 width=8) (actual
time=792.653..830.017 rows=110 loops=1)
   ->  Hash Join  (cost=9.89..40641.84 rows=66 width=12)
(actual time=467.029..790.075 rows=3884 loops=1)
 Hash Cond: (record_type_mv.record_key =
record_dimension.record_key)
 ->  Seq Scan on record_type_mv  (cost=0.00..7076.20
rows=30012 width=4) (actual time=0.920..270.790 rows=31056 loops=1)
   Filter: (("Type")::text = ANY (('{"Cat 1 -
yyy","Cat 1 - www","Cat 1 - zzz"}'::character varying[])::text[]))
 ->  Hash  (cost=33287.67..33287.67 rows=4178 width=8)
(actual time=447.819..447.819 rows=10203 loops=1)
   ->  Bitmap Heap Scan on record_dimension
 (cost=18214.01..33287.67 rows=4178 width=8) (actual time=371.277..428.278
rows=10203 loops=1)
 Recheck Cond: (("Category" = 'Cat 1'::text)
AND ("Sector" = 'Sector I'::text) AND ("Dataset - A" OR "Dataset - C" OR
"Dataset - B"))
 Filter: (("Dataset - A" OR "Dataset - C" OR
"Dataset - B") AND ("Country" = 'USA'::text))
 ->  BitmapAnd  (cost=18214.01..18214.01
rows=4623 width=0) (actual time=367.336..367.336 rows=0 loops=1)
   ->  Bitmap Index Scan on "Category"
 (cost=0.00..3091.75 rows=123623 width=0) (actual time=53.713..53.713
rows=124053 loops=1)
 Index Cond: ("Category" = 'Cat
1'::text)
   ->  Bitmap Index Scan on "Sector"
 (cost=0.00..7525.37 rows=327577 width=0) (actual time=129.610..129.610
rows=328054 loops=1)
 Index Cond: ("Sector" = 'Sector
I'::text)
   ->  BitmapOr  (cost=7594.30..7594.30
rows=410371 width=0) (actual time=128.983..128.983 rows=0 loops=1)
 ->  Bitmap Index Scan on
"Dataset - A"  (cost=0.00..550.30 rows=29579 width=0) (actual
time=11.393..11.393 rows=30016 loops=1)
   Index Cond: ("Dataset -
A" = true)
 ->  Bitmap Index Scan on
"Dataset - C"  (cost=0.00..6981.18 rows=377696 width=0) (actual
time=116.306..116.306 rows=380788 loops=1)
   Index Cond: ("Dataset -
C" = true)
 ->  Bitmap Index Scan on
"Dataset - B"  (cost=0.00..59.69 rows=3097 width=0) (actual
time=1.272..1.272 rows=3116 loops=1)
   Index Cond: ("Dataset -
B" = true)
   ->  Index Scan using date_dimension_pkey on date_dimension
"date_dimension_Published Date"  (cost=0.00..0.54 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=3884)
 Index Cond: ("date_dimension_Published Date".date_key =
record_dimension.published_date_key)
 Filter: ("date_dimension_Published Date"."SQL Date" >=
(now() - '6 mons'::interval))
 ->  Index Scan using record_date on instance_core_fact fact_table
 (cost=0.00..122.48 rows=33 width=12) (actual time=0.007..0.011 rows=2
loops=110)
   Index Cond: (fact_table.record_key =
record_dimension.record_key)
   ->  Hash  (cost=24.45..24.45 rows=1 width=4) (actual time=1.004..1.004
rows=169 loops=1)
 ->  Bitmap Heap Scan on originator_dimension  (cost=20.44..24.45
rows=1 width=4) (actual time=0.313..0.662 rows=169 loops=1)
   Recheck Cond: (("Area" = ANY ('{National,"Phoenix,
AZ"}'::text[])) AND ("Originator Type" = 'NO DATA'::text))
   

Re: [BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5

2009-10-15 Thread Tom Lane
"Ryan Douglas"  writes:
> Whenever I use psql to remotely connect to the database the server crashes
> (see log below). If I use psql with the '-W' option then it's fine. 

What this looks like at first glance is a bug in the PAM module you're
using, since Postgres really has no idea which PAM configuration is
being invoked.  Can you get a stack trace to narrow down where the sig11
is happening?  (You will probably need to insert "ulimit -c unlimited"
into the .bash_profile for Postgres to get the postmaster into a context
where it will produce a core dump.)

Also, is this your own build of Postgres, or if not whose?

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5

2009-10-15 Thread Heikki Linnakangas
Ryan Douglas wrote:
> The following bug has been logged online:
> 
> Bug reference:  5121
> Logged by:  Ryan Douglas
> Email address:  rdoug...@arbinet.com
> PostgreSQL version: 8.4.1
> Operating system:   Fedora 11
> Description:Segmentation Fault when using pam w/ krb5
> Details: 
> 
> Whenever I use psql to remotely connect to the database the server crashes
> (see log below). If I use psql with the '-W' option then it's fine. 
> 
> I also tested with pam_tacplus.so and in both cases the db didn't crash. It
> just complained about not having credentials to authenticate when the -W
> option is not used.
> 
> I can reproduce at will so let me know if you need more information.

Can you get a stack trace with gdb? Something along the lines of:

ulimit -c unlimited
(start postmaster)
(reproduce the crash)
gdb /usr/bin/postgres $PGDATA/core
bt

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5120: Performance difference between running a query with named cursor and straight SELECT

2009-10-15 Thread Tom Lane
Steve McLellan  writes:
> Sorry, I should've tried that. The two explains are below; without cursor
> then with cursor. I don't know enough to reliably say whether there's
> anything wrong with either (we use default_statistics_target=100 although
> the estimates don't look right for some of the query fragments), but they're
> certainly very different.

Right, so the cursor plan is one that is estimated to take 3x longer,
but can deliver rows starting right away, where the other plan will have
a long delay and then spit everything out in a bunch.  It's not a bad
tradeoff, if the estimate were right.  The killer mistake is here:

>->  Index Scan using "Originator Type" on originator_dimension
>  (cost=0.00..125.29 rows=1 width=4) (actual time=26.067..41.906 rows=169
> loops=1)
>  Index Cond: ("Originator Type" = 'NO DATA'::text)
>  Filter: ("Area" = ANY ('{National,"Phoenix, AZ"}'::text[]))

Since that's on the outside of a nestloop, the 169x rowcount error
translates directly to a 169x error in the total runtime estimate ---
and it looks like that's not very far at all from the real ratio,
so the other estimates are pretty good.

I'm not sure whether you'd get any real improvement from a further
increase in stats target, though that's certainly an easy thing to
try.  What this looks like to me is the longstanding problem of having
correlations between the distributions of two different columns
(here, Area and Originator Type), which the planner doesn't know about.
So its estimate of the combined probability of meeting the conditions
is off.  You could try checking estimated vs real rowcount for the
area and originator conditions separately to see if that's the case
or not.  If either one is off by itself then a stats increase for that
column would help, if not there's not much to be done except lean on
the cursor target knob ...

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5120: Performance difference between running a query with named cursor and straight SELECT

2009-10-15 Thread Steve McLellan
>Since that's on the outside of a nestloop, the 169x rowcount error
>translates directly to a 169x error in the total runtime estimate ---
>and it looks like that's not very far at all from the real ratio,
>so the other estimates are pretty good.

You're correct - the query's exercising a very small subset of that table.
The table has 400,000 rows, of which only 400 have Originator Type = 'NO
DATA', and ALL the rows with those two Areas have Originator Type = 'NO
DATA'. As you say, increasing the stats target may not make much difference
because of the size of this sample when compared to the overall data set (in
fact the production servers have it set to 700 and it makes little
real-world difference), so we'll go with your initial suggestion and upgrade
to 8.4. Thanks again; this has been an good set of lessons to learn.

Steve


Re: [BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5

2009-10-15 Thread Tom Lane
"Douglas, Ryan"  writes:
> (gdb) bt
> #0  0x00559624 in pam_passwd_conv_proc ()
> #1  0x7f738dfeedd8 in _pam_krb5_conv_call (pamh=, 
> messages=0xb51780, n_prompts=0, responses=0x7fff2e356668) at conv.c:99
> #2  0x7f738dfefb38 in _pam_krb5_generic_prompter (context= optimized out>, data=0x7fff2e357fe0, name=, 
> banner=, num_prompts=1,
> prompts=, suppress_password_prompts=1) at 
> prompter.c:330
> #3  0x7f738dfefe10 in _pam_krb5_normal_prompter (context=0x0, 
> data=0xb51890, name=0x7fff2e356668 "", banner=0x79df27 "", num_prompts=0, 
> prompts=0x101010101010101)
> at prompter.c:409

Okay, so the dump is definitely way down inside libpam.  The next question
is whether it's really libpam's fault, or are we passing it some bad
data.  Please install the pam-debuginfo RPM that corresponds to the
pam version you have, and retry --- hopefully that will add a bit more
information to the stack trace.

(The debuginfo-install hint you got might help, though I'm not sure why
it's referencing audit-libs and not pam ... maybe this code isn't
actually in libpam?)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5118: start-status-insert-fatal

2009-10-15 Thread Pedro Gimeno

Tom Lane wrote:


This could be addressed by having the postmaster report its $PGDATA
value in the pg_ping response, but I would be against that on security
grounds.  We don't let nonprivileged users know where PGDATA is, why
would we make the information available without any authentication at
all?


Maybe a hash of it?

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Re: BUG #5065: pg_ctl start fails as administrator, with "could not locate matching postgres executable"

2009-10-15 Thread Jesse Morris
On Sep 18, 7:31 pm, jmor...@coverity.com ("Jesse Morris") wrote:
> The following bug has been logged online:
>
> Bug reference:      5065
> Logged by:          Jesse Morris
> Email address:      jmor...@coverity.com
> PostgreSQL version: 8.3.7, 8.4.1
> Operating system:   Windows Server 2003 R2
> Description:        pg_ctl start fails as administrator, with "could not
> locate matching postgres executable"
> Details:
>
> I am logged in as domain\jmorris, a member of the local Administrators
> group.
>
...
>
> From cmd.exe:
> initdb.exe works fine.
> pg_ctl start complains "FATAL: postgres - could not locate matching postgres
> executable"
>
> Instrumentation and investigation reveals that the failure is in
> find_other_exec (exec.c) as the error text implies, but ultimately in
> pipe_read_line; CreatePipe fails with error 5 (Access Denied).  
...

I went back to the version that supposedly initially fixed this issue,
but I couldn't get it to work either.  So I think the DACL adjustment
code was always broken.  The DACL stuff that both Cygwin and Active
Perl use to simulate *nix file permissions masks this error, so any
test framework that uses them would get false negatives on this bug.
Since these DACLs are inheritable, a workaround is to run pg_ctl as a
child process of Active Perl or Cygwin.   The comments indicated
pg_ctl & initdb were already trying to do the same thing themselves
(that is, add the current user to the DACLs) but it didn't actually
work on any of the systems I tried it on.

I think that a number of other people have seen this bug; search for
"FATAL: postgres - could not locate matching postgres executable."
But that message is so misleading is probably why it seems nobody has
properly diagnosed it as a permissions issue before.  I didn't do
anything to fix pg_ctl's error reporting.  :D

The patch:

--begin patch--

diff -rup unfixed/postgresql-8.4.1/src/bin/initdb/initdb.c fixed/
postgresql-8.4.1/src/bin/initdb/initdb.c
--- unfixed/postgresql-8.4.1/src/bin/initdb/initdb.c2009-06-11
07:49:07.0 -0700
+++ fixed/postgresql-8.4.1/src/bin/initdb/initdb.c  2009-10-15
16:31:12.651226900 -0700
@@ -2392,6 +2392,10 @@ CreateRestrictedProcess(char *cmd, PROCE
fprintf(stderr, "Failed to create restricted token: %lu\n",
GetLastError());
return 0;
}
+
+#ifndef __CYGWIN__
+AddUserToTokenDacl(restrictedToken);
+#endif

if (!CreateProcessAsUser(restrictedToken,
 NULL,
@@ -2409,11 +2413,7 @@ CreateRestrictedProcess(char *cmd, PROCE
fprintf(stderr, "CreateProcessAsUser failed: %lu\n", 
GetLastError
());
return 0;
}
-
-#ifndef __CYGWIN__
-   AddUserToDacl(processInfo->hProcess);
-#endif
-
+
return ResumeThread(processInfo->hThread);
 }
 #endif
Only in fixed/postgresql-8.4.1/src/bin/initdb: initdb.c.bak
diff -rup unfixed/postgresql-8.4.1/src/bin/pg_ctl/pg_ctl.c fixed/
postgresql-8.4.1/src/bin/pg_ctl/pg_ctl.c
--- unfixed/postgresql-8.4.1/src/bin/pg_ctl/pg_ctl.c2009-09-01
19:40:59.0 -0700
+++ fixed/postgresql-8.4.1/src/bin/pg_ctl/pg_ctl.c  2009-10-15
16:31:00.096971600 -0700
@@ -1389,7 +1389,10 @@ CreateRestrictedProcess(char *cmd, PROCE
write_stderr("Failed to create restricted token: %lu\n",
GetLastError());
return 0;
}
-
+#ifndef __CYGWIN__
+   AddUserToTokenDacl(restrictedToken);
+#endif
+
r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL,
TRUE, CREATE_SUSPENDED, NULL, NULL, &si, processInfo);

Kernel32Handle = LoadLibrary("KERNEL32.DLL");
@@ -1488,9 +1491,6 @@ CreateRestrictedProcess(char *cmd, PROCE
}
}

-#ifndef __CYGWIN__
-   AddUserToDacl(processInfo->hProcess);
-#endif

CloseHandle(restrictedToken);

Only in fixed/postgresql-8.4.1/src/bin/pg_ctl: pg_ctl.c.bak
diff -rup unfixed/postgresql-8.4.1/src/include/port.h fixed/
postgresql-8.4.1/src/include/port.h
--- unfixed/postgresql-8.4.1/src/include/port.h 2009-06-11
07:49:08.0 -0700
+++ fixed/postgresql-8.4.1/src/include/port.h   2009-10-15
14:02:36.860635900 -0700
@@ -81,7 +81,7 @@ extern int find_other_exec(const char *a

 /* Windows security token manipulation (in exec.c) */
 #ifdef WIN32
-extern BOOL AddUserToDacl(HANDLE hProcess);
+extern BOOL AddUserToTokenDacl(HANDLE hToken);
 #endif


diff -rup unfixed/postgresql-8.4.1/src/port/exec.c fixed/
postgresql-8.4.1/src/port/exec.c
--- unfixed/postgresql-8.4.1/src/port/exec.c2009-06-11
07:49:15.0 -0700
+++ fixed/postgresql-8.4.1/src/port/exec.c  2009-10-15
16:02:04.352805300 -0700
@@ -664,11 +664,10 @@ set_pglocale_pgservice(const char *argv0
 #ifdef WIN32

 /*
- * AddUserToDacl(HANDLE hProcess)
+ * AddUserToTokenDacl(HANDLE hToken)
  *
- * This function adds the current user account to the default DACL
- * which gets attached to the restricted token used when we create
- * a 

Re: [BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5

2009-10-15 Thread Douglas, Ryan
Tom/ Heikki ,

   This is a custom build. I used  "./configure --with-pam --with-perl 
--with-python --enable-thread-safety --with-openssl --with-krb5".


Gdb output below...  


Core was generated by `postgres: rdouglas tacacs 10.0'. 
 
Program terminated with signal 11, Segmentation fault.
#0  0x00559624 in pam_passwd_conv_proc ()
Missing separate debuginfos, use: debuginfo-install 
audit-libs-1.7.13-1.fc11.x86_64
(gdb) bt
#0  0x00559624 in pam_passwd_conv_proc ()
#1  0x7f738dfeedd8 in _pam_krb5_conv_call (pamh=, 
messages=0xb51780, n_prompts=0, responses=0x7fff2e356668) at conv.c:99
#2  0x7f738dfefb38 in _pam_krb5_generic_prompter (context=, data=0x7fff2e357fe0, name=, banner=, num_prompts=1,
prompts=, suppress_password_prompts=1) at 
prompter.c:330
#3  0x7f738dfefe10 in _pam_krb5_normal_prompter (context=0x0, 
data=0xb51890, name=0x7fff2e356668 "", banner=0x79df27 "", num_prompts=0, 
prompts=0x101010101010101)
at prompter.c:409
#4  0x0031d3660bce in krb5_get_as_key_password (context=0xb4e710, 
client=, etype=23, prompter=, 
prompter_data=,
salt=0x7fff2e356f00, params=0x7fff2e356ef0, as_key=0x7fff2e356ec0, 
gak_data=0x7fff2e357120) at gic_pwd.c:61
#5  0x0031d3667713 in pa_enc_timestamp (context=0xb4e710, request=, in_padata=, out_padata=0x7fff2e356d30, 
salt=,
s2kparams=, etype=0x7fff2e356f4c, 
as_key=0x7fff2e356ec0, prompter=0x7f738dfefe00 <_pam_krb5_normal_prompter>, 
prompter_data=0x7fff2e357fe0,
gak_fct=0x31d36609f0 , gak_data=0x7fff2e357120) 
at preauth2.c:635
#6  0x0031d3667e0c in krb5_do_preauth (context=, 
request=0x7fff2e356e40, encoded_request_body=,
encoded_previous_request=, in_padata=0xb51060, 
out_padata=, salt=0x7fff2e356f00, 
s2kparams=0x7fff2e356ef0, etype=0x7fff2e356f4c,
as_key=0x7fff2e356ec0, prompter=0x7f738dfefe00 <_pam_krb5_normal_prompter>, 
prompter_data=0x7fff2e357fe0, gak_fct=0x31d36609f0 ,
gak_data=0x7fff2e357120, get_data_rock=0x7fff2e356ee0, opte=0xb4ec50) at 
preauth2.c:1586
#7  0x0031d365f251 in krb5_get_init_creds (context=0xb4e710, creds=, client=, prompter=,
prompter_data=, start_time=, 
in_tkt_service=0x7fff2e358050 "krbtgt/thexchange@thexchange.com", 
options=0xb4ec50,
gak_fct=0x31d36609f0 , gak_data=0x7fff2e357120, 
use_master=0x7fff2e35715c, as_reply=0x7fff2e357150) at get_in_tkt.c:1106
#8  0x0031d3660f18 in krb5_get_init_creds_password (context=0xb4e710, 
creds=, client=, password=,
prompter=0x7f738dfefe00 <_pam_krb5_normal_prompter>, data=, start_time=0, in_tkt_service=0x7fff2e358050 
"krbtgt/thexchange@thexchange.com",
options=0xb4ec50) at gic_pwd.c:139
#9  0x7f738dff5571 in v5_get_creds (ctx=0xb4e710, pamh=, creds=, user=, 
userinfo=0xb4efe0, options=0xb4ecb0,
service=0x7f738dff9bf8 "krbtgt", password=0x0, gic_options=0xb4ec50, 
prompter=0x7f738dfefe00 <_pam_krb5_normal_prompter>, result=0xb505c4) at 
v5.c:1014
#10 0x7f738dfeb3cf in pam_sm_authenticate (pamh=0xb5f460, flags=0, 
argc=, argv=) at auth.c:423
#11 0x0031d0202c1e in _pam_dispatch_aux (use_cached_chain=, resumed=, h=, flags=,
pamh=) at pam_dispatch.c:110
#12 _pam_dispatch (use_cached_chain=, resumed=, h=, flags=, 
pamh=)
at pam_dispatch.c:407
#13 0x0031d0202500 in pam_authenticate (pamh=0xb5f460, flags=0) at 
pam_auth.c:34
#14 0x005598ed in CheckPAMAuth.clone.0 ()
#15 0x00559b96 in ClientAuthentication ()
#16 0x005b25dc in BackendInitialize ()
#17 0x005b2ebc in ServerLoop ()
#18 0x005b559c in PostmasterMain ()
#19 0x005617d0 in main ()

-Original Message-
From: Heikki Linnakangas [mailto:heikki.linnakan...@enterprisedb.com] 
Sent: Thursday, October 15, 2009 5:23 PM
To: Douglas, Ryan
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5

Ryan Douglas wrote:
> The following bug has been logged online:
> 
> Bug reference:  5121
> Logged by:  Ryan Douglas
> Email address:  rdoug...@arbinet.com
> PostgreSQL version: 8.4.1
> Operating system:   Fedora 11
> Description:Segmentation Fault when using pam w/ krb5
> Details: 
> 
> Whenever I use psql to remotely connect to the database the server crashes
> (see log below). If I use psql with the '-W' option then it's fine. 
> 
> I also tested with pam_tacplus.so and in both cases the db didn't crash. It
> just complained about not having credentials to authenticate when the -W
> option is not used.
> 
> I can reproduce at will so let me know if you need more information.

Can you get a stack trace with gdb? Something along the lines of:

ulimit -c unlimited
(start postmaster)
(reproduce the crash)
gdb /usr/bin/postgres $PGDATA/core
bt

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.

Re: [BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5

2009-10-15 Thread Douglas, Ryan
When I initially ran gdb I got the following

.
.
Program terminated with signal 11, Segmentation fault.
#0  0x00559624 in pam_passwd_conv_proc ()
Missing separate debuginfos, use: debuginfo-install
audit-libs-1.7.13-1.fc11.x86_64 e2fsprogs-libs-1.41.4-12.fc11.x86_64
glibc-2.10.1-5.x86_64 keyutils-libs-1.2-5.fc11.x86_64
krb5-libs-1.6.3-20.fc11.x86_64 libselinux-2.0.80-1.fc11.x86_64
nss-softokn-freebl-3.12.4-3.fc11.x86_64 openssl-0.9.8k-5.fc11.x86_64
pam-1.0.91-6.fc11.x86_64 pam_krb5-2.3.5-1.fc11.x86_64
zlib-1.2.3-22.fc11.x86_64
(gdb) bt
#0  0x00559624 in pam_passwd_conv_proc ()
#1  0x7f738dfeedd8 in ?? () from /lib64/security/pam_krb5.so
.
.


I ran debuginfo-install which successfully installed all the related
packages, including pam,  except for audit-libs due to missing
dependency. Probably because of some package I didn't install. 

For the sake of completeness, I'll install 8.4.1 on another machine
which has all the deps met and try to reproduce the problem. 



-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Thursday, October 15, 2009 6:07 PM
To: Douglas, Ryan
Cc: pgsql-bugs@postgreSQL.org
Subject: Re: [BUGS] BUG #5121: Segmentation Fault when using pam w/ krb5


"Douglas, Ryan"  writes:
> (gdb) bt
> #0  0x00559624 in pam_passwd_conv_proc ()
> #1  0x7f738dfeedd8 in _pam_krb5_conv_call (pamh=, messages=0xb51780, n_prompts=0, responses=0x7fff2e356668) at
conv.c:99
> #2  0x7f738dfefb38 in _pam_krb5_generic_prompter (context=, data=0x7fff2e357fe0, name=,
banner=, num_prompts=1,
> prompts=, suppress_password_prompts=1) at
prompter.c:330
> #3  0x7f738dfefe10 in _pam_krb5_normal_prompter (context=0x0,
data=0xb51890, name=0x7fff2e356668 "", banner=0x79df27 "",
num_prompts=0, prompts=0x101010101010101)
> at prompter.c:409

Okay, so the dump is definitely way down inside libpam.  The next
question
is whether it's really libpam's fault, or are we passing it some bad
data.  Please install the pam-debuginfo RPM that corresponds to the
pam version you have, and retry --- hopefully that will add a bit more
information to the stack trace.

(The debuginfo-install hint you got might help, though I'm not sure why
it's referencing audit-libs and not pam ... maybe this code isn't
actually in libpam?)

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] issue with integer nullable column and value 0

2009-10-15 Thread Sean Hsien
Found the bug...was my fault, of course.

Sorry for the mistake.

Sean

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs