[HACKERS] obtaining row locking information

2005-08-07 Thread Tatsuo Ishii
Hi,

With a help from Bruce, I wrote a small function which returns row
locking information(see attached file if you are interested). Here is
a sample result:

test=# select * from pgrowlocks('t1');
 locked_row | lock_type | locker | multi 
+---++---
  (0,1) | Shared|  1 | t
  (0,3) | Exclusive |575 | f
(2 rows)

I think it will be more usefull if actual xids are shown in the case
"locker" is a multixid. It seems GetMultiXactIdMembers() does the
job. Unfortunately that is a static funtcion, however. Is there any
chance GetMultiXactIdMembers() becomes public funtion?
--
Tatsuo Ishii
/*
 * $PostgreSQL$
 *
 * Copyright (c) 2005   Tatsuo Ishii
 *
 * Permission to use, copy, modify, and distribute this software and
 * its documentation for any purpose, without fee, and without a
 * written agreement is hereby granted, provided that the above
 * copyright notice and this paragraph and the following two
 * paragraphs appear in all copies.
 *
 * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
 * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
 * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
 * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
 * OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
 * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
 * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 */

#include "postgres.h"

#include "funcapi.h"
#include "access/heapam.h"
#include "access/transam.h"
#include "catalog/namespace.h"
#include "catalog/pg_type.h"
#include "utils/builtins.h"


PG_FUNCTION_INFO_V1(pgrowlocks);

extern Datum pgrowlocks(PG_FUNCTION_ARGS);

/* --
 * pgrowlocks:
 * returns tids of rows being locked
 *
 * C FUNCTION definition
 * pgrowlocks(text) returns set of pgrowlocks_type
 * see pgrowlocks.sql for pgrowlocks_type
 * --
 */

#define DUMMY_TUPLE "public.pgrowlocks_type"
#define NCHARS 32

/*
 * define this if makeRangeVarFromNameList() has two arguments. As far
 * as I know, this only happens in 8.0.x.
 */
#undef MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS

typedef struct {
HeapScanDesc scan;
int ncolumns;
} MyData;

Datum
pgrowlocks(PG_FUNCTION_ARGS)
{
FuncCallContext *funcctx;
HeapScanDesc scan;
HeapTuple   tuple;
TupleDesc   tupdesc;
AttInMetadata *attinmeta;
Datum   result;
MyData *mydata;
Relationrel;

if (SRF_IS_FIRSTCALL())
{
text   *relname;
RangeVar   *relrv;
MemoryContext oldcontext;

funcctx = SRF_FIRSTCALL_INIT();
oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);

tupdesc = RelationNameGetTupleDesc(DUMMY_TUPLE);
attinmeta = TupleDescGetAttInMetadata(tupdesc);
funcctx->attinmeta = attinmeta;

relname = PG_GETARG_TEXT_P(0);
#ifdef MAKERANGEVARFROMNAMELIST_HAS_TWO_ARGS
relrv = 
makeRangeVarFromNameList(textToQualifiedNameList(relname,   
 
"pgrowlocks"));

#else
relrv = 
makeRangeVarFromNameList(textToQualifiedNameList(relname));
#endif
rel = heap_openrv(relrv, AccessShareLock);
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
mydata = palloc(sizeof(*mydata));
mydata->scan = scan;
mydata->ncolumns = tupdesc->natts;
funcctx->user_fctx = mydata;

MemoryContextSwitchTo(oldcontext);
}

funcctx = SRF_PERCALL_SETUP();
attinmeta = funcctx->attinmeta;
mydata = (MyData *)funcctx->user_fctx;
scan = mydata->scan;

/* scan the relation */
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
/* must hold a buffer lock to call HeapTupleSatisfiesUpdate */
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);

if (HeapTupleSatisfiesUpdate(tuple->t_data, 
GetCurrentCommandId(), scan->rs_cbuf)
== HeapTupleBeingUpdated)
{

char **values;
int i;

values = (char **) palloc(mydata->ncolumns * 
sizeof(char *));

i = 0;
values[i++] = (char *)DirectFunctionCall1(tidout, 
PointerGetDatum(&tuple->t_self));

#ifdef HEAP_XMAX_SHARED_LOCK
if (tuple->t_data->t_infomask & HEAP_XMAX_SHARED_LOCK)
  values[i++] = pstrdup("Shared");
  

Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Andrew Dunstan



Tom Lane wrote:


"Rocco Altier" <[EMAIL PROTECTED]> writes:
 


It looks like when we changed regress/GNUmakefile to pull rules from
Makefile.shlib, cygwin got broken in the process.
...
I don't know enough about the rest of the way the cygwin port is put
together, but it seems that the other platforms all have
shlib=lib$(NAME)...
   



Seems to me that defining shlib that way for Cygwin too would be a
reasonable answer, but I'm not sure if there will be any side-effects.
Can someone try it?


 



The attached patch worked for me. The second part should not be applied 
- I simply include it to illustrate the hack (taken from a recent clue 
on the Cygwin mailing list) that I found necessary to get around 
brokenness on the latest release of Cygwin. The good news is that they 
do seem to be trying to find out what broke and fix it.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Andrew Dunstan


er that would be this patch.

Andrew Dunstan wrote:




Tom Lane wrote:


"Rocco Altier" <[EMAIL PROTECTED]> writes:



It looks like when we changed regress/GNUmakefile to pull rules from
Makefile.shlib, cygwin got broken in the process.
...
I don't know enough about the rest of the way the cygwin port is put
together, but it seems that the other platforms all have
shlib=lib$(NAME)...



Seems to me that defining shlib that way for Cygwin too would be a
reasonable answer, but I'm not sure if there will be any side-effects.
Can someone try it?






The attached patch worked for me. The second part should not be 
applied - I simply include it to illustrate the hack (taken from a 
recent clue on the Cygwin mailing list) that I found necessary to get 
around brokenness on the latest release of Cygwin. The good news is 
that they do seem to be trying to find out what broke and fix it.



Index: src/Makefile.shlib
===
RCS file: /projects/cvsroot/pgsql/src/Makefile.shlib,v
retrieving revision 1.95
diff -c -r1.95 Makefile.shlib
*** src/Makefile.shlib  13 Jul 2005 17:00:44 -  1.95
--- src/Makefile.shlib  7 Aug 2005 13:21:58 -
***
*** 234,240 
  endif
  
  ifeq ($(PORTNAME), cygwin)
!   shlib   = $(NAME)$(DLSUFFIX)
# needed for /contrib modules, not sure why
SHLIB_LINK  += $(LIBS)
haslibarule   = yes
--- 234,240 
  endif
  
  ifeq ($(PORTNAME), cygwin)
!   shlib   = lib$(NAME)$(DLSUFFIX)
# needed for /contrib modules, not sure why
SHLIB_LINK  += $(LIBS)
haslibarule   = yes
Index: src/backend/storage/file/fd.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.118
diff -c -r1.118 fd.c
*** src/backend/storage/file/fd.c   4 Jul 2005 04:51:48 -   1.118
--- src/backend/storage/file/fd.c   7 Aug 2005 13:22:00 -
***
*** 327,332 
--- 327,334 
elog(WARNING, "dup(0) failed after %d 
successes: %m", used);
break;
}
+   if (used >= 250)
+   break;
  
if (used >= size)
{

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> ... The second part should not be 
> applied - I simply include it to illustrate the hack (taken from a 
> recent clue on the Cygwin mailing list) that I found necessary to get 
> around brokenness on the latest release of Cygwin. The good news is 
> that they do seem to be trying to find out what broke and fix it.

You mean this?

> *** src/backend/storage/file/fd.c 4 Jul 2005 04:51:48 -   1.118
> --- src/backend/storage/file/fd.c 7 Aug 2005 13:22:00 -
> ***
> *** 327,332 
> --- 327,334 
>   elog(WARNING, "dup(0) failed after %d 
> successes: %m", used);
>   break;
>   }
> + if (used >= 250)
> + break;
>  
>   if (used >= size)
>   {

Looking at that code, I wonder why we don't make the loop stop at
max_files_per_process opened files --- the useful result will be
bounded by that anyhow.  Actively running the system out of FDs,
even momentarily, doesn't seem like a friendly thing to do.

This wouldn't directly solve your problem unless you reduced the
default value of max_files_per_process, but at least that would
be something reasonable to do instead of hacking the code.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Seems to me that defining shlib that way for Cygwin too would be a
>>> reasonable answer, but I'm not sure if there will be any side-effects.
>>> Can someone try it?
>> 
>> The attached patch worked for me.
  
>   ifeq ($(PORTNAME), cygwin)
> !   shlib = $(NAME)$(DLSUFFIX)
> # needed for /contrib modules, not sure why
> SHLIB_LINK+= $(LIBS)
> haslibarule   = yes
> --- 234,240 
>   endif
  
>   ifeq ($(PORTNAME), cygwin)
> !   shlib = lib$(NAME)$(DLSUFFIX)
> # needed for /contrib modules, not sure why
> SHLIB_LINK+= $(LIBS)
> haslibarule   = yes

Couple thoughts here --- one, someone upthread suggested
"cyg$(NAME)$(DLSUFFIX" as the proper value for shlib.  I didn't
see why at first, but now it occurs to me that it might avoid name
collisions with Windows-native builds, which use the "lib" prefix.
I'm not sure if DLLs for Cygwin and native builds would ever go
into the same directory though.  Is this worth worrying about?

Second, in view of Rocco's recent fixes for AIX, I wonder whether that
hack addition to SHLIB_LINK is still needed?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Race condition in backend process exit

2005-08-07 Thread Tom Lane
I can fairly consistently crash CVS tip with the following:

Session 1:

regression=# begin;
BEGIN
regression=# select * from int4_tbl where f1 = 123456 for update;
   f1   

 123456
(1 row)

Session 2:

regression=# begin;
BEGIN
regression=# select * from int4_tbl where f1 = 123456 for update;
<< blocks >>

Session 1:

regression=# \q

Session 2 now crashes:

TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "lmgr.c", 
Line: 464)
LOG:  server process (PID 2337) was terminated by signal 6

with this backtrace:

#4  0x3000c4 in ExceptionalCondition (
conditionName=0xc4b3c "!(((xid) != ((TransactionId) 0)))", 
errorType=0xc4a14 "FailedAssertion", fileName=0xc49c8 "lmgr.c", 
lineNumber=464) at assert.c:51
#5  0x265058 in XactLockTableWait (xid=0) at lmgr.c:464
#6  0x105730 in heap_lock_tuple (relation=0x7b03c451, tuple=0x7b03bdd0, 
buffer=0x7b03bdec, cid=2063845554, mode=LockTupleExclusive, nowait=0)
at heapam.c:2076
#7  0x1c5880 in ExecutePlan (estate=0x4010ab50, planstate=0xc0064b68, 
operation=CMD_SELECT, numberTuples=-1073329308, 
direction=ForwardScanDirection, dest=0x40106708) at execMain.c:1192

Apparently, session 1's locks are being released while it still shows as
an active transaction in the PGPROC array, causing XactLockTableWait to
suppose it was a subtransaction and look for the parent.  This indicates
something is being done incompletely or in the wrong order during
backend exit, because AbortTransaction is perfectly clear that you mark
yourself not running before you release your locks.  Haven't found it
yet.

I could not provoke the same crash in 8.0, but I suspect this may just
be a chance timing difference, and that the bug may be of long standing.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] obtaining row locking information

2005-08-07 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> With a help from Bruce, I wrote a small function which returns row
> locking information(see attached file if you are interested).

Scanning the whole table seems a bit slow :-(

There is another possibility: in CVS tip, anyone who is actually blocked
on a row lock will be holding a tuple lock that shows exactly what they
are waiting for.  For example:

Session 1:

regression=# begin;
BEGIN
regression=# select * from int4_tbl where f1 = 123456 for update;
   f1   

 123456
(1 row)

Session 2:

<< same as above, leaving session 2 blocked >

Session 1:

regression=# select * from pg_locks;
   locktype| database | relation | page | tuple | transactionid | classid | 
objid | objsubid | transaction | pid  |  mode   | granted 
---+--+--+--+---+---+-+---+--+-+--+-+-
 transactionid |  |  |  |   | 14575 | | 
  |  |   14576 | 2501 | ShareLock   | f
 tuple |48344 |48369 |0 | 2 |   | | 
  |  |   14576 | 2501 | ExclusiveLock   | t
 relation  |48344 |48369 |  |   |   | | 
  |  |   14576 | 2501 | AccessShareLock | t
 relation  |48344 |48369 |  |   |   | | 
  |  |   14576 | 2501 | RowShareLock| t
 transactionid |  |  |  |   | 14576 | | 
  |  |   14576 | 2501 | ExclusiveLock   | t
 relation  |48344 |10339 |  |   |   | | 
  |  |   14575 | 2503 | AccessShareLock | t
 relation  |48344 |48369 |  |   |   | | 
  |  |   14575 | 2503 | AccessShareLock | t
 relation  |48344 |48369 |  |   |   | | 
  |  |   14575 | 2503 | RowShareLock| t
 transactionid |  |  |  |   | 14575 | | 
  |  |   14575 | 2503 | ExclusiveLock   | t
(9 rows)

Session 2 (XID 14576) is blocked on session 1 (XID 14575) according to
the first row of this output.  The second row shows the exact tuple
that it is after.

This isn't an amazingly user-friendly way of displaying things, of
course, but maybe somebody could make a function that would show it
better using pg_locks as input.

> I think it will be more usefull if actual xids are shown in the case
> "locker" is a multixid. It seems GetMultiXactIdMembers() does the
> job. Unfortunately that is a static funtcion, however. Is there any
> chance GetMultiXactIdMembers() becomes public funtion?

No particular objection here.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Marko Kreen
On Sun, Aug 07, 2005 at 12:08:28PM -0400, Tom Lane wrote:
> Couple thoughts here --- one, someone upthread suggested
> "cyg$(NAME)$(DLSUFFIX" as the proper value for shlib.  I didn't
> see why at first, but now it occurs to me that it might avoid name
> collisions with Windows-native builds, which use the "lib" prefix.
> I'm not sure if DLLs for Cygwin and native builds would ever go
> into the same directory though.  Is this worth worrying about?

.exe's in different directories than .dll's but all in PATH.

-- 
marko


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] psql and ROLES

2005-08-07 Thread Stefan Kaltenbrunner
Hi,

I'm currently working on syncing psql's tab-complete code with the docs
especially wrt ROLES. while working on this I noticed the following things:

*) there is no backslash command for getting a list of Roles (like \du &
\dg for Users and Groups) - I'm considering using \dr for that - does
that sound sensible ?

*) the new connectionlimit code allows for negative Limits (beside -1)
like this:

playground=# CREATE ROLE testrole LOGIN CONNECTION LIMIT -9;
CREATE ROLE

that doesn't strike me as that useful (and it is not clear what that
should mean anyway because such a user can still login) - so maybe we
should reject that (and create a sensible upper bound for that too)


Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Andrew Dunstan



Tom Lane wrote:


Andrew Dunstan <[EMAIL PROTECTED]> writes:
 

... The second part should not be 
applied - I simply include it to illustrate the hack (taken from a 
recent clue on the Cygwin mailing list) that I found necessary to get 
around brokenness on the latest release of Cygwin. The good news is 
that they do seem to be trying to find out what broke and fix it.
   



You mean this?

 


*** src/backend/storage/file/fd.c   4 Jul 2005 04:51:48 -   1.118
--- src/backend/storage/file/fd.c   7 Aug 2005 13:22:00 -
***
*** 327,332 
--- 327,334 
elog(WARNING, "dup(0) failed after %d successes: 
%m", used);
break;
}
+   if (used >= 250)
+   break;

if (used >= size)
{
   



Looking at that code, I wonder why we don't make the loop stop at
max_files_per_process opened files --- the useful result will be
bounded by that anyhow.  Actively running the system out of FDs,
even momentarily, doesn't seem like a friendly thing to do.

This wouldn't directly solve your problem unless you reduced the
default value of max_files_per_process, but at least that would
be something reasonable to do instead of hacking the code.


 



Turns out that works as is on Cygwin - no adjustment necessary, at least 
for me. 250 was just a number I plucked out of the air to get me around 
the crashing problem. I just ran successfully with the attached patch. 
Given the problems the Cygwin people are having with the stable branch 
from just this piece of code, I think this or something similar should 
be applied to the 8.0 branch as well as HEAD.


cheers

andrew


Index: src/backend/storage/file/fd.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.118
diff -c -r1.118 fd.c
*** src/backend/storage/file/fd.c	4 Jul 2005 04:51:48 -	1.118
--- src/backend/storage/file/fd.c	7 Aug 2005 17:00:10 -
***
*** 315,321 
  	fd = (int *) palloc(size * sizeof(int));
  
  	/* dup until failure ... */
! 	for (;;)
  	{
  		int			thisfd;
  
--- 315,321 
  	fd = (int *) palloc(size * sizeof(int));
  
  	/* dup until failure ... */
! 	for ( ; used <= max_files_per_process ; )
  	{
  		int			thisfd;
  

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Andrew Dunstan



Marko Kreen wrote:


On Sun, Aug 07, 2005 at 12:08:28PM -0400, Tom Lane wrote:
 


Couple thoughts here --- one, someone upthread suggested
"cyg$(NAME)$(DLSUFFIX" as the proper value for shlib.  I didn't
see why at first, but now it occurs to me that it might avoid name
collisions with Windows-native builds, which use the "lib" prefix.
I'm not sure if DLLs for Cygwin and native builds would ever go
into the same directory though.  Is this worth worrying about?
   



.exe's in different directories than .dll's but all in PATH.

 



Especially DLLs in the system directory. Anyway, I see no point *not* to 
observe the platform's convention.


I just tested it and make check worked fine.

cheers

andrew





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Looking at that code, I wonder why we don't make the loop stop at
>> max_files_per_process opened files --- the useful result will be
>> bounded by that anyhow.  Actively running the system out of FDs,
>> even momentarily, doesn't seem like a friendly thing to do.

> Turns out that works as is on Cygwin - no adjustment necessary, at least 
> for me. 250 was just a number I plucked out of the air to get me around 
> the crashing problem. I just ran successfully with the attached patch. 
> Given the problems the Cygwin people are having with the stable branch 
> from just this piece of code, I think this or something similar should 
> be applied to the 8.0 branch as well as HEAD.

I back-patched 7.4 as well, which is the oldest branch that has this
code.  The Cygwin people still need to fix their bug, since it's
entirely possible to run the system out of FDs after we're up and
running ... but it's surely a waste of cycles to do it deliberately
during postmaster startup.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Marko Kreen wrote:
>> On Sun, Aug 07, 2005 at 12:08:28PM -0400, Tom Lane wrote:
>>> Couple thoughts here --- one, someone upthread suggested
>>> "cyg$(NAME)$(DLSUFFIX" as the proper value for shlib.
>> 
>> .exe's in different directories than .dll's but all in PATH.

> Especially DLLs in the system directory. Anyway, I see no point *not* to 
> observe the platform's convention.

> I just tested it and make check worked fine.

OK, applied with the "cyg" prefix.

When you get a chance, would you see if the SHLIB_LINK += $(LIBS)
bit is still needed?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] psql and ROLES

2005-08-07 Thread Tom Lane
Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
> *) there is no backslash command for getting a list of Roles (like \du &
> \dg for Users and Groups) - I'm considering using \dr for that - does
> that sound sensible ?

We could just recycle \du and/or \dg for the purpose.  If those should
still exist as separate commands, what should they do differently from
\dr?  There's no longer any hard-and-fast distinction ...

> *) the new connectionlimit code allows for negative Limits (beside -1)

Right now, any negative value is interpreted as "no limit".  I don't
feel a pressing need to change that.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Race condition in backend process exit

2005-08-07 Thread Tom Lane
I wrote:
> I can fairly consistently crash CVS tip with the following:
> ...
> Apparently, session 1's locks are being released while it still shows as
> an active transaction in the PGPROC array, causing XactLockTableWait to
> suppose it was a subtransaction and look for the parent.  This indicates
> something is being done incompletely or in the wrong order during
> backend exit, because AbortTransaction is perfectly clear that you mark
> yourself not running before you release your locks.  Haven't found it
> yet.

It looks to me like the problem is that ShutdownPostgres tries to get
away with doing just a subset of AbortTransaction; in particular it
does nothing to mark the PGPROC as not running a transaction anymore.
So when ProcKill releases locks, the xact is still InProgress.

I'm thinking that the correct fix is to forget the notion that it's
safer to do a subset of AbortTransaction than to do the whole thing.
We should make ShutdownPostgres do this:

AbortOutOfAnyTransaction();

/* Drop user-level locks, which are not dropped by xact abort */
#ifdef USER_LOCKS
LockReleaseAll(USER_LOCKMETHOD, true);
#endif

and then remove the lock manager cleanup operations from ProcKill.

> I could not provoke the same crash in 8.0, but I suspect this may just
> be a chance timing difference, and that the bug may be of long standing.

I haven't done the experiment, but I'm pretty certain that it's possible
to provoke this same crash in 8.0 if the timing is right, which could be
forced by using gdb to delay execution at the right place in ProcKill.
In pre-8.0 releases XactLockTableWait doesn't try to chain up to parent
transactions, so the particular crash doesn't exist, but we still have
the problem that the exiting backend releases locks while its xact still
appears to be running.  That's been incorrect according to the comments
in xact.c since forever, so I would imagine that there are other race
conditions in which this is a Bad Thing.

This bug may well explain the known reports of failures from SIGTERM'ing
an individual backend, since (IIRC) that code path could also try to
exit the backend with a transaction still in progress.

I'm a bit hesitant to back-patch such a nontrivial and hard-to-test
change, but it sure looks badly broken to me.  Any thoughts about the
risks involved?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Race condition in backend process exit

2005-08-07 Thread Alvaro Herrera
On Sun, Aug 07, 2005 at 03:45:10PM -0400, Tom Lane wrote:

> I'm thinking that the correct fix is to forget the notion that it's
> safer to do a subset of AbortTransaction than to do the whole thing.
> We should make ShutdownPostgres do this:
> 
>   AbortOutOfAnyTransaction();
> 
>   /* Drop user-level locks, which are not dropped by xact abort */
> #ifdef USER_LOCKS
>   LockReleaseAll(USER_LOCKMETHOD, true);
> #endif
> 
> and then remove the lock manager cleanup operations from ProcKill.

I agree it's cleaner.  It'd be comforting however if any cleanup
procedure would severely report when it finds inconsistent state (Most
of xact.c throws at least a WARNING, IIRC).  That way we'd know about
bogus conditions quickly.  OTOH, that code is in much better shape than
it was when ShutdownPostgres was last heavily modified, which AFAICS was
around revision 1.82.

> I'm a bit hesitant to back-patch such a nontrivial and hard-to-test
> change, but it sure looks badly broken to me.  Any thoughts about the
> risks involved?

How far back?  I'd do it in 8.0 but not in earlier releases.  The
transaction management code changed a lot in between, and I think a lot
of bugs were corrected.

-- 
Alvaro Herrera ()
Jason Tesser: You might not have understood me or I am not understanding you.
Paul Thomas: It feels like we're 2 people divided by a common language...

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] shrinking the postgresql.conf

2005-08-07 Thread Joshua D. Drake

Hello,

As I have been laboring over the documentation of the postgresql.conf 
file for 8.1dev it seems that it may be useful to rip out most of the 
options in this file?


Considering many of the options can already be altered using SET why
not make it the default for many of them?

Sincerely,

Joshua D. Drake

--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Race condition in backend process exit

2005-08-07 Thread Tom Lane
I wrote:
>> I could not provoke the same crash in 8.0, but I suspect this may just
>> be a chance timing difference, and that the bug may be of long standing.

> I haven't done the experiment, but I'm pretty certain that it's possible
> to provoke this same crash in 8.0 if the timing is right, which could be
> forced by using gdb to delay execution at the right place in ProcKill.

Having done the experiment, I can now say that 8.0 and prior versions
are *not* vulnerable, but the reason is, um, subtle.  The actual
execution order of on_shmem_exit callbacks in an exiting backend is

ShutdownPostgres
CleanupInvalidationState
ProcKill

CleanupInvalidationState removes the backend from the SI invalidation
message ring.  Until I recently refactored the code to separate the
PGPROC array from the SI mechanism, that had the side effect of making
the backend's PGPROC disappear from the set visible to
TransactionIdIsInProgress.  Which means that in fact the released
versions do honor the rule "stop being in-progress before you release
locks".

This behavior is obviously mighty fragile, not to say undocumented,
so I'm still strongly inclined to make ShutdownPostgres do a normal
transaction abort sequence.  But I'm no longer very excited about
back-patching it.

> This bug may well explain the known reports of failures from SIGTERM'ing
> an individual backend, since (IIRC) that code path could also try to
> exit the backend with a transaction still in progress.

The particular issue exhibited here evidently isn't the explanation
for SIGTERM problems in existing releases ... but I still suspect that
those reports might have something to do with ShutdownPostgres taking
shortcuts with transaction abort.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] shrinking the postgresql.conf

2005-08-07 Thread Tom Lane
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> As I have been laboring over the documentation of the postgresql.conf 
> file for 8.1dev it seems that it may be useful to rip out most of the 
> options in this file?

What?  The contents of postgresql.conf *are* documentation.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Andrew Dunstan



Tom Lane wrote:


Andrew Dunstan <[EMAIL PROTECTED]> writes:
 


Marko Kreen wrote:
   


On Sun, Aug 07, 2005 at 12:08:28PM -0400, Tom Lane wrote:
 


Couple thoughts here --- one, someone upthread suggested
"cyg$(NAME)$(DLSUFFIX" as the proper value for shlib.
   


.exe's in different directories than .dll's but all in PATH.
 



 

Especially DLLs in the system directory. Anyway, I see no point *not* to 
observe the platform's convention.
   



 


I just tested it and make check worked fine.
   



OK, applied with the "cyg" prefix.

When you get a chance, would you see if the SHLIB_LINK += $(LIBS)
bit is still needed?


 



I commented it out of the Cygwin stanza and all seemed fine - contrib 
built and passed installcheck quite happily.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Jason Tishler
On Sun, Aug 07, 2005 at 02:51:12PM -0400, Tom Lane wrote:
> I back-patched 7.4 as well, which is the oldest branch that has this
> code.  The Cygwin people still need to fix their bug, since it's
> entirely possible to run the system out of FDs after we're up and
> running ... but it's surely a waste of cycles to do it deliberately
> during postmaster startup.

AFAICT, this should be fixed in Cygwin CVS:

http://cygwin.com/ml/cygwin/2005-08/msg00249.html

Jason

-- 
PGP/GPG Key: http://www.tishler.net/jason/pubkey.asc or key servers
Fingerprint: 7A73 1405 7F2B E669 C19D  8784 1AFD E4CC ECF4 8EF6

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Cygwin - make check broken

2005-08-07 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> When you get a chance, would you see if the SHLIB_LINK += $(LIBS)
>> bit is still needed?

> I commented it out of the Cygwin stanza and all seemed fine - contrib 
> built and passed installcheck quite happily.

Great ... one less platform-specific kluge is always a good thing.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster