[BUGS] deadlocks in postgresql 7.2.1

2003-07-28 Thread Philipp Reisner
Hi,

We use postgresql 7.2.1 (actually the Debian binary packet version 
7.2.1-2woody2)
Our database load has more QUERIES than INSERT/DELETES. But we have a 
sever problem with the wirting operations.

Once in a while (about 3 times a day) one or more INSERTS/DELETES simply
go into the "waiting" state, and block the whole database. The only way
out is to terminate the client connection (i.e. to abort the blocked
INSERT/DELETE query)

Further investigation with ps -e -o wchan... showed that the backed
process was simply sleeping in "semop".

Output of ps:

   762 ?S  0:00 /usr/lib/postgresql/bin/postmaster
   764 ?S  0:00 postgres: stats buffer process
   765 ?S  0:00 postgres: stats collector process
24872 ?S  0:00 postgres: sd sd 10.2.2.6 idle in transaction
24873 ?R 68:01 postgres: sd sd 10.2.2.6 SELECT
24932 ?S  3:09 postgres: sd sd 10.2.2.6 idle in transaction
24943 ?R  3:02 postgres: sd sd 10.2.2.6 SELECT
25004 ?S  0:01 postgres: sd sd 10.2.1.5 idle in transaction
21226 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21228 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21229 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21230 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21231 ?S  0:01 postgres: sd sd 10.2.1.5 idle in transaction
21232 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21227 ?S  0:01 postgres: sd sd 10.2.1.5 INSERT waiting
21391 pts/1S  0:00 grep postg

   762 ?S  0:00 /usr/lib/postgresql/bin/postmaster
   764 ?S  0:00 postgres: stats buffer process
   765 ?S  0:00 postgres: stats collector process
24872 ?S  0:00 postgres: sd sd 10.2.2.6 idle in transaction
24873 ?S 68:02 postgres: sd sd 10.2.2.6 idle in transaction
24932 ?S  3:09 postgres: sd sd 10.2.2.6 idle in transaction
24943 ?S  3:04 postgres: sd sd 10.2.2.6 idle in transaction
25004 ?S  0:01 postgres: sd sd 10.2.1.5 idle in transaction
21226 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21228 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21229 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21230 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21231 ?S  0:01 postgres: sd sd 10.2.1.5 idle in transaction
21232 ?S  0:00 postgres: sd sd 10.2.1.5 idle in transaction
21227 ?S  0:01 postgres: sd sd 10.2.1.5 INSERT waiting

   762 ?S  0:00 /usr/lib/postgresql/bin/postmaster
   764 ?S  0:00 postgres: stats buffer process
   765 ?S  0:00 postgres: stats collector process
10291 ?S  0:05 postgres: sd sd 10.2.1.5 idle in transaction
  4303 ?S  0:00 postgres: sd sd 10.2.2.6 idle in transaction
  4304 ?S 40:43 postgres: sd sd 10.2.2.6 idle in transaction
  4370 ?S  2:54 postgres: sd sd 10.2.2.6 idle in transaction
  4384 ?S  3:12 postgres: sd sd 10.2.2.6 idle in transaction
19961 ?S  4:32 postgres: sd sd 10.2.1.5 idle in transaction
19962 ?S  4:40 postgres: sd sd 10.2.1.5 idle in transaction
19963 ?S  4:24 postgres: sd sd 10.2.1.5 DELETE waiting
19964 ?S  4:58 postgres: sd sd 10.2.1.5 idle in transaction
19965 ?S  4:18 postgres: sd sd 10.2.1.5 idle in transaction
19966 ?S  3:49 postgres: sd sd 10.2.1.5 idle in transaction
  5574 ?S  2:50 postgres: sd sd 10.2.1.5 idle in transaction
  5575 ?S  2:54 postgres: sd sd 10.2.1.5 idle in transaction
  8744 pts/0S  0:00 /usr/lib/postgresql/bin/psql -d sd
  8751 ?S  0:00 postgres: sd sd [local] idle
16606 ?S  0:03 postgres: sd sd 10.2.1.5 idle in transaction


To get rid of the problem we tried to upgrade to 7.3.3. But see next
mail for our experiences with 7.3.3.

-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[BUGS] Postgresql 7.3.3 crashing on query

2003-07-28 Thread Philipp Reisner
character varying(1) NOT NULL,
hasreservedquantity character varying(1) NOT NULL,
cancreate character varying(1) NOT NULL,
mustcalcpurchaseprice character varying(1) NOT NULL,
allownegativequantity character varying(1) NOT NULL
);

-- Name: bpartners_tmp; Type: TABLE; Schema: public; Owner: sd
--

CREATE TABLE bpartners_tmp (
objid integer,
id integer,
shortname character varying(12),
name character varying(50),
street character varying(50),
city character varying(50),
zip character varying(50),
tel character varying(50),
fax character varying(50),
email character varying(50),
"location" character varying(50),
memo character varying(2000),
sdcallflag character varying(1),
sdcallpingflag character varying(1),
sdcallmobileflag character varying(1),
sdinventoryflag character varying(1),
sdreportflag character varying(1),
sdbenchmarkflag character varying(1),
cou_objid integer,
tzo_objid integer,
logofilename character varying(50),
additionalmenulinks character varying(2000),
showonlyallowedmenuentries character varying(1),
duns character varying(50),
accountcode character varying(50),
bpa_objid integer
);

-- Name: documentheaders_tmp; Type: TABLE; Schema: public; Owner: sd
--

CREATE TABLE documentheaders_tmp (
objid integer,
id integer,
dot_objid integer,
documentdate date,
edittime timestamp with time zone,
edittimeutc timestamp with time zone,
tzo_objid integer,
edi_objid integer,
documentid character varying(50),
cal_id integer,
from_objid integer,
to_objid integer
);

-- Name: locations; Type: TABLE; Schema: public; Owner: sd
--

CREATE TABLE locations (
objid integer NOT NULL,
id integer NOT NULL,
shortname character varying(12) NOT NULL,
name character varying(50),
zip character varying(50),
city character varying(50),
street character varying(50),
cou_objid integer,
tel character varying(50),
fax character varying(50),
modemtelnr character varying(50),
description character varying(2000),
sco_objid integer,
contactfirstname character varying(50),
contactlastname character varying(50),
contactsalutation character varying(50),
contacttitle character varying(50),
moopenfromam time without time zone,
moopentoam time without time zone,
moopenfrompm time without time zone,
moopentopm time without time zone,
tuopenfromam time without time zone,
tuopentoam time without time zone,
tuopenfrompm time without time zone,
tuopentopm time without time zone,
weopenfromam time without time zone,
weopentoam time without time zone,
weopenfrompm time without time zone,
weopentopm time without time zone,
thopenfromam time without time zone,
thopentoam time without time zone,
thopenfrompm time without time zone,
thopentopm time without time zone,
fropenfromam time without time zone,
fropentoam time without time zone,
fropenfrompm time without time zone,
fropentopm time without time zone,
saopenfromam time without time zone,
saopentoam time without time zone,
saopenfrompm time without time zone,
saopentopm time without time zone,
suopenfromam time without time zone,
suopentoam time without time zone,
suopenfrompm time without time zone,
suopentopm time without time zone,
edittime timestamp with time zone NOT NULL,
edittimeutc timestamp with time zone NOT NULL,
tzo_objid integer NOT NULL,
edi_objid integer NOT NULL,
isactiv character varying(1) NOT NULL,
locationtype character varying(50)
);

-- Name: bppersons_tmp; Type: TABLE; Schema: public; Owner: sd
--

CREATE TABLE bppersons_tmp (
objid integer,
id integer,
shortname character varying(12),
firstname character varying(50),
lastname character varying(50),
title character varying(20),
salutation character varying(50),
street character varying(50),
city character varying(50),
zip character varying(50),
cou_objid integer,
tel character varying(50),
tel2 character varying(50),
mobiletel character varying(50),
fax character varying(50),
"password" character varying(50),
email character varying(50),
pgr_objid integer,
isactiv character varying(1),
"location" character varying(50),
loc_objid integer,
tzo_objid integer,
pin character varying(50),
description character varying(4000),
department character varying(50),
sign character varying(50)
);

-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [BUGS] deadlocks in postgresql 7.2.1

2003-07-28 Thread Philipp Reisner
Am Montag, 28. Juli 2003 11:41 schrieb Peter Eisentraut:
> Philipp Reisner writes:
> > Once in a while (about 3 times a day) one or more INSERTS/DELETES simply
> > go into the "waiting" state, and block the whole database. The only way
> > out is to terminate the client connection (i.e. to abort the blocked
> > INSERT/DELETE query)
> >
> > Further investigation with ps -e -o wchan... showed that the backed
> > process was simply sleeping in "semop".
> >
> > Output of ps:
> >
> >762 ?S  0:00 /usr/lib/postgresql/bin/postmaster
> >764 ?S  0:00 postgres: stats buffer process
> >765 ?S  0:00 postgres: stats collector process
> > 24872 ?S  0:00 postgres: sd sd 10.2.2.6 idle in transaction
> > 24873 ?R 68:01 postgres: sd sd 10.2.2.6 SELECT
> > 24932 ?S  3:09 postgres: sd sd 10.2.2.6 idle in transaction
> > 24943 ?R  3:02 postgres: sd sd 10.2.2.6 SELECT
> > 25004 ?S  0:01 postgres: sd sd 10.2.1.5 idle in transaction
>
> [snip]
>
> All these "idle in transaction" sessions have unfinished transactions that
> are probably holding locks that the INSERT is waiting for.  If you
> constantly have loads of "idle in transaction" sessions, you need to fix
> your application.
>
> In 7.3 there is a system table called pg_locks that you can use to
> investigate locks.  I don't believe there was one in 7.2.

[ Sorry about this offtopic posting ]

Thanks a lot for the reply!

The applications uses the jdbc driver with autocommit turned off,
commit and rollback. Do you know if the jdbc driver just starts 
a new transaction as soon as the last one was ended with commit/
rollback ?

BTW, The development system is still on postgres 7.3.3. The is 
the pg_locks table:

 relation | database | transaction |  pid  |  mode   | granted
--+--+-+---+-+-
  6520806 |  6520640 | | 20988 | AccessShareLock | t
  |  |   63435 | 21154 | ExclusiveLock   | t
  6521098 |  6520640 | | 20988 | AccessShareLock | t
  6521041 |  6520640 | | 20988 | AccessShareLock | t
  6520884 |  6520640 | | 20988 | AccessShareLock | t
  6520894 |  6520640 | | 20988 | AccessShareLock | t
  |  |   63442 | 21153 | ExclusiveLock   | t
  |  |   57548 | 21140 | ExclusiveLock   | t
  6520810 |  6520640 | | 20988 | AccessShareLock | t
  |  |   63434 | 21160 | ExclusiveLock   | t
  |  |   63110 | 21109 | ExclusiveLock   | t
  6520833 |  6520640 | | 20988 | AccessShareLock | t
  |  |   57837 | 21096 | ExclusiveLock   | t
  6521078 |  6520640 | | 20988 | AccessShareLock | t
  |  |   63437 | 21156 | ExclusiveLock   | t
  6520702 |  6520640 | | 20988 | AccessShareLock | t
  |  |   63436 | 21155 | ExclusiveLock   | t
  |  |   63438 | 21157 | ExclusiveLock   | t
  |  |   63440 | 21159 | ExclusiveLock   | t
  6520814 |  6520640 | | 20988 | AccessShareLock | t
  6520899 |  6520640 | | 20988 | AccessShareLock | t
  6520652 |  6520640 | | 20988 | AccessShareLock | t
  |  |   57826 | 20987 | ExclusiveLock   | t
  |  |   63439 | 21158 | ExclusiveLock   | t
  6521092 |  6520640 | | 20988 | AccessShareLock | t
16757 |  6520640 | | 21153 | AccessShareLock | t
  |  |   57535 | 20988 | ExclusiveLock   | t
(27 rows)

-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [BUGS] Postgresql 7.3.3 crashing on query

2003-07-30 Thread Philipp Reisner
Am Montag, 28. Juli 2003 15:53 schrieb Tom Lane:
> Philipp Reisner <[EMAIL PROTECTED]> writes:
> > If I execute this query several times (I receive the correct result set),
> > and then do an explain of the same query, the backend terminates with
> > sig 11 !!!
>
> Can you get us a debugger stack trace from the crash?
>

Ok, I attached GDB to the backend.
Attaching to program: /usr/lib/postgresql/bin/postgres, process 25380
Reading symbols from /lib/libpam.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpam.so.0
Reading symbols from /usr/lib/libssl.so.0.9.6...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libssl.so.0.9.6
Reading symbols from /usr/lib/libcrypto.so.0.9.6...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libcrypto.so.0.9.6
Reading symbols from /usr/lib/libkrb5.so.17...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libkrb5.so.17
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libreadline.so.4...(no debugging symbols 
found)...done.
Loaded symbols for /lib/libreadline.so.4
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libcom_err.so.1...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libcom_err.so.1
Reading symbols from /usr/lib/libasn1.so.5...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libasn1.so.5
Reading symbols from /usr/lib/libroken.so.9...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libroken.so.9
Reading symbols from /lib/libncurses.so.5...(no debugging symbols 
found)...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib/libdb3.so.3...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libdb3.so.3
Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib/libnss_compat.so.2
0x402be812 in recv () from /lib/libc.so.6
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0812f9bc in DecodeDateTime ()
(gdb) where
#0  0x0812f9bc in DecodeDateTime ()
Cannot access memory at address 0xbf003030

Looks a lot like a corrupted stack :(


> Also you should specify the platform you're using in reports of this sort.
>

Its a PIII 733MHz 256MB RAM, running Debian Linux 3.0 (Woody) an a 2.4.18 
kernel.

BTW, it crashes are getting rarer on the system...

-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [BUGS] Postgresql 7.3.3 crashing on query

2003-07-31 Thread Philipp Reisner
Am Mittwoch, 30. Juli 2003 16:00 schrieb Tom Lane:
> Philipp Reisner <[EMAIL PROTECTED]> writes:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0812f9bc in DecodeDateTime ()
> > (gdb) where
> > #0  0x0812f9bc in DecodeDateTime ()
> > Cannot access memory at address 0xbf003030
>
> That's a fairly large routine :-(.  Could I trouble you to rebuild with
> debugging symbols (configure --enable-debug) so we can get a more exact
> fix on the problem location?  While you're at it, please add
> --enable-cassert too, just in case that produces any useful info.
>
> Alternatively: maybe you can now create a simple test case.  This is
> enough information to let us guess that the crash occurs while trying
> to read an unusual date or timestamp input value.  Do you know what's
> getting fed to the database?  If you can exhibit a crash with a test
> case like "SELECT 'something odd'::timestamp", we could take it from
> there.
>
>   regards, tom lane

Hi Tom,

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
DecodeDateTime (field=Cannot access memory at address 0x303038
) at datetime.c:1404
1404datetime.c: No such file or directory.
in datetime.c
(gdb) where
#0  DecodeDateTime (field=Cannot access memory at address 0x303038
) at datetime.c:1404
Cannot access memory at address 0x303030

Looks a lot like an access to freed memory to me. 0x303030 looks like
a poison value. 

The way I reproduced the bug:

1) I stared up the server compiled with debug
2) It was not possible to crash the server with the in the first 
   interactive psql session. -- I executed the query several times.
3) I exited the first session and started a new one.
4) Executing the same query again crashed the backend immediately!

BTW, this pattern is reproducable. It never happens in the first 
 session. But it immediately happens in the second session.

I was not able to find simple SELECT statement to reproduce
the bug, neither selecting some fancy timestamp value nor using
some fancy timestamp value in the where clause did the trick.

-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [BUGS] Postgresql 7.3.3 crashing on query

2003-08-05 Thread Philipp Reisner
Am Donnerstag, 31. Juli 2003 15:53 schrieb Tom Lane:
> Philipp Reisner <[EMAIL PROTECTED]> writes:
> > Program received signal SIGSEGV, Segmentation fault.
> > DecodeDateTime (field=Cannot access memory at address 0x303038
> > ) at datetime.c:1404
> > 1404datetime.c: No such file or directory.
> > in datetime.c
> > (gdb) where
> > #0  DecodeDateTime (field=Cannot access memory at address 0x303038
> > ) at datetime.c:1404
> > Cannot access memory at address 0x303030
> >
> > Looks a lot like an access to freed memory to me. 0x303030 looks like
> > a poison value.
>
> No, it looks like ASCII text.  Something has clobbered the stack with
> a text string.  Unfortunately, that probably means we can't trust the
> claimed execution address --- DecodeDateTime may just happen to live at
> address 0x3030 or something like that in your executable.
>
> Did you try a backtrace (bt)?  The odds are it will fail, or give junk
> results, but you should try it.
>
> > I was not able to find simple SELECT statement to reproduce
> > the bug, neither selecting some fancy timestamp value nor using
> > some fancy timestamp value in the where clause did the trick.
>
> If the pointer to DecodeDateTime is a red herring then that's not too
> surprising.  Unfortunately, that conclusion leaves me with zero to go
> on :-(.  I think you'll need to work on setting up a self-contained test
> case.  You should try pg_dumping the tables involved, loading them into
> a fresh database, and then seeing if the bug can be reproduced there.
>
>   regards, tom lane

Hi Tom,

We (Clifford Wolf and me) have found the bug! 

the strncpy call results in fstr beeing not 0 terminated. In case
you are lucky (or not) there were some zero bytes there...

This was the case when the query worked for me. But after closing
the first interactive psql session and opening the second one
this particular page is no longer filled with zeros.

In our case the strlen() returned ~50, and the strcpy overwrote
the stack frame pointer of the current stack frame.

This small patch cures the bug for us!

--- datetime.c_o2003-08-05 14:16:01.0 +0200
+++ datetime.c  2003-08-05 14:16:25.0 +0200
@@ -2277,6 +2277,7 @@
 * integer.
 */
strncpy(fstr, (cp + 1), 7);
+   fstr[7]=0;
strcpy((fstr + strlen(fstr)), "00");
*(fstr + 6) = '\0';
*fsec = strtol(fstr, &cp, 10);


-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[BUGS] Deadlock in PostgreSQL 7.3.4

2003-08-18 Thread Philipp Reisner
 unspsc, dev_objid, tzo_objid, isTestCall, que_objid, 
tec_objid from Calls
Aug 18 10:34:25 nut1 postgres[4933]: [44067-23]  where ID = 2265302
Aug 18 10:34:25 nut1 postgres[4933]: [44068] LOG:  duration: 0.008038 sec
Aug 18 10:34:25 nut1 postgres[4933]: [44069] LOG:  query: begin;
Aug 18 10:34:25 nut1 postgres[4933]: [44070] LOG:  duration: 0.000121 sec
Aug 18 10:34:25 nut1 postgres[4933]: [44071] LOG:  query: delete from Calls where 
objID = 2268645
Aug 18 10:34:25 nut1 postgres[4933]: [44072] LOG:  duration: 0.000675 sec
Aug 18 10:39:18 nut1 postgres[4933]: [44073] LOG:  pq_recvbuf: unexpected EOF on 
client connection

-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [BUGS] Deadlock in PostgreSQL 7.3.4

2003-08-19 Thread Philipp Reisner
Am Montag, 18. August 2003 15:38 schrieb Tom Lane:
> Philipp Reisner <[EMAIL PROTECTED]> writes:
> > Now if the applications issues one delete statement concurrently on
> > two connections both block forever.
> >
> > Please correct me if I am wrong, but should not one of the statements
> > succeed and the other simply fail ?
> >
> > Aug 18 10:34:25 nut1 postgres[4934]: [44389] LOG:
> >  query: delete from Calls where objID = 2268645
> >
> > Aug 18 10:34:25 nut1 postgres[4933]: [44071] LOG:
> >  query: delete from Calls where objID = 2268645
> >
> >  4933 ?S  5:20 postgres: sd sd 10.2.1.5 idle in transaction
> >  4934 ?S  5:08 postgres: sd sd 10.2.1.5 DELETE waiting
>
> I see no deadlock.  4933 is waiting for its client to issue another
> command.  4934 is waiting to see if it can delete the row.
>
>   regards, tom lane

Hi Tom, 

Right, thanks for pointing it out for me. It seems to be somwhere in
the application code or PostgreSQL's jdbc driver. 

Finally we are able to reproduce the "Lockup" and we will continue
to work out if it is in the App or in the jdbc driver.

-Philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [BUGS] Deadlock in PostgreSQL 7.3.4

2003-08-20 Thread Philipp Reisner
Am Dienstag, 19. August 2003 11:52 schrieb Philipp Reisner:
> Am Montag, 18. August 2003 15:38 schrieb Tom Lane:
> > Philipp Reisner <[EMAIL PROTECTED]> writes:
> > > Now if the applications issues one delete statement concurrently on
> > > two connections both block forever.
> > >
> > > Please correct me if I am wrong, but should not one of the statements
> > > succeed and the other simply fail ?
> > >
> > > Aug 18 10:34:25 nut1 postgres[4934]: [44389] LOG:
> > >  query: delete from Calls where objID = 2268645
> > >
> > > Aug 18 10:34:25 nut1 postgres[4933]: [44071] LOG:
> > >  query: delete from Calls where objID = 2268645
> > >
> > >  4933 ?S  5:20 postgres: sd sd 10.2.1.5 idle in transaction
> > >  4934 ?S  5:08 postgres: sd sd 10.2.1.5 DELETE waiting
> >
> > I see no deadlock.  4933 is waiting for its client to issue another
> > command.  4934 is waiting to see if it can delete the row.
> >
> > regards, tom lane
>
> Hi Tom,
>
> Right, thanks for pointing it out for me. It seems to be somwhere in
> the application code or PostgreSQL's jdbc driver.
>
> Finally we are able to reproduce the "Lockup" and we will continue
> to work out if it is in the App or in the jdbc driver.
>
> -Philipp

Just in order to end this thread. It turned out that the lockup was 
triggered by an old version of postgres' jdbc driver. (It was from 
the 7.2.1 days of postgresql)

Updating to a recent version finaly solved this issue. 

-Philipp


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[BUGS] Endless loop in ExecNestLoop

2006-01-30 Thread Philipp Reisner
in ExecProcNode ()
#13 0x004ee261 in ExecNestLoop ()   <---
#14 0x004e0d36 in ExecProcNode ()
#15 0x004ee261 in ExecNestLoop ()
#16 0x004e0d36 in ExecProcNode ()
#17 0x004ee261 in ExecNestLoop ()
#18 0x004e0d36 in ExecProcNode ()
#19 0x004ee261 in ExecNestLoop ()
#20 0x004e0d36 in ExecProcNode ()
#21 0x004ef57c in ExecSort ()
#22 0x004e0d6a in ExecProcNode ()
#23 0x004df73f in ExecEndPlan ()
#24 0x004dec09 in ExecutorRun ()
#25 0x0055f87a in PortalRun ()
#26 0x0055f5f9 in PortalRun ()
#27 0x0055b561 in pg_plan_queries ()
#28 0x0055df8c in PostgresMain ()
#29 0x00539b01 in ClosePostmasterPorts ()
#30 0x005395d0 in ClosePostmasterPorts ()
#31 0x00537b4d in PostmasterMain ()
#32 0x000000536e59 in PostmasterMain ()
#33 0x004fda0c in main ()

With gdb's ability to execute until the current function (stack frame) return,
I found out which function (stack frame) is the up-most in this endless loop.
That frame is marked with the arrow.

Is this bug-report of any use so far ?

best regards,
  Philipp Reisner
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] Endless loop in ExecNestLoop

2006-01-31 Thread Philipp Reisner
Am Montag, 30. Januar 2006 17:42 schrieb Tom Lane:
> Philipp Reisner <[EMAIL PROTECTED]> writes:
> > Version: 8.0.2
>
> I don't think so ... neither bitmap scans nor slot_getattr exist in 8.0.
>

Sorry, a typo, 8.1.2 of course.

> > Is this bug-report of any use so far ?
>
> Not a lot.  You need to find out what the difference is between the
> cases where the query runs quickly and those where it doesn't.  I'm
> betting that the planner is choosing a very bad plan in the slow cases,
> but there's not a lot of evidence here to show what or why.
>
> The explain analyze output shows two levels of hash join underneath
> four levels of nestloop join, whereas the stack trace looks like there
> are five levels of nestloop and only one hash.  So this is some evidence
> that a different plan is being used in the slow cases.  The stack trace
> doesn't provide nearly enough info about what that plan is, though.
>

What reasons are there for the planer to choose a different plan ? 

> Do you have geqo_threshold set to less than its default value?  Do you
> sometimes execute the query with different sets of parameters?  Either
> of these might lead to changes of plan.

Since we had very bad experience with the generic query optimizer we
completely disabled it. The non geqo plans perform multitudes better 
on our data-set and work-load. -- And the time to plan the query is
always negligible compared to the execution time.

We have the endless loop about 10 times a day (but execute about 1.5 million 
queries a day) :

Program received signal SIGINT, Interrupt.
0x2ba687b4 in wcsrtombs () from /lib/libc.so.6
(gdb) bt
#0  0x2ba687b4 in wcsrtombs () from /lib/libc.so.6
#1  0x2ba163c8 in wcstombs () from /lib/libc.so.6
#2  0x0058dcc4 in text_oid ()
#3  0x0058deb9 in upper ()
#4  0x004e216e in ExecMakeFunctionResult ()
#5  0x004e210d in ExecMakeFunctionResult ()
#6  0x004e5063 in ExecQual ()
#7  0x004e554e in ExecScan ()
#8  0x004e0cc8 in ExecProcNode ()
#9  0x004ebeb6 in ExecEndHashJoin ()
#10 0x004eb7fa in ExecHashJoin ()
#11 0x004e0d50 in ExecProcNode ()
#12 0x004ee29d in ExecNestLoop ()
#13 0x004e0d36 in ExecProcNode ()
#14 0x004ee261 in ExecNestLoop () <<---
#15 0x004e0d36 in ExecProcNode ()
#16 0x004ee261 in ExecNestLoop ()
#17 0x004e0d36 in ExecProcNode ()
#18 0x004ee261 in ExecNestLoop ()
#19 0x004e0d36 in ExecProcNode ()
#20 0x004ee261 in ExecNestLoop ()
#21 0x004e0d36 in ExecProcNode ()
#22 0x004ef57c in ExecSort ()
#23 0x004e0d6a in ExecProcNode ()
#24 0x004df73f in ExecEndPlan ()
#25 0x004dec09 in ExecutorRun ()
#26 0x0055f87a in PortalRun ()
#27 0x0055f5f9 in PortalRun ()
#28 0x0055b561 in pg_plan_queries ()
#29 0x0055df8c in PostgresMain ()
#30 0x00539b01 in ClosePostmasterPorts ()
#31 0x005395d0 in ClosePostmasterPorts ()
#32 0x00537b4d in PostmasterMain ()
#33 0x00536e59 in PostmasterMain ()
#34 0x004fda0c in main ()


After detaching with gdb, I killed it. I configured postgres to dump 
the statement to the logfile with "log_min_error_statement = error".
So I have the exact query that caused the problem. 

This time the plainer says, only explain:

 Sort  (cost=16257.75..16257.76 rows=1 width=262)
   Sort Key: con.shortname
   ->  Nested Loop  (cost=1.18..16257.74 rows=1 width=262)
 ->  Nested Loop  (cost=1.18..16254.57 rows=1 width=130)
   ->  Nested Loop  (cost=1.18..16251.50 rows=1 width=125)
 ->  Nested Loop  (cost=1.18..16248.45 rows=1 width=111)
   ->  Nested Loop  (cost=1.18..16245.31 rows=1 
width=86)
 Join Filter: ("outer".cel_objid = 
"inner".objid)
 ->  Seq Scan on devices dev  
(cost=0.00..11984.25 rows=1 width=63)
   Filter: ((upper((COALESCE(isactiv, 
'Y'::character varying))::text) = 'Y'::text) AND 
(upper((COALESCE(iscommittedsp, 'Y'::character varying))::text) = 'Y'::text) 
AND (upper((COALESCE(iscommittedcust, 'Y'::character varying))::text) = 
'Y'::text))
 ->  Hash Left Join  (cost=1.18..4256.22 
rows=387 width=27)
   Hash Cond: ("outer".ser_objid = 
"inner".objid)
   ->  Seq Scan on contractelements cel  
(cost=0.00..4253.10 rows=387 width=21)
 Filter: (upper((isactiv)::text) = 
'Y'::text)
   ->  Hash  (co

Re: [BUGS] Endless loop in ExecNestLoop

2006-01-31 Thread Philipp Reisner
ws=1 width=63)
   Filter: ((upper((COALESCE(isactiv, 
'Y'::character varying))::text) = 'Y'::text) AND 
(upper((COALESCE(iscommittedsp, 'Y'::character varying))::text) = 'Y'::text) 
AND (upper((COALESCE(iscommittedcust, 'Y'::character varying))::text) = 
'Y'::text))
 ->  Hash Left Join  (cost=1.18..4256.22 
rows=387 width=27)
   Hash Cond: ("outer".ser_objid = 
"inner".objid)
   ->  Seq Scan on contractelements cel  
(cost=0.00..4253.10 rows=387 width=21)
 Filter: (upper((isactiv)::text) = 
'Y'::text)
   ->  Hash  (cost=1.14..1.14 rows=14 
width=14)
 ->  Seq Scan on servicetypes ser  
(cost=0.00..1.14 rows=14 width=14)
   ->  Index Scan using contracts_pkey on contracts con 
 (cost=0.00..3.13 rows=1 width=33)
 Index Cond: ("outer".con_objid = con.objid)
 Filter: ((ccu_objid = 3837149) OR (ccu_objid = 
3837320) OR (ccu_objid = 3837375) OR (ccu_objid = 3974872) OR (ccu_objid = 
5032031) OR (ccu_objid = 7220313) OR (ccu_objid = 15684899) OR (ccu_objid = 
16052168) OR (ccu_objid = 19196573) OR (ccu_objid = 20095712) OR (ccu_objid = 
22399813) OR (ccu_objid = 42092332) OR (ccu_objid = 42092454) OR (ccu_objid = 
42092574) OR (ccu_objid = 42886947) OR (ccu_objid = 43813234))
 ->  Index Scan using devicetypes_pkey on devicetypes dty  
(cost=0.00..3.04 rows=1 width=18)
   Index Cond: (dty.objid = "outer".dty_objid)
   ->  Index Scan using bpartners_pkey on bpartners man  
(cost=0.00..3.06 rows=1 width=13)
 Index Cond: (man.objid = "outer".bpa_objid)
 ->  Index Scan using locations_pkey on locations loc  (cost=0.00..3.12 
rows=1 width=136)
   Index Cond: (loc.objid = "outer".loc_objid)
   Filter: (upper((shortname)::text) = '5195'::text)

-philipp
-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Fwd: Re: [BUGS] Endless loop in ExecNestLoop

2006-01-31 Thread Philipp Reisner


--  Weitergeleitete Nachricht  --

Subject: Re: [BUGS] Endless loop in ExecNestLoop
Date: Dienstag, 31. Januar 2006 16:39
From: Tom Lane <[EMAIL PROTECTED]>
To: Philipp Reisner <[EMAIL PROTECTED]>
Cc: pgsql-bugs@postgresql.org

Philipp Reisner <[EMAIL PROTECTED]> writes:
> Well after more research it turned out that sometimes we have a plan
> that executes in less than 2 seconds, and somethimes we get a plan
> does not terminate within 1h30 of CPU time.

Could you send EXPLAIN ANALYZE output for this (at least for the fast
case)?  Presumably some of the planner's estimates are way off, but
there's no way to know which ones from just EXPLAIN.

>> What reasons are there for the planer to choose a different plan ?

It looks to me like the statistics have changed, because it's showing

different rowcount and width estimates for the same scan:
>->  Index Scan using
> contractelements_pkey on contractelements cel  (cost=0.00..5574.21 rows=459
> width=22) Filter: (upper((isactiv)::text) = 'Y'::text)
>
>->  Seq Scan on contractelements cel
>  (cost=0.00..4253.10 rows=387 width=21) Filter: (upper((isactiv)::text) =
> 'Y'::text)

Perhaps you ran VACUUM or ANALYZE in between, or autovacuum did so for
you?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---

-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com :

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match