[BUGS] deadlocks in postgresql 7.2.1
Hi, We use postgresql 7.2.1 (actually the Debian binary packet version 7.2.1-2woody2) Our database load has more QUERIES than INSERT/DELETES. But we have a sever problem with the wirting operations. Once in a while (about 3 times a day) one or more INSERTS/DELETES simply go into the "waiting" state, and block the whole database. The only way out is to terminate the client connection (i.e. to abort the blocked INSERT/DELETE query) Further investigation with ps -e -o wchan... showed that the backed process was simply sleeping in "semop". Output of ps: 762 ?S 0:00 /usr/lib/postgresql/bin/postmaster 764 ?S 0:00 postgres: stats buffer process 765 ?S 0:00 postgres: stats collector process 24872 ?S 0:00 postgres: sd sd 10.2.2.6 idle in transaction 24873 ?R 68:01 postgres: sd sd 10.2.2.6 SELECT 24932 ?S 3:09 postgres: sd sd 10.2.2.6 idle in transaction 24943 ?R 3:02 postgres: sd sd 10.2.2.6 SELECT 25004 ?S 0:01 postgres: sd sd 10.2.1.5 idle in transaction 21226 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21228 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21229 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21230 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21231 ?S 0:01 postgres: sd sd 10.2.1.5 idle in transaction 21232 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21227 ?S 0:01 postgres: sd sd 10.2.1.5 INSERT waiting 21391 pts/1S 0:00 grep postg 762 ?S 0:00 /usr/lib/postgresql/bin/postmaster 764 ?S 0:00 postgres: stats buffer process 765 ?S 0:00 postgres: stats collector process 24872 ?S 0:00 postgres: sd sd 10.2.2.6 idle in transaction 24873 ?S 68:02 postgres: sd sd 10.2.2.6 idle in transaction 24932 ?S 3:09 postgres: sd sd 10.2.2.6 idle in transaction 24943 ?S 3:04 postgres: sd sd 10.2.2.6 idle in transaction 25004 ?S 0:01 postgres: sd sd 10.2.1.5 idle in transaction 21226 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21228 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21229 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21230 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21231 ?S 0:01 postgres: sd sd 10.2.1.5 idle in transaction 21232 ?S 0:00 postgres: sd sd 10.2.1.5 idle in transaction 21227 ?S 0:01 postgres: sd sd 10.2.1.5 INSERT waiting 762 ?S 0:00 /usr/lib/postgresql/bin/postmaster 764 ?S 0:00 postgres: stats buffer process 765 ?S 0:00 postgres: stats collector process 10291 ?S 0:05 postgres: sd sd 10.2.1.5 idle in transaction 4303 ?S 0:00 postgres: sd sd 10.2.2.6 idle in transaction 4304 ?S 40:43 postgres: sd sd 10.2.2.6 idle in transaction 4370 ?S 2:54 postgres: sd sd 10.2.2.6 idle in transaction 4384 ?S 3:12 postgres: sd sd 10.2.2.6 idle in transaction 19961 ?S 4:32 postgres: sd sd 10.2.1.5 idle in transaction 19962 ?S 4:40 postgres: sd sd 10.2.1.5 idle in transaction 19963 ?S 4:24 postgres: sd sd 10.2.1.5 DELETE waiting 19964 ?S 4:58 postgres: sd sd 10.2.1.5 idle in transaction 19965 ?S 4:18 postgres: sd sd 10.2.1.5 idle in transaction 19966 ?S 3:49 postgres: sd sd 10.2.1.5 idle in transaction 5574 ?S 2:50 postgres: sd sd 10.2.1.5 idle in transaction 5575 ?S 2:54 postgres: sd sd 10.2.1.5 idle in transaction 8744 pts/0S 0:00 /usr/lib/postgresql/bin/psql -d sd 8751 ?S 0:00 postgres: sd sd [local] idle 16606 ?S 0:03 postgres: sd sd 10.2.1.5 idle in transaction To get rid of the problem we tried to upgrade to 7.3.3. But see next mail for our experiences with 7.3.3. -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[BUGS] Postgresql 7.3.3 crashing on query
character varying(1) NOT NULL, hasreservedquantity character varying(1) NOT NULL, cancreate character varying(1) NOT NULL, mustcalcpurchaseprice character varying(1) NOT NULL, allownegativequantity character varying(1) NOT NULL ); -- Name: bpartners_tmp; Type: TABLE; Schema: public; Owner: sd -- CREATE TABLE bpartners_tmp ( objid integer, id integer, shortname character varying(12), name character varying(50), street character varying(50), city character varying(50), zip character varying(50), tel character varying(50), fax character varying(50), email character varying(50), "location" character varying(50), memo character varying(2000), sdcallflag character varying(1), sdcallpingflag character varying(1), sdcallmobileflag character varying(1), sdinventoryflag character varying(1), sdreportflag character varying(1), sdbenchmarkflag character varying(1), cou_objid integer, tzo_objid integer, logofilename character varying(50), additionalmenulinks character varying(2000), showonlyallowedmenuentries character varying(1), duns character varying(50), accountcode character varying(50), bpa_objid integer ); -- Name: documentheaders_tmp; Type: TABLE; Schema: public; Owner: sd -- CREATE TABLE documentheaders_tmp ( objid integer, id integer, dot_objid integer, documentdate date, edittime timestamp with time zone, edittimeutc timestamp with time zone, tzo_objid integer, edi_objid integer, documentid character varying(50), cal_id integer, from_objid integer, to_objid integer ); -- Name: locations; Type: TABLE; Schema: public; Owner: sd -- CREATE TABLE locations ( objid integer NOT NULL, id integer NOT NULL, shortname character varying(12) NOT NULL, name character varying(50), zip character varying(50), city character varying(50), street character varying(50), cou_objid integer, tel character varying(50), fax character varying(50), modemtelnr character varying(50), description character varying(2000), sco_objid integer, contactfirstname character varying(50), contactlastname character varying(50), contactsalutation character varying(50), contacttitle character varying(50), moopenfromam time without time zone, moopentoam time without time zone, moopenfrompm time without time zone, moopentopm time without time zone, tuopenfromam time without time zone, tuopentoam time without time zone, tuopenfrompm time without time zone, tuopentopm time without time zone, weopenfromam time without time zone, weopentoam time without time zone, weopenfrompm time without time zone, weopentopm time without time zone, thopenfromam time without time zone, thopentoam time without time zone, thopenfrompm time without time zone, thopentopm time without time zone, fropenfromam time without time zone, fropentoam time without time zone, fropenfrompm time without time zone, fropentopm time without time zone, saopenfromam time without time zone, saopentoam time without time zone, saopenfrompm time without time zone, saopentopm time without time zone, suopenfromam time without time zone, suopentoam time without time zone, suopenfrompm time without time zone, suopentopm time without time zone, edittime timestamp with time zone NOT NULL, edittimeutc timestamp with time zone NOT NULL, tzo_objid integer NOT NULL, edi_objid integer NOT NULL, isactiv character varying(1) NOT NULL, locationtype character varying(50) ); -- Name: bppersons_tmp; Type: TABLE; Schema: public; Owner: sd -- CREATE TABLE bppersons_tmp ( objid integer, id integer, shortname character varying(12), firstname character varying(50), lastname character varying(50), title character varying(20), salutation character varying(50), street character varying(50), city character varying(50), zip character varying(50), cou_objid integer, tel character varying(50), tel2 character varying(50), mobiletel character varying(50), fax character varying(50), "password" character varying(50), email character varying(50), pgr_objid integer, isactiv character varying(1), "location" character varying(50), loc_objid integer, tzo_objid integer, pin character varying(50), description character varying(4000), department character varying(50), sign character varying(50) ); -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [BUGS] deadlocks in postgresql 7.2.1
Am Montag, 28. Juli 2003 11:41 schrieb Peter Eisentraut: > Philipp Reisner writes: > > Once in a while (about 3 times a day) one or more INSERTS/DELETES simply > > go into the "waiting" state, and block the whole database. The only way > > out is to terminate the client connection (i.e. to abort the blocked > > INSERT/DELETE query) > > > > Further investigation with ps -e -o wchan... showed that the backed > > process was simply sleeping in "semop". > > > > Output of ps: > > > >762 ?S 0:00 /usr/lib/postgresql/bin/postmaster > >764 ?S 0:00 postgres: stats buffer process > >765 ?S 0:00 postgres: stats collector process > > 24872 ?S 0:00 postgres: sd sd 10.2.2.6 idle in transaction > > 24873 ?R 68:01 postgres: sd sd 10.2.2.6 SELECT > > 24932 ?S 3:09 postgres: sd sd 10.2.2.6 idle in transaction > > 24943 ?R 3:02 postgres: sd sd 10.2.2.6 SELECT > > 25004 ?S 0:01 postgres: sd sd 10.2.1.5 idle in transaction > > [snip] > > All these "idle in transaction" sessions have unfinished transactions that > are probably holding locks that the INSERT is waiting for. If you > constantly have loads of "idle in transaction" sessions, you need to fix > your application. > > In 7.3 there is a system table called pg_locks that you can use to > investigate locks. I don't believe there was one in 7.2. [ Sorry about this offtopic posting ] Thanks a lot for the reply! The applications uses the jdbc driver with autocommit turned off, commit and rollback. Do you know if the jdbc driver just starts a new transaction as soon as the last one was ended with commit/ rollback ? BTW, The development system is still on postgres 7.3.3. The is the pg_locks table: relation | database | transaction | pid | mode | granted --+--+-+---+-+- 6520806 | 6520640 | | 20988 | AccessShareLock | t | | 63435 | 21154 | ExclusiveLock | t 6521098 | 6520640 | | 20988 | AccessShareLock | t 6521041 | 6520640 | | 20988 | AccessShareLock | t 6520884 | 6520640 | | 20988 | AccessShareLock | t 6520894 | 6520640 | | 20988 | AccessShareLock | t | | 63442 | 21153 | ExclusiveLock | t | | 57548 | 21140 | ExclusiveLock | t 6520810 | 6520640 | | 20988 | AccessShareLock | t | | 63434 | 21160 | ExclusiveLock | t | | 63110 | 21109 | ExclusiveLock | t 6520833 | 6520640 | | 20988 | AccessShareLock | t | | 57837 | 21096 | ExclusiveLock | t 6521078 | 6520640 | | 20988 | AccessShareLock | t | | 63437 | 21156 | ExclusiveLock | t 6520702 | 6520640 | | 20988 | AccessShareLock | t | | 63436 | 21155 | ExclusiveLock | t | | 63438 | 21157 | ExclusiveLock | t | | 63440 | 21159 | ExclusiveLock | t 6520814 | 6520640 | | 20988 | AccessShareLock | t 6520899 | 6520640 | | 20988 | AccessShareLock | t 6520652 | 6520640 | | 20988 | AccessShareLock | t | | 57826 | 20987 | ExclusiveLock | t | | 63439 | 21158 | ExclusiveLock | t 6521092 | 6520640 | | 20988 | AccessShareLock | t 16757 | 6520640 | | 21153 | AccessShareLock | t | | 57535 | 20988 | ExclusiveLock | t (27 rows) -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] Postgresql 7.3.3 crashing on query
Am Montag, 28. Juli 2003 15:53 schrieb Tom Lane: > Philipp Reisner <[EMAIL PROTECTED]> writes: > > If I execute this query several times (I receive the correct result set), > > and then do an explain of the same query, the backend terminates with > > sig 11 !!! > > Can you get us a debugger stack trace from the crash? > Ok, I attached GDB to the backend. Attaching to program: /usr/lib/postgresql/bin/postgres, process 25380 Reading symbols from /lib/libpam.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libpam.so.0 Reading symbols from /usr/lib/libssl.so.0.9.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libssl.so.0.9.6 Reading symbols from /usr/lib/libcrypto.so.0.9.6...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libcrypto.so.0.9.6 Reading symbols from /usr/lib/libkrb5.so.17...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libkrb5.so.17 Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libreadline.so.4...(no debugging symbols found)...done. Loaded symbols for /lib/libreadline.so.4 Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/lib/libcom_err.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libcom_err.so.1 Reading symbols from /usr/lib/libasn1.so.5...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libasn1.so.5 Reading symbols from /usr/lib/libroken.so.9...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libroken.so.9 Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libncurses.so.5 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/libdb3.so.3...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libdb3.so.3 Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_compat.so.2 0x402be812 in recv () from /lib/libc.so.6 (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x0812f9bc in DecodeDateTime () (gdb) where #0 0x0812f9bc in DecodeDateTime () Cannot access memory at address 0xbf003030 Looks a lot like a corrupted stack :( > Also you should specify the platform you're using in reports of this sort. > Its a PIII 733MHz 256MB RAM, running Debian Linux 3.0 (Woody) an a 2.4.18 kernel. BTW, it crashes are getting rarer on the system... -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [BUGS] Postgresql 7.3.3 crashing on query
Am Mittwoch, 30. Juli 2003 16:00 schrieb Tom Lane: > Philipp Reisner <[EMAIL PROTECTED]> writes: > > Program received signal SIGSEGV, Segmentation fault. > > 0x0812f9bc in DecodeDateTime () > > (gdb) where > > #0 0x0812f9bc in DecodeDateTime () > > Cannot access memory at address 0xbf003030 > > That's a fairly large routine :-(. Could I trouble you to rebuild with > debugging symbols (configure --enable-debug) so we can get a more exact > fix on the problem location? While you're at it, please add > --enable-cassert too, just in case that produces any useful info. > > Alternatively: maybe you can now create a simple test case. This is > enough information to let us guess that the crash occurs while trying > to read an unusual date or timestamp input value. Do you know what's > getting fed to the database? If you can exhibit a crash with a test > case like "SELECT 'something odd'::timestamp", we could take it from > there. > > regards, tom lane Hi Tom, (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. DecodeDateTime (field=Cannot access memory at address 0x303038 ) at datetime.c:1404 1404datetime.c: No such file or directory. in datetime.c (gdb) where #0 DecodeDateTime (field=Cannot access memory at address 0x303038 ) at datetime.c:1404 Cannot access memory at address 0x303030 Looks a lot like an access to freed memory to me. 0x303030 looks like a poison value. The way I reproduced the bug: 1) I stared up the server compiled with debug 2) It was not possible to crash the server with the in the first interactive psql session. -- I executed the query several times. 3) I exited the first session and started a new one. 4) Executing the same query again crashed the backend immediately! BTW, this pattern is reproducable. It never happens in the first session. But it immediately happens in the second session. I was not able to find simple SELECT statement to reproduce the bug, neither selecting some fancy timestamp value nor using some fancy timestamp value in the where clause did the trick. -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [BUGS] Postgresql 7.3.3 crashing on query
Am Donnerstag, 31. Juli 2003 15:53 schrieb Tom Lane: > Philipp Reisner <[EMAIL PROTECTED]> writes: > > Program received signal SIGSEGV, Segmentation fault. > > DecodeDateTime (field=Cannot access memory at address 0x303038 > > ) at datetime.c:1404 > > 1404datetime.c: No such file or directory. > > in datetime.c > > (gdb) where > > #0 DecodeDateTime (field=Cannot access memory at address 0x303038 > > ) at datetime.c:1404 > > Cannot access memory at address 0x303030 > > > > Looks a lot like an access to freed memory to me. 0x303030 looks like > > a poison value. > > No, it looks like ASCII text. Something has clobbered the stack with > a text string. Unfortunately, that probably means we can't trust the > claimed execution address --- DecodeDateTime may just happen to live at > address 0x3030 or something like that in your executable. > > Did you try a backtrace (bt)? The odds are it will fail, or give junk > results, but you should try it. > > > I was not able to find simple SELECT statement to reproduce > > the bug, neither selecting some fancy timestamp value nor using > > some fancy timestamp value in the where clause did the trick. > > If the pointer to DecodeDateTime is a red herring then that's not too > surprising. Unfortunately, that conclusion leaves me with zero to go > on :-(. I think you'll need to work on setting up a self-contained test > case. You should try pg_dumping the tables involved, loading them into > a fresh database, and then seeing if the bug can be reproduced there. > > regards, tom lane Hi Tom, We (Clifford Wolf and me) have found the bug! the strncpy call results in fstr beeing not 0 terminated. In case you are lucky (or not) there were some zero bytes there... This was the case when the query worked for me. But after closing the first interactive psql session and opening the second one this particular page is no longer filled with zeros. In our case the strlen() returned ~50, and the strcpy overwrote the stack frame pointer of the current stack frame. This small patch cures the bug for us! --- datetime.c_o2003-08-05 14:16:01.0 +0200 +++ datetime.c 2003-08-05 14:16:25.0 +0200 @@ -2277,6 +2277,7 @@ * integer. */ strncpy(fstr, (cp + 1), 7); + fstr[7]=0; strcpy((fstr + strlen(fstr)), "00"); *(fstr + 6) = '\0'; *fsec = strtol(fstr, &cp, 10); -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[BUGS] Deadlock in PostgreSQL 7.3.4
unspsc, dev_objid, tzo_objid, isTestCall, que_objid, tec_objid from Calls Aug 18 10:34:25 nut1 postgres[4933]: [44067-23] where ID = 2265302 Aug 18 10:34:25 nut1 postgres[4933]: [44068] LOG: duration: 0.008038 sec Aug 18 10:34:25 nut1 postgres[4933]: [44069] LOG: query: begin; Aug 18 10:34:25 nut1 postgres[4933]: [44070] LOG: duration: 0.000121 sec Aug 18 10:34:25 nut1 postgres[4933]: [44071] LOG: query: delete from Calls where objID = 2268645 Aug 18 10:34:25 nut1 postgres[4933]: [44072] LOG: duration: 0.000675 sec Aug 18 10:39:18 nut1 postgres[4933]: [44073] LOG: pq_recvbuf: unexpected EOF on client connection -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [BUGS] Deadlock in PostgreSQL 7.3.4
Am Montag, 18. August 2003 15:38 schrieb Tom Lane: > Philipp Reisner <[EMAIL PROTECTED]> writes: > > Now if the applications issues one delete statement concurrently on > > two connections both block forever. > > > > Please correct me if I am wrong, but should not one of the statements > > succeed and the other simply fail ? > > > > Aug 18 10:34:25 nut1 postgres[4934]: [44389] LOG: > > query: delete from Calls where objID = 2268645 > > > > Aug 18 10:34:25 nut1 postgres[4933]: [44071] LOG: > > query: delete from Calls where objID = 2268645 > > > > 4933 ?S 5:20 postgres: sd sd 10.2.1.5 idle in transaction > > 4934 ?S 5:08 postgres: sd sd 10.2.1.5 DELETE waiting > > I see no deadlock. 4933 is waiting for its client to issue another > command. 4934 is waiting to see if it can delete the row. > > regards, tom lane Hi Tom, Right, thanks for pointing it out for me. It seems to be somwhere in the application code or PostgreSQL's jdbc driver. Finally we are able to reproduce the "Lockup" and we will continue to work out if it is in the App or in the jdbc driver. -Philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [BUGS] Deadlock in PostgreSQL 7.3.4
Am Dienstag, 19. August 2003 11:52 schrieb Philipp Reisner: > Am Montag, 18. August 2003 15:38 schrieb Tom Lane: > > Philipp Reisner <[EMAIL PROTECTED]> writes: > > > Now if the applications issues one delete statement concurrently on > > > two connections both block forever. > > > > > > Please correct me if I am wrong, but should not one of the statements > > > succeed and the other simply fail ? > > > > > > Aug 18 10:34:25 nut1 postgres[4934]: [44389] LOG: > > > query: delete from Calls where objID = 2268645 > > > > > > Aug 18 10:34:25 nut1 postgres[4933]: [44071] LOG: > > > query: delete from Calls where objID = 2268645 > > > > > > 4933 ?S 5:20 postgres: sd sd 10.2.1.5 idle in transaction > > > 4934 ?S 5:08 postgres: sd sd 10.2.1.5 DELETE waiting > > > > I see no deadlock. 4933 is waiting for its client to issue another > > command. 4934 is waiting to see if it can delete the row. > > > > regards, tom lane > > Hi Tom, > > Right, thanks for pointing it out for me. It seems to be somwhere in > the application code or PostgreSQL's jdbc driver. > > Finally we are able to reproduce the "Lockup" and we will continue > to work out if it is in the App or in the jdbc driver. > > -Philipp Just in order to end this thread. It turned out that the lockup was triggered by an old version of postgres' jdbc driver. (It was from the 7.2.1 days of postgresql) Updating to a recent version finaly solved this issue. -Philipp ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[BUGS] Endless loop in ExecNestLoop
in ExecProcNode () #13 0x004ee261 in ExecNestLoop () <--- #14 0x004e0d36 in ExecProcNode () #15 0x004ee261 in ExecNestLoop () #16 0x004e0d36 in ExecProcNode () #17 0x004ee261 in ExecNestLoop () #18 0x004e0d36 in ExecProcNode () #19 0x004ee261 in ExecNestLoop () #20 0x004e0d36 in ExecProcNode () #21 0x004ef57c in ExecSort () #22 0x004e0d6a in ExecProcNode () #23 0x004df73f in ExecEndPlan () #24 0x004dec09 in ExecutorRun () #25 0x0055f87a in PortalRun () #26 0x0055f5f9 in PortalRun () #27 0x0055b561 in pg_plan_queries () #28 0x0055df8c in PostgresMain () #29 0x00539b01 in ClosePostmasterPorts () #30 0x005395d0 in ClosePostmasterPorts () #31 0x00537b4d in PostmasterMain () #32 0x000000536e59 in PostmasterMain () #33 0x004fda0c in main () With gdb's ability to execute until the current function (stack frame) return, I found out which function (stack frame) is the up-most in this endless loop. That frame is marked with the arrow. Is this bug-report of any use so far ? best regards, Philipp Reisner -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [BUGS] Endless loop in ExecNestLoop
Am Montag, 30. Januar 2006 17:42 schrieb Tom Lane: > Philipp Reisner <[EMAIL PROTECTED]> writes: > > Version: 8.0.2 > > I don't think so ... neither bitmap scans nor slot_getattr exist in 8.0. > Sorry, a typo, 8.1.2 of course. > > Is this bug-report of any use so far ? > > Not a lot. You need to find out what the difference is between the > cases where the query runs quickly and those where it doesn't. I'm > betting that the planner is choosing a very bad plan in the slow cases, > but there's not a lot of evidence here to show what or why. > > The explain analyze output shows two levels of hash join underneath > four levels of nestloop join, whereas the stack trace looks like there > are five levels of nestloop and only one hash. So this is some evidence > that a different plan is being used in the slow cases. The stack trace > doesn't provide nearly enough info about what that plan is, though. > What reasons are there for the planer to choose a different plan ? > Do you have geqo_threshold set to less than its default value? Do you > sometimes execute the query with different sets of parameters? Either > of these might lead to changes of plan. Since we had very bad experience with the generic query optimizer we completely disabled it. The non geqo plans perform multitudes better on our data-set and work-load. -- And the time to plan the query is always negligible compared to the execution time. We have the endless loop about 10 times a day (but execute about 1.5 million queries a day) : Program received signal SIGINT, Interrupt. 0x2ba687b4 in wcsrtombs () from /lib/libc.so.6 (gdb) bt #0 0x2ba687b4 in wcsrtombs () from /lib/libc.so.6 #1 0x2ba163c8 in wcstombs () from /lib/libc.so.6 #2 0x0058dcc4 in text_oid () #3 0x0058deb9 in upper () #4 0x004e216e in ExecMakeFunctionResult () #5 0x004e210d in ExecMakeFunctionResult () #6 0x004e5063 in ExecQual () #7 0x004e554e in ExecScan () #8 0x004e0cc8 in ExecProcNode () #9 0x004ebeb6 in ExecEndHashJoin () #10 0x004eb7fa in ExecHashJoin () #11 0x004e0d50 in ExecProcNode () #12 0x004ee29d in ExecNestLoop () #13 0x004e0d36 in ExecProcNode () #14 0x004ee261 in ExecNestLoop () <<--- #15 0x004e0d36 in ExecProcNode () #16 0x004ee261 in ExecNestLoop () #17 0x004e0d36 in ExecProcNode () #18 0x004ee261 in ExecNestLoop () #19 0x004e0d36 in ExecProcNode () #20 0x004ee261 in ExecNestLoop () #21 0x004e0d36 in ExecProcNode () #22 0x004ef57c in ExecSort () #23 0x004e0d6a in ExecProcNode () #24 0x004df73f in ExecEndPlan () #25 0x004dec09 in ExecutorRun () #26 0x0055f87a in PortalRun () #27 0x0055f5f9 in PortalRun () #28 0x0055b561 in pg_plan_queries () #29 0x0055df8c in PostgresMain () #30 0x00539b01 in ClosePostmasterPorts () #31 0x005395d0 in ClosePostmasterPorts () #32 0x00537b4d in PostmasterMain () #33 0x00536e59 in PostmasterMain () #34 0x004fda0c in main () After detaching with gdb, I killed it. I configured postgres to dump the statement to the logfile with "log_min_error_statement = error". So I have the exact query that caused the problem. This time the plainer says, only explain: Sort (cost=16257.75..16257.76 rows=1 width=262) Sort Key: con.shortname -> Nested Loop (cost=1.18..16257.74 rows=1 width=262) -> Nested Loop (cost=1.18..16254.57 rows=1 width=130) -> Nested Loop (cost=1.18..16251.50 rows=1 width=125) -> Nested Loop (cost=1.18..16248.45 rows=1 width=111) -> Nested Loop (cost=1.18..16245.31 rows=1 width=86) Join Filter: ("outer".cel_objid = "inner".objid) -> Seq Scan on devices dev (cost=0.00..11984.25 rows=1 width=63) Filter: ((upper((COALESCE(isactiv, 'Y'::character varying))::text) = 'Y'::text) AND (upper((COALESCE(iscommittedsp, 'Y'::character varying))::text) = 'Y'::text) AND (upper((COALESCE(iscommittedcust, 'Y'::character varying))::text) = 'Y'::text)) -> Hash Left Join (cost=1.18..4256.22 rows=387 width=27) Hash Cond: ("outer".ser_objid = "inner".objid) -> Seq Scan on contractelements cel (cost=0.00..4253.10 rows=387 width=21) Filter: (upper((isactiv)::text) = 'Y'::text) -> Hash (co
Re: [BUGS] Endless loop in ExecNestLoop
ws=1 width=63) Filter: ((upper((COALESCE(isactiv, 'Y'::character varying))::text) = 'Y'::text) AND (upper((COALESCE(iscommittedsp, 'Y'::character varying))::text) = 'Y'::text) AND (upper((COALESCE(iscommittedcust, 'Y'::character varying))::text) = 'Y'::text)) -> Hash Left Join (cost=1.18..4256.22 rows=387 width=27) Hash Cond: ("outer".ser_objid = "inner".objid) -> Seq Scan on contractelements cel (cost=0.00..4253.10 rows=387 width=21) Filter: (upper((isactiv)::text) = 'Y'::text) -> Hash (cost=1.14..1.14 rows=14 width=14) -> Seq Scan on servicetypes ser (cost=0.00..1.14 rows=14 width=14) -> Index Scan using contracts_pkey on contracts con (cost=0.00..3.13 rows=1 width=33) Index Cond: ("outer".con_objid = con.objid) Filter: ((ccu_objid = 3837149) OR (ccu_objid = 3837320) OR (ccu_objid = 3837375) OR (ccu_objid = 3974872) OR (ccu_objid = 5032031) OR (ccu_objid = 7220313) OR (ccu_objid = 15684899) OR (ccu_objid = 16052168) OR (ccu_objid = 19196573) OR (ccu_objid = 20095712) OR (ccu_objid = 22399813) OR (ccu_objid = 42092332) OR (ccu_objid = 42092454) OR (ccu_objid = 42092574) OR (ccu_objid = 42886947) OR (ccu_objid = 43813234)) -> Index Scan using devicetypes_pkey on devicetypes dty (cost=0.00..3.04 rows=1 width=18) Index Cond: (dty.objid = "outer".dty_objid) -> Index Scan using bpartners_pkey on bpartners man (cost=0.00..3.06 rows=1 width=13) Index Cond: (man.objid = "outer".bpa_objid) -> Index Scan using locations_pkey on locations loc (cost=0.00..3.12 rows=1 width=136) Index Cond: (loc.objid = "outer".loc_objid) Filter: (upper((shortname)::text) = '5195'::text) -philipp -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Fwd: Re: [BUGS] Endless loop in ExecNestLoop
-- Weitergeleitete Nachricht -- Subject: Re: [BUGS] Endless loop in ExecNestLoop Date: Dienstag, 31. Januar 2006 16:39 From: Tom Lane <[EMAIL PROTECTED]> To: Philipp Reisner <[EMAIL PROTECTED]> Cc: pgsql-bugs@postgresql.org Philipp Reisner <[EMAIL PROTECTED]> writes: > Well after more research it turned out that sometimes we have a plan > that executes in less than 2 seconds, and somethimes we get a plan > does not terminate within 1h30 of CPU time. Could you send EXPLAIN ANALYZE output for this (at least for the fast case)? Presumably some of the planner's estimates are way off, but there's no way to know which ones from just EXPLAIN. >> What reasons are there for the planer to choose a different plan ? It looks to me like the statistics have changed, because it's showing different rowcount and width estimates for the same scan: >-> Index Scan using > contractelements_pkey on contractelements cel (cost=0.00..5574.21 rows=459 > width=22) Filter: (upper((isactiv)::text) = 'Y'::text) > >-> Seq Scan on contractelements cel > (cost=0.00..4253.10 rows=387 width=21) Filter: (upper((isactiv)::text) = > 'Y'::text) Perhaps you ran VACUUM or ANALYZE in between, or autovacuum did so for you? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org --- -- : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schönbrunnerstr 244, 1120 Vienna, Austriahttp://www.linbit.com : ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match