Re: [PERFORM] optimizing large query with IN (...)
> UPDATE table SET table.par = table.par + 1 > WHERE table.key IN ('value1', 'value2', ... , 'value1200' ) How fast is the query alone, i.e. SELECT * FROM table WHERE table.key IN ('value1', 'value2', ... , 'value1200' ) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] optimizing large query with IN (...)
UPDATE table SET table.par = table.par + 1 WHERE table.key IN ('value1', 'value2', ... , 'value1200' ) How fast is the query alone, i.e. SELECT * FROM table WHERE table.key IN ('value1', 'value2', ... , 'value1200' ) Also, post the output of '\d table' and EXPLAIN ANALYZE UPDATE... Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Cluster failure due to space
I have had a cluster failure on a table. It most likely was due to space. I do not not have the error message anymore, but it was indicating that it was most likely a space problem. The partition was filled to 99%. The table is about 56 GB and what I believe to be the new table that it was writing to looks to be 40 files of 1GB. The problem is that it did not clean itself up properly. The oids that I believe it was writing to are still there. There are 56 files of 102724113.* and 40 files of 361716097.*. A vacuum had indicated that there was around 16 GB of free space. I can not find any reference to 361716097 in the pg_class table. Am I going to have to manually delete the 361716097.* files myself? Dan. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] optimizing large query with IN (...)
On Wed, Mar 10, 2004 at 12:35:15AM -0300, Marcus Andree S. Magalhaes wrote: > Guys, > > I got a Java program to tune. It connects to a 7.4.1 postgresql server > running Linux using JDBC. > > The program needs to update a counter on a somewhat large number of > rows, about 1200 on a ~130k rows table. The query is something like > the following: > > UPDATE table SET table.par = table.par + 1 > WHERE table.key IN ('value1', 'value2', ... , 'value1200' ) > > This query runs on a transaction (by issuing a call to > setAutoCommit(false)) and a commit() right after the query > is sent to the backend. > > The process of committing and updating the values is painfully slow > (no surprises here). Any ideas? I posted an analysis of use of IN () like this a few weeks ago on pgsql-general. The approach you're using is optimal for < 3 values. For any more than that, insert value1 ... value1200 into a temporary table, then do UPDATE table SET table.par = table.par + 1 WHERE table.key IN (SELECT value from temp_table); Indexing the temporary table marginally increases the speed, but not significantly. Cheers, Steve ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Cluster failure due to space
"Shea,Dan [CIS]" <[EMAIL PROTECTED]> writes: > The problem is that it did not clean itself up properly. Hm. It should have done so. What were the exact filenames and sizes of the not-deleted files? > I can not find any reference to 361716097 in the pg_class table. You are looking at pg_class.relfilenode, I hope, not pg_class.oid. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Cluster failure due to space
"Shea,Dan [CIS]" <[EMAIL PROTECTED]> writes: >> The problem is that it did not clean itself up properly. >Hm. It should have done so. What were the exact filenames and sizes of >the not-deleted files? 361716097 to 361716097.39 are 1073741824 bytes. 361716097.40 is 186105856 bytes. > I can not find any reference to 361716097 in the pg_class table. >>You are looking at pg_class.relfilenode, I hope, not pg_class.oid. Yes I am looking through pg_class.relfilenode. Dan. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] compiling 7.4.1 on Solaris 9
On Tue, Mar 02, 2004 at 10:54:23AM +, teknokrat wrote: > thanks, i remember a thread about problems with flags passed to gcc on > solaris. I was wondering if there had been any resolution and if the > defaults for 7.4 are considered Ok. As near as I can tell, -O2 is used by default on Solaris now. Again, this is on 8, not 9. At work, we have been doing a number of tests on 7.4. The performance is such an improvement over 7.2 that the QA folks thought there must be something wrong. So I suppose the defaults are ok. A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] optimizing large query with IN (...)
Hmm... from the 'performance' point of view, since the data comes from a quite complex select statement, Isn't it better/quicker to have this select replaced by a select into and creating a temporary database? > The problem, as I understand it, is that 7.4 introduced massive > improvements in handling moderately large in() clauses, as long as they > can fit in sort_mem, and are provided by a subselect. > > So, creating a temp table with all the values in it and using in() on > the temp table may be a win: > > begin; > create temp table t_ids(id int); > insert into t_ids(id) values (123); <- repeat a few hundred times > select * from maintable where id in (select id from t_ids); > ... ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] optimizing large query with IN (...)
On Wed, Mar 10, 2004 at 02:02:23PM -0300, Marcus Andree S. Magalhaes wrote: > Hmm... from the 'performance' point of view, since the data comes from > a quite complex select statement, Isn't it better/quicker to have this > select replaced by a select into and creating a temporary database? Definitely - why loop the data into the application and back out again if you don't need to? > > The problem, as I understand it, is that 7.4 introduced massive > > improvements in handling moderately large in() clauses, as long as they > > can fit in sort_mem, and are provided by a subselect. > > > > So, creating a temp table with all the values in it and using in() on > > the temp table may be a win: > > > > begin; > > create temp table t_ids(id int); > > insert into t_ids(id) values (123); <- repeat a few hundred times > > select * from maintable where id in (select id from t_ids); > > ... Cheers, Steve ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] optimizing large query with IN (...)
On Wed, 10 Mar 2004, Marcus Andree S. Magalhaes wrote: > > Guys, > > I got a Java program to tune. It connects to a 7.4.1 postgresql server > running Linux using JDBC. > > The program needs to update a counter on a somewhat large number of > rows, about 1200 on a ~130k rows table. The query is something like > the following: > > UPDATE table SET table.par = table.par + 1 > WHERE table.key IN ('value1', 'value2', ... , 'value1200' ) > > This query runs on a transaction (by issuing a call to > setAutoCommit(false)) and a commit() right after the query > is sent to the backend. > > The process of committing and updating the values is painfully slow > (no surprises here). Any ideas? The problem, as I understand it, is that 7.4 introduced massive improvements in handling moderately large in() clauses, as long as they can fit in sort_mem, and are provided by a subselect. So, creating a temp table with all the values in it and using in() on the temp table may be a win: begin; create temp table t_ids(id int); insert into t_ids(id) values (123); <- repeat a few hundred times select * from maintable where id in (select id from t_ids); ... ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] syslog slowing the database?
> "GS" == Greg Spiegelberg <[EMAIL PROTECTED]> writes: GS> I've been waiting all day for a pg_restore to finish on a test system GS> identically configured as our production in hardware and software GS> with the exception prod is 7.3.5 and test is 7.4.1. GS> The file it's restoring from is about 8GB uncompressed from a GS> "pg_dump -b -F t" and after 2 hours the directory the database is in GS> contains only 1GB. iostat reported ~2000 blocks written every 2 GS> seconds to the DB file system. Have you considered increasing the value of checkpoint_segments to something like 50 or 100 during your restore? It made a *dramatic* improvement on my system when I did the same migration. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] compiling 7.4.1 on Solaris 9
On Mar 2, 2004, at 5:54 AM, teknokrat wrote: Andrew Sullivan wrote: On Thu, Feb 26, 2004 at 12:46:23PM +, teknokrat wrote: I've read about the place. Would using -O3 be an improvement? In my experience, it's not only not an improvement, it sometimes breaks the code. That's on 8, though, not 9. A thanks, i remember a thread about problems with flags passed to gcc on solaris. I was wondering if there had been any resolution and if the defaults for 7.4 are considered Ok. Yes. The compile flags on solaris were fixed on 7.4. Previously it wasn't using any optimization flags. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [ADMIN] syslog slowing the database?
Tom Lane wrote: Greg Spiegelberg <[EMAIL PROTECTED]> writes: I turned syslog back on and the restore slowed down again. Turned it off and it sped right back up. We have heard reports before of syslog being quite slow. What platform are you on exactly? Does Richard's suggestion of turning off syslog's fsync help? RedHat 7.3 w/ 2.4.24 kernel on a dual Intel PIII 1.3Ghz, 2GB memory, U160 internal on integrated controller, 1Gbps SAN for database. Database file being restored and the actual database are on different disk and controllers than syslog files. With the ``-'' in front of the syslog file postgres logs too gives me roughly 75% of the I/O the performance as reported by iostat. So, it helps though turning syslog off gives the optimum performance. If the log and database were on the same disk I'd be okay with the current workaround. If the ``-'' gave me near the same performance as turning syslog off I'd be okay with that too. However, neither of these are the case so there has to be something else blocking between the two processes. <2 hours and multiple test later> I've found that hardware interrupts are the culprit. Given my system config both SCSI and fibre controllers were throttling the system with the interrupts required to write the data (syslog & database) and read the data from the restore. I'm okay with that. In the order of worst to best. * There were, on average about 450 interrupts/sec with the default config of syslog on one disk, database on the SAN and syslog using fsync. * Turning fsync off in syslog puts interrupts around 105/sec and. * Having syslog fsync turned off in syslog AND moving the syslog file to a filesystem serviced by the same fibre controller put interrupts at around 92/sec. I decided to do this after watching the I/O on the SAN with syslog turned off and found that it had bandwidth to spare. FYI, the system when idle generated about 50 interrupts/sec. I'm going with the later for now on the test system and after running it through it's paces with all our processes I'll make the change in production. I'll post if I run into anything else. Greg BTW, I like what metalog has to offer but I prefer using as many of the default tools as possible and replacing them only when absolutely necessary. What I've learned with syslog here is that it is still viable but likely requires a minor tweak. If this tweak fails in testing I'll look at metalog then. -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: [EMAIL PROTECTED] Cranel. Technology. Integrity. Focus. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [ADMIN] syslog slowing the database?
Greg Spiegelberg <[EMAIL PROTECTED]> writes: > If the log and database were on the same disk I'd be okay with the > current workaround. If the ``-'' gave me near the same performance as > turning syslog off I'd be okay with that too. However, neither of these > are the case so there has to be something else blocking between the two > processes. You could also consider not using syslog at all: let the postmaster output to its stderr, and pipe that into a log-rotation program. I believe some people use Apache's log rotator for this with good results. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [ADMIN] syslog slowing the database?
Tom Lane wrote: Greg Spiegelberg <[EMAIL PROTECTED]> writes: If the log and database were on the same disk I'd be okay with the current workaround. If the ``-'' gave me near the same performance as turning syslog off I'd be okay with that too. However, neither of these are the case so there has to be something else blocking between the two processes. You could also consider not using syslog at all: let the postmaster output to its stderr, and pipe that into a log-rotation program. I believe some people use Apache's log rotator for this with good results. I do this... here's the relevant lines from my startup script: ROTATE="/inst/apache/bin/rotatelogs $PGLOGS/postgresql 86400" $PGBIN/pg_ctl start -s -D $PGDATA | $ROTATE & Following is a patch to rotatelogs that does two things: - makes a symbolic link 'foo.current' that points to the current output file. - gzips the rotated logfile If you have gnu tools installed, you can tail --retry --follow=name foo.current and it will automatically track the most recent log file. HTH, Mark -- Mark Harrison Pixar Animation Studios *** rotatelogs.c-orig 2004-03-10 10:24:02.0 -0800 --- rotatelogs.c2004-03-10 11:01:55.0 -0800 *** *** 25,30 --- 25,32 int main (int argc, char **argv) { char buf[BUFSIZE], buf2[MAX_PATH], errbuf[ERRMSGSZ]; + char linkbuf[MAX_PATH]; + char oldbuf2[MAX_PATH]; time_t tLogEnd = 0, tRotation; int nLogFD = -1, nLogFDprev = -1, nMessCount = 0, nRead, nWrite; int utc_offset = 0; *** *** 75,80 --- 77,84 setmode(0, O_BINARY); #endif + sprintf(linkbuf, "%s.current", szLogRoot); + sprintf(oldbuf2, ""); use_strftime = (strstr(szLogRoot, "%") != NULL); for (;;) { nRead = read(0, buf, sizeof buf); *** *** 99,104 --- 103,111 sprintf(buf2, "%s.%010d", szLogRoot, (int) tLogStart); } tLogEnd = tLogStart + tRotation; + printf("oldbuf2=%s\n",oldbuf2); + printf("buf2=%s\n",buf2); + printf("linkbuf=%s\n",linkbuf); nLogFD = open(buf2, O_WRONLY | O_CREAT | O_APPEND, 0666); if (nLogFD < 0) { /* Uh-oh. Failed to open the new log file. Try to clear *** *** 125,130 --- 132,146 } else { close(nLogFDprev); + /* use: tail --follow=name foo.current */ + unlink(linkbuf); + symlink(buf2,linkbuf); + if (strlen(oldbuf2) > 0) { + char cmd[MAX_PATH+100]; + sprintf(cmd, "gzip %s &", oldbuf2); + system(cmd); + } + strcpy(oldbuf2, buf2); } nMessCount = 0; } ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [ADMIN] syslog slowing the database?
You could also consider not using syslog at all: let the postmaster output to its stderr, and pipe that into a log-rotation program. I believe some people use Apache's log rotator for this with good results. Not an option I'm afraid. PostgreSQL just jams and stops logging after the first rotation... I've read in the docs that syslog logging is the only "production" solution... Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] [ADMIN] syslog slowing the database?
It might depend on how you're rotating it. Try the copy/truncate method instead of moving the log file. If you move the log file to another filename you usually have to restart the app doing the logging before it starts logging again. Chris. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Kings-Lynne Sent: Thursday, March 11, 2004 12:35 PM To: Tom Lane Cc: Greg Spiegelberg; PgSQL Performance ML; Postgres Admin List Subject: Re: [PERFORM] [ADMIN] syslog slowing the database? > You could also consider not using syslog at all: let the postmaster > output to its stderr, and pipe that into a log-rotation program. I > believe some people use Apache's log rotator for this with good > results. Not an option I'm afraid. PostgreSQL just jams and stops logging after the first rotation... I've read in the docs that syslog logging is the only "production" solution... Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [ADMIN] syslog slowing the database?
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes: >> You could also consider not using syslog at all: let the postmaster >> output to its stderr, and pipe that into a log-rotation program. >> I believe some people use Apache's log rotator for this with good >> results. > Not an option I'm afraid. PostgreSQL just jams and stops logging after > the first rotation... I know some people use this in production. Dunno what went wrong in your test, but it can be made to work. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])