Re: [PERFORM] optimizing large query with IN (...)

2004-03-10 Thread Eric Jain
> UPDATE table SET table.par = table.par + 1
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )

How fast is the query alone, i.e. 

  SELECT * FROM table
  WHERE table.key IN ('value1', 'value2', ... , 'value1200' )


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] optimizing large query with IN (...)

2004-03-10 Thread Christopher Kings-Lynne
UPDATE table SET table.par = table.par + 1
WHERE table.key IN ('value1', 'value2', ... , 'value1200' )


How fast is the query alone, i.e. 

  SELECT * FROM table
  WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
Also, post the output of '\d table' and EXPLAIN ANALYZE UPDATE...

Chris

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Cluster failure due to space

2004-03-10 Thread Shea,Dan [CIS]
I have had a cluster failure on a table.  It most likely was due to space.
I do not not have the error message anymore, but it was indicating that it
was most likely a space problem.  The partition was filled to 99%.  The
table is about 56 GB and what I believe to be the new table that it was
writing to looks to be 40 files of  1GB. 

The problem is that it did not clean itself up properly.  

The oids that I believe it was writing to  are still  there.
There are 56 files of 102724113.* and 40 files of 361716097.*.
A vacuum had indicated that there was around 16 GB of free space.
I can not find any reference to 361716097 in the pg_class table.  
Am I going to have to manually delete the 361716097.* files myself?

Dan.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] optimizing large query with IN (...)

2004-03-10 Thread Steve Atkins
On Wed, Mar 10, 2004 at 12:35:15AM -0300, Marcus Andree S. Magalhaes wrote:
> Guys,
> 
> I got a Java program to tune. It connects to a 7.4.1 postgresql server
> running Linux using JDBC.
> 
> The program needs to update a counter on a somewhat large number of
> rows, about 1200 on a ~130k rows table. The query is something like
> the following:
> 
> UPDATE table SET table.par = table.par + 1
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
> 
> This query runs on a  transaction (by issuing  a call to
> setAutoCommit(false)) and a commit() right after the query
> is sent to the backend.
> 
> The process of committing and updating the values is painfully slow
> (no surprises here). Any ideas?

I posted an analysis of use of IN () like this a few weeks ago on
pgsql-general.

The approach you're using is optimal for < 3 values.

For any more than that, insert value1 ... value1200 into a temporary
table, then do

 UPDATE table SET table.par = table.par + 1
 WHERE table.key IN (SELECT value from temp_table);

Indexing the temporary table marginally increases the speed, but not
significantly.

Cheers,
  Steve



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Cluster failure due to space

2004-03-10 Thread Tom Lane
"Shea,Dan [CIS]" <[EMAIL PROTECTED]> writes:
> The problem is that it did not clean itself up properly.  

Hm.  It should have done so.  What were the exact filenames and sizes of
the not-deleted files?

> I can not find any reference to 361716097 in the pg_class table.  

You are looking at pg_class.relfilenode, I hope, not pg_class.oid.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Cluster failure due to space

2004-03-10 Thread Shea,Dan [CIS]



"Shea,Dan [CIS]" <[EMAIL PROTECTED]> writes:
>> The problem is that it did not clean itself up properly.  

>Hm.  It should have done so.  What were the exact filenames and sizes of
>the not-deleted files?
361716097 to 361716097.39 are 1073741824 bytes.
361716097.40 is 186105856 bytes.

> I can not find any reference to 361716097 in the pg_class table.  

>>You are looking at pg_class.relfilenode, I hope, not pg_class.oid.
Yes I am looking through pg_class.relfilenode.

Dan.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] compiling 7.4.1 on Solaris 9

2004-03-10 Thread Andrew Sullivan
On Tue, Mar 02, 2004 at 10:54:23AM +, teknokrat wrote:
> thanks, i remember a thread about problems with flags passed to gcc on 
> solaris. I was wondering if there had been any resolution and if the 
> defaults for 7.4 are considered Ok.

As near as I can tell, -O2 is used by default on Solaris now.  Again,
this is on 8, not 9.  

At work, we have been doing a number of tests on 7.4.  The
performance is such an improvement over 7.2 that the QA folks thought
there must be something wrong.  So I suppose the defaults are ok.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] optimizing large query with IN (...)

2004-03-10 Thread Marcus Andree S. Magalhaes

Hmm... from the 'performance' point of view, since the data comes from
a quite complex select statement, Isn't it better/quicker to have this
select replaced by a select into and creating a temporary database?



> The problem, as I understand it, is that 7.4 introduced massive
> improvements in handling moderately large in() clauses, as long as they
> can fit in sort_mem, and are provided by a subselect.
>
> So, creating a temp table with all the values in it and using in() on
> the  temp table may be a win:
>
> begin;
> create temp table t_ids(id int);
> insert into t_ids(id) values (123); <- repeat a few hundred times
> select * from maintable where id in (select id from t_ids);
> ...




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] optimizing large query with IN (...)

2004-03-10 Thread Steve Atkins
On Wed, Mar 10, 2004 at 02:02:23PM -0300, Marcus Andree S. Magalhaes wrote:

> Hmm... from the 'performance' point of view, since the data comes from
> a quite complex select statement, Isn't it better/quicker to have this
> select replaced by a select into and creating a temporary database?

Definitely - why loop the data into the application and back out again
if you don't need to?

> > The problem, as I understand it, is that 7.4 introduced massive
> > improvements in handling moderately large in() clauses, as long as they
> > can fit in sort_mem, and are provided by a subselect.
> >
> > So, creating a temp table with all the values in it and using in() on
> > the  temp table may be a win:
> >
> > begin;
> > create temp table t_ids(id int);
> > insert into t_ids(id) values (123); <- repeat a few hundred times
> > select * from maintable where id in (select id from t_ids);
> > ...

Cheers,
  Steve

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] optimizing large query with IN (...)

2004-03-10 Thread scott.marlowe
On Wed, 10 Mar 2004, Marcus Andree S. Magalhaes wrote:

> 
> Guys,
> 
> I got a Java program to tune. It connects to a 7.4.1 postgresql server
> running Linux using JDBC.
> 
> The program needs to update a counter on a somewhat large number of
> rows, about 1200 on a ~130k rows table. The query is something like
> the following:
> 
> UPDATE table SET table.par = table.par + 1
> WHERE table.key IN ('value1', 'value2', ... , 'value1200' )
> 
> This query runs on a  transaction (by issuing  a call to
> setAutoCommit(false)) and a commit() right after the query
> is sent to the backend.
> 
> The process of committing and updating the values is painfully slow
> (no surprises here). Any ideas?

The problem, as I understand it, is that 7.4 introduced massive 
improvements in handling moderately large in() clauses, as long as they 
can fit in sort_mem, and are provided by a subselect.

So, creating a temp table with all the values in it and using in() on the 
temp table may be a win:

begin;
create temp table t_ids(id int);
insert into t_ids(id) values (123); <- repeat a few hundred times
select * from maintable where id in (select id from t_ids);
...



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] syslog slowing the database?

2004-03-10 Thread Vivek Khera
> "GS" == Greg Spiegelberg <[EMAIL PROTECTED]> writes:

GS> I've been waiting all day for a pg_restore to finish on a test system
GS> identically configured as our production in hardware and software
GS> with the exception prod is 7.3.5 and test is 7.4.1.

GS> The file it's restoring from is about 8GB uncompressed from a
GS> "pg_dump -b -F t" and after 2 hours the directory the database is in
GS> contains only 1GB.  iostat reported ~2000 blocks written every 2
GS> seconds to the DB file system.

Have you considered increasing the value of checkpoint_segments to
something like 50 or 100 during your restore? It made a *dramatic*
improvement on my system when I did the same migration.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] compiling 7.4.1 on Solaris 9

2004-03-10 Thread Jeff
On Mar 2, 2004, at 5:54 AM, teknokrat wrote:

Andrew Sullivan wrote:
On Thu, Feb 26, 2004 at 12:46:23PM +, teknokrat wrote:
I've read about the place. Would using -O3 be an improvement?
In my experience, it's not only not an improvement, it sometimes
breaks the code.  That's on 8, though, not 9.
A
thanks, i remember a thread about problems with flags passed to gcc on 
solaris. I was wondering if there had been any resolution and if the 
defaults for 7.4 are considered Ok.

Yes. The compile flags on solaris were fixed on 7.4.  Previously it 
wasn't using any optimization flags.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] [ADMIN] syslog slowing the database?

2004-03-10 Thread Greg Spiegelberg
Tom Lane wrote:
Greg Spiegelberg <[EMAIL PROTECTED]> writes:

I turned syslog back on and the restore slowed down again.  Turned
it off and it sped right back up.
We have heard reports before of syslog being quite slow.  What platform
are you on exactly?  Does Richard's suggestion of turning off syslog's
fsync help?
RedHat 7.3 w/ 2.4.24 kernel on a dual Intel PIII 1.3Ghz, 2GB memory,
U160 internal on integrated controller, 1Gbps SAN for database.
Database file being restored and the actual database are on different
disk and controllers than syslog files.
With the ``-'' in front of the syslog file postgres logs too gives
me roughly 75% of the I/O the performance as reported by iostat.  So,
it helps though turning syslog off gives the optimum performance.
If the log and database were on the same disk I'd be okay with the
current workaround.  If the ``-'' gave me near the same performance as
turning syslog off I'd be okay with that too.  However, neither of these
are the case so there has to be something else blocking between the two
processes.
<2 hours and multiple test later>

I've found that hardware interrupts are the culprit.  Given my system
config both SCSI and fibre controllers were throttling the system with
the interrupts required to write the data (syslog & database) and read
the data from the restore.  I'm okay with that.
In the order of worst to best.

* There were, on average about 450 interrupts/sec with the default
  config of syslog on one disk, database on the SAN and syslog using
  fsync.
* Turning fsync off in syslog puts interrupts around 105/sec and.

* Having syslog fsync turned off in syslog AND moving the syslog file
  to a filesystem serviced by the same fibre controller put interrupts
  at around 92/sec.  I decided to do this after watching the I/O on
  the SAN with syslog turned off and found that it had bandwidth to
  spare.  FYI, the system when idle generated about 50 interrupts/sec.
I'm going with the later for now on the test system and after running
it through it's paces with all our processes I'll make the change in
production.  I'll post if I run into anything else.
Greg

BTW, I like what metalog has to offer but I prefer using as many of the
default tools as possible and replacing them only when absolutely
necessary.  What I've learned with syslog here is that it is still
viable but likely requires a minor tweak.  If this tweak fails in
testing I'll look at metalog then.
--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] [ADMIN] syslog slowing the database?

2004-03-10 Thread Tom Lane
Greg Spiegelberg <[EMAIL PROTECTED]> writes:
> If the log and database were on the same disk I'd be okay with the
> current workaround.  If the ``-'' gave me near the same performance as
> turning syslog off I'd be okay with that too.  However, neither of these
> are the case so there has to be something else blocking between the two
> processes.

You could also consider not using syslog at all: let the postmaster
output to its stderr, and pipe that into a log-rotation program.
I believe some people use Apache's log rotator for this with good
results.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [ADMIN] syslog slowing the database?

2004-03-10 Thread Mark Harrison
Tom Lane wrote:
Greg Spiegelberg <[EMAIL PROTECTED]> writes:

If the log and database were on the same disk I'd be okay with the
current workaround.  If the ``-'' gave me near the same performance as
turning syslog off I'd be okay with that too.  However, neither of these
are the case so there has to be something else blocking between the two
processes.


You could also consider not using syslog at all: let the postmaster
output to its stderr, and pipe that into a log-rotation program.
I believe some people use Apache's log rotator for this with good
results.
I do this... here's the relevant lines from my startup script:

ROTATE="/inst/apache/bin/rotatelogs $PGLOGS/postgresql 86400"
$PGBIN/pg_ctl start -s -D $PGDATA | $ROTATE &
Following is a patch to rotatelogs that does two things:

- makes a symbolic link 'foo.current' that points to the
  current output file.
- gzips the rotated logfile

If you have gnu tools installed, you can
tail --retry --follow=name foo.current
and it will automatically track the most recent
log file.
HTH,
Mark
--
Mark Harrison
Pixar Animation Studios
*** rotatelogs.c-orig   2004-03-10 10:24:02.0 -0800
--- rotatelogs.c2004-03-10 11:01:55.0 -0800
***
*** 25,30 
--- 25,32 
  int main (int argc, char **argv)
  {
  char buf[BUFSIZE], buf2[MAX_PATH], errbuf[ERRMSGSZ];
+ char linkbuf[MAX_PATH];
+ char oldbuf2[MAX_PATH];
  time_t tLogEnd = 0, tRotation;
  int nLogFD = -1, nLogFDprev = -1, nMessCount = 0, nRead, nWrite;
  int utc_offset = 0;
***
*** 75,80 
--- 77,84 
  setmode(0, O_BINARY);
  #endif
+ sprintf(linkbuf, "%s.current", szLogRoot);
+ sprintf(oldbuf2, "");
  use_strftime = (strstr(szLogRoot, "%") != NULL);
  for (;;) {
  nRead = read(0, buf, sizeof buf);
***
*** 99,104 
--- 103,111 
  sprintf(buf2, "%s.%010d", szLogRoot, (int) tLogStart);
  }
  tLogEnd = tLogStart + tRotation;
+ printf("oldbuf2=%s\n",oldbuf2);
+ printf("buf2=%s\n",buf2);
+ printf("linkbuf=%s\n",linkbuf);
  nLogFD = open(buf2, O_WRONLY | O_CREAT | O_APPEND, 0666);
  if (nLogFD < 0) {
  /* Uh-oh. Failed to open the new log file. Try to clear
***
*** 125,130 
--- 132,146 
  }
  else {
  close(nLogFDprev);
+ /* use: tail --follow=name foo.current */
+ unlink(linkbuf);
+ symlink(buf2,linkbuf);
+ if (strlen(oldbuf2) > 0) {
+ char cmd[MAX_PATH+100];
+ sprintf(cmd, "gzip %s &", oldbuf2);
+ system(cmd);
+ }
+ strcpy(oldbuf2, buf2);
  }
  nMessCount = 0;
  }
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] [ADMIN] syslog slowing the database?

2004-03-10 Thread Christopher Kings-Lynne
You could also consider not using syslog at all: let the postmaster
output to its stderr, and pipe that into a log-rotation program.
I believe some people use Apache's log rotator for this with good
results.
Not an option I'm afraid.  PostgreSQL just jams and stops logging after 
the first rotation...

I've read in the docs that syslog logging is the only "production" 
solution...

Chris

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] [ADMIN] syslog slowing the database?

2004-03-10 Thread Chris Smith
It might depend on how you're rotating it.

Try the copy/truncate method instead of moving the log file. If you move
the log file to another filename you usually have to restart the app
doing the logging before it starts logging again.

Chris.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Christopher
Kings-Lynne
Sent: Thursday, March 11, 2004 12:35 PM
To: Tom Lane
Cc: Greg Spiegelberg; PgSQL Performance ML; Postgres Admin List
Subject: Re: [PERFORM] [ADMIN] syslog slowing the database?


> You could also consider not using syslog at all: let the postmaster 
> output to its stderr, and pipe that into a log-rotation program. I 
> believe some people use Apache's log rotator for this with good 
> results.

Not an option I'm afraid.  PostgreSQL just jams and stops logging after 
the first rotation...

I've read in the docs that syslog logging is the only "production" 
solution...

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] [ADMIN] syslog slowing the database?

2004-03-10 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
>> You could also consider not using syslog at all: let the postmaster
>> output to its stderr, and pipe that into a log-rotation program.
>> I believe some people use Apache's log rotator for this with good
>> results.

> Not an option I'm afraid.  PostgreSQL just jams and stops logging after 
> the first rotation...

I know some people use this in production.  Dunno what went wrong in
your test, but it can be made to work.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])