[PERFORM] VERY slow after many updates

2005-11-19 Thread Alex Wang
I am using PostgreSQL in an embedded system which has only 32 or 64 MB RAM 
(run on PPC 266 MHz or ARM 266MHz CPU). I have a table to keep downlaod 
tasks. There is a daemon keep looking up the table and fork a new process to 
download data from internet.


Daemon:
   . Check the table every 5 seconds
   . Fork a download process to download if there is new task
Downlaod process (there are 5 download process max):
   . Update the download rate and downloaded size every 3 seconds.

At begining, everything just fine. The speed is good. But after 24 hours, 
the speed to access database become very very slow. Even I stop all 
processes, restart PostgreSQL and use psql to select data, this speed is 
still very very slow (a SQL command takes more than 2 seconds). It is a 
small table. There are only 8 records in the table.


The only way to solve it is remove all database, run initdb, create new 
database and insert new records. I tried to run vacummdb but still very 
slow.


Any idea to make it faster?

Thanks,
Alex

--
Here is the table schema:
create table download_queue (
  task_id SERIAL,
  username varchar(128),
  pid int,
  url text,
  filename varchar(1024),
  status int,
  created_time int,
  started_time int,
  total_size int8,
  current_size int8,
  current_rate int,
  CONSTRAINT download_queue_pkey PRIMARY KEY(task_id)
);
CREATE INDEX download_queue_user_index ON download_queue USING BTREE 
(username);




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] VERY slow after many updates

2005-11-19 Thread Csaba Nagy
Alex,

I suppose the table is a kind of 'queue' table, where you
insert/get/delete continuously, and the life of the records is short.
Considering that in postgres a delete will still leave you the record in
the table's file and in the indexes, just mark it as dead, your table's
actual size can grow quite a lot even if the number of live records will
stay small (you will have a lot of dead tuples, the more tasks
processed, the more dead tuples). So I guess you should vacuum this
table very often, so that the dead tuples are reused. I'm not an expert
on this, but it might be good to vacuum after each n deletions, where n
is ~ half the average size of the queue you expect to have. From time to
time you might want to do a vacuum full on it and a reindex.

Right now I guess a vacuum full + reindex will help you. I think it's
best to do:

vacuum download_queue;
vacuum full download_queue;
reindex download_queue;

I think the non-full vacuum which is less obtrusive than the full one
will do at least some of the work and it will bring all needed things in
FS cache, so the full vacuum to be as fast as possible (vacuum full
locks exclusively the table). At least I do it this way with good
results for small queue-like tables...

BTW, I wonder if the download_queue_user_index index is helping you at
all on that table ? Do you expect it to grow bigger than 1000 ?
Otherwise it has no point to index it.

HTH,
Csaba.

On Sat, 2005-11-19 at 08:46, Alex Wang wrote:
> I am using PostgreSQL in an embedded system which has only 32 or 64 MB RAM 
> (run on PPC 266 MHz or ARM 266MHz CPU). I have a table to keep downlaod 
> tasks. There is a daemon keep looking up the table and fork a new process to 
> download data from internet.
> 
> Daemon:
> . Check the table every 5 seconds
> . Fork a download process to download if there is new task
> Downlaod process (there are 5 download process max):
> . Update the download rate and downloaded size every 3 seconds.
> 
> At begining, everything just fine. The speed is good. But after 24 hours, 
> the speed to access database become very very slow. Even I stop all 
> processes, restart PostgreSQL and use psql to select data, this speed is 
> still very very slow (a SQL command takes more than 2 seconds). It is a 
> small table. There are only 8 records in the table.
> 
> The only way to solve it is remove all database, run initdb, create new 
> database and insert new records. I tried to run vacummdb but still very 
> slow.
> 
> Any idea to make it faster?
> 
> Thanks,
> Alex
> 
> --
> Here is the table schema:
> create table download_queue (
>task_id SERIAL,
>username varchar(128),
>pid int,
>url text,
>filename varchar(1024),
>status int,
>created_time int,
>started_time int,
>total_size int8,
>current_size int8,
>current_rate int,
>CONSTRAINT download_queue_pkey PRIMARY KEY(task_id)
> );
> CREATE INDEX download_queue_user_index ON download_queue USING BTREE 
> (username);
> 
> 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] VERY slow after many updates

2005-11-19 Thread Alex Wang

Hi Csaba,

Thanks for your reply.

Yes, it's a "queue" table. But I did not perform many insert/delete before 
it becomes slow. After insert 10 records, I just do get/update continuously. 
After 24 hour, the whole database become very slow (not only the 
download_queue table but other tables, too).  But you are right. Full vacuum 
fixes the problem. Thank you very much!


I expect there will be less than 1000 records in the table. The index does 
obvous improvement on "SELECT task_id, username FROM download_queue WHERE 
username > '%s'" even there are only 100 records.


Thanks,
Alex

- Original Message - 
From: "Csaba Nagy" <[EMAIL PROTECTED]>

To: "Alex Wang" <[EMAIL PROTECTED]>
Cc: "postgres performance list" 
Sent: Saturday, November 19, 2005 7:12 PM
Subject: Re: [PERFORM] VERY slow after many updates



Alex,

I suppose the table is a kind of 'queue' table, where you
insert/get/delete continuously, and the life of the records is short.
Considering that in postgres a delete will still leave you the record in
the table's file and in the indexes, just mark it as dead, your table's
actual size can grow quite a lot even if the number of live records will
stay small (you will have a lot of dead tuples, the more tasks
processed, the more dead tuples). So I guess you should vacuum this
table very often, so that the dead tuples are reused. I'm not an expert
on this, but it might be good to vacuum after each n deletions, where n
is ~ half the average size of the queue you expect to have. From time to
time you might want to do a vacuum full on it and a reindex.

Right now I guess a vacuum full + reindex will help you. I think it's
best to do:

vacuum download_queue;
vacuum full download_queue;
reindex download_queue;

I think the non-full vacuum which is less obtrusive than the full one
will do at least some of the work and it will bring all needed things in
FS cache, so the full vacuum to be as fast as possible (vacuum full
locks exclusively the table). At least I do it this way with good
results for small queue-like tables...

BTW, I wonder if the download_queue_user_index index is helping you at
all on that table ? Do you expect it to grow bigger than 1000 ?
Otherwise it has no point to index it.

HTH,
Csaba.

On Sat, 2005-11-19 at 08:46, Alex Wang wrote:
I am using PostgreSQL in an embedded system which has only 32 or 64 MB 
RAM

(run on PPC 266 MHz or ARM 266MHz CPU). I have a table to keep downlaod
tasks. There is a daemon keep looking up the table and fork a new process 
to

download data from internet.

Daemon:
. Check the table every 5 seconds
. Fork a download process to download if there is new task
Downlaod process (there are 5 download process max):
. Update the download rate and downloaded size every 3 seconds.

At begining, everything just fine. The speed is good. But after 24 hours,
the speed to access database become very very slow. Even I stop all
processes, restart PostgreSQL and use psql to select data, this speed is
still very very slow (a SQL command takes more than 2 seconds). It is a
small table. There are only 8 records in the table.

The only way to solve it is remove all database, run initdb, create new
database and insert new records. I tried to run vacummdb but still very
slow.

Any idea to make it faster?

Thanks,
Alex

--
Here is the table schema:
create table download_queue (
   task_id SERIAL,
   username varchar(128),
   pid int,
   url text,
   filename varchar(1024),
   status int,
   created_time int,
   started_time int,
   total_size int8,
   current_size int8,
   current_rate int,
   CONSTRAINT download_queue_pkey PRIMARY KEY(task_id)
);
CREATE INDEX download_queue_user_index ON download_queue USING BTREE
(username);





--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] VERY slow after many updates

2005-11-19 Thread Csaba Nagy
Just for clarification, update is actually equal to delete+insert in
Postgres. So if you update rows, it's the same as you would delete the
row and insert a new version. So the table is bloating also in this
situation.
I think there is an added problem when you update, namely to get to a
row, postgres will traverse all dead rows matching the criteria... so
even if you have an index, getting 1 row which was updated 1 times
will access 1 rows only to find 1 which is still alive. So in this
case vacuuming should happen even more often, to eliminate the dead
rows.
And the index was probably only helping because the table was really
bloated, so if you vacuum it often enough you will be better off without
the index if the row count will stay low.

Cheers,
Csaba.


On Sat, 2005-11-19 at 13:05, Alex Wang wrote:
> Hi Csaba,
> 
> Thanks for your reply.
> 
> Yes, it's a "queue" table. But I did not perform many insert/delete before 
> it becomes slow. After insert 10 records, I just do get/update continuously. 
> After 24 hour, the whole database become very slow (not only the 
> download_queue table but other tables, too).  But you are right. Full vacuum 
> fixes the problem. Thank you very much!
> 
> I expect there will be less than 1000 records in the table. The index does 
> obvous improvement on "SELECT task_id, username FROM download_queue WHERE 
> username > '%s'" even there are only 100 records.
> 
> Thanks,
> Alex
> 
> - Original Message - 
> From: "Csaba Nagy" <[EMAIL PROTECTED]>
> To: "Alex Wang" <[EMAIL PROTECTED]>
> Cc: "postgres performance list" 
> Sent: Saturday, November 19, 2005 7:12 PM
> Subject: Re: [PERFORM] VERY slow after many updates
> 
> 
> > Alex,
> >
> > I suppose the table is a kind of 'queue' table, where you
> > insert/get/delete continuously, and the life of the records is short.
> > Considering that in postgres a delete will still leave you the record in
> > the table's file and in the indexes, just mark it as dead, your table's
> > actual size can grow quite a lot even if the number of live records will
> > stay small (you will have a lot of dead tuples, the more tasks
> > processed, the more dead tuples). So I guess you should vacuum this
> > table very often, so that the dead tuples are reused. I'm not an expert
> > on this, but it might be good to vacuum after each n deletions, where n
> > is ~ half the average size of the queue you expect to have. From time to
> > time you might want to do a vacuum full on it and a reindex.
> >
> > Right now I guess a vacuum full + reindex will help you. I think it's
> > best to do:
> >
> > vacuum download_queue;
> > vacuum full download_queue;
> > reindex download_queue;
> >
> > I think the non-full vacuum which is less obtrusive than the full one
> > will do at least some of the work and it will bring all needed things in
> > FS cache, so the full vacuum to be as fast as possible (vacuum full
> > locks exclusively the table). At least I do it this way with good
> > results for small queue-like tables...
> >
> > BTW, I wonder if the download_queue_user_index index is helping you at
> > all on that table ? Do you expect it to grow bigger than 1000 ?
> > Otherwise it has no point to index it.
> >
> > HTH,
> > Csaba.
> >
> > On Sat, 2005-11-19 at 08:46, Alex Wang wrote:
> >> I am using PostgreSQL in an embedded system which has only 32 or 64 MB 
> >> RAM
> >> (run on PPC 266 MHz or ARM 266MHz CPU). I have a table to keep downlaod
> >> tasks. There is a daemon keep looking up the table and fork a new process 
> >> to
> >> download data from internet.
> >>
> >> Daemon:
> >> . Check the table every 5 seconds
> >> . Fork a download process to download if there is new task
> >> Downlaod process (there are 5 download process max):
> >> . Update the download rate and downloaded size every 3 seconds.
> >>
> >> At begining, everything just fine. The speed is good. But after 24 hours,
> >> the speed to access database become very very slow. Even I stop all
> >> processes, restart PostgreSQL and use psql to select data, this speed is
> >> still very very slow (a SQL command takes more than 2 seconds). It is a
> >> small table. There are only 8 records in the table.
> >>
> >> The only way to solve it is remove all database, run initdb, create new
> >> database and insert new records. I tried to run vacummdb but still very
> >> slow.
> >>
> >> Any idea to make it faster?
> >>
> >> Thanks,
> >> Alex
> >>
> >> --
> >> Here is the table schema:
> >> create table download_queue (
> >>task_id SERIAL,
> >>username varchar(128),
> >>pid int,
> >>url text,
> >>filename varchar(1024),
> >>status int,
> >>created_time int,
> >>started_time int,
> >>total_size int8,
> >>current_size int8,
> >>current_rate int,
> >>CONSTRAINT download_queue_pkey PRIMARY KEY(task_id)
> >> );
> >> CREATE INDEX download_queue_user_index ON download_queue USING BTREE
> >> 

Re: [PERFORM] VERY slow after many updates

2005-11-19 Thread Guido Neitzer

On 19.11.2005, at 13:05 Uhr, Alex Wang wrote:

Yes, it's a "queue" table. But I did not perform many insert/delete  
before it becomes slow. After insert 10 records, I just do get/ 
update continuously.


When PostgreSQL updates a row, it creates a new row with the updated  
values. So you should be aware, that the DB gets bigger and bigger  
when you only update your rows. Vacuum full reclaims that used space.


The concepts are described in detail in the manual in chapter 12.

cug

--
PharmaLine Essen, GERMANY and
Big Nerd Ranch Europe - PostgreSQL Training, Dec. 2005, Rome, Italy
http://www.bignerdranch.com/classes/postgresql.shtml




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] VERY slow after many updates

2005-11-19 Thread Alex Wang
Great infomation. I didn't know that update is equal to delete+insert in 
Postgres. I would be more careful on designing the database access method in 
this case.


Thanks,
Alex

- Original Message - 
From: "Csaba Nagy" <[EMAIL PROTECTED]>

To: "Alex Wang" <[EMAIL PROTECTED]>
Cc: "postgres performance list" 
Sent: Saturday, November 19, 2005 8:12 PM
Subject: Re: [PERFORM] VERY slow after many updates



Just for clarification, update is actually equal to delete+insert in
Postgres. So if you update rows, it's the same as you would delete the
row and insert a new version. So the table is bloating also in this
situation.
I think there is an added problem when you update, namely to get to a
row, postgres will traverse all dead rows matching the criteria... so
even if you have an index, getting 1 row which was updated 1 times
will access 1 rows only to find 1 which is still alive. So in this
case vacuuming should happen even more often, to eliminate the dead
rows.
And the index was probably only helping because the table was really
bloated, so if you vacuum it often enough you will be better off without
the index if the row count will stay low.

Cheers,
Csaba.


On Sat, 2005-11-19 at 13:05, Alex Wang wrote:

Hi Csaba,

Thanks for your reply.

Yes, it's a "queue" table. But I did not perform many insert/delete 
before
it becomes slow. After insert 10 records, I just do get/update 
continuously.

After 24 hour, the whole database become very slow (not only the
download_queue table but other tables, too).  But you are right. Full 
vacuum

fixes the problem. Thank you very much!

I expect there will be less than 1000 records in the table. The index 
does

obvous improvement on "SELECT task_id, username FROM download_queue WHERE
username > '%s'" even there are only 100 records.

Thanks,
Alex

- Original Message - 
From: "Csaba Nagy" <[EMAIL PROTECTED]>

To: "Alex Wang" <[EMAIL PROTECTED]>
Cc: "postgres performance list" 
Sent: Saturday, November 19, 2005 7:12 PM
Subject: Re: [PERFORM] VERY slow after many updates


> Alex,
>
> I suppose the table is a kind of 'queue' table, where you
> insert/get/delete continuously, and the life of the records is short.
> Considering that in postgres a delete will still leave you the record 
> in

> the table's file and in the indexes, just mark it as dead, your table's
> actual size can grow quite a lot even if the number of live records 
> will

> stay small (you will have a lot of dead tuples, the more tasks
> processed, the more dead tuples). So I guess you should vacuum this
> table very often, so that the dead tuples are reused. I'm not an expert
> on this, but it might be good to vacuum after each n deletions, where n
> is ~ half the average size of the queue you expect to have. From time 
> to

> time you might want to do a vacuum full on it and a reindex.
>
> Right now I guess a vacuum full + reindex will help you. I think it's
> best to do:
>
> vacuum download_queue;
> vacuum full download_queue;
> reindex download_queue;
>
> I think the non-full vacuum which is less obtrusive than the full one
> will do at least some of the work and it will bring all needed things 
> in

> FS cache, so the full vacuum to be as fast as possible (vacuum full
> locks exclusively the table). At least I do it this way with good
> results for small queue-like tables...
>
> BTW, I wonder if the download_queue_user_index index is helping you at
> all on that table ? Do you expect it to grow bigger than 1000 ?
> Otherwise it has no point to index it.
>
> HTH,
> Csaba.
>
> On Sat, 2005-11-19 at 08:46, Alex Wang wrote:
>> I am using PostgreSQL in an embedded system which has only 32 or 64 MB
>> RAM
>> (run on PPC 266 MHz or ARM 266MHz CPU). I have a table to keep 
>> downlaod
>> tasks. There is a daemon keep looking up the table and fork a new 
>> process

>> to
>> download data from internet.
>>
>> Daemon:
>> . Check the table every 5 seconds
>> . Fork a download process to download if there is new task
>> Downlaod process (there are 5 download process max):
>> . Update the download rate and downloaded size every 3 seconds.
>>
>> At begining, everything just fine. The speed is good. But after 24 
>> hours,

>> the speed to access database become very very slow. Even I stop all
>> processes, restart PostgreSQL and use psql to select data, this speed 
>> is
>> still very very slow (a SQL command takes more than 2 seconds). It is 
>> a

>> small table. There are only 8 records in the table.
>>
>> The only way to solve it is remove all database, run initdb, create 
>> new
>> database and insert new records. I tried to run vacummdb but still 
>> very

>> slow.
>>
>> Any idea to make it faster?
>>
>> Thanks,
>> Alex
>>
>> --
>> Here is the table schema:
>> create table download_queue (
>>task_id SERIAL,
>>username varchar(128),
>>pid int,
>>url text,
>>filename varchar(1024),
>>status int,
>>created_time int,
>> 

Re: [PERFORM] What is the max number of database I can create in

2005-11-19 Thread John McCawley


However, what is the max number of database I can create before  
performance goes down?


I know I'm not directly answering your question, but you might want to 
consider why you're splitting things up into different logical 
databases.  If security is a big concern, you can create different 
database users that own the different departments' tables, and each of 
your apps can login as the corresponding users. 

Everyone loves reports.  Once you've got data in your database, people 
will ask for a billion reports...Whether or not they know it now, most 
likely they're going to want reports that cross the department 
boundaries (gross revenue, employee listings etc.) and that will be very 
difficult if you have multiple databases.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-19 Thread Luke Lonergan
Mark,

On 11/18/05 6:27 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:

> That too, meaning the business of 1 executor random reading a given
> relation file whilst another is sequentially scanning (some other) part
> of it

I think it should actually improve things - each I/O will read 16MB into the
I/O cache, then the next scanner will seek for 10ms to get the next 16MB
into cache, etc.  It should minimize the seek/data ratio nicely.  As long as
the tables are contiguous it should rock and roll.

- Luke



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-19 Thread Luke Lonergan
Alan,

On 11/18/05 11:39 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

> Yes and no.   The one cpu is clearly idle.   The second cpu is 40% busy
> and 60% idle (aka iowait in the above numbers).

The "aka iowait" is the problem here - iowait is not idle (otherwise it
would be in the "idle" column).

Iowait is time spent waiting on blocking io calls.  As another poster
pointed out, you have a two CPU system, and during your scan, as predicted,
one CPU went 100% busy on the seq scan.  During iowait periods, the CPU can
be context switched to other users, but as I pointed out earlier, that's not
useful for getting response on decision support queries.

Thanks for your data, it exemplifies many of the points brought up:
- Lots of disks and expensive I/O hardware does not help improve performance
on large table queries because I/O bandwidth does not scale beyond
110-120MB/s on the fastest CPUs
- OLTP performance optimizations are different than decision support

Regards,

- Luke



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Storage/Performance and splitting a table

2005-11-19 Thread Craig A. James

In a recent thread, several people pointed out that UPDATE = DELETE+INSERT.  
This got me to wondering.

I have a table that, roughly, looks like this:

 create table doc (
id integer primary key,
document   text,
keywords   tsvector
  );

where "keywords" has a GIST index.  There are about 10 million rows in the 
table, and an average of 20 keywords per document.  I have two questions.

First, I occasionally rebuild the keywords, after which the VACUUM FULL ANALYZE 
takes a LONG time - like 24 hours.  Given the UPDATE = DELETE+INSERT, it sounds 
like I'd be better off with something like this:

 create table doc (
id integer primary key,
document   text,
  );
 create table keywords (
id integer primary key,
keywords   tsvector
  );

Then I could just drop the GIST index, truncate the keywords table, rebuild the 
keywords, and reindex.  My suspicion is that VACUUM FULL ANALYZE would be quick 
-- there would be no garbage to collect, so all it would to do is the ANALYZE 
part.

My second question: With the doc and keywords split into two tables, would the tsearch2/GIST performance be faster?  The second 
schema's "keywords" table has just pure keywords (no documents); does that translate to fewer blocks being read during 
a tsearch2/GIST query?  Or are the "document" and "keywords" columns of the first schema already stored 
separately on disk so that the size of the "document" data doesn't affect the "keywords" search performance?

Thanks,
Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Storage/Performance and splitting a table

2005-11-19 Thread Michael Stone

On Sat, Nov 19, 2005 at 09:54:23AM -0800, Craig A. James wrote:
First, I occasionally rebuild the keywords, after which the VACUUM FULL 
ANALYZE takes a LONG time - like 24 hours.


You know you just need vacuum, not vacuum full, right? 


Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Perl DBD and an alarming problem

2005-11-19 Thread Craig A. James



When I set statement_timeout in the config file, it just didn't
do anything - it never timed out (PG 8.0.3).


... but did you reload the server after you [changed statement_timeout]?


Mystery solved.  I have two servers; I was reconfiguring one and restarting the 
other.  Duh.

Thanks,
Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-19 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,

On 11/18/05 3:46 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:
 


If you alter this to involve more complex joins (e.g 4. way star) and
(maybe add a small number of concurrent executors too) - is it still the
case?



4-way star, same result, that's part of my point.  With Bizgres MPP, the
4-way star uses 4 concurrent scanners, though not all are active all the
time.  And that's per segment instance - we normally use one segment
instance per CPU, so our concurrency is NCPUs plus some.



Luke - I don't think I was clear enough about what I was asking, sorry.

I added the more "complex joins" comment because:

- I am happy that seqscan is cpu bound after ~110M/s (It's cpu bound on 
my old P3 system even earlier than that)
- I am curious if the *other* access methods (indexscan, nested loop, 
hash, merge, bitmap) also suffer then same fate.


I'm guessing from your comment that you have tested this too, but I 
think its worth clarifying!


With respect to Bizgres MPP, scan parallelism is a great addition... 
very nice! (BTW - is that in 0.8, or are we talking a new product variant?)


regards

Mark



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-19 Thread Alan Stange

Luke Lonergan wrote:

Alan,

On 11/18/05 11:39 AM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

  

Yes and no.   The one cpu is clearly idle.   The second cpu is 40% busy
and 60% idle (aka iowait in the above numbers).



The "aka iowait" is the problem here - iowait is not idle (otherwise it
would be in the "idle" column).

Iowait is time spent waiting on blocking io calls.  As another poster
pointed out, you have a two CPU system, and during your scan, as predicted,
one CPU went 100% busy on the seq scan.  During iowait periods, the CPU can
be context switched to other users, but as I pointed out earlier, that's not
useful for getting response on decision support queries.
  
iowait time is idle time. Period.   This point has been debated 
endlessly for Solaris and other OS's as well.


Here's the man page:
 %iowait
Show  the  percentage  of  time that the CPU or 
CPUs were
idle during which the system had an outstanding 
disk  I/O

request.

If the system had some other cpu bound work to perform you wouldn't ever 
see any iowait time.  Anyone claiming the cpu was 100% busy on the 
sequential scan using the one set of numbers I posted is 
misunderstanding the actual metrics.



Thanks for your data, it exemplifies many of the points brought up:
- Lots of disks and expensive I/O hardware does not help improve performance
on large table queries because I/O bandwidth does not scale beyond
110-120MB/s on the fastest CPUs
  
I don't think that is the conclusion from anecdotal numbers I posted.   
This file subsystem doesn't perform as well as expected for any tool. 
Bonnie, dd, star, etc., don't get a better data rate either.   In fact, 
the storage system wasn't built for performance; it was build to 
reliably hold a big chunk of data.   Even so,  postgresql is reading at 
130MB/s on it, using about 30% of a single cpu, almost all of which was 
system time.   I would get the same 130MB/s on a system with cpus that 
were substantially slower; the limitation isn't the cpus, or 
postgresql.  It's the IO system that is poorly configured for this test, 
not postgresqls ability to use it.


In fact, given the numbers I posted, it's clear this system could 
handily generate more than 120 MB/s using a single cpu given a better IO 
subsystem;  it has cpu time to spare.   A simple test can be done:   
build the database in /dev/shm and time the scans.  It's the same read() 
system call being used and now one has made the IO system "infinitely 
fast".  The claim is being made that standard postgresql is unable to 
generate more than 120MB/s of IO on any IO system due to an inefficient 
use of the kernel API and excessive memory copies, etc.  Having the 
database be on a ram based file system is an example of "expensive IO 
hardware" and all else would be the same.   Hmmm, now that I think about 
this,  I could throw a medium sized table onto /dev/shm using 
tablespaces on one of our 8GB linux boxes.So why is this experiment 
not valid, or what is it about the above assertion that I am missing?



Anyway, if one cares about high speed sequential IO, then one should use 
a much larger block size to start.   Using 8KB IOs is inappropriate for 
such a configuration.  We happen to be using 32KB blocks on our largest 
database and it's been the best move for us.


-- Alan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-19 Thread Alan Stange
Another data point. 

We had some down time on our system today to complete some maintenance 
work.  It took the opportunity to rebuild the 700GB file system using 
XFS instead of Reiser.


One iostat output for 30 seconds is

avg-cpu:  %user   %nice%sys %iowait   %idle
  1.580.00   19.69   31.94   46.78

Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdd 343.73175035.73   277.555251072   8326

while doing a select count(1) on the same large table as before.   
Subsequent iostat output all showed that this data rate was being 
maintained.  The system is otherwise mostly idle during this measurement.


The sequential read rate is 175MB/s.  The system is the same as earlier, 
one cpu is idle and the second is ~40% busy doing the scan and ~60% 
idle.   This is  postgresql 8.1rc1, 32KB block size.  No tuning except 
for using a 1024KB read ahead.


The peak speed of the attached storage is 200MB/s (a 2Gb/s fiber channel 
controller).  I see no reason why this configuration wouldn't generate 
higher IO rates if a faster IO connection were available.


Can you explain again why you think there's an IO ceiling of 120MB/s 
because I really don't understand?


-- Alan



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly