[PERFORM] How to determine cause of performance problem?

2005-09-23 Thread Joost Kraaijeveld
Hi,

I must convert an old table into a new table. The conversion goes at ~
100 records per second. Given the fact that I must convert 40 million
records, it takes too much time: more hours than the 48 hour weekend I
have for the conversion;-).

The tables are rather simple: both tables only have a primary key
constraint (of type text) and no other indexes. I only copy 3 columns. I
use Java for the conversion. For the exact code see below.

During the conversion my processor load is almost non existant. The
harddisk throughput is ~ 6 megabyte/second max (measured with iostat).

My platform is Debian Sarge AMD64. My hardware is a Tyan Thunder K8W
2885 motherboard, 2 Opteron 248 processors, 2 GB RAM, a SATA bootdisk
with / and swap, and a 3Ware 9500S-8 RAID-5 controller with 5 attached
SATA disks with /home and /var. /var contains *all* PostgreSQL log and
database files (default Debian installation).

Output of hdparm -Tt /dev/sdb (sdb is the RAID opartition)

/dev/sdb:
 Timing cached reads:   1696 MB in  2.00 seconds = 846.86 MB/sec
 Timing buffered disk reads:  246 MB in  3.01 seconds =  81.79 MB/sec


I want to determine the cause of my performance problem (if it is one).

1. Is this a performance I can expect?
2. If not, how can I determine the cause?
3. Can I anyhow improve the performance without replacing my hardware,
e.g. by tweaking the software?
4. Is there a Linux (Debian) tool that I can use to benchmark write
performance?



-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 


The Java code I use for the conversion  :

 
ResultSet resultSet = selectStatement.executeQuery(
"select ordernummer, orderdatum, klantnummer from odbc.orders order by
ordernummer");

connection.setAutoCommit(false);

PreparedStatement ordersInsertStatement = 
connection.prepareStatement("insert into prototype.orders
(objectid,ordernumber,orderdate,customernumber) values (?,?,?,?)"); 


while( resultSet.next() )
{

if( (++record % 100) == 0){
System.err.println( "handling record: " + record);
}

// the next line can do > 1.000.000 objectId/sec
String orderObjectId = ObjectIdGenerator.newObjectId();
ordersInsertStatement.setString(1,orderObjectId);
ordersInsertStatement.setInt(2,resultSet.getInt("ordernummer")); 
ordersInsertStatement.setDate(3,resultSet.getDate("orderdatum")); 
ordersInsertStatement.setInt(4,resultSet.getInt("klantnummer")); 

ordersInsertStatement.execute();

}   

connection.commit();


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to determine cause of performance problem?

2005-09-23 Thread Joost Kraaijeveld
On Fri, 2005-09-23 at 05:55 -0400, Michael Stone wrote:
> It's not clear what your object id generator does. If it's just a
> sequence, it's not clear that you need this program at all--just use a
> SELECT INTO and make the object id a SERIAL.
It generates a GUID (and no, I do not want to turn this in a discussion
about GUIDs). As in the Java code comment: it is not the generation of
the GUID that is the problem (that is, I can generate millions of them
per second.)

> If you do need to control the object id or do some other processing
> before putting the data into the new table, rewrite to use a COPY
> instead of an INSERT.
It is actually the shortest piece of code that gives me a poor
performance. The conversion problem is much, much larger and much much
more complicated. 

I suspect that either my hardware is to slow (but then again, see the
specs), or my Debian is to slow, or my PostgreSQL settings are wrong.

But I have no clue where to begin with determining the bottleneck (it
even may be a normal performance for all I know: I have no experience
with converting such (large) database).

Any suggestions?


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to determine cause of performance problem?

2005-09-23 Thread Joost Kraaijeveld
On Fri, 2005-09-23 at 13:19 +0200, Arjen van der Meijden wrote:
> Another suggestion:
> How many indexes and constraints are on the new table?
As mentioned in the first mail: in this tables only primary key
constraints, no other indexes or constraints.

> Drop all of them and recreate them once the table is filled. Of course 
> that only works if you know your data will be ok (which is normal for 
> imports of already conforming data like database dumps of existing tables).
> This will give major performance improvements, if you have indexes and 
> such on the new table.
I will test this a for perfomance improvement, but still, I wonder if ~
100 inserts/second is a reasonable performance for my software/hardware
combination.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] How to determine cause of performance problem?

2005-09-23 Thread Joost Kraaijeveld
On Fri, 2005-09-23 at 07:05 -0400, Michael Stone wrote: 
> On Fri, Sep 23, 2005 at 12:21:15PM +0200, Joost Kraaijeveld wrote:
> >On Fri, 2005-09-23 at 05:55 -0400, Michael Stone wrote:
> I didn't say it was, did I? 
No, you did not. But only last week someon'es head was (luckely for him
only virtually) almost chopped off for suggesting the usage of GUIDs ;-)


> Ok, that's great, but you didn't respond to the suggestion of using COPY
> INTO instead of INSERT.
Part of the code I left out are some data conversions (e.g. from
path-to-file to blob, from text to date (not castable because of the
homebrew original format)). I don't believe that I can do these in a SQL
statement, can I (my knowledge of SQL as a langage is not that good)? .
However I will investigate if I can do the conversion in two steps and
check if it is faster.

But still, I wonder if ~100 inserts/second is a reasonable performance
for my software/hardware combination.


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to determine cause of performance problem?

2005-09-23 Thread Joost Kraaijeveld
On Fri, 2005-09-23 at 10:33 -0400, Tom Lane wrote:
> Is the client code running on the same machine as the database server?
> If not, what's the network delay and latency between them?
Yes, it is running on the same machine.


> The major problem you're going to have here is at least one network
> round trip per row inserted --- possibly more, if the jdbc driver is
> doing "helpful" stuff behind your back like starting/committing
> transactions.
OK, I will look into that.

But do you maybe know a pointer to info, or tools that can measure, what
my machine is doing during all the time it is doing nothing? Something
like the performance monitor in Windows but than for Linux?

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] pgbench results interpretation?

2005-11-01 Thread Joost Kraaijeveld
Hi,

I am trying to optimize my Debian Sarge AMD64 PostgreSQL 8.0
installation, based on the recommendations from "the Annotated
POSTGRESQL.CONF Guide for
PostgreSQL" (http://www.powerpostgresql.com/Downloads/annotated_conf_80.html). 
To see the result of the recommendations I use pgbench from  
postgresql-contrib. 

I have 3 questions about pgbench:

1. Is there a repository somewhere that shows results, using and
documenting different kinds of hard- and software setups so that I can
compare my results with someone elses?

2. Is there a reason for the difference in values from run-to-run of
pgbench:

The command I used (nothing else is done on the machine, not even mouse
movement):
[EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 1000 test

Results for 4 consecutive runs:

tps = 272.932982 (including connections establishing)
tps = 273.262622 (excluding connections establishing)

tps = 199.501426 (including connections establishing)
tps = 199.674937 (excluding connections establishing)

tps = 400.462117 (including connections establishing)
tps = 401.218291 (excluding connections establishing)

tps = 223.695331 (including connections establishing)
tps = 223.919031 (excluding connections establishing)

3. It appears that running more transactions with the same amount of
clients leads to a drop in the transactions per second. I do not
understand why this is (a drop from more clients I do understand). Is
this because of the way pgbench works, the way PostgrSQL works or even
Linux?

[EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 10 test
tps = 379.218809 (including connections establishing)
tps = 461.968448 (excluding connections establishing)

[EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 100 test
tps = 533.878031 (including connections establishing)
tps = 546.571141 (excluding connections establishing)

[EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 1000 test
tps = 204.30 (including connections establishing)
tps = 204.533627 (excluding connections establishing)

[EMAIL PROTECTED]:/usr/lib/postgresql/8.0/bin$ ./pgbench -c 10 -t 1 test
tps = 121.486803 (including connections establishing)
tps = 121.493681 (excluding connections establishing)


TIA

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pgbench results interpretation?

2005-11-01 Thread Joost Kraaijeveld
Hi Gavin,

Thanks for answering.

On Tue, 2005-11-01 at 20:16 +1100, Gavin Sherry wrote:
> On Tue, 1 Nov 2005, Joost Kraaijeveld wrote:
> > 1. Is there a repository somewhere that shows results, using and
> > documenting different kinds of hard- and software setups so that I can
> > compare my results with someone elses?
> 
> Other than the archives of this mailing list, no.
OK.

> >
> > 2. Is there a reason for the difference in values from run-to-run of
> > pgbench:
> Well, firstly: pgbench is not a good benchmarking tool. 
Is there a reason why that is the case? I would like to understand why?
Is it because the transaction is to small/large? Or that the queries are
to small/large? Or just experience?

> It is mostly used
> to generate load. Secondly, the numbers are suspicious: do you have fsync
> turned off? 
In the first trials I posted yes, in the second no.

> Do you have write caching enabled? If so, you'd want to make
> sure that cache is battery backed. 
I am aware of that, but for now, I am mostly interested in the effects
of the configuration parameters. I won't do this at home ;-)


> Thirdly, the effects of caching will be
> seen on subsequent runs.
In that case I would expect mostly rising values. I only copied and
pasted 4 trials that were available in my xterm at the time of writing
my email, but I could expand the list ad infinitum: the variance between
the runs is very large. I also expect that if there is no shortage of
memory wrt caching that the effect would be negligible, but I may be
wrong. Part of using pgbench is learning about performance, not
achieving it.

> > 3. It appears that running more transactions with the same amount of
> > clients leads to a drop in the transactions per second. I do not
> > understand why this is (a drop from more clients I do understand). Is
> > this because of the way pgbench works, the way PostgrSQL works or even
> > Linux?
> This degradation seems to suggest effects caused by the disk cache filling
> up (assuming write caching is enabled) and checkpointing.
Which diskcache are your referring to? The onboard harddisk or RAID5
controller caches or the OS cache? The first two I can unstand but if
you refer to the OS cache I do not understand what I am seeing. I have
enough memory giving the size of the database: during these duration (~)
tests fsync was on, and the files could be loaded into memory easily
(effective_cache_size = 32768 which is ~ 265 MB, the complete database
directory 228 MB)

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pgbench results interpretation?

2005-11-02 Thread Joost Kraaijeveld
On Wed, 2005-11-02 at 21:16 +1100, Gavin Sherry wrote:
> connections are updating the branches table heavily. As an aside, did you
> initialise with a scaling factor of 10 to match your level of concurrency?
Yep, I did.


> that. The hackers list archive also contains links to the testing Mark
> Wong has been doing at OSDL with TPC-C and TPC-H. Taking a look at the
> configuration file he is using, along with the annotated postgresql.conf,
> would be useful, depending on the load you're antipating and your
> hardware.
I will look into that project.

> Well, two things may be at play. 1) if you are using write caching on your
> controller/disks then at the point at which that cache fills up
> performance will degrade to roughly that you can expect if write through
> cache was being used. Secondly, we checkpoint the system periodically to
> ensure that recovery wont be too long a job. Running for pgbench for a few
> seconds, you will not see the effect of checkpointing, which usually runs
> once every 5 minutes.
I still think it is strange. Simple tests with tar suggest that I could
easily do 600-700 tps at 50.000 KB/second ( as measured by iostat), a
test with bonnie++ measured throughputs > 40.000 KB/sec during very long
times, with 1723 - 2121 operations per second. These numbers suggest
that PostgreSQL is not using all it could from the hardware. Processor
load however is negligible during the pgbench tests.

As written before, I will look into the OSDL benchmarks. Maybe they are
more suited for my needs: *understanding* performance determinators.

> 
> Hope this helps.
You certainly did, thanks.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware Raid5 / Debian??

2005-11-06 Thread Joost Kraaijeveld
Hi,

I am experiencing very long update queries and I want to know if it
reasonable to expect them to perform better. 

The query below is running for more than 1.5 hours (5500 seconds) now,
while the rest of the system does nothing (I don't even type or move a
mouse...).

- Is that to be expected? 
- Is 180-200 tps with ~ 9000 KB (see output iostat below) not low, given
the fact that fsync is off?  (Note: with bonnie++ I get write
performance > 50 MB/sec and read performace > 70 MB/sec with > 2000
read/write ops /sec?
- Does anyone else have any experience with the 3Ware RAID controller
(which is my suspect)?
- Any good idea how to determine the real botleneck if this is not the
performance I can expect?

My hard- and software:

- PostgreSQL 8.0.3
- Debian 3.1 (Sarge) AMD64   
- Dual Opteron 
- 4GB RAM
- 3ware Raid5 with 5 disks

Pieces of my postgresql.conf (All other is default):
shared_buffers = 7500
work_mem = 260096
fsync=false
effective_cache_size = 32768



The query with explain (amount and orderbedrag_valuta are float8,
ordernummer and ordernumber int4):

explain update prototype.orders set amount =
odbc.orders.orderbedrag_valuta from odbc.orders where ordernumber =
odbc.orders.ordernummer;
 QUERY PLAN
-
Hash Join  (cost=50994.74..230038.17 rows=1104379 width=466)
   Hash Cond: ("outer".ordernumber = "inner".ordernummer)
   ->  Seq Scan on orders  (cost=0.00..105360.68 rows=3991868 width=455)
   ->  Hash  (cost=48233.79..48233.79 rows=1104379 width=15)
 ->  Seq Scan on orders  (cost=0.00..48233.79 rows=1104379
width=15)


Sample output from iostat during query (about avarage):
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
hdc   0.00 0.00 0.00  0  0
sda   0.00 0.00 0.00  0  0
sdb 187.1323.76  8764.36     24   8852


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-06 Thread Joost Kraaijeveld
On Sun, 2005-11-06 at 12:17 -0500, Tom Lane wrote:
> Does that table have any triggers that would fire on the update?
Alas, no trigger, constrainst, foreign keys, indixes (have I forgotten
something?)

All queries are slow. E.g (after vacuum):

select objectid from prototype.orders

Explain analyse (with PgAdmin):

Seq Scan on orders  (cost=0.00..58211.79 rows=1104379 width=40) (actual
time=441.971..3252.698 rows=1104379 loops=1)
Total runtime: 5049.467 ms

Actual execution time: 82163 MS (without getting the data)

 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-06 Thread Joost Kraaijeveld
Hi Tom,

On Sun, 2005-11-06 at 15:26 -0500, Tom Lane wrote:
> I'm confused --- where's the 82sec figure coming from, exactly?
>From actually executing the query.

>From PgAdmin:

-- Executing query:
select objectid from prototype.orders

Total query runtime: 78918 ms.
Data retrieval runtime: 188822 ms.
1104379 rows retrieved.


> We've heard reports of performance issues in PgAdmin with large
> result sets ... if you do the same query in psql, what happens?
[EMAIL PROTECTED]:~/postgresql$ time psql muntdev -c "select objectid from
prototype.orders" > output.txt

real0m5.554s
user0m1.121s
sys 0m0.470s


Now *I* am confused. What does PgAdmin do more than giving the query to
the database?

(BTW: I have repeated both measurements and the numbers above were all
from the last measurement I did and are about average)

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-06 Thread Joost Kraaijeveld
On Mon, 2005-11-07 at 12:37 +0800, Christopher Kings-Lynne wrote:
> > Now *I* am confused. What does PgAdmin do more than giving the query to
> > the database?
> 
> It builds it into the data grid GUI object.

Is that not the difference between the total query runtime and the data
retrieval runtime (see below)?

-- Executing query:
select objectid from prototype.orders

Total query runtime: 78918 ms.
Data retrieval runtime: 188822 ms.
1104379 rows retrieved.
-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-06 Thread Joost Kraaijeveld
Hi Christopher,

On Mon, 2005-11-07 at 12:37 +0800, Christopher Kings-Lynne wrote:
> > Now *I* am confused. What does PgAdmin do more than giving the query to
> > the database?
> 
> It builds it into the data grid GUI object.
But my initial question was about a query that does not produce data at
all (well, a response from the server saying it is finished). I broke
that query off after several hours.

I am now running the query from my initial question with psql (now for
>1 hour, in a transaction, fsyn off).

Some statistics :

uptime:
06:35:55 up  9:47,  6 users,  load average: 7.08, 7.21, 6.08

iostat -x -k 1 (this output appears to be representative):

avg-cpu:  %user   %nice%sys %iowait   %idle
   1.000.000.50   98.510.00

Device: sda sdb

rrqm/s  0.000.00
wrqm/s  14.00   611.00
r/s 0.001.00
w/s 3.00201.00
rsec/s  0.0032.00
wsec/s  136.00  6680.00
rkB/s   0.0016.00
wkB/s   68.00   3340.00 
avgrq-sz45.33   33.23
avgqu-sz0.00145.67
await   0.67767.19
svctm   0.674.97
%util   0.20100.30


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-07 Thread Joost Kraaijeveld
Hi Dave,

On Mon, 2005-11-07 at 08:51 +, Dave Page wrote: 
> > On Sun, 2005-11-06 at 15:26 -0500, Tom Lane wrote:
> > > I'm confused --- where's the 82sec figure coming from, exactly?
> > >From actually executing the query.
> > 
> > >From PgAdmin:
> > 
> > -- Executing query:
> > select objectid from prototype.orders
> > 
> > Total query runtime: 78918 ms.
> > Data retrieval runtime: 188822 ms.
> > 1104379 rows retrieved.
> > 
> > 
> > > We've heard reports of performance issues in PgAdmin with large
> > > result sets ... if you do the same query in psql, what happens?
> > [EMAIL PROTECTED]:~/postgresql$ time psql muntdev -c "select objectid from
> > prototype.orders" > output.txt
> > 
> > real0m5.554s
> > user0m1.121s
> > sys 0m0.470s
> > 
> > 
> > Now *I* am confused. What does PgAdmin do more than giving 
> > the query to
> > the database?
> 
> Nothing - it just uses libpq's pqexec function. The speed issue in
> pgAdmin is rendering the results in the grid which can be slow on some
> OS's due to inefficiencies in some grid controls with large data sets.
> That's why we give 2 times - the first is the query runtime on the
> server, the second is data retrieval and rendering (iirc, it's been a
> while).
That is what I thought, but what could explain the difference in query
runtime (78 seconds versus 5 seconds) ?

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-15 Thread Joost Kraaijeveld
Hi Dave,

On Mon, 2005-11-14 at 18:51 -0500, Dave Cramer wrote:
> Joost,
> 
> I've got experience with these controllers and which version do you  
> have. I'd expect to see higher than 50MB/s although I've never tried  
> RAID 5
> 
> I routinely see closer to 100MB/s with RAID 1+0 on their 9000 series
OK, than there must be hope.

> I would also suggest that shared buffers should be higher than 7500,  
> closer to 3, and effective cache should be up around 200k
In my current 8.1 situation I use shared_buffers = 4, 
effective_cache_size = 131072 .

> work_mem is awfully high, remember that this will be given to each  
> and every connection and can be more than 1x this number per  
> connection depending on the number of sorts
> done in the query.
I use such a high number because I am the only user querying and my
queries do sorted joins etc. 


> fsync=false ? I'm not even sure why we have this option, but I'd  
> never set it to false.
I want as much speed as possible for a database conversion that MUST be
handled in 1 weekend (it lasts now, with the current speed almost 7
centuries. I may be off a millenium). If it fails because of hardware
problem (the only reason we want and need fsync?) we will try next
weekend until it finally goes right. 

What I can see is that only the *write* performance of *long updates*
(and not inserts) are slow and they get slower in time: the first few
thousand go relatively fast, after that PostgreSQL crawls to a halt
(other "benchmarks" like bonnie++ or just dd'ing a big file don't have
this behavior).

I did notice that changing the I/O scheduler's nr_request from the
default 128 to 1024 or even 4096 made a remarkable performance
improvement. I suspect that experimenting with other I/O schedululers
could improve performance. But it is hard to find any useful
documentation about I/O schedulers.



-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-15 Thread Joost Kraaijeveld
Hi Luke,

On Tue, 2005-11-15 at 10:42 -0800, Luke Lonergan wrote:
> With RAID5, it could matter a lot what block size you run your “dd
> bigfile” test with.  You should run “dd if=/dev/zero of=bigfile bs=8k
> count=50” for a 2GB main memory machine, multiply the count by
> (/2GB).
If I understand correctly (I have 4GB ram):

[EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100
100+0 records in
100+0 records out
819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec)

Which looks suspicious: 26308 MB/sec???

> It is very important with the 3Ware cards to match the driver to the
> firmware revision.
OK, I am running 1 driver behind the firmware.
   
> I did notice that changing the I/O scheduler's nr_request from
> the
> default 128 to 1024 or even 4096 made a remarkable performance
> improvement. I suspect that experimenting with other I/O
> schedululers
> could improve performance. But it is hard to find any useful
> documentation about I/O schedulers.
> 
> You could try deadline, there’s no harm, but I’ve found that when you
> reach the point of experimenting with schedulers, you are probably not
> addressing the real problem.
It depends. I/O Schedulers (I assume) have a purpose: some schedulers
should be more appropriate for some circumstances. And maybe my specific
circumstances (converting a database with *many updates*) is a specific
circumstance. I really don't know

> On a 3Ware 9500 with HW RAID5 and 4 or more disks I think you should
> get 100MB/s write rate, which is double what Postgres can use.  We
> find that Postgres, even with fsync=false, will only run at a net COPY
> speed of about 8-12 MB/s, where 12 is the Bizgres number.  8.1 might
> do 10.  But to get the 10 or 12, the WAL writing and other writing is
> about 4-5X more than the net write speed, or the speed at which the
> input file is parsed and read into the database.
As I have an (almost) seperate WAL disk: iostat does not show any
significant writing on the WAL disk

> So, if you can get your “dd bigfile” test to write data at 50MB/s+
> with a blocksize of 8KB, you should be doing well enough.
See above.

> Incidentally, we also find that using the XFS filesystem and setting
> the readahead to 8MB or more is extremely beneficial for performance
> with the 3Ware cards (and with others, but especially for the older
> 3Ware cards).
I don't have problems with my read performance but *only* with my
*update* performance (and not even insert performance). But than again I
am not the only one with these problems:

http://www.issociate.de/board/goto/894541/3ware_+_RAID5_
+_xfs_performance.html#msg_894541
http://lkml.org/lkml/2005/4/20/110
http://seclists.org/lists/linux-kernel/2005/Oct/1171.html

I am happy to share the tables against which I am running my checks

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-15 Thread Joost Kraaijeveld
On Tue, 2005-11-15 at 12:41 -0700, Steve Wampler wrote:
> Joost Kraaijeveld wrote:
> > If I understand correctly (I have 4GB ram):
> > 
> > [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100
> > 100+0 records in
> > 100+0 records out
> > 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec)
> > 
> > Which looks suspicious: 26308 MB/sec???
> 
> Eh?  That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes.
Oooops. This calculation error is not typical for my testing (I think ;-)).

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-15 Thread Joost Kraaijeveld
Hi Luke,

On Tue, 2005-11-15 at 22:07 -0800, Luke Lonergan wrote:

> You might update your driver, 
I will do that (but I admit that I am not looking forward to it. When I
was young and did not make money with my computer, I liked challenges
like compiling kernels and not being able to boot the computer. Not any
more :-)).


> 
> WAL on a separate disk, on a separate controller?  What is the write
> performance there?
WAL is on a separate disk and a separate controller, write performance:

[EMAIL PROTECTED]:/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100
100+0 records in
100+0 records out
819200 bytes transferred in 166.499230 seconds (49201429 bytes/sec)

The quest continues...


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-16 Thread Joost Kraaijeveld
Hi Luke,


> It is very important with the 3Ware cards to match the driver to the
> firmware revision.
> So, if you can get your “dd bigfile” test to write data at 50MB/s+
> with a blocksize of 8KB, you should be doing well enough.

I recompiled my kernel, added the driver and:

[EMAIL PROTECTED]:~$ dmesg | grep 3w
3ware 9000 Storage Controller device driver for Linux v2.26.03.019fw.
scsi4 : 3ware 9000 Storage Controller
3w-9xxx: scsi4: Found a 3ware 9000 Storage Controller at 0xfd8ffc00,
IRQ: 28.
3w-9xxx: scsi4: Firmware FE9X 2.08.00.005, BIOS BE9X 2.03.01.052, Ports:
8.


[EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100
100+0 records in
100+0 records out
819200 bytes transferred in 200.982055 seconds (40759858 bytes/sec)

Which is an remarkable increase in speed (38.9 MB/sec vs 25.7 MB/sec).

Thanks for your suggestions.


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Can this query go faster???

2005-12-06 Thread Joost Kraaijeveld
Hi,

Is it possible to get this query run faster than it does now, by adding
indexes, changing the query?

SELECT customers.objectid FROM prototype.customers, prototype.addresses
WHERE
customers.contactaddress = addresses.objectid
ORDER BY zipCode asc, housenumber asc
LIMIT 1 OFFSET 283745

Explain:

Limit  (cost=90956.71..90956.71 rows=1 width=55)
  ->  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
Sort Key: addresses.zipcode, addresses.housenumber
->  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
  Hash Cond: ("outer".contactaddress = "inner".objectid)
  ->  Seq Scan on customers  (cost=0.00..31392.15
rows=368915 width=80)
  ->  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
->  Seq Scan on addresses  (cost=0.00..13675.15
rows=369315 width=55)

The customers table has an index on contactaddress and objectid.
The addresses table has an index on zipcode+housenumber and objectid.

TIA

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Joost Kraaijeveld
On Tue, 2005-12-06 at 10:52 +0100, Csaba Nagy wrote:
> Joost,
> 
> Why do you use an offset here ? I guess you're traversing the table
> somehow, in this case it would be better to remember the last zipcode +
> housenumber and put an additional condition to get the next bigger than
> the last one you've got... that would go for the index on
> zipcode+housenumber and be very fast. The big offset forces postgres to
> traverse that many entries until it's able to pick the one row for the
I am forced to translate a sorting dependent record number to a record
in the database. The GUI (a Java JTable) works with record /row numbers,
which is handy if one has an ISAM database, but not if one uses
PostgreSQL.

I wonder if using a forward scrolling cursor would be faster.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Joost Kraaijeveld
Hi Tino,

On Tue, 2005-12-06 at 11:32 +0100, Tino Wildenhain wrote:
> You can have a row number in postgres easily too. For example if you
> just include a serial for the row number.
Not if the order of things is determined runtime and not at insert time...

> Cursor would work too but you would need to have a persistent connection.
I just tried it: a cursor is not faster (what does not surprise me at
all, as the amount of work looks the same to me)

I guess there is no solution.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Joost Kraaijeveld
On Tue, 2005-12-06 at 12:36 +0100, Tino Wildenhain wrote:
> > 
> > I just tried it: a cursor is not faster (what does not surprise me at
> > all, as the amount of work looks the same to me)
> 
> Actually no, if you scroll forward, you just ask the database for the
> next rows to materialize. So if you are ahead in your database and
> ask for next rows, it should be faster then working w/ an offset
> from start each time.
Ah, a misunderstanding: I only need to calculate an index if the user
wants a record that is not in or adjacent to the cache (in which case I
can do a "select values > last value in the cache". So  I must always
materialize all rows below the wanted index.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] When to vacuum a table?

2006-11-26 Thread Joost Kraaijeveld
Hi,

Are there guidelines (or any empirical data) available how to determine
how often a table should be vacuumed for optimum performance or is this
an experience / trial-and-error thing?

TIA 

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Which query analiser tools are available?

2006-12-03 Thread Joost Kraaijeveld
Hi,

I am struggling with the performance of a JBoss based J2EE application
with CMP 2.1. beans and using PostgreSQL as database back-end.

Because JBoss is largely responsible for the SQL queries that are send
to the back-end , I would like to see the queries that are actually
received by PostgreSQL (insert, select, update and delete), together
with the number of times they are called, the average execution time,
total execution time etc.

I have tried PQA (http://pgfoundry.org/projects/pqa/) but that does not
seem to work with PostgreSQL 8.1.5 on Debian Etch: I get output saying
"Continuation for no previous query" and no statistics. As I don't know
anything about Ruby, I am lost here.

Can I "repair" PQA somehow (without resorting to a crash course "Ruby
for *extreme* Ruby dummies") or are there any other, preferably "Open
Source" (or extremely cheap ;-)) and multi-platform (Linux and Windows
2000 +), tools that can gather the statistics that I want? 

One last question: can I get the select queries in PostgreSQL *without*
all the internal PostgreSQL selects that appear in the log files if I
set log_statement to "ddl" or "all" or should I try to catch these by a
judiciously chosen log_min_duration_statement ?

TIA 

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] How to move pg_xlog to another drive on Windows????

2006-12-04 Thread Joost Kraaijeveld
How can I move pg_xlog to another drive on Windows? In Linux I can use a
symlink, but how do I that on windows?

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Locking in PostgreSQL?

2006-12-05 Thread Joost Kraaijeveld
Does PostgreSQL lock the entire row in a table if I update only 1
column?


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] is file size relevant in choosing index or table scan?

2007-05-22 Thread Joost Kraaijeveld
Hi,

I have a table with a file size of 400 MB with an index of 100 MB. Does 
PostgreSQL take the file sizes of both the table and the index into account 
when determing if it should do a table or an index scan? 

TIA

Joost

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Vacuum takes forever

2007-05-29 Thread Joost Kraaijeveld
Hi 

I am currently running a vacuum analyse through PgAdmin on a PostgreSQL
8.1.9 database that takes forever without doing anything: no
(noticeable) disk activity or (noticeable) CPU activity. 

The mesage tab in PgAdmin says:

...
Detail: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 568.16 sec


and lots of entries looking just like this ( 0 % CPU, > 500 secs).

There are no other connections to the database and the machine does not
do anything else than me typing this e-mail and playing Metallica MP3's.

Could this be because of my Cost-Based Vacuum Delay settings ?

vacuum_cost_delay = 200
vacuum_cost_page_hit = 6
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
vacuum_cost_limit = 100



-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Vacuum takes forever

2007-05-29 Thread Joost Kraaijeveld
On Tue, 2007-05-29 at 19:16 +0200, PFC wrote:
> 
> > Could this be because of my Cost-Based Vacuum Delay settings ?
> 
>   Yeah. It is supposed to slow down VACUUM so it doesn't kill your 
> server,  
> but it is not aware of the load. It will also slow it down if there is no  
> load. That is its purpose after all ;)
>   If you want fast vacuum, issue SET  vacuum_cost_delay  = 0; before.
Thanks, I tried it and it worked. I did not know that changing this
setting would result in such a performance drop ( I just followed an
advise I read on http://www.powerpostgresql.com/PerfList/) which
mentioned a tripling of the the execution time. Not a change from
8201819 ms  to 17729 ms.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Vacuum takes forever

2007-05-29 Thread Joost Kraaijeveld
On Tue, 2007-05-29 at 21:43 +0100, Dave Page wrote:
> Cliff, Jason or Rob era? Could be important...
Cliff and Jason.

Rob is in my Ozzy collection ;-)
-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 3WARE Card performance boost?

2006-01-18 Thread Joost Kraaijeveld
On Wed, 2006-01-18 at 10:26 -0800, Benjamin Arai wrote:
> My original plan was to buy a 3WARE card and put a 1GB of memory on it
> to improve writes but I am not sure if that is actually going to help
> the issue if fsync=off.
My experience with a 3Ware 9500S-8 card are rather disappointing,
especially the write performance of the card, which is extremely poor.
Reading is OK.


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] How to query and index for customer with lastname and city

2006-03-04 Thread Joost Kraaijeveld




Hi,

I have two tables:

Customer: objectid, lastname, fk_address
Address: objectid, city

I want to select all customers with a name >= some_name and living in a city >= some_city, all comparisons case insensitive

Below is what I actually have. Given the fact that it takes forever to get a result (> 6 seconds) , there must be something wrong with my solution or my expectation. Can anyone tell what I should do to make this query go faster ( or convince me to wait for the result ;-()?


SELECT customers.objectid FROM prototype.customers,prototype.addresses 
WHERE
customers.contactAddress = addresses.objectId 
AND 
( 
 TRIM(UPPER(lastName)) >= TRIM(UPPER('some_name'))
 AND 
 TRIM(UPPER(city)) >= TRIM(UPPER('some_city'))
)  
order by TRIM(UPPER(lastname)), TRIM(UPPER(city)) 

Explain analyze after a full alayse vacuum:

Sort  (cost=54710.68..54954.39 rows=97484 width=111) (actual time=7398.971..7680.405 rows=96041 loops=1)
  Sort Key: btrim(upper(customers.lastname)), btrim(upper(addresses.city))
  ->  Hash Join  (cost=14341.12..46632.73 rows=97484 width=111) (actual time=1068.862..5472.788 rows=96041 loops=1)
    Hash Cond: ("outer".contactaddress = "inner".objectid)
    ->  Seq Scan on customers  (cost=0.00..24094.01 rows=227197 width=116) (actual time=0.018..1902.646 rows=223990 loops=1)
  Filter: (btrim(upper(lastname)) >= 'JANSEN'::text)
    ->  Hash  (cost=13944.94..13944.94 rows=158473 width=75) (actual time=1068.467..1068.467 rows=158003 loops=1)
  ->  Bitmap Heap Scan on addresses  (cost=1189.66..13944.94 rows=158473 width=75) (actual time=71.259..530.986 rows=158003 loops=1)
    Recheck Cond: (btrim(upper(city)) >= 'NIJMEGEN'::text)
    ->  Bitmap Index Scan on prototype_addresses_trim_upper_city  (cost=0.00..1189.66 rows=158473 width=0) (actual time=68.290..68.290 rows=158003 loops=1)
  Index Cond: (btrim(upper(city)) >= 'NIJMEGEN'::text)
Total runtime: 7941.095 ms


I have indices on :
fki_customers_addresses
customer.lastname (both lastname and trim(uppercase(lastname)) 
addresses.city (both city and trim(uppercase(city)) 

I 






-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl









Re: [PERFORM] How to query and index for customer with lastname

2006-03-04 Thread Joost Kraaijeveld
Hi Hubert,

On Sat, 2006-03-04 at 14:49 +0100, hubert depesz lubaczewski wrote:
> >  Sort  (cost=54710.68..54954.39 rows=97484 width=111) (actual
> > time=7398.971..7680.405 rows=96041 loops=1)
> >Sort Key: btrim(upper(customers.lastname)), btrim(upper(addresses.city))
> >->  Hash Join  (cost=14341.12..46632.73 rows=97484 width=111) (actual 
> > time=1068.862..5472.788 rows=96041 loops=1)
> >  Hash Cond: ("outer".contactaddress = "inner".objectid)
> >  ->  Seq Scan on customers  (cost=0.00..24094.01 rows=227197 
> > width=116) (actual time=0.018..1902.646 rows=223990 loops=1)
> >Filter: (btrim(upper(lastname)) >= 'JANSEN'::text)
> >  ->  Hash  (cost=13944.94..13944.94 rows=158473 width=75) (actual 
> > time=1068.467..1068.467 rows=158003 loops=1)
> >->  Bitmap Heap Scan on addresses  (cost=1189.66..13944.94 
> > rows=158473 width=75) (actual time=71.259..530.986 rows=158003 loops=1)
> >  Recheck Cond: (btrim(upper(city)) >= 'NIJMEGEN'::text)
> >  ->  Bitmap Index Scan on 
> > prototype_addresses_trim_upper_city  (cost=0.00..1189.66 rows=158473 
> > width=0) (actual time=68.290..68.290 rows=158003 loops=1)
> >Index Cond: (btrim(upper(city)) >=> 
> > 'NIJMEGEN'::text)
> >  Total runtime: 7941.095 ms
> 
> explain clearly shows, that index is used for addresses scan, but it
Yes, but I do not understand why I have both a "Bitmap Index Scan"  and
a "Bitmap Heap Scan" on (btrim(upper(city)) >=> 'NIJMEGEN'::text)?

> is not so for users.
> explain estimates that 227197 customers match the lastname criteria -
> which looks awfuly high.
> how many record do you have in the customers table?
368915 of which 222465 actually meet the condition. 

>From what I understand from the mailing list, PostgreSQL prefers a table
scan whenever it expects that the number of records in the resultset
will be ~ > 10 % of the total number of records in the table. Which
explains the table scan for customers, but than again, it does not
explain why it uses the index on addresses: it has 369337 addresses of
which 158003 meet the condition

> i would try to create index test on customers(contactAddress,
> trim(uppercase(lastname)));
> or with other ordring of fields.
> 
> try this - create the index, make analyze of customers table, and
> recheck explain.
> then try the second index in the same manner.
Makes no difference.


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Can anyone explain this pgbench results?

2006-03-06 Thread Joost Kraaijeveld
Hi,

Below are some results of running pgbench, run on a machine that is doing 
nothing else than running PostgreSQL woth pgbench. The strange thing is that 
the results are *constantly alternating* hight (750-850 transactions)and low 
(50-80 transactions), no matter how many test I run. If I wait a long time (> 5 
minutes) after running the test, I always get a hight score, followed by a low 
one, followed by a high one, low one etc. 

I was expecting a low(ish) score the first run (because the tables are not 
loaded in the cache yet), followed by continues high(ish) scores, but not an 
alternating pattern. I also did not expect so much difference, given the 
hardware I have (Dual Opteron, 4GB memory , 3Ware SATA RAID5 with 5 disks, 
seerate swap and pg_log disks).

Anyone any idea?

Results of pgbench:

[EMAIL PROTECTED]:/usr/lib/postgresql/8.1/bin$ ./pgbench  -c 10 -t 150 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 150
number of transactions actually processed: 1500/1500
tps = 50.651705 (including connections establishing)
tps = 50.736338 (excluding connections establishing)
[EMAIL PROTECTED]:/usr/lib/postgresql/8.1/bin$ ./pgbench  -c 10 -t 150 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 150
number of transactions actually processed: 1500/1500
tps = 816.972995 (including connections establishing)
tps = 836.951755 (excluding connections establishing)
[EMAIL PROTECTED]:/usr/lib/postgresql/8.1/bin$ ./pgbench  -c 10 -t 150 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 150
number of transactions actually processed: 1500/1500
tps = 42.924294 (including connections establishing)
tps = 42.986747 (excluding connections establishing)
[EMAIL PROTECTED]:/usr/lib/postgresql/8.1/bin$ ./pgbench  -c 10 -t 150 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 150
number of transactions actually processed: 1500/1500
tps = 730.651970 (including connections establishing)
tps = 748.538852 (excluding connections establishing)


TIA

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Can anyone explain this pgbench results?

2006-03-06 Thread Joost Kraaijeveld
Hi Michael,

Thanls for your response.

Michael Fuhr wrote:
> On Mon, Mar 06, 2006 at 04:29:49PM +0100, Joost Kraaijeveld wrote:
>> Below are some results of running pgbench, run on a machine that
>> is doing nothing else than running PostgreSQL woth pgbench. The
>> strange thing is that the results are *constantly alternating* hight
>> (750-850 transactions)and low (50-80 transactions), no matter how
>> many test I run. If I wait a long time (> 5 minutes) after running
>> the test, I always get a hight score, followed by a low one, followed
>> by a high one, low one etc.
> 
> The default checkpoint_timeout is 300 seconds (5 minutes).  Is it
> coincidence that the "long time" between fast results is about the
> same? 
I have not measured the "long wait time". But I can run multiple test in 3 
minutes: the fast test lasts 3 sec, the long one 40 secs (see below). During 
the tests there is not much activity on the partition where the logfiles are 
(other controller and disk than the database and swap)

[EMAIL PROTECTED]:/usr/lib/postgresql/8.1/bin$ time ./pgbench  -c 10 -t 150 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 150
number of transactions actually processed: 1500/1500
tps = 531.067258 (including connections establishing)
tps = 541.694790 (excluding connections establishing)

real0m2.892s
user0m0.105s
sys 0m0.145s


[EMAIL PROTECTED]:/usr/lib/postgresql/8.1/bin$ time ./pgbench  -c 10 -t 150 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 150
number of transactions actually processed: 1500/1500
tps = 37.064000 (including connections establishing)
tps = 37.114023 (excluding connections establishing)

real0m40.531s
user0m0.088s
sys 0m0.132s

>What's your setting?  
Default.

> Are your test results more consistent
> if you execute CHECKPOINT between them?
Could you tell me how I could do that?



Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Can anyone explain this pgbench results?

2006-03-07 Thread Joost Kraaijeveld
Hi Michael,

Michael Fuhr wrote:
>>> Have you tweaked postgresql.conf at all?  If so, what non-default
>>> settings are you using?
>> 
>> Yes, I have tweaked the following settings:
>> 
>> shared_buffers = 4
>> work_mem = 512000
>> maintenance_work_mem = 512000
>> max_fsm_pages = 4
>> effective_cache_size = 131072
> 
> Are you sure you need work_mem that high?  How did you decide on
> that value? 
I have used http://www.powerpostgresql.com/Downloads/annotated_conf_80.html , 
expecting that the differences between 8.0 and 8.1 do not invalidate the 
recommendations. I have checked with (some) of my (large) queries and adjusted 
upward untill I had no temp files in the PGDATA/base/DB_OID/pgsql_tmp. (The 
warning about

> Are all other settings at their defaults?  
Yep.

> No changes to the write ahead log (WAL) or background writer (bgwriter) 
> settings?
No, because the forementioned document explicitely states that it has 
recomendations on these subjects.

> What version of PostgreSQL are you running?  The paths in your
> original message suggest 8.1.x.
Debian's Ecth 8.1.0-3

> A checkpoint updates the database files with the data from the
> write-ahead log; you're seeing those writes to the database partition.
> The postmaster does checkpoints every checkpoint_timeout seconds
> (default 300) or every checkpoint_segment log segments (default 3);
> it also uses a background writer to trickle pages to the database
> files between checkpoints so the checkpoints don't have as much
> work to do.  I've been wondering if your pgbench runs are being
> affected by that background activity; the fact that you get
> consistently good performance after forcing a checkpoint suggests
> that that might be the case. 
OK, thanks. 

To be sure if I understand it correctly:

1. Every update/insert is first written to a WAL log file which is in the 
PGDATA/pg_xlog directory. 
2. Routinely the background writer than writes the changes to the 
PGDATA/base/DB_OID/ directory.
2. Postmaster forces after 300 secs or if the log segments are full (which ever 
comes first?) a checkpoint so that the WAL log file are empty ( I assume that 
that are the changes the background writer has not written yet since the last 
forced checkpont?).

> If you run pgbench several times without intervening checkpoints,
> do your postmaster logs have any messages like "checkpoints are
> occurring too frequently"?  It might be useful to increase
> checkpoint_warning up to the value of checkpoint_timeout and then
> see if you get any such messages during pgbench runs.  If checkpoints
> are happening a lot more often than every checkpoint_timeout seconds
> then try increasing checkpoint_segments (assuming you have the disk
> space).  After doing so, restart the database and run pgbench several
> times without intervening checkpoints and see if performance is
> more consistent.
I will try that this day.

> Note that tuning PostgreSQL for pgbench performance might be
> irrelevant for your actual needs unless your usage patterns happen
> to resemble what pgbench does.

The advantage of using pgbench is a repeatable short command that leads to 
something that is showing in actual real world usage.

My problem is with the raw performance of my disk array (3Ware 9500S-8 SATA 
RAID5 controller with 5 disks). I am having *very* serious performance problems 
if I do large updates on my databases. E.g. an update of 1 (boolean) column in 
a table (update prototype.customers set deleted = false) that has 368915 
records last forever (> 3500 secs ). The only noticable disk activity during 
such an update is on the disk/partition that has the PGDATA/base/DB_OID/ 
directory (/dev/sdc, the 3Ware 9800S-8 RAID 5 array). There is *no* noticable 
disk activity on the disk/partition that hase the PGDATA/pg_xlog directory 
(/dev/sdb, on a Sil 3114 on-board SAT controller). The throughtput during the 
update is ~ 2 MB/sec. The thoughtput during a large file copy or running bonnie 
(a benchmark) is > 40 MB/sec. My primary goal is to understand the differences 
( and than sue the guilty ones ;-)), and than maybe either learn to live with 
it or find a solution. The number of write operations/sec during the update is 
~ 2000 /sec. I suspect that the RAID card cannot handle a lot of small write 
operations (with fsync?) in a short time without performance penalty (and yes, 
the write cache on the controller is enabled).


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Can anyone explain this pgbench results?

2006-03-07 Thread Joost Kraaijeveld
Hi Michael,

Michael Fuhr wrote:
> If you run pgbench several times without intervening checkpoints,
> do your postmaster logs have any messages like "checkpoints are
> occurring too frequently"?  It might be useful to increase
> checkpoint_warning up to the value of checkpoint_timeout and then
> see if you get any such messages during pgbench runs.  If checkpoints
> are happening a lot more often than every checkpoint_timeout seconds
> then try increasing checkpoint_segments (assuming you have the disk
> space).  After doing so, restart the database and run pgbench several
> times without intervening checkpoints and see if performance is
> more consistent.
I got the "checkpoints are occurring too frequently". Increasing the number of 
checkpoint_segments from the default 3 to 10 resulted in more tests without 
performance penalty (~ 5-6 tests). The perfomance penalty is also a little 
less. It takes several minutes for the background writer to catch up.

This will solve my problems at the customers site (they do not run sm many 
sales transaction per second), but not my own problem while converting the old 
database to a new databse :-(. Maybe I should invest in other hardware or 
re-arrange my RAID5 in a RAID10 (or 50???).

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Can anyone explain this pgbench results?

2006-03-07 Thread Joost Kraaijeveld

Jim C. Nasby wrote:
 
> Speaking of 'disks', what's your exact layout? Do you have a 5 drive
> raid5 for the OS and the database, 1 drive for swap and 1 drive for
> pg_xlog?

On a Sil SATA 3114 controller:
/dev/sda OS + Swap
/dev/sdb /var with pg_xlog

On the 3Ware 9500S-8, 5 disk array:
/dev/sdc with the database (and very safe, my MP3 collection ;-))

As I wrote in one of my posts to Michael, I suspect that the card is not 
handling the amount of write operations as well as I expected. I wonder if 
anyone else sees the same characteristics with this kind of card.


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Can anyone explain this pgbench results?

2006-03-07 Thread Joost Kraaijeveld
Jim C. Nasby wrote:
> Well, the problem is that you're using RAID5, which has a huge write
> overhead. You're unlikely to get good performance with it.
Apparently. But I had no idea that the performance hit would be that big. 

Running bonnie or copying a large file with dd show that the card can do 30-50 
MB/sec. Running a large update on my postgresql database however, show a 
throughtput of ~ 2MB/sec, doing between ~ 2500 - 2300 writes/second (avarage). 
with an utilisation of almost always 100%, and large await times ( almost 
always > 700), large io-wait percentages (>50%), all measured with iostat.
 
> Also, it sounds like sda and sdb are not mirrored. If that's the case,
> you have no protection from a drive failure taking out your entire
> database, because you'd lose pg_xlog.
> 
> If you want better performance your best bets are to either
> setup RAID10 or if you don't care about the data, just go to RAID0.
Because it is just my development machine I think I will opt for the last 
option. More diskspace left.


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] x206-x225

2006-03-10 Thread Joost Kraaijeveld
On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:
> Your ATA disk is lying about disk caching being turned off. Assuming 
> each insert is in a separate transaction, then it's not going to do 
> 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational 
> speed.
Could you explain the calculation? Why should the number of transactions
be related to the rotational speed of the disk, without saying anything
about the number of bytes per rotation?

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] x206-x225

2006-03-11 Thread Joost Kraaijeveld
On Fri, 2006-03-10 at 23:57 -0800, David Lang wrote:
> On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:
> 
> > On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:
> >> Your ATA disk is lying about disk caching being turned off. Assuming
> >> each insert is in a separate transaction, then it's not going to do
> >> 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
> >> speed.
> > Could you explain the calculation? Why should the number of transactions
> > be related to the rotational speed of the disk, without saying anything
> > about the number of bytes per rotation?
> 
> each transaction requires a sync to the disk, a sync requires a real 
> write (which you then wait for), so you can only do one transaction per 
> rotation.
Not according to a conversation I had with Western Digital about the
write performance of my own SATA disks. What I understand from their
explanation their disk are limited by the MB/sec and not by the number
of writes/second, e.g. I could write 50 MB/sec *in 1 bit/write* on my
disk. This would suggest that the maximum transactions of my disk
(overhead of OS and PostgreSQL ignored) would be 50MB / (transaction
size in MB) per second. Or am I missing something (what would not
surprise me, as I do not understand the perforance of my system at
all ;-))?

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] x206-x225

2006-03-11 Thread Joost Kraaijeveld
On Sat, 2006-03-11 at 12:33 +0100, PFC wrote:
> >> each transaction requires a sync to the disk, a sync requires a real
> >> write (which you then wait for), so you can only do one transaction per
> >> rotation.
> > Not according to a conversation I had with Western Digital about the
> 
> 
> It depends if you consider that "written to the disk" means "data is  
> somewhere between the OS cache and the platter" or "data is writter on the  
> platter and will survive a power loss".
> 
> Postgres wants the second option, of course.

I assume that for PostgreSQL "written to disk" is after fsync returned
successfully. In practice that could very well mean that the data is
still in a cache somewhere (controller or harddisk, not in the OS
anymore, see also man page of fsync)


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] x206-x225

2006-03-11 Thread Joost Kraaijeveld
On Sat, 2006-03-11 at 11:59 -0500, Tom Lane wrote:
> Joost Kraaijeveld <[EMAIL PROTECTED]> writes:
> > I assume that for PostgreSQL "written to disk" is after fsync returned
> > successfully. In practice that could very well mean that the data is
> > still in a cache somewhere (controller or harddisk, not in the OS
> > anymore, see also man page of fsync)
> 
> What it had better mean, if you want your database to be reliable,
> is that the data is stored someplace that will survive a system crash
> (power outage, kernel panic, etc).  A battery-backed RAM cache is OK,
> assuming that total failure of the RAID controller is not one of the
> events you consider likely enough to need protection against.

Maybe I should have expressed myself better. The parent post said: 

> It depends if you consider that "written to the disk" means "data is  
> somewhere between the OS cache and the platter" or "data is written on
> the platter and will survive a power loss".
>
> Postgres wants the second option, of course.

With my remark I meant that the only thing *PostgreSQL* can expect is
that the data is out of the OS: there is no greater guarantee in the
fsync function. If the *database administrator* wants better guarantees,
he (or she) better read your advise.

> The description of your SATA drive makes it sound like the drive
> does not put data on the platter before reporting "write complete",
> but only stores it in on-board RAM cache.  It is highly unlikely
> that there is any battery backing for that cache, and therefore that
> drive is not to be trusted.
Yep, the drives have a write cache, and indeed, they are not backed up
by a battery (neither is my RAID controller) but as this is a
test/development machine, I don't really care. 

You made me rethink my production machine thought. I will have to check
the drives and the state of their write cache of that machine. Thanks
for that.



-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Why the difference in plans ??

2006-09-15 Thread Joost Kraaijeveld
Hi,

I have two table: customers and salesorders. salesorders have a foreign
key to the customer

If I run this query:

SELECT 
salesOrders.objectid, 
salesOrders.ordernumber, 
salesOrders.orderdate, 
customers.objectid, 
customers.customernumber, 
customers.lastname 
FROM prototype.salesorders 
INNER JOIN prototype.customers ON ( 
customers.objectid = salesorders.customer 
) 
where 
lastname ilike 'Boonk' 
order by ordernumber asc LIMIT 1


WITHOUT "LIMIT 1" this query plan is executed (EXPLAIN ANALYZE):


Sort  (cost=41811.90..41812.78 rows=353 width=103) (actual 
time=623.855..623.867 rows=7 loops=1)
  Sort Key: salesorders.ordernumber
  ->  Nested Loop  (cost=2.15..41796.96 rows=353 width=103) (actual 
time=0.166..623.793 rows=7 loops=1)
->  Seq Scan on customers  (cost=0.00..21429.44 rows=118 width=55) 
(actual time=0.037..623.325 rows=5 loops=1)
  Filter: (lastname ~~* 'Boonk'::text)
->  Bitmap Heap Scan on salesorders  (cost=2.15..172.06 rows=44 
width=88) (actual time=0.075..0.079 rows=1 loops=5)
  Recheck Cond: ("outer".objectid = salesorders.customer)
  ->  Bitmap Index Scan on orders_customer  (cost=0.00..2.15 
rows=44 width=0) (actual time=0.066..0.066 rows=1 loops=5)
Index Cond: ("outer".objectid = salesorders.customer)
Total runtime: 624.051 ms



With the limit this query plan is used (EXPLAIN ANALYZE):

Limit  (cost=0.00..18963.24 rows=1 width=103) (actual time=18404.730..18404.732 
rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..6694025.41 rows=353 width=103) (actual 
time=18404.723..18404.723 rows=1 loops=1)
->  Index Scan using prototype_orders_ordernumber on salesorders  
(cost=0.00..37263.14 rows=1104381 width=88) (actual time=26.715..1862.408 
rows=607645 loops=1)
->  Index Scan using pk_prototype_customers on customers  
(cost=0.00..6.02 rows=1 width=55) (actual time=0.023..0.023 rows=0 loops=607645)
  Index Cond: (customers.objectid = "outer".customer)
  Filter: (lastname ~~* 'Boonk'::text)
Total runtime: 18404.883 ms


Both tables are freshly fully vacuumed analyzed.

Why the difference and can I influence the result so that the first
query plan (which is the fastest) is actually used in both cases (I
would expect that the limit would be done after the sort?)? 

TIA

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Why the difference in plans ??

2006-09-15 Thread Joost Kraaijeveld
On Fri, 2006-09-15 at 10:08 -0400, Tom Lane wrote:
> but it seems there are only 7.  Try increasing your statistics target
> and re-analyzing.

Do you mean with "increasing my statistics target" changing the value of
"default_statistics_target = 10" to a bigger number? If so, changing it
to 900 did not make any difference (PostgreSQL restarted, vacuumed
analysed etc).

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] What kind of performace can I expect and how to measure?

2004-08-02 Thread Joost Kraaijeveld
Hi all,

My system is a PostgreSQL 7.4.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 
20020903 (Red Hat Linux 8.0 3.2-7). It has a Pentium III-733 Mhz with 512 MB ram. It 
is connected to my workststation (dual XEON 1700 with 1 Gb RAM) with a 100 Mb switched 
network.

I have a table with 31 columns, all fixed size datatypes. It contains 88393 rows. 
Doing a "select * from table" with PGAdmin III in it's SQL window, it takes a total of 
9206 ms query runtime an a 40638 ms data retrievel runtime.

Is this a reasonable time to get 88393 rows from the database?

If not, what can I do to find the bottleneck (and eventually make it faster)?

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] What kind of performace can I expect and how to measure?

2004-08-02 Thread Joost Kraaijeveld
Hi Merlin,

> The 9206 ms time is what the database actually spent 
> gathering the data and sending it to you.  This is non-negotiable unless you bump up
> hardware, etc, or fetch less data.  This time usually scales linearly
> (or close to it) with the size of the dataset you fetch.
>
> The 40638 ms time is pgAdmin putting the data in the grid.  This time
So it take PostgreSQL 9206 ms to get the data AND send it to the client. It than takes 
PGAdmin 40638 ms to display the data?

> solution).  In the meantime, I would suggest using queries to refine
> your terms a little bit...(do you really need to view all 80k 
> records at once?).
The application is build in Clarion, a 4 GL environment. We do not have any influence 
over the query it generates and executes.


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Measuring server performance with psql and pgAdmin

2004-10-26 Thread Joost Kraaijeveld
Hi all,

I am (stilll) converting a database from a Clarion Topspeed database to Postgresql 
7.4.5 on Debian Linux 2.6.6-1. The program that uses the database uses a query like 
"select * from table" to show the user the contents of a table. This query cannot be 
changed (it is generated by Clarion and the person in charge of the program cannot 
alter that behaviour).

Now I have a big performance problem with reading a large table ( 96713 rows). The 
query that is send to the database is "select * from table".

"explain" and "explain analyze", using psql on cygwin:

munt=# explain select * from klt_alg;
 QUERY PLAN
- 
Seq Scan on klt_alg  (cost=0.00..10675.13 rows=96713 width=729) 


munt=# explain analyze select * from klt_alg;
 QUERY PLAN 
---
Seq Scan on klt_alg  (cost=0.00..10675.13 rows=96713 width=729) (actual 
time=13.172..2553.328 rows=96713 loops=1)
Total runtime: 2889.109 ms
(2 rows)   
   

Running the query (with pgAdmin III):
-- Executing query:
select * from klt_alg;

Total query runtime: 21926 ms.
Data retrieval runtime: 72841 ms.
96713 rows retrieved.

QUESTIONS:

GENERAL:
1. The manual says about "explain analyze" : "The ANALYZE option causes the statement 
to be actually executed, not only planned. The total elapsed time expended within each 
plan node (in milliseconds) and total number of rows it actually returned are added to 
the display." Does this time include datatransfer or just the time the database needs 
to collect the data, without any data transfer?
2. If the time is without data transfer to the client, is there a reliable way to 
measure the time needed to run the query and get the data (without the overhead of a 
program that does something with the data)?

PGADMIN:
1. What does the "Total query runtime" really mean? (It was my understanding that it 
was the time the database needs to collect the data, without any data transfer).
2. What does the "Data retrieval runtime" really mean? (Is this including the filling 
of the datagrid/GUI, or just the datatransfer?)

TIA

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Why the difference in query plan and performance pg 7.4.6?

2005-02-01 Thread Joost Kraaijeveld
Hi all,

I have a freshly vacuumed table with 1104379 records with a index on zipcode. 
Can anyone explain why the queries go as they go, and why the performance 
differs so much (1 second versus 64 seconds, or stated differently,  1 
records per second versus 1562 records per second) and why the query plan of 
query 2 ignores the index?

For completeness sake I also did a select ordernumber without any ordering. 
That only took 98 second for 1104379 record (11222 record per second, 
compariable with the first query as I would have expected). 

Query 1:
select a.ordernumer from orders a order by a.zipcode limit 1
Explain: 
QUERY PLAN
Limit  (cost=0.00..39019.79 rows=1 width=14)
  ->  Index Scan using orders_postcode on orders a  (cost=0.00..4309264.07 
rows=1104379 width=14)
Running time: 1 second

Query 2:
select a.ordernumer from orders a order by a.zipcode limit 10
Explain:
QUERY PLAN
Limit  (cost=207589.75..207839.75 rows=10 width=14)
  ->  Sort  (cost=207589.75..210350.70 rows=1104379 width=14)
Sort Key: postcode
->  Seq Scan on orders a  (cost=0.00..46808.79 rows=1104379 width=14)
Running time: 64 seconds

Query 3:
select a.ordernumer from orders a
QUERY PLAN
Seq Scan on orders a  (cost=0.00..46808.79 rows=1104379 width=4)
Running time: 98 seconds

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Is this possible / slow performance?

2005-02-07 Thread Joost Kraaijeveld
Hi every one,


Why does this take forever (each query is sub second when done seperately)? 
Is it because I cannot open two cursors in the same transaction?

begin;

declare SQL_CUR01 cursor for 
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B 
ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;

declare SQL_CUR02 cursor for 
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B 
ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer desc;
fetch 100 in SQL_CUR02;

commit;


TIA

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Retry: Is this possible / slow performance?

2005-02-07 Thread Joost Kraaijeveld
Hi all,

A retry of the question asked before. All tables freshly vacuumed an analized. 

Two queries: one with "set enable_seqscan = on" , the other with "set 
enable_seqscan = off". The first query lasts 59403 ms, the second query 31 ms ( 
the desc order variant has the same large difference: 122494 ms vs. 1297 ms). 
(for the query plans see below).

Can I, without changing the SQL (because it is generated by a tool) or 
explicitely setting "set enable_seqscan = off" for this query, trick PostgreSQL 
in taking the fast variant of the queryplan?

TIA


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


--- Query 1

begin;
set enable_seqscan = on;
declare SQL_CUR01 cursor for 
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B 
ON A.Klantnummer=B.Klantnummer 
ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;
commit;

QUERY PLAN
Sort  (cost=259968.77..262729.72 rows=1104380 width=12)
  Sort Key: a.klantnummer, a.ordernummer
  ->  Hash Left Join  (cost=42818.43..126847.70 rows=1104380 width=12)
Hash Cond: ("outer".klantnummer = "inner".klantnummer)
->  Seq Scan on orders a  (cost=0.00..46530.79 rows=1104379 width=8)
->  Hash  (cost=40635.14..40635.14 rows=368914 width=4)
  ->  Seq Scan on klt_alg b  (cost=0.00..40635.14 rows=368914 
width=4)

Actual running time:  59403 ms.

--- Query 2

begin;
set enable_seqscan = off;
declare SQL_CUR01 cursor for 
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B 
ON A.Klantnummer=B.Klantnummer 
ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;
commit;

QUERY PLAN
Merge Left Join  (cost=0.00..2586604.86 rows=1104380 width=12)
  Merge Cond: ("outer".klantnummer = "inner".klantnummer)
  ->  Index Scan using orders_klantnummer on orders a  (cost=0.00..2435790.17 
rows=1104379 width=8)
  ->  Index Scan using klt_alg_klantnummer on klt_alg b  (cost=0.00..44909.11 
rows=368914 width=4)

Actual running time: 31 ms.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Retry: Is this possible / slow performance?

2005-02-07 Thread Joost Kraaijeveld
 
>> The best solution is probably to put a LIMIT into the DECLARE CURSOR,
>> so that the planner can see how much you intend to fetch.
I assume that this limits the resultset to a LIMIT. That is not what I was 
hoping for. I was hoping for a way to scrolll throught the whole tables with 
orders.

I have tested, and if one really wants the whole table the query with "set 
enable_seqscan = on" lasts 137 secs, the query with "set enable_seqscan = off" 
lasts 473 secs, so (alas), the planner is right. 

I sure would like to have ISAM like behaviour once in a while.


Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Joost Kraaijeveld
Hi all,

A question on how to read and interpret the explain analyse statement (and what 
to do)

I have a query "SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER 
JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;"

Both tables have an btree index on klantnummer (int4, the column the join is 
on). I have vacuumed and analyzed both tables. The explain analyse is:

QUERY PLAN
Sort  (cost=220539.32..223291.41 rows=1100836 width=12) (actual 
time=51834.128..56065.126 rows=1104380 loops=1)
  Sort Key: a.klantnummer
  ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836 width=12) (actual 
time=21263.858..42845.158 rows=1104380 loops=1)
Hash Cond: (""outer"".klantnummer = ""inner"".klantnummer)
->  Seq Scan on orders a  (cost=0.00..46495.36 rows=1100836 width=8) 
(actual time=5.986..7378.488 rows=1104380 loops=1)
->  Hash  (cost=40635.14..40635.14 rows=368914 width=4) (actual 
time=21256.683..21256.683 rows=0 loops=1)
  ->  Seq Scan on klt_alg b  (cost=0.00..40635.14 rows=368914 
width=4) (actual time=8.880..18910.120 rows=368914 loops=1)
Total runtime: 61478.077 ms


Questions:
  ->  Hash Left Join  (cost=41557.43..110069.51 rows=1100836 width=12) (actual 
time=21263.858..42845.158 rows=1104380 loops=1)

0. What exactly are the numbers in "cost=41557.43..110069.51" ( I assume for 
the other questions that 41557.43 is the estimated MS the query will take, what 
are the others)?

1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is the 
estimated cost and (actual time=21263.858..42845.158 rows=1104380 loops=1) the 
actual cost. Is the difference acceptable?

2. If not, what can I do about it?

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Joost Kraaijeveld
Hi Tom,

Tom Lane schreef:
> On the surface this looks like a reasonable plan choice.  If you like
> you can try the other two basic types of join plan by turning off
> enable_hashjoin, which will likely drive the planner to use a merge
> join, and then also turn off enable_mergejoin to get a nested loop
> (or if it thinks nested loop is second best, turn off enable_nestloop
> to see the behavior with a merge join).

The problem is that the query logically requests all records  ( as in "select * 
from a join") from the database but actually displays (in practise) in 97% of 
the time the first 1000 records and at most the first 50.000 records 
99.99% of the time by scrolling (using "page down) in the gui and 
an occasional "jump to record " through something called a locator) (both 
percentages tested!).

If I do the same query with a "limit 60.000" or if I do a "set enable_seqscan = 
off" the query returns in 0.3 secs. Otherwise it lasts for 20 secs (which is 
too much for the user to wait for, given the circumstances).

I cannot change the query (it is geneated by a tool called Clarion) but it 
something like (from the psqlodbc_xxx.log):
"...
declare SQL_CUR01 cursor for 
SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B 
ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;
..."

PostgreSQL does the planning (and than executes accordingly) to the query and 
not the "fetch 100". Changing the query  with a "limit whatever" prohibits 
scrolling after the size of the resultset. If Postgres should delay the 
planning of the actual query untill the fetch it could choose the quick 
solution. Another solution would be to "advise" PostgreSQL which index etc 
(whatever etc means ;-))  to use ( as in the mailing from Silke Trissl in the 
performance list on 09-02-05).

> What's important in comparing different plan alternatives is the ratios
> of estimated costs to actual elapsed times.  If the planner is doing its
> job well, those ratios should be similar across all the alternatives
> (which implies of course that the cheapest-estimate plan is also the
> cheapest in reality).  If not, it may be appropriate to fool with the
> planner's cost estimate parameters to try to line up estimates and
> reality a bit better. 
I I really do a "select *" and display the result, the planner is right (tested 
with "set enable_seqscan = off" and "set enable_seqscan = on).

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Joost Kraaijeveld
Hi Tom,

Tom Lane schreef:
> Well, the planner does put some emphasis on startup time when dealing
> with a DECLARE CURSOR plan; the problem you face is just that that
> correction isn't large enough.  (From memory, I think it optimizes on
> the assumption that 10% of the estimated rows will actually
> be fetched; you evidently want a setting of 1% or even less.)
I wish I had your mnemory ;-) . The tables contain 1.100.000 records by the way 
 (that is not nearly 10 %, my math is not that good))


> We once talked about setting up a GUC variable to control the
> percentage of a cursor that is estimated to be fetched:
> http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
> It never got done but that seems like the most reasonable solution to
> me. 
If the proposal means that the cursor is not limited to ths limit in the query 
but is limited to the fetch than I support the proposal. A bit late I presume.

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] What is the number of rows in explain?

2005-03-11 Thread Joost Kraaijeveld
Hi all,

Is the number of rows in explain the number of rows that is expected to be 
visited or retrieved?

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] PostgreSQL 9.1 : why is this query slow?

2011-11-28 Thread Joost Kraaijeveld
Hi All,

I have a table with 665605 rows (counted, vacuum-ed):
CREATE TABLE unique_words
( filename text NOT NULL,
  filetype text NOT NULL,
  word text NOT NULL,
  count integer,)

The query is:
select f.word , count(f.word) from 
unique_words as f, 
unique_words as s ,
unique_words as n 
where
(f.word = s.word and s.word = n.word)
and
(f.filetype = 'f' and s.filetype = 's' and n.filetype = 'n')
group by f.word

Explain says:
"GroupAggregate  (cost=0.00..67237557.88 rows=1397 width=6)"
"  ->  Nested Loop  (cost=0.00..27856790.31 rows=7876150720 width=6)"
"->  Nested Loop  (cost=0.00..118722.04 rows=14770776 width=12)"
"  ->  Index Scan using idx_unique_words_filetype_word on 
unique_words f  (cost=0.00..19541.47 rows=92098 width=6)"
"Index Cond: (filetype = 'f'::text)"
"  ->  Index Scan using idx_unique_words_filetype_word on 
unique_words s  (cost=0.00..0.91 rows=13 width=6)"
"Index Cond: ((filetype = 's'::text) AND (word = f.word))"
"->  Index Scan using idx_unique_words_filetype_word on unique_words n  
(cost=0.00..1.33 rows=44 width=6)"
"  Index Cond: ((filetype = 'n'::text) AND (word = f.word))"


The right answer should be 3808 different words (according to a Java
program I wrote).

This query takes more than 1 hour (after which I cancelled the query).
My questions are:
- Is this to be expected?
- Especially as the query over just 1 join takes 32 secs? (on f.word =
s.word omitting everything for n )
- Why does explain say it takes "7876150720 rows"? 
- Is there a way to rephrase the query that makes it faster?
- Could another table layout help (f,s,n are all possibilities for
filetype)?
- Anything else?

TIA

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.1 : why is this query slow?

2011-11-28 Thread Joost Kraaijeveld
On Mon, 2011-11-28 at 17:42 +0100, Joost Kraaijeveld wrote:
> - Is there a way to rephrase the query that makes it faster?
This query goes faster (6224 ms, but I am not sure it gives the correct
answer as the result differs from my Java program):

select word, count (word) from unique_words 
where
word in (select word from unique_words where 
 word in ( select word from unique_words where filetype = 'f')
 and
 filetype = 's')
and
filetype = 'n'
group by word


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.1 : why is this query slow?

2011-11-28 Thread Joost Kraaijeveld
On Mon, 2011-11-28 at 11:05 -0600, Kevin Grittner wrote:
> Joost Kraaijeveld  wrote:
>  
> > This query goes faster (6224 ms, but I am not sure it gives the
> > correct answer as the result differs from my Java program):
>  
> It seems clear that you want to see words which appear with all
> three types of files, but it's not clear what you want the count to
> represent.  The number of times the word appears in filetype 'n'
> references (as specified in your second query)?  The number of
> permutations of documents which incorporate one 'f' document, one
> 's' document, and one 'n' document (as specified in your first
> query).  Something else, like the total number of times the word
> appears?
I would like the answer to be "the number of times the word appears in
all three the queries", the intersection of the three queries. 

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.1 : why is this query slow?

2011-11-29 Thread Joost Kraaijeveld
On Mon, 2011-11-28 at 11:36 -0600, Kevin Grittner wrote:
> "Kevin Grittner"  wrote:
>  
> > If you really want the intersection, perhaps:
>  
> Or maybe closer:
>  
> with x as
>   (
> select
> word,
> count(*) as countall,
> count(case when filetype = 'f' then 1 else null end)
>   as countf,
> count(case when filetype = 's' then 1 else null end)
>   as counts,
> count(case when filetype = 'n' then 1 else null end)
>   as countn
>   from unique_words
>   group by word
>   )
> select word, least(countf, counts, countn) from x
>   where countf > 0 and counts > 0 and countn > 0
>   order by word;
>  
> Cranked out rather quickly and untested.

I tested it and it worked as advertised. Takes ~ 3 secs to complete.
Thanks.

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance