from:"mike"

[PERFORM] SubQuery Performance

2010-08-25 Thread mike

Hi All,

I have a poor performance SQL as following. The table has about 200M
records, each employee have average 100 records. The query lasts about
3 hours. All I want is to update the flag for highest version of each
client's record. Any suggestion is welcome!

Thanks,

Mike


SQL===
update empTbl A
set flag=1
where
rec_ver =
( select max(rec_ver)
from empTbl
where empNo = A.empNo)



===Table empTbl=

empTbl
{
int empNo;
int flag;
char[256] empDesc;
int rec_ver;
}

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Vacuum and Memory Loss

2006-10-20 Thread Mike

Hello friends,

I am responsible for maintaining a high volume website using postgresql
8.1.4. Given the amount of reads and writes, I vacuum full the server a
few times a week around 1, 2 AM shutting down the site for a few
minutes. The next day morning around 10 - 11 AM the server slows down
to death. It used to be that the error 'Too many clients' would be
recorded, until I increased the number of clients it can handle, and
now it simply slows down to death having lots and lots of postmaster
processes running:

Tasks: 665 total,  10 running, 655 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.9% us, 16.7% sy,  0.0% ni,  0.0% id, 68.4% wa,  0.0% hi,
0.0% si
Mem:   2074932k total,  2051572k used,23360k free, 2736k
buffers
Swap:  2096440k total,  188k used,   251992k free,   102968k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 6420 postgres  15   0 26912  11m  10m R  3.6  0.6   0:00.11 postmaster
 6565 postgres  16   0 26912  11m  10m S  3.6  0.6   0:00.12 postmaster
 6707 postgres  15   0 26912  11m  10m S  3.3  0.6   0:00.10 postmaster
 6715 postgres  15   0 26912  11m  10m S  3.3  0.6   0:00.11 postmaster
 6765 postgres  15   0 26912  11m  10m S  3.3  0.6   0:00.11 postmaster
 6147 postgres  15   0 26912  11m  10m R  3.0  0.6   0:00.15 postmaster
 6311 postgres  15   0 26904  11m  10m R  3.0  0.6   0:00.10 postmaster
 6551 postgres  15   0 26912  11m  10m R  3.0  0.6   0:00.09 postmaster
 6803 postgres  16   0 26912  11m  10m R  3.0  0.6   0:00.09 postmaster
 6255 postgres  15   0 26904  11m  10m R  2.6  0.6   0:00.14 postmaster
 6357 postgres  15   0 26912  11m  10m R  2.6  0.6   0:00.11 postmaster
 6455 postgres  15   0 26912  11m  10m S  2.6  0.6   0:00.10 postmaster
 6457 postgres  15   0 26912  11m  10m S  2.6  0.6   0:00.11 postmaster
 6276 postgres  15   0 26912  11m  10m S  2.3  0.6   0:00.10 postmaster
 6475 postgres  15   0 26912  11m  10m R  2.3  0.6   0:00.11 postmaster
 6868 postgres  15   0 26912  11m  10m S  2.3  0.6   0:00.07 postmaster
 6891 postgres  15   0 26912  11m  10m S  1.3  0.6   0:00.19 postmaster

Thanks for your help in advance,
Mike


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mike

So your update doesn't take long to run during off-peak times, so
basically your options are:

1. Optimize your postgresql.conf settings or upgrade to the latest
version of PostgreSQL.

2. Redesign your forum code so it can scale better.

3. Upgrade your servers hardware as it may be overloaded.

I would probably attack those in the order I described. 

As far as redesigning your forum code, keep in mind that in PostgreSQL
an update is basically a select, delete, insert in a single statement.
First it needs to find the rows to update, it marks the rows for
deletion (which vacuum later does) and inserts a new row. So updates
can be quite expensive. 

In SOME situations, it can be faster to do inserts only, and modify
your select query to get just the data you need, for example:

Rather then an update like this:

update  set LastReadAnswerID =  where UserID = 
AND TopicID = 

You could do this instead:

insert into  VALUES(,,)

Then just modify your select statement slightly to get the last
inserted row:

select * from  where user_id =  AND topic_id = 
order by LastReadAnswerID DESC LIMIT 1

This makes your select statement slightly more expensive but your
insert statement pretty much as cheap as possible. Since its much
easier to cache select results you could easily wrap some caching
mechanism around your select query to reduce the load there too. 

Then using a task scheduler like cron simply clear out old rows from the
table you insert into every minute, 5 minutes, hour, day, whatever makes
most sense to keep the select queries fast.

A memcached solution would probably be much better, but its also likely
much more involved to do.


On Tue, 23 Jun 2009 17:50:50 +0200
Mathieu Nebra  wrote:

> Robert Haas a écrit :
>  Which pg version are you using?
> >> I should have mentionned that before sorry: PostgreSQL 8.2
> > 
> > I think there is an awful lot of speculation on this thread about
> > what your problem is without anywhere near enough investigation.  A
> > couple of seconds for an update is a really long time, unless your
> > server is absolutely slammed, in which case probably everything is
> > taking a long time.  We need to get some more information on what
> > is happening here.
> 
> You're right, I'll give you the information you need.
> 
> >  Approximately how many requests per second are you servicing?
> > Also,
> 
> How can I extract this information from the database? I know how to
> use pg_stat_user_tables. My table has:
> 
> seq_tup_read
> 133793491714
> 
> idx_scan
> 12408612540
> 
> idx_tup_fetch
> 41041660903
> 
> n_tup_ins
> 14700038
> 
> n_tup_upd
> 6698236
> 
> n_tup_del
> 15990670
> 
> > can you:
> > 
> > 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
> > the exact query and the output.
> 
> "Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
> rows=1 width=18)"
> "  Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))"
> 
> This time it only took 54ms, but maybe it's already a lot.
> 
> 
> > 
> > 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
> > so of the output.
> 
> It's not very long, I can give you the whole log:
> 
> INFO:  vacuuming "public.prj_frm_flg"INFO:  scanned index
> "prj_frm_flg_pkey" to remove 74091 row versions
> DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
> "flg_fav" to remove 74091 row versions
> DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
> "flg_notif" to remove 74091 row versions
> DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
> "flg_post" to remove 74091 row versions
> DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
> "flg_no_inter" to remove 74091 row versions
> DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  "prj_frm_flg":
> removed 74091 row versions in 5979 pages
> DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
> "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages
> DETAIL:  63153 index row versions were removed.
> 672 index pages have been deleted, 639 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_fav" now
> contains 1315895 row versions in 18228 pages
> DETAIL:  73628 index row versions were removed.
> 21 index pages have been deleted, 16 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_notif" now
> contains 1315895 row versions in 18179 pages
> DETAIL:  73468 index row versions were removed.
> 22 index pages have been deleted, 13 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_post" now
> contains 1315895 row versions in 18194 pages
> DETAIL:  73628 index row versions were removed.
> 30 index pages have been deleted, 23 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_no_inter" now
> contains 1315895 row versions in 8596 pages
> DETAIL:  73628 index row versions were removed.
> 13 index pages have been deleted, 8 are currently reusable.
> C

Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mike

So your update doesn't take long to run during off-peak times, so
basically your options are:

1. Optimize your postgresql.conf settings or upgrade to the latest
version of PostgreSQL.

2. Redesign your forum code so it can scale better.

3. Upgrade your servers hardware as it may be overloaded.

I would probably attack those in the order I described. 

As far as redesigning your forum code, keep in mind that in PostgreSQL
an update is basically a select, delete, insert in a single statement.
First it needs to find the rows to update, it marks the rows for
deletion (which vacuum later does) and inserts a new row. So updates
can be quite expensive. 

In SOME situations, it can be faster to do inserts only, and modify
your select query to get just the data you need, for example:

Rather then an update like this:

update  set LastReadAnswerID =  where UserID = 
AND TopicID = 

You could do this instead:

insert into  VALUES(,,)

Then just modify your select statement slightly to get the last
inserted row:

select * from  where user_id =  AND topic_id = 
order by LastReadAnswerID DESC LIMIT 1

This makes your select statement slightly more expensive but your
insert statement pretty much as cheap as possible. Since its much
easier to cache select results you could easily wrap some caching
mechanism around your select query to reduce the load there too. 

Then using a task scheduler like cron simply clear out old rows from the
table you insert into every minute, 5 minutes, hour, day, whatever makes
most sense to keep the select queries fast.

A memcached solution would probably be much better, but its also likely
much more involved to do.



On Tue, 23 Jun 2009 17:50:50 +0200
Mathieu Nebra  wrote:

> Robert Haas a écrit :
>  Which pg version are you using?
> >> I should have mentionned that before sorry: PostgreSQL 8.2
> > 
> > I think there is an awful lot of speculation on this thread about
> > what your problem is without anywhere near enough investigation.  A
> > couple of seconds for an update is a really long time, unless your
> > server is absolutely slammed, in which case probably everything is
> > taking a long time.  We need to get some more information on what
> > is happening here.
> 
> You're right, I'll give you the information you need.
> 
> >  Approximately how many requests per second are you servicing?
> > Also,
> 
> How can I extract this information from the database? I know how to
> use pg_stat_user_tables. My table has:
> 
> seq_tup_read
> 133793491714
> 
> idx_scan
> 12408612540
> 
> idx_tup_fetch
> 41041660903
> 
> n_tup_ins
> 14700038
> 
> n_tup_upd
> 6698236
> 
> n_tup_del
> 15990670
> 
> > can you:
> > 
> > 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
> > the exact query and the output.
> 
> "Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
> rows=1 width=18)"
> "  Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))"
> 
> This time it only took 54ms, but maybe it's already a lot.
> 
> 
> > 
> > 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
> > so of the output.
> 
> It's not very long, I can give you the whole log:
> 
> INFO:  vacuuming "public.prj_frm_flg"INFO:  scanned index
> "prj_frm_flg_pkey" to remove 74091 row versions
> DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
> "flg_fav" to remove 74091 row versions
> DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
> "flg_notif" to remove 74091 row versions
> DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
> "flg_post" to remove 74091 row versions
> DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
> "flg_no_inter" to remove 74091 row versions
> DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  "prj_frm_flg":
> removed 74091 row versions in 5979 pages
> DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
> "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages
> DETAIL:  63153 index row versions were removed.
> 672 index pages have been deleted, 639 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_fav" now
> contains 1315895 row versions in 18228 pages
> DETAIL:  73628 index row versions were removed.
> 21 index pages have been deleted, 16 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_notif" now
> contains 1315895 row versions in 18179 pages
> DETAIL:  73468 index row versions were removed.
> 22 index pages have been deleted, 13 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_post" now
> contains 1315895 row versions in 18194 pages
> DETAIL:  73628 index row versions were removed.
> 30 index pages have been deleted, 23 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_no_inter" now
> contains 1315895 row versions in 8596 pages
> DETAIL:  73628 index row versions were removed.
> 13 index pages have been deleted, 8 are currently reusable.
>

Re: [PERFORM] How to speed up word count in tsearch2?

2005-04-01 Thread Mike Rylander

On Apr 1, 2005 4:03 AM, Yudie Pg <[EMAIL PROTECTED]> wrote:
> > You need to look at what else is using RAM on that machine.  And maybe buy
> > more.
> 
> Ouch.. I had that feeling also. then how can I know how much memory
> needed for certain amount words? and why counting uncommon words are
> faster than common one?

Because the index is a tree.  You fall of the end of a branch faster
with uncommon words.  Plus the executor goes back to the table for
fewer real rows with uncommon words.

It sounds like you may just need a faster disk subsystem.  That would
shrink the time for the first query on any particular set of words,
and it would make everything else faster as a nice side effect.  What
does your disk layout look like now?

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-01 Thread Mike Nolan

> Just curious, but does anyone have an idea of what we are capable of?  I 
> realize that size of record would affect things, as well as hardware, but 
> if anyone has some ideas on max, with 'record size', that would be 
> appreciated ...

Well, I just did an insert of 27,500 records with 9 fields, averaging 
around 118 bytes per record, each insert statement coming from a separate 
SQL statement fed to psql, and it took a bit over 4 minutes, or about 
106 inserts per second.

That seems consistent with what I get when I do a restore of a dump
file that has insert statement instead of COPY.

The hardware is a Dell dual Xeon system, the disks are mirrored SATA
drives with write buffering turned off.  
--
Mike Nolan



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-01 Thread Mike Nolan

> > Well, I just did an insert of 27,500 records with 9 fields, averaging 
> > around 118 bytes per record, each insert statement coming from a separate 
> > SQL statement fed to psql, and it took a bit over 4 minutes, or about 
> > 106 inserts per second.
> 
> Is that with a separate transaction for each insert command?  I can get
> significantly higher rates on my devel machine if the inserts are
> bundled into transactions of reasonable length.

That's with autocommit on.  If I do it as a single transaction block,
it takes about 6.5 seconds, which is about 4200 transactions/second.
--
Mike Nolan

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Mike Rylander

If I'm getting the point of this thread correctly, have a huge amount
of data in one table degrades INSERT/COPY performance even with just a
PKEY index.  If that's about the size of it, read on.  If not, ignore
me because I missed something.

On Apr 4, 2005 10:44 PM, Simon Riggs <[EMAIL PROTECTED]> wrote:
> Before I start to tunnel-vision on a particular coincidence...
> 

Don't worry too much about tunnel vision.  I see the same thing every
day with multi-million row tables.  The bigger the table gets (with
only a pkey index) the slower the inserts go.  If I start over
(truncate, drop/create table), or if I point the initial load at a new
table, everything gets speedy.  I've always figured it was a function
of table size and learned to live with it...

> How much memory have you got on the system?

On mine, 16G

> How much of that have you allocated to various tasks?

shared buffers: 15000

> What else is happening on your system?

Nothing on mine.

> Tell us more about disk set-up and other hardware related things.

6-disk RAID10 on a Compaq SmartArray 6404 with 256M BB cache, WAL on
2-disk mirror on built in SmartArray5 controller.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?)

2005-04-20 Thread Mike Rylander

On 4/20/05, Anjan Dave <[EMAIL PROTECTED]> wrote:
> In terms of vendor specific models -
> 
> Does anyone have any good/bad experiences/recommendations for a 4-way
> Opteron from Sun (v40z, 6 internal drives) or HP (DL585 5 internal
> drives) models?

We are going with the 90nm HPs for production.  They "feel" like
beefier boxes than the Suns, but the Suns cost a LOT less, IIRC. 
We're only using the internal drives for the OS.  PG gets access to a
fibre-channel array, HP StorageWorks 3000.  I _can't wait_ to get this
in.

Our dev box is a 130nm DL585 with 16G of RAM and an HP SCSI array, and
I have absolutely zero complaints.  :)

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Table Partitioning: Will it be supported in Future?

2005-04-26 Thread Mike Rylander

On 4/26/05, Mohan, Ross <[EMAIL PROTECTED]> wrote:
> Maybe he needs to spend $7K on performance improvements?
> 
> ;-)
> 

AAARRRGGG!

I will forever hate the number 7,000 from this day forth!

Seriously, though, I've never seen a thread on any list wander on so
aimlessly for so long.

Please, mommy, make it stop!

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM]

2005-05-04 Thread Mike Rylander

On 5/3/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> Steven Rosenstein <[EMAIL PROTECTED]> writes:
> > My question is, are there any advantages, drawbacks, or outright
> > restrictions to using multiple simultaneous COPY commands to load data into
> > the same table?
> 
> It will work; not sure about whether there is any performance benefit.
> I vaguely recall someone having posted about doing this, so you might
> check the archives.
> 

I may be one of Tom's vague "voices". ;)  The only issue would be that
you need to remove all you UNIQUE constraints before sending multiple
COPYs to the server.  This includes the PRIMARY KEY constraint.  To
the backend, COPY is just like INSERT and all constraints need to be
checked and this will block the commit of one of the COPY streams.

However, multiple COPYs may no be needed.  I regularly load several
table totaling around 50M rows with a single COPY per table.  I drop
(actually, this is during DB reload, so I don't yet create...) all
fkeys, constraints and indexes and the data loads in a matter of 5
minutes or so.

Hope that helps!

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] Table stats

2005-05-04 Thread Mike G.

Hello,

I have a table collecting stats that shows 5 Index Tuples Fetched but no Index 
Scans.  Should there not be at least one Index Scan showing in the stats?

Mike

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] ok you all win what is best opteron (I dont want a hosed system again)

2005-05-14 Thread Mike Nolan

> This can often be called for.  I'm working on a 400GB data warehouse right 
> now, and almost *all* of our queries run from materialized aggregate tables.

I thought that was pretty much the definition of data warehousing!  :-)
--
Mike Nolan

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Postgresql on an AMD64 machine

2005-06-06 Thread Mike Rylander

On 06 Jun 2005 12:53:40 -0500, Mark Rinaudo <[EMAIL PROTECTED]> wrote:
> I'm not sure if this is the appropriate list to post this question to
> but i'm starting with this one because it is related to the performance
> of Postgresql server.  I have a Penguin Computing dual AMD 64 bit
> opteron machine with 8 Gigs of memory.  In my attempt to increase the
> number of shared_buffers from the default to 65000 i was running into a
> semget error when trying to start Postgresql. After reading the
> documentation I adjusted the semaphore settings in the kernel to allow
> Postgresql to start successfully.  With this configuration running if I
> do a ipcs -u i get the following.


On my HP-585, 4xOpteron, 16G RAM, Gentoo Linux (2.6.9):

$ ipcs -u i

-- Shared Memory Status 
segments allocated 1
pages allocated 34866
pages resident  31642
pages swapped   128
Swap performance: 0 attempts 0 successes

-- Semaphore Status 
used arrays = 7
allocated semaphores = 119

-- Messages: Status 
allocated queues = 0
used headers = 0
used space = 0 bytes


Did you perhaps disable spinlocks when compiling PG?

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Advice on RAID card

2005-09-25 Thread Mike Rylander

On 9/25/05, Dave Cramer <[EMAIL PROTECTED]> wrote:
> I would think software raid would be quite inappropriate considering
> postgres when it is working is taking a fair amount of CPU as would
> software RAID. Does anyone know if this is really the case ?
>

I attempted to get some extra speed out of my Compaq/HP SA6404 card by
using software RAID1 across to hardware RAID10 sets.  It didn't help,
but there was no noticeable load or drop in performance because of it.
 Granted, this was on a 4-way Opteron, but, anecdotally speaking, the
linux software RAID has surprisingly low overhead.

My $0.02, hope it helps.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] Table Partitions / Partial Indexes

2005-12-11 Thread Mike C

Hello,I've got a table with ~60 Million rows and am having
performance problems querying it. Disks are setup as 4x10K SCSI 76GB,
RAID 1+0. The table is being inserted into multiple times every second
of the day, with no updates and every 2nd day we delete 1/60th of the
data (as it becomes old). Vacuum analyze is scheduled to run 3 times a
day.Query:select sum(TOTAL_FROM) as TOTAL_IN,
sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE
= 996 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE
<= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 descTable:CREATE TABLE PC_TRAFFIC ( PK_PC_TRAFFIC  INTEGER NOT NULL, TRAFFIC_DATE   TIMESTAMP NOT NULL, SOURCE_MAC CHAR(20) NOT NULL,
 DEST_IPCHAR(15), DEST_PORT  INTEGER, TOTAL_TO   DOUBLE PRECISION, TOTAL_FROM DOUBLE PRECISION, FK_DEVICE  SMALLINT, PROTOCOL_TYPE  SMALLINT);
CREATE INDEX pc_traffic_pkidx ON pc_traffic (pk_pc_traffic);CREATE INDEX pc_traffic_idx3 ON pc_traffic (fk_device, traffic_date);Plan:Sort  (cost=76650.58..76650.58 rows=2 width=40)   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=76650.54..76650.57 rows=2 width=40)
->  Bitmap Heap Scan on
pc_traffic  (cost=534.64..76327.03 rows=43134 width=40)  
Recheck Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01
00:00:00'::timestamp without time zone) AND (traffic_date <=
'2005-10-31 23:59:59'::timestamp without time zone))  
->  Bitmap Index Scan on
pc_traffic_idx3  (cost=0.00..534.64 rows=43134 width=0)
Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01
00:00:00'::timestamp without time zone) AND (traffic_date <=
'2005-10-31 23:59:59'::timestamp without time zone))(7 rows)CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance: Sort  (cost=39886.65..39886.66 rows=2 width=40)   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=39886.61..39886.64 rows=2 width=40)
->  Index Scan using pc_traffic_idx3 on
pc_traffic  (cost=0.00..39551.26 rows=44714 width=40)  
Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01
00:00:00'::timestamp without time zone) AND (traffic_date <=
'2005-10-31 23:59:59'::timestamp without time zone))(5 rows)However
the clustering is only effective on the first shot. Because of the
constant usage of the table we can't perform a vacuum full nor any
exclusive lock function.Would table partitioning/partial
indexes help much? Partitioning on date range doesn't make much sense
for this setup, where a typical 1-month query spans both tables (as the
billing month for the customer might start midway through a calendar
month).Noting that the index scan was quicker than the bitmap,
I'm trying to make the indexes smaller/more likely to index scan. I
have tried partitioning against fk_device, with 10 child tables. I'm
using fk_device % 10 = 1, fk_device % 10 = 2, fk_device % 10 = 3,
etc... as the check constraint.CREATE TABLE pc_traffic_0 (CHECK(FK_DEVICE % 10 = 0)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_1 (CHECK(FK_DEVICE % 10 = 1)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_2 (CHECK(FK_DEVICE % 10 = 2)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_3 (CHECK(FK_DEVICE % 10 = 3)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_4 (CHECK(FK_DEVICE % 10 = 4)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_5 (CHECK(FK_DEVICE % 10 = 5)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_6 (CHECK(FK_DEVICE % 10 = 6)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_7 (CHECK(FK_DEVICE % 10 = 7)) INHERITS (pc_traffic);CREATE TABLE pc_traffic_8 (CHECK(FK_DEVICE % 10 = 8)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_9 (CHECK(FK_DEVICE % 10 = 9)) INHERITS (pc_traffic);... indexes now look like:CREATE INDEX pc_traffic_6_idx3 ON pc_traffic_6 (fk_device, traffic_date);To
take advantage of the query my SQL now has to include the mod operation
(so the query planner picks up the correct child tables):select
sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC
from PC_TRAFFIC where FK_DEVICE = 996 and FK_DEVICE % 10 = 6 and
TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <=
'2005-11-13 23:59:59' group by SOURCE_MAC order by 1 descSorry
I would show the plan but I'm rebuilding the dev database atm. It was
faster though and did pick up the correct child table. It was also a
bitmap scan on the index IIRC.Would I be better off creating many partial indexes instead of multiple tables AND multiple indexes?Am I using a horrid method for partitioning the data? (% 10) 
Should there be that big of an improvement for multiple tables given that all the data is still stored on the same filesystem?
Any advice on table splitting much appreciated. Cheers,Mike C.

Re: [PERFORM] Table Partitions / Partial Indexes

2005-12-11 Thread Mike C

On 12/12/05, Tom Lane <[EMAIL PROTECTED]> wrote:
Mike C <[EMAIL PROTECTED]> writes:> CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance:How can you tell?  Neither of these are EXPLAIN ANALYZE output.
regards,
tom lane

Sorry that's a result of my bad record keeping. I've been keeping
records of the explain but not the analyze. IIRC the times dropped from
~25 seconds down to ~8 seconds (using analyze).

Regards,

Mike

Re: [PERFORM] Table Partitions / Partial Indexes

2005-12-13 Thread Mike C

On 12/14/05, Simon Riggs <[EMAIL PROTECTED]> wrote:
Maybe not for queries, but if you use a date range then you never needto run a DELETE and never need to VACUUM.You could split the data into two-day chunks.
That's an interesting idea, thanks.
> Am I using a horrid method for partitioning the data? (% 10)No, but what benefit do you think it provides. I'm not sure I see...

I was trying to get both the indexes to be smaller without loosing
selectivity, and make any table scans/index scans faster from having to
read less data.

> Should there be that big of an improvement for multiple tables given> that all the data is still stored on the same filesystem?
You could store partitions in separate tablespaces/filesystems.
Ideally that's what I would do, but to make the most of that I would
have to have a dedicated RAID setup for each partition right? (Which is
a bit pricey for the budget).

Cheers,

Mike

[PERFORM] strange issue for certain queries

2007-02-15 Thread Mike Gargano

int.
geqo|  
on  | Enables genetic query  
optimization.
geqo_effort |  
5   | GEQO: effort is used to  
set the default for other GEQO parameters.
geqo_generations|  
0   | GEQO: number of  
iterations of the algorithm.
geqo_pool_size  |  
0   | GEQO: number of  
individuals in the population.
geqo_selection_bias |  
2   | GEQO: selective  
pressure within the population.

geqo_threshold  | 12
join_collapse_limit | 8
server_encoding |  
SQL_ASCII   | Sets the server  
(database) character set encoding.
server_version  |  
8.1.4   | Shows the server version.

shared_buffers  | 400
vacuum_cost_delay   |  
0   | Vacuum cost delay in  
milliseconds.
vacuum_cost_limit   |  
200 | Vacuum cost amount  
available before napping.
vacuum_cost_page_dirty  |  
20  | Vacuum cost for a page  
dirtied by vacuum.
vacuum_cost_page_hit|  
1   | Vacuum cost for a page  
found in the buffer cache.
vacuum_cost_page_miss   |  
10  | Vacuum cost for a page  
not found in the buffer cache.
wal_buffers |  
8   | Sets the number of disk- 
page buffers in shared memory for WAL.
wal_sync_method |  
fsync   | Selects the method used  
for forcing WAL updates out to disk.
work_mem|  
1024| Sets the maximum memory  
to be used for query workspaces.

zero_damaged_pages  | off


Thanks in advance for any help you can offer on this problem.

-Mike


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Fwd: [PERFORM] Not Picking Index

2007-02-16 Thread Mike Gargano



This is very similar to the problem I posted to this list  
yesterday.  Apparently, if you append an empty string to the column  
data in your WHERE clause it will force the planer to treat it as a  
filter and not an index cond.  It's extremely ugly, but this method  
doesn't seem to be anymore elegant.


-Mike
On Feb 16, 2007, at 9:46 AM, Alvaro Herrera wrote:


Gauri Kanekar escribió:

I want the planner to ignore a specific index.
I am testing some query output. For that purpose i dont want the  
index.

I that possible to ignore a index by the planner.


Sure:

BEGIN
DROP INDEX foo
SELECT 
ROLLBACK

--
Alvaro Herrerahttp:// 
www.CommandPrompt.com/

The PostgreSQL Company - Command Prompt, Inc.

---(end of  
broadcast)---

TIP 6: explain analyze is your friend






---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Beginner Question

2007-04-10 Thread Mike Gargano

Yeah, I have a lot of similar problems where an index that I have to  
speed up one query is used in another query where it actually slows  
it down.  Is there any way to ignore indexes for certain queries?   
We've been appending empty strings and adding zero's to the column  
data to force it into a filter, but it's a messy hack.  I've tried  
ordering the joins in the the most efficent way with a  
join_collapse_limit of 1, but it still does uses this index in  
parallel with searching an index on another table (i guess the  
planner figures it's saving some time up front).


-Mike

On Apr 9, 2007, at 8:45 PM, s d wrote:


Hi Jan,
Adding this Index slowed down things by a factor of 4.

Also, the performance is so horrible (example bellow) that i am
certain i am doing something wrong.

Does the following explain gives any ideas ?

Thanks

=#  EXPLAIN ANALYZE select * from word_association where (word1 ='the'
or word2='the') and count > 10;

QUERY PLAN
-- 
-- 


Bitmap Heap Scan on word_association  (cost=250.86..7256.59 rows=4624
width=22) (actual time=13.461..211.568 rows=6601 loops=1)
  Recheck Cond: (((word1)::text = 'the'::text) OR ((word2)::text =
'the'::text))
  Filter: (count > 10)
  ->  BitmapOr  (cost=250.86..250.86 rows=12243 width=0) (actual
time=9.052..9.052 rows=0 loops=1)
->  Bitmap Index Scan on word_association_index1_1
(cost=0.00..153.20 rows=7579 width=0) (actual time=5.786..5.786
rows=7232 loops=1)
  Index Cond: ((word1)::text = 'the'::text)
->  Bitmap Index Scan on word_association_index2_1
(cost=0.00..95.34 rows=4664 width=0) (actual time=3.253..3.253
rows=4073 loops=1)
  Index Cond: ((word2)::text = 'the'::text)
 Total runtime: 219.987 ms
(9 rows)


On 4/9/07, Jan de Visser <[EMAIL PROTECTED]> wrote:

On Monday 09 April 2007 05:09:53 s d wrote:
> Hi,
> I am trying to figure out how to debug a performance problem /  
use psql

> explain. The table in question is:
> # \d word_association;
>Table "public.word_association"
>  Column |  Type  | Modifiers
> ++
>  word1  | character varying(128) | not null
>  word2  | character varying(128) | not null
>  count  | integer| not null default 0
> Indexes:
> "word1_word2_comb_unique" unique, btree (word1, word2)
> "word1_hash_index" hash (word1)
> "word2_hash_index" hash (word2)
> "word_association_count_index" btree (count)
> "word_association_index1_1" btree (word1)
> "word_association_index2_1" btree (word2)
>
> It has multiple indices since i wanted to see which one the  
planner choses.

>
>
> # explain select * FROM word_association WHERE (word1 = 'bdss' OR
> word2 = 'bdss')  AND count >= 10;
>QUERY PLAN
>  
- 
--

>- Bitmap Heap Scan on word_association
> (cost=11.53..1192.09 rows=155 width=22) Recheck Cond:  
(((word1)::text =

> 'bdss'::text) OR ((word2)::text = 'bdss'::text))
>Filter: (count >= 10)
>->  BitmapOr  (cost=11.53..11.53 rows=364 width=0)
>  ->  Bitmap Index Scan on word_association_index1_1
> (cost=0.00..5.79 rows=190 width=0)
>Index Cond: ((word1)::text = 'bdss'::text)
>  ->  Bitmap Index Scan on word_association_index2_1
> (cost=0.00..5.67 rows=174 width=0)
>Index Cond: ((word2)::text = 'bdss'::text)
> (8 rows)
>
> The questions:
> 1. i can undestand where the cost=11.53 came from but where did the
> 1192.09 come form? The values are in milli right ?
> 2. the query takes  in reality much longer than 1 second.
>
> In short, it feels like something is very wrong here (i tried  
vacuum

> analyze and it didn't do much diff).
> any ideas ?

You need an index on (word1, word2, count). In your current setup  
it will have

to scan all rows that satisfy word1 and word2 to see if count >= 10.

jan


--
--
Jan de Visser [EMAIL PROTECTED]

  Baruk Khazad! Khazad ai-menu!
--



---(end of  
broadcast)---

TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Volunteer to build a configuration tool

2007-06-19 Thread Mike Benoit

It would be cool if someone started a generic configuration+benchmark
utility that could be used with virtually any software. Something like
this:

1. Create a configuration file parser for your specific application, be
it PostgreSQL, MySQL, Apache, whatever.

2. Create a min/max or X,Y,Z configuration option file that determines
which options to try. ie:

shared_buffers = 1000-2[1000] //1000 is the increment by
wal_buffers = 8,16,32
...

3. Create start/stop scripts for the specific application

4. Create a benchmark script for the application that returns relevant
metrics. In PGSQL's case, it would be tied in to PG bench probably. In
Apache's case AB. This utility would of course need to know how to read
the metrics to determine what is "best".

5. Run the utility. Ideally it would use some sort of genetic algorithm
to benchmark the application initially to get base numbers, then
one-by-one apply the different configuration options and re-run the
benchmark. It would output the metrics for each run and once it is done,
pick the best run and let you know what those settings are.

I don't think something like this would be very difficult at all to
write, and it would be modular enough to work for virtually any
application. For a database it would take a while to run depending on
the benchmark script, but even that you could have a "fast" and "slow"
benchmark script that could be easily run when you first install
PostgreSQL. This way too your not worrying about how much memory the
system has, or how many disks they have, etc... The system will figure
out the best possible settings for a specific benchmark. 

Not to mention people could easily take a SQL log of their own
application running, and use that as the benchmark to get "real world"
numbers. 

Any other sort of configuration "suggestion" utility will always have
the question of what do you recommend? How much data do you try to get
and what can be determined from that data to get the best settings? Is
it really going to be that much better then the default, at least enough
better to warrant the work and effort put into it?

On Mon, 2007-06-18 at 10:04 -0500, Campbell, Lance wrote:
> I am a Java Software architect, DBA, and project manager for the
> University of Illinois, Department of Web Services.  We use PostgreSQL
> to serve about 2 million pages of dynamic content a month; everything
> from calendars, surveys, forms, discussion boards, RSS feeds, etc.  I
> am really impressed with this tool.
> 
>  
> 
> The only major problem area I have found where PostgreSQL is really
> lacking is in “what should my initial configuration settings be?”  I
> realize that there are many elements that can impact a DBA’s specific
> database settings but it would be nice to have a “configuration tool”
> that would get someone up and running better in the beginning.  
> 
>  
> 
> This is my idea:
> 
>  
> 
> A JavaScript HTML page that would have some basic questions at the
> top:
> 
> 1) How much memory do you have?
> 
> 2) How many connections will be made to the database?
> 
> 3) What operating system do you use?
> 
> 4) Etc…
> 
>  
> 
> Next the person would press a button, “generate”, found below the
> questions.  The JavaScript HTML page would then generate content for
> two Iframes at the bottom on the page.  One Iframe would contain the
> contents of the postgresql.conf file.  The postgresql.conf settings
> would be tailored more to the individuals needs than the standard
> default file.  The second Iframe would contain the default settings
> one should consider using with their operating system.
> 
>  
> 
> My web team would be very happy to develop this for the PostgreSQL
> project.   It would have saved us a lot of time by having a
> configuration tool in the beginning.  I am willing to make this a very
> high priority for my team.
> 
>  
> 
> Thanks,
> 
>  
> 
> Lance Campbell
> 
> Project Manager/Software Architect
> 
> Web Services at Public Affairs
> 
> University of Illinois
> 
> 217.333.0382
> 
> http://webservices.uiuc.edu
> 
>  
> 
> 
-- 
Mike Benoit <[EMAIL PROTECTED]>

signature.asc
Description: This is a digitally signed message part

Re: [PERFORM] Help optimize view

2007-08-10 Thread Relyea, Mike

Oops.  Realized I posted the wrong SQL and EXPLAIN ANALYZE results.
Also forgot to mention that my "server" has 1.5 GB memory.


SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID",
"tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS
"Mottle_NMF"
   FROM "AnalysisModules"
   JOIN ("tblColors"
   JOIN ("tblTPNamesAndColors"
   JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text
= "PrintSamples"."TestPatternName"::text
   JOIN ("DigitalImages"
   JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" =
"PrintSampleAnalyses"."ImageID"
   JOIN ("ParameterNames"
   JOIN ("Measurements"
   JOIN "ParameterValues" ON "Measurements"."MeasurementID" =
"ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" =
"Measurements"."psaID") ON "PrintSamples"."PrintSampleID" =
"DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" =
"tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" =
"Measurements"."MetricID"
  GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID",
"tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName",
"ParameterNames"."ParameterName", "PrintSamples"."TestPatternName"
 HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND
"ParameterNames"."ParameterName"::text = 'NMF'::text AND
"tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~
'IQAF-TP8%'::text;


QUERY PLAN
HashAggregate (cost=519801.96..519898.00 rows=7683 width=70) (actual
time=106219.710..106249.456 rows=14853 loops=1)
 -> Hash Join (cost=286101.76..519667.51 rows=7683 width=70) (actual
time=50466.513..106111.635 rows=15123 loops=1)
Hash Cond: ("Measurements"."MetricID" =
"AnalysisModules"."MetricID")
-> Hash Join (cost=286099.98..519260.45 rows=87588 width=61) (actual
time=50466.417..106055.182 rows=15123 loops=1)
   Hash Cond: ("ParameterValues"."MeasurementID" =
"Measurements"."MeasurementID")
   -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21)
(actual time=143.017..55178.583 rows=289724 loops=1)
  -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1
width=17) (actual time=0.012..0.027 rows=1 loops=1)
 Filter: (("ParameterName")::text = 'NMF'::text)
  -> Bitmap Heap Scan on "ParameterValues"
(cost=8054.81..231033.70 rows=608089 width=12)

(actual time=142.986..54432.650 rows=289724 loops=1)
 Recheck Cond: ("ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID")
 -> Bitmap Index Scan on "PVParameterID_idx"
(cost=0.00..7902.79 rows=608089 width=0)

(actual time=109.178..109.178 rows=289724 loops=1)
Index Cond: ("ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID")
   -> Hash (cost=259861.12..259861.12 rows=1454724 width=48) (actual
time=50306.950..50306.950 rows=961097 loops=1)
  -> Hash Join (cost=8139.75..259861.12 rows=1454724 width=48)
(actual time=971.910..48649.190
rows=961097 loops=1)
 Hash Cond: ("Measurements"."psaID" =
"PrintSampleAnalyses"."psaID")
 -> Seq Scan on "Measurements" (cost=0.00..199469.09
rows=7541009 width=12)
(actual
time=0.047..35628.599 rows=7539838 loops=1)
 -> Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual
time=971.734..971.734 rows=18901 loops=1)
-> Hash Join (cost=5069.24..7949.67 rows=15206 width=44)
(actual time=590.003..938.744 rows=18901 loops=1)
   Hash Cond: ("PrintSampleAnalyses"."ImageID" =
"DigitalImages"."ImageID")
   -> Seq Scan on "PrintSampleAnalyses"
(cost=0.00..2334.25 rows=78825 width=8)

(actual time=0.021..130.335 rows=78859 loops=1)
   -> Hash (cost=4879.10..4879.10 rows=15211 width=44)
(actual time=589.940..589.940
rows=18901 loops=1)
  -> Hash Join (cost=2220.11..4879.10 rows=15211
width=44)
(actual
time=168.307..557.675 rows=18901 loops=1)
 Hash Cond: ("DigitalImages"."PrintSampleID" =
"PrintSamples"."PrintSampleID")
 -> Seq Scan on "DigitalImages"
(cost=0.00..1915.50 rows=78850 width=8)

(actual time=16.126..194.911 rows=78859 loops=1)
 -> Hash (cost=2029.98..2029.98 rows=15211
width=44)
(actual
time=152.128..152.128 rows=18645 loops=1)
-> Hash Join (cost=564.39..2029.98
rows=15211 width=44)
(actual
time=13.951..121.903 rows=18645 loops=1)
   Hash Cond:
(("PrintSamples"."TestPatternName")::text =

("tblTPNamesAndColors"."TestPatternName")::text)
   -> Bitmap Heap Scan on "Print

Re: [PERFORM] Help optimize view

2007-08-13 Thread Relyea, Mike

> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 10, 2007 5:44 PM
> To: Relyea, Mike
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Help optimize view 
> 
> Try increasing join_collapse_limit --- you have just enough 
> tables here that the planner isn't going to consider all 
> possible join orders.
> And it sorta looks like it's picking a bad one.
> 
>   regards, tom lane
> 

I tried increasing join_collapse_limit with no significant change in run
time although a different plan was chosen.

I've included a re-send of my original post, it looks like it didn't go
through - it's not in the archives.  I've also included an explain
analyze before and after the join_collapse_limit change.

I'm have the following view as part of a larger, aggregate query that is
running slower than I'd like.  There are 4 views total, each very
similar to this one.  Each of the views is then left joined with data
from some other tables to give me the final result that I'm looking for.

I'm hoping that if I can get some insight in to how to make this view
execute faster, I can apply that learning to the other 3 views and
thereby decrease the run time for my aggregate query.

I'm running 8.2.4 on Windows XP with a single 10K rpm disk dedicated to
the data directory and 1.5 GB memory.
shared_buffers = 12288
work_mem = 262144
maintenance_work_mem = 131072
max_fsm_pages = 204800
random_page_cost = 2.0
effective_cache_size = 1
autovacuum = on



EXPLAIN ANALYZE SELECT "PrintSamples"."MachineID",
"PrintSamples"."PrintCopyID", "tblColors"."ColorID",
avg("ParameterValues"."ParameterValue") AS "Mottle_NMF"
   FROM "AnalysisModules"
   JOIN ("tblColors"
   JOIN ("tblTPNamesAndColors"
   JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text
= "PrintSamples"."TestPatternName"::text
   JOIN ("DigitalImages"
   JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" =
"PrintSampleAnalyses"."ImageID"
   JOIN ("ParameterNames"
   JOIN ("Measurements"
   JOIN "ParameterValues" ON "Measurements"."MeasurementID" =
"ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" =
"Measurements"."psaID") ON "PrintSamples"."PrintSampleID" =
"DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" =
"tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" =
"Measurements"."MetricID"
  GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID",
"tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName",
"ParameterNames"."ParameterName", "PrintSamples"."TestPatternName"
 HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND
"ParameterNames"."ParameterName"::text = 'NMF'::text AND
"tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~
'IQAF-TP8%'::text;

HashAggregate (cost=519801.96..519898.00 rows=7683 width=70) (actual
time=121101.027..121146.385 rows=14853 loops=1)
 -> Hash Join (cost=286101.76..519667.51 rows=7683 width=70) (actual
time=52752.600..120989.713 rows=15123 loops=1)
Hash Cond: ("Measurements"."MetricID" =
"AnalysisModules"."MetricID")
-> Hash Join (cost=286099.98..519260.45 rows=87588 width=61) (actual
time=52752.502..120933.784 rows=15123 loops=1)
   Hash Cond: ("ParameterValues"."MeasurementID" =
"Measurements"."MeasurementID")
   -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21)
(actual time=165.510..67811.086 rows=289724 loops=1)
  -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1
width=17) (actual time=0.012..0.026 rows=1 loops=1)
 Filter: (("ParameterName")::text = 'NMF'::text)
  -> Bitmap Heap Scan on "ParameterValues"
(cost=8054.81..231033.70 rows=608089 width=12) (actual
time=165.481..67094.656 rows=289724 loops=1)
 Recheck Cond: ("ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID")

Re: [PERFORM] Help optimize view

2007-08-13 Thread Relyea, Mike

> >>> On Mon, Aug 13, 2007 at 10:35 AM, in message
> <[EMAIL PROTECTED]
> .net>, "Relyea, Mike" <[EMAIL PROTECTED]> wrote: 
> > I'm running 8.2.4 on Windows XP with 1.5 GB memory.
> > shared_buffers = 12288
> > effective_cache_size = 1
>  
> For starters, you might want to adjust one or both of these.  
> It looks to me like you're telling it that it only has 78.125 
> MB cache space.  That will make it tend to want to scan 
> entire tables, on the assumption that the cache hit ratio 
> will be poor for random reads.
>  
> Since you're on 8.2.4, you can use units of measure to help 
> make this easier to read.  You could, for example, say:
>  
> shared_buffers = 96MB
> effective_cache_size = 1200MB
>  
> -Kevin

I've increased shared_buffers to 128MB, and restarted the server.  My
total run time didn't really change.

SELECT set_config('effective_cache_size', '1000MB', false); I have
another app that uses about 500MB.
SELECT set_config('join_collapse_limit', '20', false);

explain analyze SELECT "PrintSamples"."MachineID",
"PrintSamples"."PrintCopyID", "tblColors"."ColorID",
avg("ParameterValues"."ParameterValue") AS "Mottle_NMF"
   FROM "AnalysisModules"
   JOIN ("tblColors"
   JOIN ("tblTPNamesAndColors"
   JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text
= "PrintSamples"."TestPatternName"::text
   JOIN ("DigitalImages"
   JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" =
"PrintSampleAnalyses"."ImageID"
   JOIN ("ParameterNames"
   JOIN ("Measurements"
   JOIN "ParameterValues" ON "Measurements"."MeasurementID" =
"ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" =
"Measurements"."psaID") ON "PrintSamples"."PrintSampleID" =
"DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" =
"tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" =
"Measurements"."MetricID"
  GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID",
"tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName",
"ParameterNames"."ParameterName", "PrintSamples"."TestPatternName"
 HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND
"ParameterNames"."ParameterName"::text = 'NMF'::text AND
"tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~
'IQAF-TP8%'::text;

HashAggregate (cost=489274.71..489372.94 rows=7858 width=70) (actual
time=117632.844..117663.228 rows=14853 loops=1)
 -> Hash Join (cost=256774.03..489137.20 rows=7858 width=70) (actual
time=50297.022..117530.665 rows=15123 loops=1)
Hash Cond: ("ParameterValues"."MeasurementID" =
"Measurements"."MeasurementID")
-> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21)
(actual time=172.341..66959.288 rows=289724 loops=1)
   -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17)
(actual time=0.020..0.034 rows=1 loops=1)
  Filter: (("ParameterName")::text = 'NMF'::text)
   -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70
rows=608089 width=12) (actual time=172.297..66241.380 rows=289724
loops=1)
  Recheck Cond: ("ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID")
  -> Bitmap Index Scan on "PVParameterID_idx"
(cost=0.00..7902.79 rows=608089 width=0) (actual time=147.690..147.690
rows=289724 loops=1)
 Index Cond: ("ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID")
-> Hash (cost=247087.84..247087.84 rows=130510 width=57) (actual
time=50109.022..50109.022 rows=15123 loops=1)
   -> Hash Join (cost=8141.52..247087.84 rows=130510 width=57)
(actual time=11095.022..50057.777 rows=15123 loops=1)
  Hash Cond: ("Measurements"."psaID" =
"PrintSampleAnalyses"."psaID")
  -> Hash Join (cost=1.77..234364.57 rows=661492 width=21)
(actual time=31.457..48123.380 rows=289724 loop

Re: [PERFORM] Help optimize view

2007-08-13 Thread Relyea, Mike

> >>> On Mon, Aug 13, 2007 at  1:48 PM, in message
> <[EMAIL PROTECTED]
> .net>, "Relyea, Mike" <[EMAIL PROTECTED]> wrote: 
> > I've increased shared_buffers to 128MB, and restarted the 
> server.  My 
> > total run time didn't really change.
>  
> Please forgive me if this guess doesn't help either, but 
> could you try eliminating the GROUP BY options which don't 
> echo values in the select value list, and move the HAVING 
> conditions to a WHERE clause?  Something like:
>  
> explain analyze
> SELECT
> "PrintSamples"."MachineID",
> "PrintSamples"."PrintCopyID",
> "tblColors"."ColorID",
> avg("ParameterValues"."ParameterValue") AS "Mottle_NMF"
>   FROM "AnalysisModules"
>   JOIN
>   (
> "tblColors"
> JOIN
> (
>   "tblTPNamesAndColors"
>   JOIN "PrintSamples"
> ON ("tblTPNamesAndColors"."TestPatternName"::text = 
> "PrintSamples"."TestPatternName"::text)
>   JOIN
>   (
> "DigitalImages"
> JOIN "PrintSampleAnalyses"
>   ON ("DigitalImages"."ImageID" = 
> "PrintSampleAnalyses"."ImageID")
> JOIN
> (
>   "ParameterNames"
>   JOIN
>   (
> "Measurements"
> JOIN "ParameterValues"
>   ON "Measurements"."MeasurementID" = 
> "ParameterValues"."MeasurementID"
>   ) ON "ParameterNames"."ParameterID" = 
> "ParameterValues"."ParameterID"
> ) ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID"
>   ) ON "PrintSamples"."PrintSampleID" = 
> "DigitalImages"."PrintSampleID"
> ) ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID"
>   ) ON "AnalysisModules"."MetricID" = "Measurements"."MetricID"
>   WHERE "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text
> AND "ParameterNames"."ParameterName"::text = 'NMF'::text
> AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text
> AND "tblColors"."ColorID" <> 3
>   GROUP BY
> "PrintSamples"."MachineID",
> "PrintSamples"."PrintCopyID",
> "tblColors"."ColorID"
> ;
>  
> I'd also be inclined to simplify the FROM clause by 
> eliminating the parentheses and putting the ON conditions 
> closer to where they are used, but that would be more for 
> readability than any expectation that it would affect the plan.
>  
> -Kevin

Thanks for your help.  Re-writing the view like this maybe bought me
something.  I've pasted the explain analyze results below.  Tough to
tell because I also increased some of the statistics.  From what Tom
says, it sounds like if I want the data returned faster I'm likely to
have to get beefier hardware.

ALTER TABLE "ParameterValues" ALTER "MeasurementID" SET STATISTICS 500;

ALTER TABLE "ParameterValues" ALTER "ParameterID" SET STATISTICS 500;

ANALYZE "ParameterValues";

ALTER TABLE "Measurements" ALTER COLUMN "MetricID" SET STATISTICS 500;

ALTER TABLE "Measurements" ALTER COLUMN "psaID" SET STATISTICS 500;

ANALYZE "Measurements";

Running the above SQL:

HashAggregate (cost=461541.53..461634.88 rows=7468 width=16) (actual
time=110002.041..110024.777 rows=14853 loops=1)
 -> Hash Join (cost=230789.57..461464.70 rows=7683 width=16) (actual
time=56847.814..109936.722 rows=15123 loops=1)
Hash Cond: ("Measurements"."MetricID" =
"AnalysisModules"."MetricID")
-> Hash Join (cost=230787.80..461057.64 rows=87588 width=20) (actual
time=56847.697..109884.122 rows=15123 loops=1)
   Hash Cond: ("ParameterValues"."MeasurementID" =
"Measurements"."MeasurementID")
   -> Nested Loop (cost=6353.15..234044.47 rows=454038 width=8)
(actual time=179.154..52780.680 rows=289724 loops=1)
  -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1
width=4) (actual time=0.012..0.027 rows=1 loops=1)
 Filter: (("ParameterName")::text = 'NMF'::text)
  -> Bitmap Heap Scan on

[PERFORM] Help optimize view

2007-08-17 Thread Relyea, Mike

I'm have the following view as part of a larger, aggregate query that is
running slower than I'd like.  There are 4 views total, each very
similar to this one.  Each of the views is then left joined with data
from some other tables to give me the final result that I'm looking for.

I'm hoping that if I can get some insight in to how to make this view
execute faster, I can apply that learning to the other 3 views and
thereby decrease the run time for my aggregate query.

I'm running 8.2.4 on Windows XP with a single 10K rpm disk dedicated to
the data directory.
shared_buffers = 12288
work_mem = 262144
maintenance_work_mem = 131072
max_fsm_pages = 204800
random_page_cost = 2.0
effective_cache_size = 1
autovacuum = on


SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID",
"tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS
"Mottle_NMF"

FROM "PrintSamples", "DigitalImages", "PrintSampleAnalyses",
"Measurements", "ParameterValues", "tblTPNamesAndColors", "tblColors",
"AnalysisModules", "ParameterNames"

WHERE "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID"
AND "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID"
AND "PrintSampleAnalyses"."psaID" = "Measurements"."psaID" 
AND "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID" 
AND "AnalysisModules"."MetricID" = "Measurements"."MetricID" 
AND "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID"
AND "tblTPNamesAndColors"."TestPatternName" =
"PrintSamples"."TestPatternName" 
AND "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID"

GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID",
"tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName",
"ParameterNames"."ParameterName", "PrintSamples"."TestPatternName"

HAVING "PrintSamples"."MachineID" = 4741 OR "PrintSamples"."MachineID" =
4745 AND "AnalysisModules"."AnalysisModuleName" = 'NMF' AND
"ParameterNames"."ParameterName" = 'NMF' AND "tblColors"."ColorID" <> 3
AND "PrintSamples"."TestPatternName" LIKE 'IQAF-TP8%';


EXPLAIN ANALYZE
HashAggregate (cost=6069.71..6069.82 rows=9 width=70) (actual
time=3230.868..3230.923 rows=31 loops=1)
 -> Nested Loop (cost=1.77..6069.55 rows=9 width=70) (actual
time=367.959..3230.476 rows=31 loops=1)
Join Filter: ("ParameterNames"."ParameterID" =
"ParameterValues"."ParameterID")
-> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17)
(actual time=0.020..0.032 rows=1 loops=1)
   Filter: (("ParameterName")::text = 'NMF'::text)
-> Nested Loop (cost=1.77..6059.09 rows=682 width=61) (actual
time=367.905..3230.154 rows=124 loops=1)
   -> Hash Join (cost=1.77..2889.96 rows=151 width=57) (actual
time=119.748..1447.130 rows=31 loops=1)
  Hash Cond: ("Measurements"."MetricID" =
"AnalysisModules"."MetricID")
  -> Nested Loop (cost=0.00..2880.22 rows=1722 width=48) (actual
time=55.278..1444.801 rows=1656 loops=1)
 -> Nested Loop (cost=0.00..226.25 rows=18 width=44) (actual
time=10.080..13.951 rows=31 loops=1)
-> Nested Loop (cost=0.00..151.33 rows=18 width=44)
(actual time=5.030..8.266 rows=31 loops=1)
   -> Nested Loop (cost=0.00..74.21 rows=18 width=44)
(actual time=2.253..4.822 rows=31 loops=1)
  Join Filter: ("tblColors"."ColorID" =
"tblTPNamesAndColors"."ColorID")
  -> Nested Loop (cost=0.00..48.11 rows=24 width=44)
(actual time=2.232..3.619 rows=43 loops=1)
 -> Index Scan using "PSMachineID_idx" on
"PrintSamples" (cost=0.00..7.99 rows=29 width=40)
(actual time=2.204..2.515 rows=43 loops=1)
Index Cond: ("MachineID" = 4741)
Filter: (("TestPatternName")::text ~~
'IQAF-TP8%'::text)
 -> Index Scan using "TPNTestPatternName" on
"tblTPNamesAndColors" 

(cost=0.00..1.37 rows=1 width=30)

(actual time=0.011..0.015 rows=1 loops=43)
Index Cond:
(("tblTPNamesAndColors"."TestPatternName")::text =
("PrintSamples"."TestPatternName")::text)
  -> Seq Scan on "tblColors" (cost=0.00..1.05 rows=3
width=4) 
(actual
time=0.004..0.010 rows=3 loops=43)
 Filter: ("ColorID" <> 3)
   -> Index Scan using "DIPrintSampleID_idx" on
"DigitalImages"

(cost=0.00..4.27 rows=1 width=8)

(actual time=0.100..0.102 rows=1 loops=31)
  Index Cond: ("PrintSamples"."PrintSampleID" =
"DigitalImages"."PrintSampleID")
-> Index Scan using "PSAImageID_idx" on
"PrintSampleAnalyses" (cost=0.00..4.15 rows=1 width=8)

(actual time=0.171..0.174 rows=1 loops=31)
   Index Cond: ("DigitalImages"."ImageID" =
"PrintSampleAnalyses"."ImageID")
 -> Index Scan using "MpsaID_idx" on "Measurements"
(cost=0.00..120.33 rows=2169 width=12)

(actual time=19.381..46.0

Re: [PERFORM] Help optimize view

2007-08-20 Thread Relyea, Mike

> From: Kevin Grittner [mailto:[EMAIL PROTECTED] 
>  
> First off, let's make sure we're optimizing the query you 
> really want to run.
> AND binds tighter than OR, so as you have it written, it is 
> the same as:
>  
>   HAVING "PrintSamples"."MachineID" = 4741
>   OR (   "PrintSamples"."MachineID" = 4745
>  AND "AnalysisModules"."AnalysisModuleName" = 'NMF'
>  AND "ParameterNames"."ParameterName" = 'NMF'
>  AND "tblColors"."ColorID" <> 3
>  AND "PrintSamples"."TestPatternName" LIKE 'IQAF-TP8%';
>  )
>  
> I fear you may really want it evaluate to:
>  
>   HAVING ("PrintSamples"."MachineID" = 4741 OR 
> "PrintSamples"."MachineID" = 4745)
>  AND "AnalysisModules"."AnalysisModuleName" = 'NMF'
>  AND "ParameterNames"."ParameterName" = 'NMF'
>  AND "tblColors"."ColorID" <> 3
>  AND "PrintSamples"."TestPatternName" LIKE 'IQAF-TP8%';

The query I really want to run is several times larger than this.  I
didn't think people would want to wade through pages and pages worth of
SQL and then explain analyze results - especially when I'm fairly
certain that optimizing this smaller part of the overall aggregate query
would provide me the help I was looking for.

You're right about what I really want the query to evaluate to.  I'll
give your suggestion a try.  Thanks.

Mike

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table

2011-01-05 Thread Mike Broers

Hello performance, I need help explaining the performance of a particular
query:

select * from messages where ((messages.topic = E'/x') AND
(messages.processed = 'f'))  ORDER BY messages.created_at ASC limit 10;


Table Structure:

  Column   |Type |
Modifiers
+-+
 id | integer | not null default
nextval('landing_page.messages_id_seq'::regclass)
 processed  | boolean |
 topic  | character varying(255)  |
 body   | text|
 created_at | timestamp without time zone |
 updated_at | timestamp without time zone |
Indexes:
"messages_pkey" PRIMARY KEY, btree (id)
"idx_landing_page_messages_created_at" btree (created_at)
"idx_messages_topic_processed" btree (topic, processed)


Table row count ~ 1million

When I run the query with limit 10 it skips the
idx_messages_topic_processed.
When I run the query with no limit, or with a limit above 20 it uses the
desired index.
On a different system with a much smaller data set (~200,000) i have to use
a limit of about 35 to use the desired index.

this is the good plan with no limit or 'sweet spot' limit

 Limit  (cost=2050.29..2050.38 rows=35 width=1266)
   ->  Sort  (cost=2050.29..2052.13 rows=737 width=1266)
 Sort Key: created_at
 ->  Bitmap Heap Scan on messages  (cost=25.86..2027.70 rows=737
width=1266)
   Recheck Cond: ((topic)::text = 'x'::text)
   Filter: (NOT processed)
   ->  Bitmap Index Scan on idx_messages_topic_processed
 (cost=0.00..25.68 rows=737 width=0)
 Index Cond: (((topic)::text = '/x'::text) AND
(processed = false))

This is the bad plan with limit 10
 Limit  (cost=0.00..1844.07 rows=30 width=1266)
   ->  Index Scan using idx_landing_page_messages_created_at on messages
 (cost=0.00..45302.70 rows=737 width=1266)
 Filter: ((NOT processed) AND ((topic)::text = 'x'::text))


Not sure if cost has anything to do with it, but this is set in
postgresql.conf.  I am hesitant to change this as I have inherited the
database from a previous dba and dont want to adversely affect things that
caused this to be set in a non default manner if possible.

#seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 3.0 # same scale as above



Why does the smaller limit cause it to skip the index?
Is there a way to help the planner choose the better plan?

Much appreciated,
Mike

Re: [PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table

2011-01-06 Thread Mike Broers

Thanks for the assistance.

Here is an explain analyze of the query with the problem limit:

production=# explain analyze select * from landing_page.messages where
((messages.topic = E'x') AND (messages.processed = 'f'))  ORDER BY
messages.created_at ASC limit 10;


   QUERY PLAN

--

 Limit  (cost=0.00..2891.06 rows=10 width=1340) (actual
time=207922.586..207922.586 rows=0 loops=1)
   ->  Index Scan using idx_landing_page_messages_created_at on messages
 (cost=0.00..449560.48 rows=1555 widt
h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
 Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
 Total runtime: 207949.413 ms
(4 rows)


and an explain analyze with a higher limit that hits the index:


production=# explain analyze select * from landing_page.messages where
((messages.topic = E'x') AND (messages.processed = 'f'))  ORDER BY
messages.created_at ASC limit 25;
  QUERY
PLAN

--
-
 Limit  (cost=5885.47..5885.54 rows=25 width=1340) (actual
time=80.931..80.931 rows=0 loops=1)
   ->  Sort  (cost=5885.47..5889.36 rows=1555 width=1340) (actual
time=80.926..80.926 rows=0 loops=1)
 Sort Key: created_at
 Sort Method:  quicksort  Memory: 17kB
 ->  Bitmap Heap Scan on messages  (cost=60.45..5841.59 rows=1555
width=1340) (actual time=64.404..64.
404 rows=0 loops=1)
   Recheck Cond: ((topic)::text = 'x'::text)
   Filter: (NOT processed)
   ->  Bitmap Index Scan on idx_messages_topic_processed
 (cost=0.00..60.06 rows=1550 width=0) (ac
tual time=56.207..56.207 rows=0 loops=1)
 Index Cond: (((topic)::text = 'x'::text) AND (p
rocessed = false))
 Total runtime: 88.051 ms
(10 rows)


overrides in postgresql.conf

shared_buffers = 256MB
work_mem = 8MB
max_fsm_pages = 200
max_fsm_relations = 2000
checkpoint_segments = 10
archive_mode = on
random_page_cost = 3.0
effective_cache_size = 6GB
default_statistics_target = 250
logging_collector = on


Forgot to mention this is Postgres 8.3.8 with 6GB memory on the server.

When you ask how big is the active portion of the database I am not sure how
to answer.  The whole database server is about 140GB, but there are other
applications that use this database, this particular table is about 1.6GB
and growing.  Currently there are jobs that query from this table every
minute.

Thanks again
Mike






On Wed, Jan 5, 2011 at 5:10 PM, Kevin Grittner
wrote:

> Mike Broers  wrote:
>
> > Hello performance, I need help explaining the performance of a
> > particular query
>
> You provided some of the information needed, but you should review
> this page and post a bit more:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions
>
> In particular, post the result of EXPLAIN ANALYZE, not just EXPLAIN.
> Also, showing all overrides in your postgresql.conf file is
> important, and some information about your hardware.  How big is the
> active portion of your database (the frequently read portion)?
>
> > Why does the smaller limit cause it to skip the index?
>
> Because the optimizer thinks the query will return rows sooner that
> way.
>
> > Is there a way to help the planner choose the better plan?
>
> You might get there by adjusting your memory settings and/or costing
> settings, but we need to see more information to know that.
>
> -Kevin
>

Re: [PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table

2011-01-10 Thread Mike Broers

Thanks for the suggestion,

created_at is a timestamp without time zone type column.  When I add +0 to
created at I get a cast error.  I am able to get the query to use the
desired index when increasing or removing the limit, and I am still looking
for the reason why that is happening.  Any advice or more information I can
supply please let me know.


ERROR:  operator does not exist: timestamp without time zone + integer
LINE 1: ...es.processed = 'f'))  ORDER BY messages.created_at+0 ASC lim...
 ^
HINT:  No operator matches the given name and argument type(s). You might
need to add explicit type casts.




From: "pasman pasmański" 
To: pgsql-performance@postgresql.org
Date: Fri, 7 Jan 2011 15:00:22 +0100
Subject: Re: plan question - query with order by and limit not choosing
index depends on size of limit, table
Try
order by created_at+0

On Thu, Jan 6, 2011 at 3:36 PM, Mike Broers  wrote:

> Thanks for the assistance.
>
> Here is an explain analyze of the query with the problem limit:
>
> production=# explain analyze select * from landing_page.messages where
> ((messages.topic = E'x') AND (messages.processed = 'f'))  ORDER BY
> messages.created_at ASC limit 10;
>
>
>  QUERY PLAN
>
>
> --
> 
>  Limit  (cost=0.00..2891.06 rows=10 width=1340) (actual
> time=207922.586..207922.586 rows=0 loops=1)
>->  Index Scan using idx_landing_page_messages_created_at on messages
>  (cost=0.00..449560.48 rows=1555 widt
> h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
>  Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
>  Total runtime: 207949.413 ms
> (4 rows)
>
>
> and an explain analyze with a higher limit that hits the index:
>
>
> production=# explain analyze select * from landing_page.messages where
> ((messages.topic = E'x') AND (messages.processed = 'f'))  ORDER BY
> messages.created_at ASC limit 25;
>   QUERY
> PLAN
>
>
> --
> -
>  Limit  (cost=5885.47..5885.54 rows=25 width=1340) (actual
> time=80.931..80.931 rows=0 loops=1)
>->  Sort  (cost=5885.47..5889.36 rows=1555 width=1340) (actual
> time=80.926..80.926 rows=0 loops=1)
>  Sort Key: created_at
>  Sort Method:  quicksort  Memory: 17kB
>  ->  Bitmap Heap Scan on messages  (cost=60.45..5841.59 rows=1555
> width=1340) (actual time=64.404..64.
> 404 rows=0 loops=1)
>Recheck Cond: ((topic)::text = 'x'::text)
>Filter: (NOT processed)
>->  Bitmap Index Scan on idx_messages_topic_processed
>  (cost=0.00..60.06 rows=1550 width=0) (ac
> tual time=56.207..56.207 rows=0 loops=1)
>  Index Cond: (((topic)::text = 'x'::text) AND (p
> rocessed = false))
>  Total runtime: 88.051 ms
> (10 rows)
>
>
> overrides in postgresql.conf
>
> shared_buffers = 256MB
> work_mem = 8MB
> max_fsm_pages = 200
> max_fsm_relations = 2000
> checkpoint_segments = 10
> archive_mode = on
> random_page_cost = 3.0
> effective_cache_size = 6GB
> default_statistics_target = 250
> logging_collector = on
>
>
> Forgot to mention this is Postgres 8.3.8 with 6GB memory on the server.
>
> When you ask how big is the active portion of the database I am not sure
> how to answer.  The whole database server is about 140GB, but there are
> other applications that use this database, this particular table is about
> 1.6GB and growing.  Currently there are jobs that query from this table
> every minute.
>
> Thanks again
> Mike
>
>
>
>
>
>
> On Wed, Jan 5, 2011 at 5:10 PM, Kevin Grittner <
> kevin.gritt...@wicourts.gov> wrote:
>
>> Mike Broers  wrote:
>>
>> > Hello performance, I need help explaining the performance of a
>> > particular query
>>
>> You provided some of the information needed, but you should review
>> this page and post a bit more:
>>
>> http://wiki.postgresql.org/wiki/SlowQueryQuestions
>>
>> In particular, post the result of EXPLAIN ANALYZE, not just EXPLAIN.
>> Also, showing all overrides in your postgresql.conf file is
>> important, and some information about your hardware.  How big is the
>> active portion of your database (the frequently read portion)?
>>
>> > Why does the smaller limit cause it to skip the index?
>>
>> Because the optimizer thinks the query will return rows sooner that
>> way.
>>
>> > Is there a way to help the planner choose the better plan?
>>
>> You might get there by adjusting your memory settings and/or costing
>> settings, but we need to see more information to know that.
>>
>> -Kevin
>>
>
>

Re: [PERFORM] plan question - query with order by and limit not choosing index depends on size of limit, table

2011-01-14 Thread Mike Broers

Thanks Robert, this is what I was looking for.  I will try these suggestions
and follow up if any of them are the silver bullet.

On Fri, Jan 14, 2011 at 7:11 AM, Robert Haas wrote:

> On Thu, Jan 6, 2011 at 4:36 PM, Mike Broers  wrote:
> > Thanks for the assistance.
> > Here is an explain analyze of the query with the problem limit:
> > production=# explain analyze select * from landing_page.messages where
> > ((messages.topic = E'x') AND (messages.processed = 'f'))  ORDER BY
> > messages.created_at ASC limit 10;
> >
> >QUERY PLAN
> >
> >
> --
> > 
> >  Limit  (cost=0.00..2891.06 rows=10 width=1340) (actual
> > time=207922.586..207922.586 rows=0 loops=1)
> >->  Index Scan using idx_landing_page_messages_created_at on messages
> >  (cost=0.00..449560.48 rows=1555 widt
> > h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
> >  Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
> >  Total runtime: 207949.413 ms
> > (4 rows)
>
> You're not the first person to have been bitten by this.  The
> optimizer thinks that rows WHERE NOT processed and topic = 'x' are
> reasonably common, so it figures that it can just index scan until it
> finds 10 of them.  But when it turns out that there are none at all,
> it ends up having to scan the entire index, which stinks big-time.
>
> The alternative plan is to use a different index to find ALL the
> relevant rows, sort them, and then take the top 10.   That would suck
> if there actually were tons of rows like this, but there aren't.
>
> So the root of the problem, in some sense, is that the planner's
> estimate of the selectivity of "NOT processed and topic = 'x'" is not
> very good.  Some things to try:
>
> - increase the statistics target for the "processed" and "topic"
> columns even higher
> - put the processed rows in one table and the not processed rows in
> another table
> - do something like SELECT * FROM (SELECT .. LIMIT 200 OFFSET 0) LIMIT
> 10 to try to fool the planner into planning based on the higher, inner
> limit
> - create a partial index on messages (topic) WHERE NOT processed and
> see if the planner will use it
>
> ...Robert
>

[PERFORM] Huge Data sets, simple queries

2006-01-27 Thread Mike Biamonte



Does anyone have any experience with extremely large data sets?
I'm mean hundreds of millions of rows.

The queries I need to run on my 200 million transactions are relatively
simple:

   select month, count(distinct(cardnum)) count(*), sum(amount) from
transactions group by month;

This query took 18 hours on PG 8.1 on a Dual Xeon, RHEL3, (2.4 Kernel) with
RAID-10 (15K drives)
and 12 GB Ram.  I was expecting it to take about 4 hours - based on some
experience with a
similar dataset on a different machine (RH9, PG7.3 Dual Xeon, 4GB RAM,
Raid-5 10K drives)

  This machine is COMPLETELY devoted to running these relatively simple
queries one at a
time. (No multi-user support needed!)I've been tooling with the various
performance settings:
effective_cache at 5GB, shared_buffers at 2 GB, workmem, sortmem at 1 GB
each.
( Shared buffers puzzles me a it bit - my instinct says to set it as high as
possible,
but everything I read says that "too high" can hurt performance.)

   Any ideas for performance tweaking in this kind of application would be
greatly appreciated.
We've got indexes on the fields being grouped, and always vacuum analzye
after building them.

   It's difficult to just "try" various ideas because each attempt takes a
full day to test.  Real
experience is needed here!

Thanks much,

Mike


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Huge Data sets, simple queries

2006-02-01 Thread Mike Rylander

On 2/1/06, Luke Lonergan <[EMAIL PROTECTED]> wrote:
[snip]
> This is actually interesting overall - I think what this might be showing is
> that the Linux SW RAID1 is alternating I/Os to the mirror disks from
> different processes (LWP or HWP both maybe?), but not within one process.

I can confirm this behavior after looking at my multipathed fibre
channel SAN.  To the best of my knowledge, the multipathing code uses
the same underlying I/O code as the Linux SW RAID logic.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] The order of fields around the "=" in the WHERE conditions

2006-04-03 Thread Mike Quinn

version


 PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
3.3.6
(1 row)


-- The order of fields around the "=" in the WHERE conditions
-- affects the query plan. I would rather not have to worry about
-- that. It seems that it puts me back in the place of having to
-- figure what join order is best. Here are two sql statements and
-- the query plan that is generated for each. The worst of the two
-- is first and the best one is second.
-- Mike Quinn

-- the worst way --

EXPLAIN ANALYZE
SELECT
Locts.id,
Commtypes.name
FROM
Growers
,
Locts
,
Crops
,
Commtypes
WHERE
Growers.id = '0401606'
AND
-- Commtypes.number = Crops.Commtype
Crops.Commtype = Commtypes.number
AND
Locts.number = Crops.Loct
-- Crops.Loct = Locts.number
AND
Growers.number = Locts.Grower
-- Locts.Grower = Growers.number
;
 QUERY
PLAN  
-
 Nested Loop  (cost=18934.81..647002.69 rows=1045 width=20) (actual
time=525.267..4079.051 rows=69 loops=1)
   Join Filter: ("outer".commtype = "inner".number)
   ->  Nested Loop  (cost=18923.21..631988.31 rows=1310 width=18)
(actual time=523.867..4036.005 rows=69 loops=1)
 Join Filter: ("inner".number = "outer".loct)
 ->  Seq Scan on crops  (cost=0.00..7599.46 rows=258746
width=24) (actual time=0.006..278.656 rows=258746 loops=1)
 ->  Materialize  (cost=18923.21..18924.25 rows=104 width=18)
(actual time=0.001..0.007 rows=9 loops=258746)
   ->  Nested Loop  (cost=5503.02..18923.11 rows=104
width=18) (actual time=0.061..523.703 rows=9 loops=1)
 Join Filter: ("outer".number = "inner".grower)
 ->  Index Scan using growers_id on growers 
(cost=0.00..3.05 rows=4 width=12) (actual time=0.016..0.024 rows=1
loops=1)
   Index Cond: ((id)::text = '0401606'::text)
 ->  Materialize  (cost=5503.02..7451.58
rows=112456 width=30) (actual time=0.007..433.970 rows=112456 loops=1)
   ->  Seq Scan on locts  (cost=0.00..4566.56
rows=112456 width=30) (actual time=0.003..176.771 rows=112456 loops=1)
   ->  Materialize  (cost=11.60..16.69 rows=509 width=26) (actual
time=0.001..0.287 rows=509 loops=69)
 ->  Seq Scan on commtypes  (cost=0.00..11.09 rows=509
width=26) (actual time=0.021..0.672 rows=509 loops=1)
 Total runtime: 4081.766 ms
(15 rows)

-- the best way --

EXPLAIN ANALYZE
SELECT
Locts.id,
Commtypes.name
FROM
Growers
,
Locts
,
Crops
,
Commtypes
WHERE
Growers.id = '0401606'
AND
Commtypes.number = Crops.Commtype
-- Crops.Commtype = Commtypes.number
AND
-- Locts.number = Crops.Loct
Crops.Loct = Locts.number
AND
-- Growers.number = Locts.Grower
Locts.Grower = Growers.number
;
   QUERY
PLAN   

 Nested Loop  (cost=0.00..11224.18 rows=1045 width=20) (actual
time=0.259..1.172 rows=69 loops=1)
   ->  Nested Loop  (cost=0.00..5717.09 rows=1310 width=18) (actual
time=0.205..0.466 rows=69 loops=1)
 ->  Nested Loop  (cost=0.00..31.90 rows=104 width=18) (actual
time=0.141..0.171 rows=9 loops=1)
   ->  Index Scan using growers_id on growers 
(cost=0.00..3.05 rows=4 width=12) (actual time=0.078..0.080 rows=1
loops=1)
 Index Cond: ((id)::text = '0401606'::text)
   ->  Index Scan using locts_grower on locts 
(cost=0.00..6.15 rows=85 width=30) (actual time=0.058..0.070 rows=9
loops=1)
 Index Cond: (locts.grower = "outer".number)
 ->  Index Scan using crops_loct on crops  (cost=0.00..54.13
rows=43 width=24) (actual time=0.012..0.022 rows=8 loops=9)
   Index Cond: (crops.loct = "outer".number)
   ->  Index Scan using commtypes_number_key on commtypes 
(cost=0.00..4.19 rows=1 width=26) (actual time=0.006..0.007 rows=1
loops=69)
 Index Cond: (commtypes.number = "outer".commtype)
 Total runtime: 1.308 ms
(12 rows)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] The order of fields around the "=" in the WHERE

2006-04-04 Thread Mike Quinn

The datatype of the join columns is a user defined type and there are no
commutators defined. I will fix that and retest. Thanks for the
insight.

Mike Quinn

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] The order of fields around the "=" in the WHERE

2006-04-04 Thread Mike Quinn

version


 PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
3.3.6
(1 row)

-- After commutator added to operators of user defined type,
-- the order of fields around the "=" in WHERE conditions
-- no longer affect the query plan.

-- previously the worst way --

EXPLAIN ANALYZE
SELECT
Locts.id,
Commtypes.name
FROM
Growers
,
Locts
,
Crops
,
Commtypes
WHERE
Growers.id = '0401606'
AND
-- Commtypes.number = Crops.Commtype
Crops.Commtype = Commtypes.number
AND
Locts.number = Crops.Loct
-- Crops.Loct = Locts.number
AND
Growers.number = Locts.Grower
-- Locts.Grower = Growers.number
;
   QUERY
PLAN   

 Nested Loop  (cost=0.00..17467.00 rows=954 width=20) (actual
time=0.253..1.155 rows=69 loops=1)
   ->  Nested Loop  (cost=0.00..12413.93 rows=1202 width=18) (actual
time=0.191..0.472 rows=69 loops=1)
 ->  Nested Loop  (cost=0.00..32.51 rows=104 width=18) (actual
time=0.142..0.171 rows=9 loops=1)
   ->  Index Scan using growers_id on growers 
(cost=0.00..3.05 rows=4 width=12) (actual time=0.065..0.067 rows=1
loops=1)
 Index Cond: ((id)::text = '0401606'::text)
   ->  Index Scan using locts_grower on locts 
(cost=0.00..6.23 rows=91 width=30) (actual time=0.070..0.085 rows=9
loops=1)
 Index Cond: ("outer".number = locts.grower)
 ->  Index Scan using crops_loct on crops  (cost=0.00..118.53
rows=42 width=24) (actual time=0.011..0.021 rows=8 loops=9)
   Index Cond: ("outer".number = crops.loct)
   ->  Index Scan using commtypes_number_key on commtypes 
(cost=0.00..4.19 rows=1 width=26) (actual time=0.006..0.007 rows=1
loops=69)
 Index Cond: ("outer".commtype = commtypes.number)
 Total runtime: 1.299 ms
(12 rows)

-- previously the best way --

EXPLAIN ANALYZE
SELECT
Locts.id,
Commtypes.name
FROM
Growers
,
Locts
,
Crops
,
Commtypes
WHERE
Growers.id = 0401606
AND
Commtypes.number = Crops.Commtype
-- Crops.Commtype = Commtypes.number
AND
-- Locts.number = Crops.Loct
Crops.Loct = Locts.number
AND
-- Growers.number = Locts.Grower
Locts.Grower = Growers.number
;
   QUERY
PLAN   

 Nested Loop  (cost=0.00..17467.00 rows=954 width=20) (actual
time=0.063..0.947 rows=69 loops=1)
   ->  Nested Loop  (cost=0.00..12413.93 rows=1202 width=18) (actual
time=0.050..0.318 rows=69 loops=1)
 ->  Nested Loop  (cost=0.00..32.51 rows=104 width=18) (actual
time=0.036..0.064 rows=9 loops=1)
   ->  Index Scan using growers_id on growers 
(cost=0.00..3.05 rows=4 width=12) (actual time=0.018..0.020 rows=1
loops=1)
 Index Cond: ((id)::text = '0401606'::text)
   ->  Index Scan using locts_grower on locts 
(cost=0.00..6.23 rows=91 width=30) (actual time=0.012..0.023 rows=9
loops=1)
 Index Cond: (locts.grower = "outer".number)
 ->  Index Scan using crops_loct on crops  (cost=0.00..118.53
rows=42 width=24) (actual time=0.007..0.018 rows=8 loops=9)
   Index Cond: (crops.loct = "outer".number)
   ->  Index Scan using commtypes_number_key on commtypes 
(cost=0.00..4.19 rows=1 width=26) (actual time=0.005..0.006 rows=1
loops=69)
 Index Cond: (commtypes.number = "outer".commtype)
 Total runtime: 1.091 ms
(12 rows)



>>> "Mike Quinn" <[EMAIL PROTECTED]> 4/4/06 10:18:30 AM >>>
The datatype of the join columns is a user defined type and there are
no
commutators defined. I will fix that and retest. Thanks for the
insight.

Mike Quinn

---(end of
broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] High CPU with 7.4.1 after running for about 2 weeks

2004-03-13 Thread Mike Bridge

I've been running Postgresql 7.4.1 for a couple weeks after upgrading
from 7.2.  I noticed today that the postmaster had been using 99% of
the dual CPUs (on a PowerEdge 2650) non-stop for the last couple days.
I stopped all the clients, and it didn't abate---even with no
connections---so I restarted the postmaster.  Now everything is
running smoothly again.

Is there anything that might accumulate after two weeks that might
cause postgresql to thrash?  I'm running pg_autovacuum, so the
database itself should be nice and clean.  It isn't connections,
because I restarted the clients a few times without success.  I've
been running a long time on 7.2 with essentially the same
configuration (minus pg_autovacuum) without any problems

Thanks for any help,

-Mike

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] High CPU with 7.4.1 after running for about 2 weeks

2004-03-15 Thread Mike Bridge

>Since the postmaster is a single unthreaded process, it's quite
>impossible for it to take up 100% of two CPUs.  Could you be more
>precise about which processes were eating CPU, and what they were
>doing according to the available state data?  (ps auxww and
>pg_stat_activity can be helpful tools.)
>
>   regards, tom lane

I shut down all our clients (all java except one in perl), and
pg_stat_activity showed that there was still one query active.  That's
a good table to know about!  Anyway, it didn't end until I sent it a
TERM signal.  I assume this means there's a runaway query somewhere,
which I'll have to hunt down.

But if the client dies, doesn't postgresql normally terminate the
query that that client initiated?  Or do I need to set
statement_timeout?

(As for the 100% CPU, I was confused by the fact that I was getting
two lines in "top" (on Linux) with 99% utilization---I assume with two
runaway queries.)

Thanks for your help!

-Mike




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-03 Thread Mike Nolan

> Almost any cross dbms migration shows a drop in performance. The engine
> effectively trains developers and administrators in what works and what
> doesn't. The initial migration thus compares a tuned to an untuned version.

I think it is also possible that Microsoft has more programmers working
on tuning issues for SQL Server than PostgreSQL has working on the 
whole project.
--
Mike Nolan

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Long running queries degrade performance

2004-04-16 Thread Mike Nolan

> We have a web app with a postgres backend.  Most queries have subsecond 
> response times through the web even with high usage.  Every once in awhile 
> someone will run either an ad-hoc query or some other long running db 
> process.  

Are you sure it is postgres where the delay is occurring?  I ask this
because I also have a web-based front end to postgres, and while most of
the time the queries respond in about a second every now and then I see
one that takes much longer, sometimes 10-15 seconds.

I've seen this behavior on both my development system and on the
production server.  

The same query a while later might respond quickly again.

I'm not sure where to look for the delay, either, and it is intermittent
enough that I'm not even sure what monitoring techniques to use.
--
Mike Nolan

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Long running queries degrade performance

2004-04-16 Thread Mike Nolan

> Fairly sure, when it is happening, postgres usually is taking up the top slots 
> for cpu usage as reported by top.  Perhaps there is a better way to monitor 
> this?

Given the intermittent nature of the problem and its relative brevity 
(5-10 seconds), I don't know whether top offers the granularity needed to
locate the bottleneck.

> likely you have a situation where something else is happening which blocks 
> the current thread.  

It happens on my development system, and I'm the only one on it.  I know
I've seen it on the production server, but I think it is a bit more
common on the development server, though that may be a case of which system
I spend the most time on.  (Also, the production server is 1300 miles away
with a DSL connection, so I may just be seeing network delays some of
the time there.)

> Both of these were triggered by users double clicking links in our 
> web app and were fixed by a better implementation.  Perhaps something like 
> that is causing what you are seeing.

My web app traps double-clicks in javascript and ignores all but the first one.
That's because some of the users have mice that give double-clicks even
when they only want one click.
--
Mike Nolan

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Trigger & Function

2004-06-01 Thread Mike Nolan

> My problem is I defined the "before" and "after"
> fields in the audit table as TEXT and when I try to move NEW or OLD into
> these fields I get the error "NEW used in query that is not in a rule".  

You're trying to insert record data into a text field, that doesn't work.
OLD and NEW can be used as either record identifiers (as in RETURN OLD)
or column qualifiers (as in OLD.colname), but you can't intermingle them.

I don't think postgres (pl/pgsql) has row-to-variable and variable-to-row 
functions like serialize and unserialize, that's probably what you'd need. 
It would probably be necessary to write something like that in C, since 
at this point pl/perl cannot be used for trigger functions.  

I've not tried using pl/php yet, the announcement for it says it can be 
used for trigger functions.  

My first thought is that even if there was a serialize/unserialize 
capabiity you might be able to write something using it that creates 
the log entry but not anything that allows you to query the log for 
specific column or row entries.

It would probably require a MAJOR extension of SQL to add it to pg,
as there would need to be qualifiers that can be mapped to specific
tables and columns.  Even if we had that, storing values coming from 
multiple tables into a single audit table would present huge challenges.

I've found only two ways to implement audit logs:

1.  Have separate log tables that match the structure of
the tables they are logging.

2.  Write a trigger function that converts columns to something you can
store in a common log table.  (I've not found a way to do this without
inserting one row for each column being logged, though.)
--
Mike Nolan

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] vacuum full 100 mins plus?

2004-07-15 Thread Mike Rylander

Tom Lane wrote:

> Christopher Browne <[EMAIL PROTECTED]> writes:
>> A long time ago, in a galaxy far, farpliers [EMAIL PROTECTED] (Patrick
>> Hatcher) wrote:
>>> Answered my own question.  I gave up the vacuum full after 150 mins.  I
>>> was able to export to a file, vacuum full the empty table, and reimport
>>> in less
>>> than 10 mins.  I suspect the empty item pointers and the sheer number of
>>> removable rows was causing an issue.
> 
>> In that case, you'd be a little further better off if the steps were:
>>  - drop indices;
>>  - copy table to file (perhaps via pg_dump -t my_table);
>>  - truncate the table, or drop-and-recreate, both of which make
>>it unnecessary to do _any_ vacuum of the result;
>>  - recreate indices, probably with SORT_MEM set high, to minimize
>>paging to disk
>>  - analyze the table (no need to vacuum if you haven't created any
>>dead tuples)
>>  - cut SORT_MEM back down to "normal" sizes
> 
> Rather than doing all this manually, you can just CLUSTER on any handy
> index.  In 7.5, another possibility is to issue one of the forms of
> ALTER TABLE that force a table rewrite.
> 
> The range of usefulness of VACUUM FULL is really looking narrower and
> narrower to me.  I can foresee a day when we'll abandon it completely.

I would love to see this 10lb sledge hammer go away when we have enough tiny
screwdrivers and needlenose pliers to make it obsolete!

> 
> regards, tom lane
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Performance Bottleneck

2004-08-06 Thread Mike Benoit

On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote:

> > The queries themselves are simple, normally drawing information from one 
> > table with few conditions or in the most complex cases using joins on 
> > two table or sub queries.   These behave very well and always have, the 
> > problem is that these queries take place in rather large amounts due to 
> > the dumb nature of the scripts themselves.
> 
> Show us the explain analyze on that queries, how many rows the tables are
> containing, the table schema could be also usefull.
> 

If the queries themselves are optimized as much as they can be, and as
you say, its just the sheer amount of similar queries hitting the
database, you could try using prepared queries for ones that are most
often executed to eliminate some of the overhead. 

I've had relatively good success with this in the past, and it doesn't
take very much code modification.

-- 
Mike Benoit <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Mike Benoit

On Fri, 2004-08-06 at 23:18 +, Martin Foster wrote:
> Mike Benoit wrote:
> 
> > On Wed, 2004-08-04 at 17:25 +0200, Gaetano Mendola wrote:
> > 
> > 
> >>>The queries themselves are simple, normally drawing information from one 
> >>>table with few conditions or in the most complex cases using joins on 
> >>>two table or sub queries.   These behave very well and always have, the 
> >>>problem is that these queries take place in rather large amounts due to 
> >>>the dumb nature of the scripts themselves.
> >>
> >>Show us the explain analyze on that queries, how many rows the tables are
> >>containing, the table schema could be also usefull.
> >>
> > 
> > 
> > If the queries themselves are optimized as much as they can be, and as
> > you say, its just the sheer amount of similar queries hitting the
> > database, you could try using prepared queries for ones that are most
> > often executed to eliminate some of the overhead. 
> > 
> > I've had relatively good success with this in the past, and it doesn't
> > take very much code modification.
> > 
> 
> One of the biggest problems is most probably related to the indexes. 
> Since the performance penalty of logging the information needed to see 
> which queries are used and which are not is a slight problem, then I 
> cannot really make use of it for now.
> 
> However, I am curious how one would go about preparing query?   Is this 
> similar to the DBI::Prepare statement with placeholders and simply 
> changing the values passed on execute?  Or is this something database 
> level such as a view et cetera?
> 

Yes, always optimize your queries and GUC settings first and foremost.
Thats where you are likely to gain the most performance. After that if
you still want to push things even further I would try prepared queries.
I'm not familiar with DBI::Prepare at all, but I don't think its what
your looking for.

This is what you want:
http://www.postgresql.org/docs/current/static/sql-prepare.html


-- 
Mike Benoit <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Partitioning

2004-09-16 Thread Mike Rylander

On 15 Sep 2004 23:55:24 -0400, Greg Stark <[EMAIL PROTECTED]> wrote:
> 
> "J. Andrew Rogers" <[EMAIL PROTECTED]> writes:
> 
> > We do something very similar, also using table inheritance
> 
> I have a suspicion postgres's table inheritance will end up serving as a good
> base for a partitioned table feature. Is it currently possible to query which
> subtable a record came from though?

>From the docs on http://www.postgresql.org/docs/7.4/static/ddl-inherit.html :

... In some cases you may wish to know which table a particular row
originated from. There is a system column called TABLEOID in each
table which can tell you the originating table:

SELECT c.tableoid, c.name, c.altitude
FROM cities c
WHERE c.altitude > 500;

which returns:

 tableoid |   name| altitude
--+---+--
   139793 | Las Vegas | 2174
   139793 | Mariposa  | 1953
   139798 | Madison   |  845

(If you try to reproduce this example, you will probably get different
numeric OIDs.) By doing a join with pg_class you can see the actual
table names:

SELECT p.relname, c.name, c.altitude
FROM cities c, pg_class p
WHERE c.altitude > 500 and c.tableoid = p.oid;

which returns:

 relname  |   name| altitude
--+---+--
 cities   | Las Vegas | 2174
 cities   | Mariposa  | 1953
 capitals | Madison   |  845

--miker

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] COPY slows down?

2004-10-11 Thread Mike Harding


I just ran a COPY of a million records several times, and each time I
ran it it ran apparently exponentially slower.  If I do an insert of
10 million records, even with 2 indexes (same table) it doesn't appear
to slow down at all.  Any ideas?

- Mike H.

(I apologize for the ^Hs)

Script started on Wed Oct  6 08:37:32 2004
bash-3.00$ psql
Welcome to psql 7.4.5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help on internal slash commands
   \g or terminate with semicolon to execute query
   \q to quit

mvh=# \timing
Timing is on.
mvh=# \timingreindex table bgtest;mvh=# [2Pdelete from bgtest;mvh=# [4hcopy 
bgtest from '/home/mvh/database[4lstuff/dbdmp/bgdump';
COPY
Time: 69796.130 ms
mvh=# vacuum analyze;
VACUUM
Time: 19148.621 ms
mvh=# vacuum analyze;mvh=# [4hcopy bgtest from 
'/home/mvh/databasestuff[4l/dbdmp/bgdump';
COPY
Time: 89189.939 ms
mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K
VACUUM
Time: 26814.670 ms
mvh=# vacuum analyze;mvh=# [4hcopy bgtest from 
'/home/mvh/databasestuff[4l/dbdmp/bgdump';
COPY
Time: 131131.982 ms
mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K
VACUUM
Time: 64997.264 ms
mvh=# vacuum analyze;mvh=# [4hcopy bgtest from 
'/home/mvh/databasestuff[4l/dbdmp/bgdump';
COPY
Time: 299977.697 ms
mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K
VACUUM
Time: 103541.716 ms
mvh=# vacuum analyze;mvh=# [4hcopy bgtest from 
'/home/mvh/databasestuff[4l/dbdmp/bgdump';
COPY
Time: 455292.600 ms
mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K
VACUUM
Time: 138910.015 ms
mvh=# vacuum analyze;mvh=# [4hcopy bgtest from 
'/home/mvh/databasestuff[4l/dbdmp/bgdump';
COPY
Time: 612119.661 ms
mvh=# copy bgtest from '/home/mvh/databasestuff/dbdmp/bgdump';mvh=# vacuum analyze;[K
VACUUM
Time: 151331.243 ms
mvh=# \q
bash-3.00$ exit

Script done on Wed Oct  6 10:43:04 2004


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Does PostgreSQL run with Oracle?

2004-10-15 Thread Mike Rylander

On Fri, 15 Oct 2004 11:54:44 -0500, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> My basic question to the community is "is PostgreSQL approximately as fast
> as Oracle?"

My personal experience comparing PG to Oracle is across platforms,
Oracle on Sun/Solaris (2.7, quad-proc R440) and PG on Intel/Linux (2.6
kernel, dual P3/1GHz).  When both were tuned for the specific app I
saw a 45% speedup after switching to PG.  This was with a customized
CRM and System Monitoring application serving ~40,000 trouble tickets
and monitoring 5,000 metric datapoints every 5-30 minutes.

The hardware was definitely not comparable (the Intel boxes have more
horsepower and faster disks), but dollar for dollar, including support
costs, PG is the winner by a BIG margin.  YMMV, of course, and my
results are apparently above average.

Another big plus I found was that PG is much easier to admin as long
as you turn on pg_autovacuum.

--miker

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] preloading indexes

2004-11-03 Thread Mike Benoit

If your running Linux, and kernel 2.6.x, you can try playing with the:

/proc/sys/vm/swappiness

setting.

My understanding is that:

echo "0" > /proc/sys/vm/swappiness

Will try to keep all in-use application memory from being swapped out
when other processes query the disk a lot.

Although, since PostgreSQL utilizes the disk cache quite a bit, this may
not help you. 


On Wed, 2004-11-03 at 15:53 -0500, Tom Lane wrote:
> <[EMAIL PROTECTED]> writes:
> > The caching appears to disappear overnight.
> 
> You've probably got cron jobs that run late at night and blow out your
> kernel disk cache by accessing a whole lot of non-Postgres stuff.
> (A nightly disk backup is one obvious candidate.)  The most likely
> solution is to run some cron job a little later to exercise your
> database and thereby repopulate the cache with Postgres files before
> you get to work ;-)
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
-- 
Mike Benoit <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

[PERFORM] int4 in a GiST index

2004-11-10 Thread Mike Rylander

Hello all,

I am using tsearch2 to (imagine this... :) index a text field.  There
is also a, for lack of a better name, "classification" field called
'field' that will be used to group certain rows together.

CREATE TABLE biblio.metarecord_field_entry (
record  BIGINT  REFERENCES biblio.metarecord (id)
ON UPDATE CASCADE
ON DELETE SET NULL
DEFERRABLE
INITIALLY DEFERRED,
field   INT NOT NULL
REFERENCES biblio.metarecord_field_map (id)
ON UPDATE CASCADE
ON DELETE CASCADE
DEFERRABLE
INITIALLY DEFERRED,
value   TEXT,
value_fti   tsvector,
source  BIGINT  NOT NULL
REFERENCES biblio.record (id)
ON UPDATE CASCADE
ON DELETE CASCADE
DEFERRABLE
INITIALLY DEFERRED
) WITHOUT OIDS;


Because there will be "or" queries against the 'value_fti' I want to
create a multi-column index across the tsvector and classification
columns as that should help with selectivity.  But because there is no
GiST opclass for INT4 the index creation complains thusly:

  oils=# CREATE INDEX metarecord_field_entry_value_and_field_idx ON
biblio.metarecord_field_entry USING GIST (field, value_fti);
  ERROR:  data type integer has no default operator class for access
method "gist"
  HINT:  You must specify an operator class for the index or define a
default operator class for the data type.

I attempted to give it the 'int4_ops' class, but that also complains:

  oils=# CREATE INDEX metarecord_field_entry_value_and_field_idx ON
biblio.metarecord_field_entry USING GIST (value_fti, field int4_ops);
  ERROR:  operator class "int4_ops" does not exist for access method "gist"

I couldn't find any info in the docs (7.4 and 8.0.0b4) for getting
GiST to index standard integers.  I'm sure this has been done before,
but I've note found the magic spell.  Of course, I may just be barking
up the wrong tree altogether...

Thanks in advance!


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] int4 in a GiST index

2004-11-10 Thread Mike Rylander

On Wed, 10 Nov 2004 18:50:28 -0800 (PST), George Essig
<[EMAIL PROTECTED]> wrote:
> Mike Rylander wrote:
> 
> > I want to create a multi-column index across the tsvector and classification
> > columns as that should help with selectivity.  But because there is no
> > GiST opclass for INT4 the index creation complains thusly:
> 
> Install contrib/btree_gist along with contrib/tsearch2 to create a 
> multicolumn index on the in4
> and the tsvector columns.  See the following for an example:
> 
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/oscon_tsearch2/multi_column_index.html
> 
> George Essig
> 

Thanks a million.  I had actually just found the answer after some
more googling, but I hadn't seen that page and it happens to be
exactly what I wanted.

As a side note I'd like to thank everyone here (and especially George,
in this case).   I've been on these lists for quite a while and I'm
always amazed at the speed, accuracy and precision of the answers on
the PG mailing lists.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Mike Rylander

On 17 Nov 2004 03:08:20 -0500, Greg Stark <[EMAIL PROTECTED]> wrote:
> Josh Berkus <[EMAIL PROTECTED]> writes:
> 
> > So memcached becomes a very good place to stick data that's read often but 
> > not
> > updated often, or alternately data that changes often but is disposable.   
> > An
> > example of the former is a user+ACL list; and example of the latter is web
> > session information ... or simple materialized views.
> 
> I would like very much to use something like memcached for a materialized view
> I have. The problem is that I have to join it against other tables.
> 
> I've thought about providing a SRF in postgres to read records out of
> memcached but I'm unclear it would it really help at all.
> 
> Has anyone tried anything like this?

I haven't tried it yet, but I plan too.  An intersting case might be
to use plperlu to interface with memcached and store hashes in the
cache via some external process, like a CGI script.  Then just define
a TYPE for the perl SRF to return, and store the data as an array of
hashes with keys matching the TYPE.

A (perhaps useless) example could then be something like:

CREATE TYPE user_info AS ( sessionid TEXT,  userid INT, lastaccess
TIMESTAMP, lastrequest TEXT);

CREATE FUNCTION get_user_info_by_session ( TEXT) RETURNS SETOF user_info AS $$
  use Cache::Memcached;

  my $session = shift;

  my $c = $_SHARED{memcached} || Cache::Memcached->new( {servers =>
'127.0.0.1:'} );

  my $user_info = $m->get('web_access_list');

  # $user_info looks like
  # [ {userid => 5, lastrequest => 'http://...', lastaccess => localtime(),
  #sessionid => '123456789'}, { ...} ]
  # and is stored by a CGI.

  @info = grep {$$_{sessionid} eq $session} @$user_info;

  return [EMAIL PROTECTED];
$$ LANGUAGE 'plperlu';

SELECT u.username, f.lastrequest FROM users u,
get_user_info_by_session('123456789') WHERE f.userid = u.userid;


Any thoughts? 

> 
> --
> greg
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html
> 


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] Slow execution time when querying view with WHERE clause

2004-11-22 Thread Mike Mascari

 loops=1)
   
Filter: (webuser = getwebuser())
   ->  Subquery Scan "*SELECT* 
2"  (cost=1.08..3.33 rows=3 width=8) (actual time=0.295..0.381 rows=1 loops=1)
 ->  Hash Join  
(cost=1.08..3.30 rows=3 width=8) (actual time=0.286..0.368 rows=1 loops=1)
   Hash Cond: 
("outer".company = "inner".company)
   ->  Seq Scan on 
_stores s  (cost=0.00..2.13 rows=13 width=16) (actual time=0.008..0.080 rows=13 
loops=1)
   ->  Hash  
(cost=1.07..1.07 rows=1 width=8) (actual time=0.116..0.116 rows=0 loops=1)
 ->  Seq 
Scan on _companies c  (cost=0.00..1.07 rows=1 width=8) (actual time=0.062..0.083 
rows=1 loops=1)
   
Filter: ((companyid)::text = 'DEFAULT'::text)
 ->  Index Scan using i_offers4 on _offers o  
(cost=0.00..1007.93 rows=15524 width=16) (actual time=0.023..67.183 rows=10049 
loops=48)
   ->  Index Scan using i_inventories1 on _inventories i  
(cost=0.00..5.07 rows=1 width=16) (actual time=0.025..0.029 rows=1 loops=163561)
 Index Cond: (i.inventory = "outer".inventory)
 ->  Index Scan using i_bins1 on _bins b  (cost=0.00..3.40 rows=1 
width=16) (actual time=0.021..0.026 rows=1 loops=163561)
   Index Cond: (b.bin = "outer".bin)
 Total runtime: 20027.414 ms
(36 rows)
---
That's a slow-down on execution time by a factor of 50, even
though the row count was the same: 34. In fact, it's MUCH
faster to do:
create temporary table foo as
select * from p_areas;
select * from foo
where deactive is null;
The database has been analyzed.
Any tips would be greatly appreciated.
Mike Mascari
P.S.: I turned off word-wrap in my mail client for this post.
Is that the right thing to do for analyze output?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-22 Thread Mike Mascari

Mike Mascari wrote:
I have the following view:
create or replace view market.p_areas as
select a.*
from _areas a
where a.area in (
 select b.area
 from _bins b, _inventories i, _offers o, _pricemembers p
 where b.bin = i.bin and
 i.inventory = o.inventory and
 o.pricegroup = p.pricegroup and
 p.buyer in (
  select s.store
  from _stores s, _webusers w
  where w.webuser = getWebuser() and
  w.company = s.company
   union
  select s.store
  from _stores s, _companies c
  where s.company = c.company and
  c.companyid = 'DEFAULT'
 )
);
...
I failed to report the version:
select version();
PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC 
i686-pc-linux-gnu-gcc (GCC) 3.4.0 20040204 (prerelease)

Sorry.
Mike Mascari
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-23 Thread Mike Mascari

Tom Lane wrote:
Mike Mascari <[EMAIL PROTECTED]> writes:
When I query the view with a simple filter, I get:

explain analyze select * from p_areas where deactive is null;

The problem seems to be here:

   ->  Seq Scan on _areas a  (cost=0.00..2.48 rows=1 width=163) (actual 
time=0.037..0.804 rows=48 loops=1)
 Filter: (deactive IS NULL)

Why is it so completely off about the selectivity of the IS NULL clause?
Are you sure you ANALYZEd this table recently?

Yes. I just did:
[EMAIL PROTECTED] vacuum full analyze;
VACUUM
[EMAIL PROTECTED] explain analyze select * from p_areas where deactive is null;

  QUERY PLAN
--
 Nested Loop IN Join  (cost=8.62..512.47 rows=1 width=162) (actual 
time=1143.969..21811.417 rows=37 loops=1)
   Join Filter: ("outer".area = "inner".area)
   ->  Seq Scan on _areas a  (cost=0.00..2.49 rows=1 width=162) (actual 
time=0.037..1.673 rows=49 loops=1)
 Filter: (deactive IS NULL)
   ->  Nested Loop  (cost=8.62..25740.20 rows=2681 width=8) (actual 
time=1.172..429.501 rows=3566 loops=49)
 ->  Nested Loop  (cost=8.62..16674.93 rows=2680 width=8) (actual 
time=1.125..281.570 rows=3566 loops=49)
   ->  Merge Join  (cost=8.62..3012.72 rows=2778 width=8) (actual 
time=0.876..128.908 rows=3566 loops=49)
 Merge Cond: ("outer".pricegroup = "inner".pricegroup)
 ->  Nested Loop IN Join  (cost=8.62..1929.41 rows=9 
width=8) (actual time=0.613..5.504 rows=9 loops=49)
   Join Filter: ("outer".buyer = "inner".store)
   ->  Index Scan using i_pricemembers3 on 
_pricemembers p  (cost=0.00..11.13 rows=217 width=16) (actual time=0.403..1.476 
rows=142 loops=49)
   ->  Subquery Scan "IN_subquery"  (cost=8.62..8.74 
rows=8 width=8) (actual time=0.013..0.019 rows=1 loops=6950)
 ->  Unique  (cost=8.62..8.66 rows=8 width=8) 
(actual time=0.007..0.010 rows=1 loops=6950)
   ->  Sort  (cost=8.62..8.64 rows=8 
width=8) (actual time=0.003..0.004 rows=1 loops=6950)
 Sort Key: store
 ->  Append  (cost=2.87..8.50 
rows=8 width=8) (actual time=8.394..8.446 rows=1 loops=1)
   ->  Subquery Scan "*SELECT* 
1"  (cost=2.87..5.17 rows=5 width=8) (actual time=8.112..8.112 rows=0 loops=1)
 ->  Hash Join  
(cost=2.87..5.12 rows=5 width=8) (actual time=8.106..8.106 rows=0 loops=1)
   Hash Cond: 
("outer".company = "inner".company)
   ->  Seq Scan on 
_stores s  (cost=0.00..2.13 rows=13 width=16) (actual time=0.014..0.052 rows=13 
loops=1)
   ->  Hash  
(cost=2.87..2.87 rows=1 width=8) (actual time=7.878..7.878 rows=0 loops=1)
 ->  Seq 
Scan on _webusers w  (cost=0.00..2.87 rows=1 width=8) (actual time=7.868..7.868 
rows=0 loops=1)
   
Filter: (webuser = getwebuser())
   ->  Subquery Scan "*SELECT* 
2"  (cost=1.08..3.33 rows=3 width=8) (actual time=0.273..0.322 rows=1 loops=1)
 ->  Hash Join  
(cost=1.08..3.30 rows=3 width=8) (actual time=0.263..0.308 rows=1 loops=1)
   Hash Cond: 
("outer".company = "inner".company)
   ->  Seq Scan on 
_stores s  (cost=0.00..2.13 rows=13 width=16) (actual time=0.008..0.042 rows=13 
loops=1)
   ->  Hash  
(cost=1.07..1.07 rows=1 width=8) (actual time=0.093..0.093 rows=0 loops=1)
 ->  Seq 
Scan on _companies c  (cost=0.00..1.07 rows=1 width=8) (actual time=0.061..0.081 
rows=1 loops=1)
   
Filter: ((companyid)::text = 'DEFAULT'::text)
 ->  Index Scan using i_offers4 on _offers o  
(cost=0.00..1014.76 rows=16298 width=16) (actual time=0.244..72.742 rows=10433 
loops=49)

Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-24 Thread Mike Mascari

Tom Lane wrote:
Um ... doh ... analyze.c about line 1550:
/* We can only compute valid stats if we found some non-null values. */
if (nonnull_cnt > 0)
   ...
There's a bit of an epistemological issue here: if we didn't actually
find any nonnull values in our sample, is it legitimate to assume that
the column is entirely null?  On the other hand, if we find only "3" in
our sample we will happily assume the column contains only "3", so I
dunno why we are discriminating against null.  This seems like a case
that just hasn't come up before.
Will this discriminatory policy toward null end for 8.0?
Mike Mascari

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] VACUUM ANALYZE downgrades performance

2004-11-30 Thread Mike Rylander

On 30 Nov 2004 14:30:37 +0100, Dmitry Karasik <[EMAIL PROTECTED]> wrote:
> 
> Hi all,
> 
> On v7.4.5 I noticed downgrade in the planner, namely favoring
> sequential scan over index scan. The proof:
> 
>create table a ( a integer);
>create index aidx on a(a);
>explain analyze select * from a where a = 0;
>-- Index Scan using aidx on a  (cost=0.00..17.07 rows=5 width=4) (actual
>--   time=0.029..0.029 rows=0 loops=1)
>-- Index Cond: (a = 0)
>vacuum analyze;
>explain analyze select * from a where a = 0;
>-- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009
>--   rows=0 loops=1)
>-- Filter: (a = 0)

Looks to me like the seq scan is a better plan.  The "actual time" went down.

> 
> I do realize that there might be reasons why this happens over an empty
> table, but what is way worse that when the table starts actually to fill,
> the seq scan is still there, and the index is simply not used. How
> that could be so ...mmm... shortsighted, and what is more important,
> how to avoid this? I hope the answer is not 'run vacuum analyze each 5 
> seconds'.
> 

See this thread
(http://archives.postgresql.org/pgsql-hackers/2004-11/msg00985.php and
http://archives.postgresql.org/pgsql-hackers/2004-11/msg01080.php) for
an ongoing discussion of the issue.


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Alternatives to Dell?

2004-12-01 Thread Mike Rylander

On Wed, 1 Dec 2004 14:24:12 -0800, Josh Berkus <[EMAIL PROTECTED]> wrote:
> Folks,
> 
> A lot of people have been having a devilish time with Dell hardware lately.
> It seems like the quality control just isn't there on the Dell servers.

Which is a shame, because I *still* drool over a rack full of those
front bevels with the bright blue LEDs. :)

> 
> Thing is, some companies are required to use 1st-tier or at least 2nd-tier
> vendors for hardware; they won't home-build.   For those people, what vendors
> do others on this list recommend?   What have been your good/bad experiences?

I'm using an HP DL585 quad Opteron with 16G RAM as a development box. 
It's working great. ;)

Seriously though, I never really liked HP (or worse, Compaq) hardware
before, but this box seems really well built, and I've yet to see a
'D' in the S column in top with the SA-6404/256 RAID card.

If all goes as well as it has so far on this testbed I'll be deploying
on a Slony-1 clustered set of 3 of these bad boys with 32G RAM each. 
Dollar-for-dollar, we're saving 90% (that's right, an order of
magnitude) going this route, PG with linux-amd64 on HP/Opterons, as
opposed to the E20K monster that was originally spec'd out.

Mail me direct if you want the full spec list on this beast.  And if
there is a ready-made benchmark anyone would like me to run, just drop
me a note.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Alternatives to Dell?

2004-12-03 Thread Mike Rylander

On Fri, 03 Dec 2004 06:38:50 -0800, Joshua D. Drake
<[EMAIL PROTECTED]> wrote:
> That's true :) One of the reasons the compaq's are expensive
> is they supposedly use a quad board, even for the dual machine.
> Which means a different opteron chip as well.

I can confirm that.  You have a choice of CPUs, but all the DL585s are
expandable to 4 procs if you get the 800 series Opterons.  Each CPU
sits on it's own daughter board that links up the HyperTransport
busses between all the others.  Each CPU card has (I think...) 8 slots
for DIMMS, for a max of 64G.

> 
> I don't know this for a fact, it is just what one of their
> "ahem" sales guys told me.
> 

At least in that case they were being accurate. ;)

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 8: explain analyze is your friend

[PERFORM] LIMIT causes SEQSCAN in subselect

2004-12-10 Thread Mike Rylander

ndex Cond: (source = 10)
 Total runtime: 0.101 ms
(6 rows)



-- query 3:  if we were to drop the limit, since we're using a unique index

oils4=# EXPLAIN ANALYZE select a.record, b.control from (select * from
biblio.record where id = 10) b, (select * from
biblio.metarecord_field_entry limit 1000) a where a.source = b.id;
  
QUERY PLAN

 Nested Loop  (cost=0.00..41.97 rows=5 width=22) (actual
time=1.169..1.169 rows=0 loops=1)
   ->  Index Scan using biblio_record_pkey on record  (cost=0.00..3.67
rows=1 width=22) (actual time=0.036..0.038 rows=1 loops=1)
 Index Cond: (id = 10)
   ->  Subquery Scan a  (cost=0.00..38.25 rows=5 width=16) (actual
time=1.126..1.126 rows=0 loops=1)
 Filter: (source = 10)
 ->  Limit  (cost=0.00..25.75 rows=1000 width=87) (actual
time=0.005..0.673 rows=1000 loops=1)
   ->  Seq Scan on metarecord_field_entry 
(cost=0.00..43379.75 rows=1684575 width=87) (actual time=0.004..0.424
rows=1000 loops=1)
 Total runtime: 1.243 ms
(8 rows)



-- query 4:  what I would like the seqscan in query 3 to become...

oils4=# EXPLAIN ANALYZE select * from biblio.metarecord_field_entry
where source = 10 limit 1000;
  
 QUERY PLAN
--
 Limit  (cost=0.00..16.19 rows=9 width=87) (actual time=0.026..0.035
rows=7 loops=1)
   ->  Index Scan using metarecord_field_entry_source_idx on
metarecord_field_entry  (cost=0.00..16.19 rows=9 width=87) (actual
time=0.025..0.032 rows=7 loops=1)
 Index Cond: (source = 10)
 Total runtime: 0.069 ms
(4 rows)

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] Which is more efficient?

2004-12-17 Thread Mike G.

Hi,

I have data that I am taking from 2 tables, pulling out specific columns and 
inserting into one table.

Is it more efficient to do:
a)  insert into x
select z from y;
insert into x
select z from a;

b)  insert into x
select z from y
union all
select z from a;

I have run both through explain.
a) 650ms
b) 741.57ms

According to the planner option a, select z from y takes 545.93 ms
Under option b select z from y takes 553.34 ms

Shouldn't the time predicted for the select z from y be the same?

I would believe b would be more efficient as the inserts could be done in a 
batch rather than individual transactions but the planner doesn't recognize 
that.  When I run option a through the planner I have to highlight each insert 
separately since the planner stops executing after the first ; it comes across.

Mike

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] LIMIT causes SEQSCAN in subselect

2004-12-11 Thread Mike Rylander

On Fri, 10 Dec 2004 21:40:18 -0800, Josh Berkus <[EMAIL PROTECTED]> wrote:
> Mike,
> The fact that the estimator knows that the LIMIT is pointless because there
> are less rows in the subselect than the LIMIT will return is not something we
> want to count on; sometimes the estimator has innaccurate information.  The
> UNIQUE index makes this more certain, except that I'm not sure that the
> planner distinguishes between actual UNIQUE indexes and columns which are
> estimated unique (per the pg_stats).   And I think you can see in your case
> that there's quite a difference between a column we're CERTAIN is unique,
> versus a column we THINK is unique.

Absolutely.  At first I was going to ask if perhaps using the stats to
discard the LIMIT would be possible, but since the stats are only
guidelines I dropped that.  The stats are just so tempting!

> 
> > I realize this is a rather specialized case and not really great form.
> 
> Exactly.   You've grasped the main issue: that this has not been optimized
> because it's bizarre and not very sensible query writing.   Someday we'll get
> around to optimizing the really wierd queries, but there's still a lot of
> work to be done on the common ones (like count(*) ...).

Absolutely.  And if I can help out with the common cases to gain some
Karmic currency I will. ;)  After thinking about it some more, I don't
think those queries we really all that wacky though.  The problem with
the example is that the generated query is very simple, and real-world
queries that would be used in the subselect  would be much more
complex, and row estimation would be untrustworthy without a UNIQUE
index.

> 
> Keep in mind that the only reason we support LIMIT inside subqueries in the
> first place is a workaround to slow aggregates, and a way to do RANK.  It's
> certainly not SQL-standard.
> 

No it's not, but then nobody ever accused the authors of the SQL spec
of being omniscient...  I' cant think of another way to get, say, a
'top 10' list from a subselect, or use a paging iterator (LIMIT ..
OFFSET ..) as the seed for an outer query.  Well, other than an SRF of
course.

> > Just a matter of
> > defining result sets independently, and creating a simple wrapper to
> > join them.
> 
> Well, if you think so, you know where to submit patches ...
> 

Well, I do, but I was talking about it being 'easy' in the middleware.
 Just let PG handle optimizing the subselects.

For example, you have a pile of predefined SELECTS that don't know
they are related and are used for simple lookups.  You tell the SQL
generator thingy that it should use two of those, queries A and B,
that they are related on x, and that you want to see the 'id' from A
and the 'value' from B.  Instead of having to preplan every possible
combination of JOINS the SQL generator will toss the preplanned ones
into subselects and join them in the outer query instead of having to
rip them apart and calculate the join syntax.  And yes, I know that
view will take care of most of that for me... :)

Thanks for all your comments.  Pretty much what I expected, but I
thought I'd raise a use case.  I'll just have to give the query
builder more smarts.

> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
> 

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Low Performance for big hospital server ..

2005-01-03 Thread Mike Mascari

William Yu wrote:
[EMAIL PROTECTED] wrote:
Yes , vacuumdb daily.
Do you vacuum table by table or the entire DB? I find over time, the 
system tables can get very bloated and cause a lot of slowdowns just due 
to schema queries/updates. You might want to try a VACUUM FULL ANALYZE 
just on the system tables.
A REINDEX of the system tables in stand-alone mode might also be in 
order, even for a 7.4.x database:

http://www.postgresql.org/docs/7.4/interactive/sql-reindex.html
If a dump-reload-analyze cycle yields significant performance 
improvements then we know it's due to dead-tuple bloat - either heap 
tuples or index tuples.

Mike Mascari
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Postgres Optimizer is not smart enough?

2005-01-12 Thread Mike Mascari

Litao Wu wrote:
Hi All,
Here is my test comparison between Postgres (7.3.2)
optimizer vs Oracle (10g) optimizer. 

It seems to me that Postgres optimizer is not smart 
enough.

Did I miss anything?
Yeah, 7.4.
7.3.2 is *ancient*. Here's output from 7.4:
[EMAIL PROTECTED] explain analyze
test-# SELECT module,  sum(action_deny)
test-# FROM test
test-# WHERE  created >= ('now'::timestamptz - '1
test'# day'::interval) AND customer_id='100'
test-#   AND  domain='100'
test-# GROUP BY module;
 QUERY PLAN 

---
 HashAggregate  (cost=5.69..5.69 rows=1 width=13) (actual 
time=715.058..715.060 rows=1 loops=1)
   ->  Index Scan using test_id1 on test  (cost=0.00..5.68 rows=1 
width=13) (actual time=0.688..690.459 rows=1 loops=1)
 Index Cond: ((customer_id = 100) AND (created >= '2005-01-11 
17:52:22.364145-05'::timestamp with time zone) AND (("domain")::text = 
'100'::text))
 Total runtime: 717.546 ms
(4 rows)

[EMAIL PROTECTED] create index test_id2 on test(domain);
CREATE INDEX
[EMAIL PROTECTED] analyze test;
ANALYZE
[EMAIL PROTECTED]
[EMAIL PROTECTED] explain analyze
test-# SELECT module,  sum(action_deny)
test-# FROM test
test-# WHERE  created >= ('now'::timestamptz - '1
test'# day'::interval) AND customer_id='100'
test-#   AND  domain='100'
test-# GROUP BY module;
   QUERY PLAN 


 HashAggregate  (cost=5.68..5.69 rows=1 width=13) (actual 
time=10.778..10.780 rows=1 loops=1)
   ->  Index Scan using test_id2 on test  (cost=0.00..5.68 rows=1 
width=13) (actual time=10.702..10.721 rows=1 loops=1)
 Index Cond: (("domain")::text = '100'::text)
 Filter: ((created >= '2005-01-11 
17:53:16.720749-05'::timestamp with time zone) AND (customer_id = 100))
 Total runtime: 11.039 ms
(5 rows)

[EMAIL PROTECTED] select version();
 PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC 
i686-pc-linux-gnu-gcc (GCC) 3.4.0 20040204 (prerelease)
(1 row)

Hope that helps,
Mike Mascari
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Performance Tuning

2005-02-09 Thread Mike Rylander

On Wed, 9 Feb 2005 17:30:41 -0500, Chris Kratz
<[EMAIL PROTECTED]> wrote:
> The solutions appear to primarily be:
> 1. Going to faster hardware of which probably Opterons would be about the only
> choice.  And even that probably won't be a huge difference.

I'd beg to differ on that last part.  The difference between a 3.6GHz
Xeon and a 2.8GHz Opteron is ~150% speed increase on the Opteron on my
CPU bound app.  This is because the memory bandwidth on the Opteron is
ENORMOUS compared to on the Xeon.  Add to that the fact that you
actually get to use more than about 2G of RAM directly and you've got
the perfect platform for a high speed database on a budget.

> 2. Moving to more materialized views and prepared statements where we can.

Definitely worth investigating.  I wish I could, but I can't get my
customers to even consider slightly out of date stats :(

> 3. Continue to tweak the sql behind our app.

Short of an Opteron based system, this is by far your best bet.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Benchmark

2005-02-11 Thread Mike Benoit

I have never used Oracle myself, nor have I read its license agreement,
but what if you didn't name Oracle directly? ie:

TPS Database
---
112 MySQL
120 PgSQL
90  Sybase
95  "Other database that *may* start with a letter after N"
50  "Other database that *may* start with a letter after L"

As far as I know there are only a couple databases that don't allow you
to post benchmarks, but if they remain "unnamed" can legal action be
taken? 

Just like all those commercials on TV where they advertise: "Cleans 10x
better then the other leading brand".


On Fri, 2005-02-11 at 00:22 -0500, Mitch Pirtle wrote:
> On Thu, 10 Feb 2005 08:21:09 -0500, Jeff <[EMAIL PROTECTED]> wrote:
> > 
> > If you plan on making your results public be very careful with the
> > license agreements on the other db's.  I know Oracle forbids the
> > release of benchmark numbers without their approval.
> 
> ...as all of the other commercial databases do. This may be off-topic,
> but has anyone actually suffered any consequences of a published
> benchmark without permission?
> 
> For example, I am a developer of Mambo, a PHP-based CMS application,
> and am porting the mysql functions to ADOdb so I can use grown-up
> databases ;-)
> 
> What is keeping me from running a copy of Mambo on a donated server
> for testing and performance measures (including the commercial
> databases) and then publishing the results based on Mambo's
> performance on each?
> 
> It would be really useful to know if anyone has ever been punished for
> doing this, as IANAL but that restriction is going to be very, VERY
> difficult to back up in court without precedence. Is this just a
> deterrent, or is it real?
> 
> -- Mitch
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
-- 
Mike Benoit <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part

Re: [PERFORM] queries on huge tables

2005-03-19 Thread Mike Rylander

The most recent version of this thread starts here: 
http://archives.postgresql.org/pgsql-general/2005-03/msg00321.php . 
Search the archives for "table partition", "union view" and "partition
inherits" and you should find most relevant discussions.

Hope that helps!

On Thu, 17 Mar 2005 15:01:43 +0100, Lending, Rune
<[EMAIL PROTECTED]> wrote:
>  
> Hello all. 
>   
> I am having a couple of tables with couple of hundre millions records in
> them. The tables contains a timestamp column. 
> I am almost always interested in getting datas from a specific day or month.
> Each day contains aprox. 400.000 entries. 
>   
> When I do such queries as " select ... from archive where m_date between
> '2005-01-01' and '2005-02-01' group by ... " and so on. 
> It takes very long. I am having indexes that kicks in, but still it takes
> sometime. 
>   
> I have splitted the archive table in smaller monthly tables, it then goes a
> lot faster, but not fast enough. 
>   
> I know simular systems that uses Oracle and gains a lot on performance
> because of the partioning. That kind of anoyes me a bit :) 
>   
> Does anyone of you have some good ideas on how speed up  such queries on
> huge tables? 
>   
> regards 
> rune 
>   
>   
>   
>   


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Tsearch2 performance on big database

2005-03-22 Thread Mike Rylander

On Tue, 22 Mar 2005 15:36:11 +0300 (MSK), Oleg Bartunov  wrote:
> On Tue, 22 Mar 2005, Rick Jansen wrote:
> 
> > Hi,
> >
> > I'm looking for a *fast* solution to search thru ~ 4 million records of book
> > descriptions. I've installed PostgreSQL 8.0.1 on a dual opteron server with
> > 8G of memory, running Linux 2.6. I haven't done a lot of tuning on 
> > PostgreSQL
> > itself, but here's the settings I have changed so far:
> >
> > shared_buffers = 2000 (anything much bigger says the kernel doesnt allow  
> > it,
> > still have to look into that)
> 
> use something like
> echo "15000" > /proc/sys/kernel/shmmax
> to increase shared memory. In your case you could dedicate much more
> memory.
> 
> Regards,
> Oleg

And Oleg should know.  Unless I'm mistaken, he (co)wrote tsearch2. 
Other than shared buffers, I can't imagine what could be causing that
kind of slowness.  EXPLAIN ANALYZE, please?

As an example of what I think you *should* be seeing, I have a similar
box (4 procs, but that doesn't matter for one query) and I can search
a column with tens of millions of rows in around a second.

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Tsearch2 performance on big database

2005-03-22 Thread Mike Rylander

On Tue, 22 Mar 2005 14:25:19 +0100, Rick Jansen <[EMAIL PROTECTED]> wrote:
> 
> ilab=# explain analyze select count(titel) from books where idxfti @@
> to_tsquery('default', 'buckingham | palace');
> QUERY PLAN
> 
>   Aggregate  (cost=35547.99..35547.99 rows=1 width=56) (actual
> time=125968.119..125968.120 rows=1 loops=1)
> ->  Index Scan using idxfti_idx on books  (cost=0.00..35525.81
> rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1)
>   Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery)
>   Total runtime: 125968.212 ms
> (4 rows)
> 
> Time: 125969.264 ms
> ilab=#

Ahh... I should have qualified my claim.  I am creating a google-esqe
search interface and almost every query uses '&' as the term joiner. 
'AND' queries and one-term queries are orders of magnitude faster than
'OR' queries, and fortunately are the expected default for most users.
 (Think, "I typed in these words, therefore I want to match these
words"...)  An interesting test may be to time multiple queries
independently, one for each search term, and see if the combined cost
is less than a single 'OR' search.  If so, you could use UNION to join
the results.

However, the example you originally gave ('terminology') should be
very fast.  On a comparable query ("select count(value) from
metabib.full_rec where index_vector @@ to_tsquery('default','jane');")
I get 12ms.

Oleg, do you see anything else on the surface here?

Try:

EXPLAIN ANALYZE
  SELECT titel FROM books WHERE idxfti @@
to_tsquery('default', 'buckingham')
  UNION
  SELECT titel FROM books WHERE idxfti @@
to_tsquery('default', 'palace');

and see if using '&' instead of '|' where you can helps out.  I
imagine you'd be surprised by the speed of:

  SELECT titel FROM books WHERE idxfti @@
to_tsquery('default', 'buckingham&palace');

> 
>  > As an example of what I think you *should* be seeing, I have a similar
>  > box (4 procs, but that doesn't matter for one query) and I can search
>  > a column with tens of millions of rows in around a second.
>  >
> 
> That sounds very promising, I'd love to get those results.. could you
> tell me what your settings are, howmuch memory you have and such? 

16G of RAM on a dedicated machine.

shared_buffers = 15000  # min 16, at least max_connections*2, 8KB each
work_mem = 10240# min 64, size in KB
maintenance_work_mem = 100  # min 1024, size in KB
# big m_w_m for loading data...

random_page_cost = 2.5  # units are one sequential page fetch cost
# fast drives, and tons of RAM

-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] 8x2.5" or 6x3.5" disks

2008-01-29 Thread Mike Smith

You don't mention the capacity of the disks you are looking at. Here is
something you might want to consider. 

 

I've seen a few performance posts on using different hardware
technologies to gain improvements. Most of those comments are on raid,
interface and rotation speed.   One area that doesn't seem to have  been
mentioned  is to  run your disks empty.

 

One of the key roadblocks in disk performance is the time for the disk
heads to seek, settle and find the start of the data. Another is the
time to transfer from disk to interface.  Everyone may instinctively
know this but its often ignored.

 

Hard disks are CRV ( constant rotational velocity) = they spin at the
same speed all the time

Hard disk  drives use a technology called ZBR  = Zone Bit Recording =  a
lot more data on the outside tracks than the inner ones.

Hard disk fill up from outside track to inside track  generally unless
you've done some weird partitioning.

 

On the outside of the disk you get a lot more data per seek than on the
inside. Double whammy you get it faster.

Performance  can vary more than  100% between the outer and inner tracks
of the disk.   So running a slower disk twice as big may give you more
benefit than running a small capacity 15K disk full.  The slower disks
are also generally more reliable and mostly much cheaper. 

 

The other issue for full disks especially with lots of random small
transactions is the heads are seeking and settling  across the whole
disk but  typically with most of those seeks being on the latest
transactions which are placed nicely towards the middle of the disk.

 

I know of a major bank that has a  rule of thumb 25% of the disk
partioned  as  a target maximum for high performance disk systems in a
key application. They also only pay for used capacity from their disk
vendor.   

 

This is not very green as you need to buy more disks for the same amount
of data and its liable to upset your purchasing department who won't
understand why you don't want to fill your disks up.

 

Mike

Re: [PERFORM] 8x2.5" or 6x3.5" disks

2008-01-29 Thread Mike Smith

[presumably the empty-disk effect could also be achieved by partitioning, say 
25% of the drive for the database, and 75% empty partition.  But in fact, you 
could use that "low performance 75%" for rarely-used or static data, such as 
the output from pg_dump, that is written during non-peak times] 
 
Larry Ellison financed a  company called Pillar Data Systems which was founded 
on the principle that you can tier the disk according to the value and 
performance requirements of the data. They planned to put  the most valuable in 
performance terms on the outside of  SATA disks  and use the empty space in the 
middle for  slower stuff..
(This is not an advert. I like the idea but I dont know if it works well and I 
dont have anything to do with Pillar  other than EnterpriseDB compete against 
Larry's other little company).   
Probably the way to go is flash drives for primary performance data . EMC and 
others have announced   Enterprise Flash Drives  (they claim 30 times 
performance of 15K disks  although at 30 times the cost of standard disk today 
). Flash should also have pretty much consistent high performance across the 
whole capacity.
Within a couple of years EFD  should be affordable for mainstream use.

[PERFORM] Implications of having large number of users

2009-06-23 Thread Mike Ivanov


Hi there,

Please help me to make a decision on how to manage users.

For some reason it is easier in the project I'm working on to split data 
by schemes and assign them to Postgres' users (I mean those created with 
CREATE USER) rather than support 'owner' fields referring to a global 
users table.


The question is what could be the consequences of having a large number 
of them (tens of thousands)?


Context:

- it is a web app
- thousands of concurrent requests from different users
- amount of user's data in the db is relatively small

Concerns:

- how big is the performance/memory penalty on switching users in the 
same connection (connections are reused of course)?

- will it hurt the cache?
- are prepared statements kept per user or per connection?
- is the query planner global or somehow tied to users?

I'd be glad to hear any opinions/suggestions.

Best regards,
Mike






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Implications of having large number of users

2009-06-25 Thread Mike Ivanov




I'd be glad to hear any opinions/suggestions.


Many thanks to everyone who responded!

Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-29 Thread Mike Ivanov


Hi Sean,

Sean Ma wrote:

One months ago, this type of slow query happened about a few time per
day. But recently, the slow query happens more frequent at the rate of
once every 10 minutes or less. There seesm not relation to th


What is your hardware (memory, CPU type and such)?

This seems like a cache issue to me, but I can't tell for sure without 
some additional information on your system:


1) What is the amount of a) available memory b) free memory and c) 
memory available to i/o buffers?


2) What is the swap usage if any?

3) What is the CPU load? Any noticeable patterns in CPU load?

You can use /usr/bin/top to obtain most of this information.

Mike



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Hi Sean,

Well, the overall impression is your machine is badly overloaded. Look:


top - 10:18:58 up 224 days, 15:10,  2 users,  load average: 6.27, 7.33, 6
  
The load average of 6.5 means there are six and a half processes 
competing for the same CPU (and this system apparently has only one). 
This approximately equals to 500% overload.


Recommendation: either add more CPU's or eliminate process competition 
by moving them to other boxes.



Tasks: 239 total,   1 running, 238 sleeping,   0 stopped,   0 zombie
  
This supports what I said above. There are only 92 processes running on 
my laptop and I think it is too much. Do you have Apache running on the 
same machine?



Cpu(s):  5.0%us,  0.7%sy,  0.0%ni, 61.5%id, 32.7%wa,  0.0%hi,  0.1%si,  0
  
Waiting time (wa) is rather high, which means processes wait on locks or 
for IO, another clue for concurrency issues on this machine.



Mem:  32962804k total, 32802612k used,   160192k free,   325360k buffers
  
Buffers are about 10% of all the memory which is OK, but I tend to give 
buffers some more room.


Recommendation: eliminate unneeded processes, decrease (yes, decrease) 
the Postgres cache buffers if they are set too high.



Swap:  8193140k total,   224916k used,  7968224k free, 30829456k cached
  
200M paged out. It should be zero except of an emergency. 3G of cached 
swap is a sign of some crazy paging activity in thepast. Those 
unexplainable slowdowns are very likely caused by that.



Didn't really see the pattern, typical the cpu load is only about 40%
  

40% is too much, really. I start worrying when it is above 10%.

Conclusion:

- the system bears more load than it can handle
- the machine needs an upgrade
- Postges is competing with something (presumably Apache) - separate them.

That should help.

Cheers,
Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Sean,


Yes, besides another mysql server running on
the same server, 

Which is a really bad idea :-)


The postgres shared cache is at 4G, is that too big?
  
OK, I have misread the total memory amount which was 32G, and I thought 
it was 3G. Thanks to Scott Marlow who pointed that out. In this case 4G 
for shared buffers is good.


Actually, I take back my words on swap, too. 200M swapped is less 
important when you have a plenty of memory.


Regards,
Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Scott Marlowe wrote:


The postgres shared cache is at 4G, is that too big?



Not for a machine with 32Gig of ram.

  


He could even add some more.

Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Hi Scott,

Well, we can't be sure OP's only got one core.  


In fact, we can, Sean posted what top -b -n 1 says. There was only one 
CPU line.



the number of cores, it's the IO subsystem is too slow for the load.
More cores wouldn't fix that.
  


While I agree on the IO, more cores would definitely help to improve 
~6.5 load average.



My production PG server that runs ONLY pg has 222 processes on it.
It's no big deal.  Unless they're all trying to get cpu time, which
generally isn't the case.
  

222 / 8 cores = ridiculous 27 processes per core, while the OP has 239.


More likely just a slow IO subsystem.  Like a single drive or
something.  adding drives in a RAID-1 or RAID-10 etc usually helps.
  


Absolutely.


This is kernel buffers, not pg buffers.  It's set by the OS
semi-automagically.  In this case it's 325M out of 32 Gig, so it's
well under 10%, which is typical.
  


You can control the FS buffers indirectly by not allowing running 
processes to take too much memory. If you have like 40% free, there are 
good chances the system will use that memory for buffers. If you let 
them eat up 90% and swap out some more, there is no room for buffers and 
the system will have to swap out something when it really needs it.



Not true.  Linux will happily swap out seldom used processes to make
room in memory for more kernel cache etc.  You can adjust this
tendency by setting swappiness.
  


This is fine until one of those processes wakes up. Then your FS cache 
is dumped.


It's 30G btw, 


Yeah, I couldn't believe my eyes :-)


> 3G of cached swap
and it's not swap that's cached, it's
the kernel using extra memory to cache data to / from the hard drives.
  


Oh please.. it *is*: 
http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314



 It's normal, and shouldn't worry anybody.  In fact it's a good sign
that you're not using way too much memory for any one process.
  


It says exactly the opposite.


Really?  I have eight cores on my production servers and many batch
jobs I run put all 8 cores at 90% for extended periods.  Since that
machine is normally doing a lot of smaller cached queries, it hardly
even notices.
  


The OP's machine is doing a lot of write ops, which is different.


Yes, more hard drives / better caching RAID controller.
  

+1

BTW, nearly full file system can be another source of problems.

Cheers,
Mike



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Scott Marlowe wrote:

Also think about it, the OP has 8G of swap and 30Gig cached.  How /
why would you be caching 30Gigs worth of data when there's only 8G to
cache anyway?
  


You're right, I have misread it again :-)

Cheers,
Mike




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Scott Marlowe wrote:

Close, but it'll use that memory for cache. Large buffers are not
typical in linux, large kernel caches are.
  

OK, we're talking about different things. You're right.


If that tutorial says that, then that tutorial is wrong.  I'm guessing
what that tutorial is talking about, and what top is saying are two
very different things though.
  
Then it is an amazingly common misconception. I guess it first appeared 
in some book and then reproduced by zillion blogs. Essentially this is 
what Goolgle brings you on 'swap cache' query.


Thanks for clearing that out.


 It's normal, and shouldn't worry anybody.  In fact it's a good sign
that you're not using way too much memory for any one process

It says exactly the opposite.



This time I agree :-)

Cheers,
Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Scott Carey wrote:

222 / 8 cores = ridiculous 27 processes per core, while the OP has 239

That's not rediculous at all.  Modern OS's handle thousands of idle
processes just fine.

  

I meant that 27 was a ridiculously small number.


Or you can control the behavior with the following kenrnel params:
vm.swappiness
vm.dirty_ratio
vm.dirty_background ratio
  

Thanks for pointing that out!


Actually, no.  When a process wakes up only the pages that are needed are
accessed.  For most idle processes that wake up from time to time, a small
bit of work is done, then they go back to sleep.  This initial allocation
does NOT come from the page cache, but from the "buffers" line in top.  The
os tries to keep some ammount of free buffers not allocated to processes or
pages available, so that allocation demands can be met without having to
synchronously decide which buffers from page cache to eject.
  

Wait a second, I'm trying to understand that :-)
Did you mean that FS cache pages are first allocated from the buffer 
pages or that process memory being paged out to swap is first written to 
buffers? Could you clarify please?



If queries are intermittently causing problems, it might be due to
checkpoints.  Make sure that the kernel parameters for
dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less.
  
Scott, isn't dirty_ratio supposed to be less than 
dirty_background_ratio? I've heard that system would automatically set 
dirty_ratio = dirty_background_ratio / 2 if that's not the case. Also, 
how dirty_ratio could be less than 5 if 5 is the minimal value?


Regards,
Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] random slow query

2009-06-30 Thread Mike Ivanov


Scott Carey wrote:

the OS can either quickly allocate to the process or
the page cache from the free buffers, or more slowly take from the page
cache, or even more slowly page out a process page.
  


Aha, now it all makes sense.


I like to use the '5 second rule'.  dirty_background_ratio should be sized
so that it takes about 5 seconds to flush to disk in optimal conditions.
dirty_ratio should be 2x to 5x this depending on your application's needs --
for a system with well tuned postgres checkpoints, smaller tends to be
better to limit stalls while waiting for the checkpoint fsync to finish.
  


Thanks a lot, this is invaluable information.


Regards,
Mike



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] - Slow Query

2009-07-01 Thread Mike Ivanov


Hi Rui,

i have this query (i think is a simple one)


Could you EXPLAIN ANALYZE the query and show the results please?

Thanks,
Mike



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] - Slow Query

2009-07-01 Thread Mike Ivanov


>  Merge Join (cost=111885.70..319492.88 rows=13016048 width=620)

The outermost merge join has to go through 13 million rows. If you 
remove "distinct on (bien.uid)", you'll see that.


> LEFT outer JOIN ville ON ville.uid = bien.ref_ville
> LEFT outer JOIN freguesia_ville ON freguesia_ville.ref_ville =ville.uid

This is not enough. You have to add this condition as well:

AND bien.ref_ville = freguesia_ville.ref_ville

In other words, when you link three tables by a common field, all three 
relationships should be explicitly expressed, otherwise you'll have this 
type of explosive row multiplication.


Although I don't quite understand the purpose of the query, I don't 
think you need all those OUTER joins.


Regards,
Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] - Slow Query

2009-07-01 Thread Mike Ivanov


Rui Carvalho wrote:

hum thanks a lot for the quick answer,

if is not abuse of your patience

what is the best alternative to the LEFT OUTER JOINS?
I meant I wasn't sure whether you really meant *outer* joins. Too many 
of them looked kinda suspicious :-)


If you *do* need them, then there is no alternative, as Scott said.

Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-15 Thread Mike Ivanov


ning wrote:
The log is really long, 


Which usually signals a problem with the query.


but I compared the result of "explain analyze"
for first and later executions, except for 3 "time=XXX" numbers, they
are identical.
  


They are supposed to be identical unless something is really badly broken.


I agree with you that PostgreSQL is doing different level of caching,
I just wonder if there is any way to speed up PostgreSQL in this
scenario, 
  


This is what EXPLAIN ANALYZE for. Could you post the results please?

Cheers,
Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Repeated Query is much slower in PostgreSQL8.2.4 than DB2 9.1

2009-07-16 Thread Mike Ivanov

Interesting. It's quite a hairy plan even though all the branches are 
cut off by conditions ("never executed") so the query yields 0 rows. 


0.018 is not a bad timing for that.

However, if you run this query with different parameters, the result 
could be quite sad.


There are some deeply nested loops with joins filtered by inner seq 
scans; this can be extremely expensive. Also, note that Left Merge Join 
with 16243 rows being reduced into just 1.


With a database like DB2, the results you had are quite predictable: 
slow first time execution (because of the ineffective query) and then 
fast consequent responses because the tiny resultset produced by the 
query can be stored in the memory.


Now, with Postgres the picture is different: all this complex stuff has 
to be executed each time the query is sent.


I would rather rewrite the query without inner selects, using straight 
joins instead. Also, try to filter things before joining, not after. 
Correct me if I'm wrong, but in this particular case this seems pretty 
much possible.


Cheers,
Mike


ning wrote:

Hi Mike,

Thank you for your explanation.
The "explain analyze" command used is as follows, several integers are
bound to '?'.
-
SELECT 
oid_,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval
FROM (SELECT 
attributeOf,void,nameId,tag,intval,lowerbound,upperbound,crossfeeddir,feeddir,units,opqval,bigval,strval
FROM DenormAttributePerf WHERE attributeof IN (SELECT oid_ FROM
JobView WHERE JobView.JobId=? and JobView.assignedTo_=?) AND nameId in
(?)) x RIGHT OUTER JOIN (SELECT oid_ FROM JobView WHERE
JobView.JobId=? and JobView.assignedTo_=?) y ON attributeof = oid_ FOR
READ ONLY
-

The result of the command is
-

 QUERY PLAN
---
 Nested Loop Left Join  (cost=575.60..1273.15 rows=81 width=568)
(actual time=0.018..0.018 rows=0 loops=1)
   Join Filter: (x.attributeof = j1.oid_)
   ->  Index Scan using job_tc1 on job j1  (cost=0.00..8.27 rows=1
width=4) (actual time=0.016..0.016 rows=0 loops=1)
 Index Cond: ((assignedto_ = 888) AND (jobid = 0))
   ->  Merge Left Join  (cost=575.60..899.41 rows=16243 width=564)
(never executed)
 Merge Cond: (v.void = b.void)
 ->  Merge Left Join  (cost=470.77..504.87 rows=2152
width=556) (never executed)
   Merge Cond: (v.void = res.void)
   ->  Sort  (cost=373.61..374.39 rows=310 width=544)
(never executed)
 Sort Key: v.void
 ->  Hash Left Join  (cost=112.07..360.78 rows=310
width=544) (never executed)
   Hash Cond: (v.void = i.void)
   ->  Hash Left Join  (cost=65.40..303.17
rows=38 width=540) (never executed)
 Hash Cond: (v.void = r.void)
 ->  Hash Left Join
(cost=21.42..257.86 rows=5 width=532) (never executed)
   Hash Cond: (v.void = s.void)
   ->  Nested Loop Left Join
(cost=8.27..244.65 rows=5 width=16) (never executed)
 Join Filter: (v.containedin = a.id)
 ->  Nested Loop
(cost=8.27..16.57 rows=1 width=12) (never executed)
   ->  HashAggregate
(cost=8.27..8.28 rows=1 width=4) (never executed)
 ->  Index
Scan using job_tc1 on job j1  (cost=0.00..8.27 rows=1 width=4) (never
executed)
   Index
Cond: ((assignedto_ = 888) AND (jobid = 0))
   ->  Index Scan
using attribute_tc1 on attribute a  (cost=0.00..8.27 rows=1 width=12)
(never executed)
 Index Cond:
((a.attributeof = j1.oid_) AND (a.nameid = 6))
 ->  Append
(cost=0.00..137.60 rows=7239 width=12) (never executed)
   ->  Index Scan
using attribute_value_i on attribute_value v  (cost=0.00..5.30 rows=9
width=12) (never executed)
 Index Cond:
(v.containedin = a.id)
   ->  Seq Scan on
string_value v  (cost=0.00..11.40 rows=140 width=12) (never executed)
   ->  Seq Scan on
integer_value v  (cost=0.00..26.30 rows=1630 width=12) (never
executed)
   ->  Seq Scan on
bigint_value v  (cost=0.00..25.10 rows=1510 width=12) (never executed)

Re: [PERFORM] select query performance question

2009-07-27 Thread Mike Ivanov


Hi Thomas,

How is 'messungen_v_dat_2009_04_13_gtyp_minute_tag_idx' defined? What is 
the row count for the table?


Mike


Hi,
subject is the following type of query needed in a function to select 
data:


SELECT ' 13.04.2009 12:00:00 ' AS zeit,

'M' AS ganglinientyp,
 
m.zs_nr AS zs,
  
j_ges,

de_mw_abh_j_lkw(mw_abh) AS j_lkw,
  
de_mw_abh_v_pkw(mw_abh) AS v_pkw,
   
de_mw_abh_v_lkw(mw_abh) AS v_lkw,

de_mw_abh_p_bel(mw_abh) AS p_bel
  
FROM messungen_v_dat_2009_04_13 m
   
INNER JOIN de_mw w ON w.nr = m.mw_nr
  
WHERE  m.ganglinientyp = 'M'
 
AND ' 890 ' = m.minute_tag;

explain analyse brings up
Nested Loop  (cost=0.00..66344.47 rows=4750 width=10) (actual 
time=134.160..19574.228 rows=4148 loops=1)
  ->  Index Scan using messungen_v_dat_2009_04_13_gtyp_minute_tag_idx 
on messungen_v_dat_2009_04_13 m  (cost=0.00..10749.14 rows=4750 
width=8) (actual time=64.681..284.732 rows=4148 loops=1)
Index Cond: ((ganglinientyp = 'M'::bpchar) AND (891::smallint 
= minute_tag))
  ->  Index Scan using de_nw_nr_idx on de_mw w  (cost=0.00..10.69 
rows=1 width=10) (actual time=4.545..4.549 rows=1 loops=4148)

Index Cond: (w.nr = m.mw_nr)
Total runtime: 19590.078 ms

Seems quite slow to me.
Is this query plan near to optimal or are their any serious flaws?




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Number of tables

2009-08-31 Thread Mike Ivanov


Fabio La Farcioli wrote:

Hi to all,

i am developing a web app for thousands users (1.000/2.000).

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


As a someone with a ~50K-table database, I can tell you it's definitely 
possible to survive with such a layout :-)


However, expect very slow (hours) pg_dump, \dt and everything else that 
requires reading schema information for the whole db.



Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Number of tables

2009-08-31 Thread Mike Ivanov


Greg Stark wrote:

You may
not expect to be need to run queries which combine multiple users'
data now but you will eventually.
  


We store cross-user data in a separate schema, which solves all *our* 
problems.



This doesn't work so great when each user is going to be specifying
their own custom schema on the fly 


This works fine, at least we didn't encounter any issues with that.


but that's not really what
relational databases were designed for. 


Sometimes you have to.. you know, unusual things to meet some specific 
requirements, like independent user schemas. It's not a conventional web 
app we run :-)


I'm not arguing this is a bit extremal approach, but if one is forced to 
go this path, it's quite walkable ;-)


Mike


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What's faster?

2003-12-26 Thread Mike Nolan

> Because Postgres requires VACUUM ANALYZE more frequently on updated tables,
> should I break this single field out into its own table, and if so what kind
> of a speed up can I expect to achieve. I would be appreciative of any
> guidance offered.

Unless that field is part of the key, I wouldn't think that a vacuum 
analyze would be needed, as the key distribution isn't changing. 

I don't know if that is still true if that field is indexed.  Tom?

Even then, as I understand things vacuum analyze doesn't rebuild indexes, 
so I could see a need to drop and rebuild indexes on a regular basis,
even if you move that field into a separate table.  
--
Mike Nolan

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PERFORM] Very slow update + not using clustered index

2004-01-01 Thread Mike Glover

)::text
 ->  Seq Scan on inventory i (cost=0.00..16173.92 rows=882192
width=14) (actual time=0.088..1942.173 rows=882192 loops=1)
 Total runtime: 174926.115 ms

So, my first question is: why is the planner still sorting on price when
isbn seems (considerably) quicker, and how can I force it to sort by
isbn(if I even should)?

The second question is:  why, oh why does the update take such and
obscenely long time to complete?  The 175s (and even 216s) for the
select seems reasonable given the size of the tables, but not 3000s to
update the same rows. The processor (AMD 1.3GHz) is 90%+ utilization for
most of the execution time.

I can post more information if it would be helpful, but this post is
long enough already.

TIA, and happy new year.

-mike


-- 
Mike Glover
Key ID BFD19F2C <[EMAIL PROTECTED]>


pgp0.pgp
Description: PGP signature

Re: [PERFORM] Very slow update + not using clustered index

2004-01-01 Thread Mike Glover

Tom-

   Thanks for the quick response.  More details are inline.

-mike

On Thu, 01 Jan 2004 23:06:11 -0500
Tom Lane <[EMAIL PROTECTED]> wrote:

> Mike Glover <[EMAIL PROTECTED]> writes:

> AFAICS these plans are identical, and therefore the difference in
> runtime must be ascribed to the time spent actually doing the updates.
> It seems unlikely that the raw row inserts and updating the single
> index could be quite that slow --- perhaps you have a foreign key
> or trigger performance problem?

   There are no foreign keys or triggers for either of the tables.

> Is this PG 7.4?  

Yes, PG 7.4

> 
> A quick experiment shows that if the planner does not have any reason
> to prefer one ordering over another, the current coding will put the
> last WHERE clause first:
[snip]> 
> and so you could probably improve matters just by switching the order
> of your WHERE clauses.  Of course this answer will break as soon as
> anyone touches any part of the related code, so I'd like to try to fix
> it so that there is actually a principled choice made.  Could you send
> along the pg_stats rows for these columns?
> 

It looks like the planner is already making a principled choice:

bookshelf=> explain select s.* from summary s, inventory i where s.isbn
= i.isbn and s.price_min = i.price; 
   QUERY PLAN 
---
Merge Join  (cost=491180.66..512965.72 rows=9237 width=58)
   Merge Cond: (("outer".price_min = "inner".price)
AND ("outer"."?column8?" = "inner"."?column3?"))
   ->  Sort (cost=361887.05..367000.05 rows=2045201 width=58)
 Sort Key: s.price_min, (s.isbn)::text
 ->  Seq Scan on summary s (cost=0.00..44651.01 rows=2045201
width=58)
   ->  Sort (cost=129293.61..131499.09 rows=882192 width=25)
Sort Key: i.price, (i.isbn)::text
 ->  Seq Scan on inventory i (cost=0.00..16173.92 rows=882192
width=25)
(8 rows)

bookshelf=> explain select s.* from summary s, inventory i where
s.price_min = i.price and s.isbn = i.isbn;  
  QUERY PLAN 
---
Merge Join  (cost=491180.66..512965.72 rows=9237 width=58)
   Merge Cond: (("outer".price_min = "inner".price) AND
("outer"."?column8?" ="inner"."?column3?"))
   ->  Sort (cost=361887.05..367000.05 rows=2045201 width=58)
Sort Key: s.price_min, (s.isbn)::text
 ->  Seq Scan on summary s (cost=0.00..44651.01 rows=2045201
width=58)
   -> Sort(cost=129293.61..131499.09 rows=882192 width=25)
Sort Key: i.price, (i.isbn)::text
 ->  Seq Scan on inventory i (cost=0.00..16173.92 rows=882192
width=25)
(8 rows)

Here are the pg_stats rows:
bookshelf=> select * from pg_stats where schemaname='de' and
tablename='inventory' and attname='isbn'; schemaname | tablename |
attname | null_frac | avg_width | n_distinct | most_common_vals |
most_common_freqs | 
histogram_bounds  |
correlation
+---+-+---+---+
+--+---+---
---
--+- de | inventory | isbn| 
   0 |14 | -1 |  |  
|
{0002551543,0198268211,0375507299,0486231305,0673395197,0767901576,0810
304430,0865738890,0931595029,1574160052,9971504014} |   1(1 row)

bookshelf=> select * from pg_stats where schemaname='de' and
tablename='inventory' and attname='price'; schemaname | tablename |
attname | null_frac | avg_width | n_distinct |  
most_common_vals   |
 most_common_freqs  |   
histogram_bounds |
correlation
+---+-+---+---+
+--+---
---
---+---
--+- de | inventory | price 
 | 0 |11 |   1628 |
{59.95,0.00,54.88,53.30,60.50,64.25,73.63,49.39,50.02,53.37} |
{0.259667,0.0063,0.0053,0.0047,0.0047,0.0047,0.0046
6667,0.0043,0.004,0.004

Re: [PERFORM] Slow query problem

2004-01-08 Thread Mike Glover

On Thu, 08 Jan 2004 16:52:05 +1100
Bradley Tate <[EMAIL PROTECTED]> wrote:
> Am I correct in interpreting that most time was spent doing the
> sorting? 

looks so.  your table is about 70MB total size, and its getting loaded
completely into memory (you have 12000 * 8k = 96M available).  26s to
load 70MB from disk seems reasonable.  The rest of the time is used for
sorting.

> Explain confuses the heck out of me and any help on how I could make 
> this run faster would be gratefully received.
> 

You should bump sort_mem as high as you can stand.  with only 8MB sort
memory available, you're swapping intermediate sort pages to disk --
a lot. Try the query with sort_mem set to 75MB (to do the entire sort in
memory). 

-mike

> Cheers,
> 
> Bradley.
> 
> 
> ---(end of
> broadcast)--- TIP 8: explain analyze is your
> friend

-- 
Mike Glover
Key ID BFD19F2C <[EMAIL PROTECTED]>

pgp0.pgp
Description: PGP signature

Re: [PERFORM] [pgsql-advocacy] MySQL+InnoDB vs. PostgreSQL test?

2004-02-04 Thread Mike Nolan

> Seriously, I am tired of this kind of question. You gotta get bold 
> enough to stand up in a "meeting" like that, say "guy's, you can ask me 
> how this compares to Oracle ... but if you're seriously asking me how 
> this compares to MySQL, call me again when you've done your homework".

Can they call you at the unemployment office?
--
Mike Nolan

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[PERFORM] RAID or manual split?

2004-02-17 Thread Mike Glover


It seems, that if I know the type and frequency of the queries a
database will be seeing, I could split the database by hand over
multiple disks and get better performance that I would with a RAID array
with similar hardware.  Most of the data is volatile
and easily replaceable (and the rest is backed up independently), so
redundancy isn't importand, and I'm willing to do some ongoing
maintenance if I can get a decent speed boost.  Am I misguided, or might
this work? details of my setup are below:

Six large (3-7 Mrow) 'summary' tables, each being updated continuously
by 5-20 processes with about 0.5 transactions/second/process.  

Periodically (currently every two weeks), join queries are
performed between one of the 'summary' tables(same one each time) and
each of the other five.  Each join touches most rows of both tables,
indexes aren't used.  Results are written into a separate group of
'inventory' tables (about 500 Krow each), one for each join.

There are frequent (100-1000/day) queries of both the
inventory and summary tables using the primary key -- always using the
index and returning < 10 rows.

We're currently getting (barely) acceptable performance from a single
15k U160 SCSI disk, but db size and activity are growing quickly. 
I've got more disks and a battery-backed LSI card on order.

-mike


-- 
Mike Glover
GPG Key ID BFD19F2C <[EMAIL PROTECTED]>


pgp0.pgp
Description: PGP signature

Re: [PERFORM] Fixed width rows faster?

2004-03-05 Thread Mike Nolan

> Frankly, the only reason to use anything other than TEXT is compatibility with 
> other databases and applications.

You don't consider a requirement that a field be no longer than a 
certain length a reason not to use TEXT?  
--
Mike Nolan

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Fixed width rows faster?

2004-03-05 Thread Mike Nolan

> >>You don't consider a requirement that a field be no longer than a 
> >>certain length a reason not to use TEXT?  
> 
> Can't you just create a TEXT(255) field same as you can just create 
> VARCHAR  (with no length) field?  I think they're basically synonyms for 
> each other these days.

I'll defer to the SQL standard gurus on this, as well as to the internals
guys, but I suspect there is a difference between the standard itself 
and implementor details, such as how char, varchar, varchar2 and text 
are implemented.  As long as things work as specified, I don't think 
the standard cares much about what's happening behind the curtain.
--
Mike Nolan

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Fixed width rows faster?

2004-03-06 Thread Mike Nolan

> Actually, I don't. Good reason to have a check constraint on it though
> (hint, check constraints can be changed while column types cannot be, at
> this moment).

Is there a way to copy a table INCLUDING the check constraints?  If not,
then that information is lost, unlike varchar(n).
--
Mike Nolan


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

1 2 >

1 - 100 of 161 matches

Mail list logo