[PERFORM] Autocommit

2005-02-14 Thread Hasnul Fadhly bin Hasan
Hi,
I am just wondering, by default, autocommit is enabled for every client 
connection.  The documentations states that we have to use BEGIN
and  END or COMMIT so to increase performance by not using autocommit. 
My question is, when we use the BEGIN and END statements, is autocommit 
unset/disabled automatically or we have to disable/unset it manually?

Hasnul

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Autocommit

2005-02-14 Thread Michael Fuhr
On Mon, Feb 14, 2005 at 04:01:20PM +0800, Hasnul Fadhly bin Hasan wrote:
> 
> I am just wondering, by default, autocommit is enabled for every client 
> connection.  The documentations states that we have to use BEGIN
> and  END or COMMIT so to increase performance by not using autocommit. 
> My question is, when we use the BEGIN and END statements, is autocommit 
> unset/disabled automatically or we have to disable/unset it manually?

What version of PostgreSQL is your server running and what client
software are you using?  PostgreSQL 7.3 had a server-side autocommit
setting, but it caused problems with some clients so 7.4 got rid
of it and left autocommit up to the client.  How to enable or disable
client-side autocommit depends on the client software, but if you're
able to execute a BEGIN (or START TRANSACTION) statement then you
should be inside a transaction until you execute COMMIT (or END)
or ROLLBACK.  That is, unless your client intercepts these statements
and does whatever it wants

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Autocommit

2005-02-14 Thread Hasnul Fadhly bin Hasan




Hi Micheal,

Thanks for the reply.  I am using postgres 7.4.5 client.  There's one
that is using 7.4.1 client.  I'm not sure if there would be any
difference.
When i use psql and check the status of autocommit, it is set to
enable.  I'm not sure if libpq and psql uses the same defaults.

Thanks,

Hasnul



Michael Fuhr wrote:

  On Mon, Feb 14, 2005 at 04:01:20PM +0800, Hasnul Fadhly bin Hasan wrote:
  
  
I am just wondering, by default, autocommit is enabled for every client 
connection.  The documentations states that we have to use BEGIN
and  END or COMMIT so to increase performance by not using autocommit. 
My question is, when we use the BEGIN and END statements, is autocommit 
unset/disabled automatically or we have to disable/unset it manually?

  
  
What version of PostgreSQL is your server running and what client
software are you using?  PostgreSQL 7.3 had a server-side autocommit
setting, but it caused problems with some clients so 7.4 got rid
of it and left autocommit up to the client.  How to enable or disable
client-side autocommit depends on the client software, but if you're
able to execute a BEGIN (or START TRANSACTION) statement then you
should be inside a transaction until you execute COMMIT (or END)
or ROLLBACK.  That is, unless your client intercepts these statements
and does whatever it wants

  





Re: [PERFORM] Autocommit

2005-02-14 Thread Michael Fuhr
On Mon, Feb 14, 2005 at 04:58:31PM +0800, Hasnul Fadhly bin Hasan wrote:

> Thanks for the reply.  I am using postgres 7.4.5 client.  There's one 
> that is using 7.4.1 client.  I'm not sure if there would be any difference.
> When i use psql and check the status of autocommit, it is set to 
> enable.  I'm not sure if libpq and psql uses the same defaults.

As far as I can tell, libpq doesn't have an autocommit setting --
it just sends statements on behalf of the application.  Clients
that allow the user to disable autocommit presumably do so by
implicitly sending BEGIN statements to start new transactions.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Benchmark

2005-02-14 Thread Harald Fuchs
In article <[EMAIL PROTECTED]>,
Greg Stark <[EMAIL PROTECTED]> writes:

> Christopher Browne <[EMAIL PROTECTED]> writes:
>> After all, the cost of a computer system to run the transactions is
>> likely to be comprised of some combination of software licenses and
>> hardware costs.  Even if the software is free, the hardware isn't.

> And labour costs.

Except that working with PostgreSQL is fun, not labour :-)


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] estimated rows vs. actual rows

2005-02-14 Thread Christopher Browne
After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Jaime Casanova) 
belched out:
> On Sun, 13 Feb 2005 13:41:09 -0800, Josh Berkus  wrote:
>> Jaime,
>> 
>> > Why is this query using a seq scan rather than a index scan?
>> 
>> Because it thinks a seq scan will be faster.
>> 
> I will suggest him to probe with seq scans disabled.
>
> But, IMHO, if the table has 143902 and it thinks will retrieve 2610
> (almost 1.81% of the total). it won't be faster with an index?

If the 2610 rows are scattered widely enough, it may be cheaper to do
a seq scan.

After all, with a seq scan, you read each block of the table's pages
exactly once.

With an index scan, you read index pages _and_ table pages, and may do
and redo some of the pages.

It sounds as though it's worth forcing the matter and trying it both
ways and comparing them.  Don't be surprised if the seq scan is in
fact faster...
-- 
select 'cbbrowne' || '@' || 'gmail.com';
http://cbbrowne.com/info/emacs.html
When aiming for the common denominator, be prepared for the occasional
division by zero.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] String matching

2005-02-14 Thread Ivan Voras
Is there a way to use indexes for queries like:
select field from table where field like 'abc%'
(i.e. filter for string fields that begin with something) ?
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] String matching

2005-02-14 Thread PFC
normally you shouldn't have to do anything, it should just work :
select field from table where field like 'abc%'
CREATE INDEX ... ON table( field );
that's all
	If it does not use the index, I saw on the mailing list that the locale  
could be an issue.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] String matching

2005-02-14 Thread Ivan Voras
PFC wrote:
normally you shouldn't have to do anything, it should just work :
select field from table where field like 'abc%'

If it does not use the index, I saw on the mailing list that the 
locale  could be an issue.
Oh yes, I forgot about that :( I do have LC_COLLATE (on latin2)...
It's a shame PostgreSQL doesn't allow collation rules on specific fields 
- this field I'm using here will always be 7bit ASCII :(

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] String matching

2005-02-14 Thread Ivan Voras
Stephan Szabo wrote:
You can also create an index using a _pattern_ops operator
class which should be usable even with other collations.
Could you give me an example for this, or point me to the relevant 
documentation?

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] String matching

2005-02-14 Thread Stephan Szabo
On Mon, 14 Feb 2005, Ivan Voras wrote:

> PFC wrote:
> >
> > normally you shouldn't have to do anything, it should just work :
> >
> >> select field from table where field like 'abc%'
>
> > If it does not use the index, I saw on the mailing list that the
> > locale  could be an issue.
>
> Oh yes, I forgot about that :( I do have LC_COLLATE (on latin2)...
>
> It's a shame PostgreSQL doesn't allow collation rules on specific fields
> - this field I'm using here will always be 7bit ASCII :(

You can also create an index using a _pattern_ops operator
class which should be usable even with other collations.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] String matching

2005-02-14 Thread Stephan Szabo

On Mon, 14 Feb 2005, Ivan Voras wrote:

> Stephan Szabo wrote:
>
> > You can also create an index using a _pattern_ops operator
> > class which should be usable even with other collations.
>
> Could you give me an example for this, or point me to the relevant
> documentation?

Basically, you could have something like:

create table test_table(a text);
create index test_index on test_table(a text_pattern_ops);

--

http://www.postgresql.org/docs/8.0/interactive/indexes-opclass.html


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] String matching

2005-02-14 Thread Ivan Voras
Stephan Szabo wrote:
On Mon, 14 Feb 2005, Ivan Voras wrote:

Could you give me an example for this, or point me to the relevant
documentation?

http://www.postgresql.org/docs/8.0/interactive/indexes-opclass.html
Thanks! I didn't know this and I certainly didn't think it would be that
easy :)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] String matching

2005-02-14 Thread Stephan Szabo
On Mon, 14 Feb 2005, Ivan Voras wrote:

> Stephan Szabo wrote:
> > On Mon, 14 Feb 2005, Ivan Voras wrote:
>
> >>Could you give me an example for this, or point me to the relevant
> >>documentation?
>
> >
> > http://www.postgresql.org/docs/8.0/interactive/indexes-opclass.html
>
> Thanks! I didn't know this and I certainly didn't think it would be that
> easy :)

Well, it's not perfect. It requires a separate index from one for normal
comparisons, so it's trading modification speed for LIKE lookup speed.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Mark Aufflick
Hi All,
I have boiled my situation down to the following simple case: (postgres 
version 7.3)

* Query 1 is doing a sequential scan over a table (courtesy of field 
ILIKE 'foo%') and index joins to a few others
* Query 2 is doing a functional index scan over the same table 
(lower(field) LIKE 'foo%') and index joins to a few others
* neither query has an order by clause
* for the purpose of testing, both queries are designed to return the 
same result set

Obviously Q2 is faster than Q1, but if I ever run them both at the same 
time (lets say I run two of Q1 and one of Q2 at the same time) then Q2 
consistently returns WORSE times than Q1 (explain analyze confirms that 
it is using the index).

My assumption is that the sequential scan is blowing the index from any 
cache it might live in, and simultaneously stealing all the disk IO 
that is needed to access the index on disk (the table has 200,000 
rows).

If I simplify the case to not do the index joins (ie. operate on the 
one table only) the situation is not as dramatic, but similar.

My thoughts are:
1) kill the sequential scan - but unfortunately I don't have direct 
control over that code
2) change the way the server allocates/prioritizes different caches - i 
don't know enough about how postgres caches work to do this (if it's 
possible)
3) try it on postgres 7.4 - possible, but migrating the system to 7.4 
in production will be hard because the above code that I am not 
responsible for has a lot of (slightly wacky) implicit date casts
4) ask the fine people on the mailing list for other suggestions!
--
Mark Aufflick
  e  [EMAIL PROTECTED]
  w  www.pumptheory.com (work)
  w  mark.aufflick.com (personal)
  p  +61 438 700 647
  f  +61 2 9436 4737


iBurst Wireless Broadband from $34.95/month   www.platformnetworks.net
Forward undetected SPAM to:   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to interpret this explain analyse?

2005-02-14 Thread Kevin Brown
Tom Lane wrote:
> "Joost Kraaijeveld" <[EMAIL PROTECTED]> writes:
> > I cannot change the query (it is geneated by a tool called Clarion) but it 
> > something like (from the psqlodbc_xxx.log):
> > "...
> > declare SQL_CUR01 cursor for 
> > SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN 
> > "klt_alg" B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;
> > fetch 100 in SQL_CUR01;
> > ..."
> 
> Well, the planner does put some emphasis on startup time when dealing
> with a DECLARE CURSOR plan; the problem you face is just that that
> correction isn't large enough.  (From memory, I think it optimizes on
> the assumption that 10% of the estimated rows will actually be fetched;
> you evidently want a setting of 1% or even less.)

Ouch.  Is this really a reasonable assumption?  I figured the primary
use of a cursor was to fetch small amounts of data at a time from a
large table, so 10% seems extremely high as an average fetch size.  Or
is the optimization based on the number of rows that will be fetched
by the cursor during the cursor's lifetime (as opposed to in a single
fetch)?

Also, one has to ask what the consequences are of assuming a value too
low versus too high.  Which ends up being worse?

> We once talked about setting up a GUC variable to control the percentage
> of a cursor that is estimated to be fetched:
> http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
> It never got done but that seems like the most reasonable solution to
> me.

Or keep some statistics on cursor usage, and adjust the value
dynamically based on actual cursor queries (this might be easier said
than done, of course).


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Michael Ryan S. Puncia








Hi,

 

I have 3 tables in the database with 80G of data, one of
them is almost 40G and the remaining 2 tables has 20G each.

We use this database mainly for query and updating is done only quarterly
and the database perform well. My problem

is after updating and then run VACCUM FULL ANALYZE  vacuuming the tables
takes days to complete. I hope someone

can help me solve my problem.

 

Thanks

 

 








Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Michael Ryan S. Puncia








 

But I need to do full vacuum because I
deleted some of the fields that are not use anymore and I also add another
fields. Is there

another  way  to speed up full
vacuum?

 

 









From: Iain
[mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 15, 2005
9:52 AM
To: Michael Ryan S. Puncia; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] VACCUM FULL
ANALYZE PROBLEM



 



Hi,





 





just make sure that your freespace map is
big enough and then do a vacuum analyse without the full option. 





 





I can imagine that database performance
might not be as good as it would be after a vacuum full, though I expect that
it wouldn't make much difference.





 





regards





Iain 







- Original Message - 





From: Michael Ryan
S. Puncia 





To:
pgsql-performance@postgresql.org 





Sent:
Tuesday, February 15, 2005 10:34 AM





Subject:
[PERFORM] VACCUM FULL ANALYZE PROBLEM





 



Hi,

 

I have 3 tables in the database with 80G of data, one of
them is almost 40G and the remaining 2 tables has 20G each.

We use this database mainly for query and updating is done only
quarterly and the database perform well. My problem

is after updating and then run VACCUM FULL ANALYZE  vacuuming the
tables takes days to complete. I hope someone

can help me solve my problem.

 

Thanks

 

 





__ NOD32 1.998 (20050212) Information __

This message was checked by NOD32 Antivirus System.
http://www.nod32.com








Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Iain



Hi,
 
just make sure that your freespace map is 
big enough and then do a vacuum analyse without the full option. 
 
I can imagine that database performance 
might not be as good as it would be after a vacuum full, though I expect that it 
wouldn't make much difference.
 
regards
Iain 

  - Original Message - 
  From: 
  Michael 
  Ryan S. Puncia 
  To: pgsql-performance@postgresql.org 
  
  Sent: Tuesday, February 15, 2005 
  10:34 AM
  Subject: [PERFORM] VACCUM FULL 
  ANALYZE PROBLEM
  
  
  Hi,
   
  I have 3 tables in the database 
  with 80G of data, one of them is almost 40G and the remaining 2 tables has 20G 
  each.
  We use this database mainly for query and updating is 
  done only quarterly and the database perform well. My 
  problem
  is after updating and then run VACCUM FULL ANALYZE 
   vacuuming the tables takes days to complete. I hope 
  someone
  can help me solve my 
  problem.
   
  Thanks
   
   


Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Rod Taylor
On Tue, 2005-02-15 at 09:34 +0800, Michael Ryan S. Puncia wrote:
> Hi,
> 
>  
> 
> I have 3 tables in the database with 80G of data, one of them is
> almost 40G and the remaining 2 tables has 20G each.
> 
> We use this database mainly for query and updating is done only
> quarterly and the database perform well. My problem
> 
> is after updating and then run VACCUM FULL ANALYZE  vacuuming the
> tables takes days to complete. I hope someone

I suspect the VACUUM FULL is the painful part. Try running CLUSTER on
the table or changing a column type (in 8.0) instead.
-- 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Iain




>> But I need to 
do full vacuum because I deleted some of the fields that are not use anymore and 
I also add another fields. Is there
>> another 
 way  to speed up full vacuum?
 
Hmmm... a full vacuum 
may help to re-organize the structure of modified tables, but whether this is 
significant or not is another matter. I don't know enough of the internals to 
comment on that maybe someone else who knows more can.
 
The obvious thing is 
the vacuum memory setting (in postgresql.conf). Presumably, you could set this 
quite high at least just for the duration of the vacuum 
anyway.
 
Would the total time be 
reduced by dropping the indexes, then vacuuming and rebuilding the indexes? I 
havn't tried anything like this so I can't say.
 
You should probably say 
what version of the db you are using and describe your system a 
little.
 
Regards
Iain
- 
Original Message - 

  From: 
  Michael 
  Ryan S. Puncia 
  To: 'Iain' ; pgsql-performance@postgresql.org 
  
  Sent: Tuesday, February 15, 2005 
  11:10 AM
  Subject: RE: [PERFORM] VACCUM FULL 
  ANALYZE PROBLEM
  
  
   
  But I need to do full 
  vacuum because I deleted some of the fields that are not use anymore and I 
  also add another fields. Is there
  another  way 
   to speed up full vacuum?
   
   
  
  
  
  
  From: Iain 
  [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 15, 2005 9:52 
  AMTo: Michael Ryan S. 
  Puncia; pgsql-performance@postgresql.orgSubject: Re: [PERFORM] VACCUM FULL 
  ANALYZE PROBLEM
   
  
  Hi,
  
   
  
  just make sure that your 
  freespace map is big enough and then do a vacuum analyse without the full 
  option. 
  
   
  
  I can imagine that 
  database performance might not be as good as it would be after a vacuum full, 
  though I expect that it wouldn't make much 
  difference.
  
   
  
  regards
  
  Iain 
  
  

- Original Message 
- 

From: Michael Ryan 
S. Puncia 

To: pgsql-performance@postgresql.org 


Sent: Tuesday, February 15, 
2005 10:34 AM

Subject: [PERFORM] VACCUM FULL 
ANALYZE PROBLEM

 
Hi,
 
I have 3 tables in the database 
with 80G of data, one of them is almost 40G and the remaining 2 tables has 
20G each.
We use this database mainly for query and updating 
is done only quarterly and the database perform well. My 
problem
is after updating and then run VACCUM FULL ANALYZE 
 vacuuming the tables takes days to complete. I hope 
someone
can help me solve my 
problem.
 
Thanks
 
 
  __ NOD32 1.998 (20050212) Information 
  __This message was checked by NOD32 Antivirus System.http://www.nod32.com


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Iain
Hi,
I think there was some discussion about seq scans messing up the cache, and 
talk about doing something about it but I don't think it has been addressed 
yet. Maybe worth a troll through the archives.

It is certainly true that in many situations, a seq scan is preferable to 
using an index. I have been testing a situation here on two versions of the 
same database, one of the databases is much bigger than the other 
(artificially bloated for testing purposes). Some of the query plans change 
to use seq scans on the big database, where they used indexes on the little 
database - but either way, in *single user* testing the performance is fine. 
My concern is that this kind of testing has very little relevance to the 
real world of multiuser processing where contention for the cache becomes an 
issue.  It may be that, at least in the current situation, postgres is 
giving too much weight to seq scans based on single user, straight line 
performance comparisons. If your assumption is correct, then addressing that 
might help, though it's bound to have it's compromises too...

regards
Iain

- Original Message - 
From: "Mark Aufflick" <[EMAIL PROTECTED]>
To: 
Sent: Tuesday, February 15, 2005 8:34 AM
Subject: [PERFORM] seq scan cache vs. index cache smackdown


Hi All,
I have boiled my situation down to the following simple case: (postgres 
version 7.3)

* Query 1 is doing a sequential scan over a table (courtesy of field ILIKE 
'foo%') and index joins to a few others
* Query 2 is doing a functional index scan over the same table 
(lower(field) LIKE 'foo%') and index joins to a few others
* neither query has an order by clause
* for the purpose of testing, both queries are designed to return the same 
result set

Obviously Q2 is faster than Q1, but if I ever run them both at the same 
time (lets say I run two of Q1 and one of Q2 at the same time) then Q2 
consistently returns WORSE times than Q1 (explain analyze confirms that it 
is using the index).

My assumption is that the sequential scan is blowing the index from any 
cache it might live in, and simultaneously stealing all the disk IO that 
is needed to access the index on disk (the table has 200,000 rows).

If I simplify the case to not do the index joins (ie. operate on the one 
table only) the situation is not as dramatic, but similar.

My thoughts are:
1) kill the sequential scan - but unfortunately I don't have direct 
control over that code
2) change the way the server allocates/prioritizes different caches - i 
don't know enough about how postgres caches work to do this (if it's 
possible)
3) try it on postgres 7.4 - possible, but migrating the system to 7.4 in 
production will be hard because the above code that I am not responsible 
for has a lot of (slightly wacky) implicit date casts
4) ask the fine people on the mailing list for other suggestions!
--
Mark Aufflick
  e  [EMAIL PROTECTED]
  w  www.pumptheory.com (work)
  w  mark.aufflick.com (personal)
  p  +61 438 700 647
  f  +61 2 9436 4737


iBurst Wireless Broadband from $34.95/month   www.platformnetworks.net
Forward undetected SPAM to:   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings 

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Rod Taylor
> My concern is that this kind of testing has very little relevance to the 
> real world of multiuser processing where contention for the cache becomes an 
> issue.  It may be that, at least in the current situation, postgres is 
> giving too much weight to seq scans based on single user, straight line 

To be fair, a large index scan can easily throw the buffers out of whack
as well. An index scan on 0.1% of a table with 1 billion tuples will
have a similar impact to buffers as a sequential scan of a table with 1
million tuples.

Any solution fixing buffers should probably not take into consideration
the method being performed (do you really want to skip caching a
sequential scan of a 2 tuple table because it didn't use an index) but
the volume of data involved as compared to the size of the cache.

I've often wondered if a single 1GB toasted tuple could wipe out the
buffers. I would suppose that toast doesn't bypass them.
-- 
Rod Taylor <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Tom Lane
"Iain" <[EMAIL PROTECTED]> writes:
>> another  way  to speed up full vacuum?

> Hmmm... a full vacuum may help to re-organize the structure of modified 
> tables, but whether this is significant or not is another matter.

Actually, VACUUM FULL is designed to work nicely for the situation where
a table has say 10% wasted space and you want the wasted space all
compressed out.  When there is a lot of wasted space, so that nearly all
the rows have to be moved to complete the compaction operation, VACUUM
FULL is not a very good choice.  And it simply moves rows around, it
doesn't modify the rows internally; so it does nothing at all to reclaim
space that would have been freed up by DROP COLUMN operations.

CLUSTER is actually a better bet if you want to repack a table that's
suffered a lot of updates or deletions.  In PG 8.0 you might also
consider one of the rewriting variants of ALTER TABLE.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Christopher Browne
The world rejoiced as [EMAIL PROTECTED] (Mark Aufflick) wrote:
> Hi All,
>
> I have boiled my situation down to the following simple case:
> (postgres version 7.3)
>
> * Query 1 is doing a sequential scan over a table (courtesy of field
> ILIKE 'foo%') and index joins to a few others
> * Query 2 is doing a functional index scan over the same table
> (lower(field) LIKE 'foo%') and index joins to a few others
> * neither query has an order by clause
> * for the purpose of testing, both queries are designed to return the
> same result set
>
> Obviously Q2 is faster than Q1, but if I ever run them both at the
> same time (lets say I run two of Q1 and one of Q2 at the same time)
> then Q2 consistently returns WORSE times than Q1 (explain analyze
> confirms that it is using the index).
>
> My assumption is that the sequential scan is blowing the index from
> any cache it might live in, and simultaneously stealing all the disk
> IO that is needed to access the index on disk (the table has 200,000
> rows).

There's something to be said for that...

> If I simplify the case to not do the index joins (ie. operate on the
> one table only) the situation is not as dramatic, but similar.
>
> My thoughts are:
>
> 1) kill the sequential scan - but unfortunately I don't have direct
> control over that code

This is a good choice, if plausible...

> 2) change the way the server allocates/prioritizes different caches -
> i don't know enough about how postgres caches work to do this (if it's
> possible)

That's what the 8.0 cache changes did...  Patent claim issues are
leading to some changes to the prioritization, which is liable to
change 8.0.something and 8.1.

> 3) try it on postgres 7.4 - possible, but migrating the system to 7.4
> in production will be hard because the above code that I am not
> responsible for has a lot of (slightly wacky) implicit date casts

Moving to 7.4 wouldn't materially change the situation; you'd have to
go all the way to version 8.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "gmail.com")
http://linuxdatabases.info/~cbbrowne/postgresql.html
Rules of the Evil Overlord #32. "I will not fly into a rage and kill a
messenger who brings me bad news  just to illustrate how evil I really
am. Good messengers are hard to come by."


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM

2005-02-14 Thread Iain
OK, that's interesting. So the original assumption that vacuum full was 
(Bneeded was completely wrong anyway.
(B
(BIf table re-organisation isn't required a plain vacuum would be fastest. I 
(Bwill take a guess that the next best alternative is to do the "create table 
(Bnewtable as select ... order by ..." thing and then create the indexes and 
(Bstuff. This would reorganize the table completely. After that you have the 
(Bcluster command, and coming in last place is vacuum full. Sound about right?
(B
(BMichael, you said that a vacuum that runs for 3 days is too long, but hasn't 
(Bgiven any specific requirements or limitations. Hopefully you can find 
(Bsomething suitable in the alternatives listed above.
(B
(Bregards
(BIain
(B
(B- Original Message - 
(BFrom: "Tom Lane" <[EMAIL PROTECTED]>
(BTo: "Iain" <[EMAIL PROTECTED]>
(BCc: "Michael Ryan S. Puncia" <[EMAIL PROTECTED]>; 
(B
(BSent: Tuesday, February 15, 2005 2:30 PM
(BSubject: Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM
(B
(B
(B> "Iain" <[EMAIL PROTECTED]> writes:
(B>>> another  way  to speed up full vacuum?
(B>
(B>> Hmmm... a full vacuum may help to re-organize the structure of modified
(B>> tables, but whether this is significant or not is another matter.
(B>
(B> Actually, VACUUM FULL is designed to work nicely for the situation where
(B> a table has say 10% wasted space and you want the wasted space all
(B> compressed out.  When there is a lot of wasted space, so that nearly all
(B> the rows have to be moved to complete the compaction operation, VACUUM
(B> FULL is not a very good choice.  And it simply moves rows around, it
(B> doesn't modify the rows internally; so it does nothing at all to reclaim
(B> space that would have been freed up by DROP COLUMN operations.
(B>
(B> CLUSTER is actually a better bet if you want to repack a table that's
(B> suffered a lot of updates or deletions.  In PG 8.0 you might also
(B> consider one of the rewriting variants of ALTER TABLE.
(B>
(B> regards, tom lane
(B>
(B> ---(end of broadcast)---
(B> TIP 6: Have you searched our list archives?
(B>
(B>   http://archives.postgresql.org 
(B
(B
(B---(end of broadcast)---
(BTIP 6: Have you searched our list archives?
(B
(B   http://archives.postgresql.org

Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Iain
Hi Rod,
(B 
(B> Any solution fixing buffers should probably not take into consideration
(B> the method being performed (do you really want to skip caching a
(B> sequential scan of a 2 tuple table because it didn't use an index) but
(B> the volume of data involved as compared to the size of the cache.
(B
(BYes, in fact indexes aren't so different to tables really in that regard.
(B
(BIt sounds like version 8 may help out anyway.
(B
(Bregards
(BIain
(B
(B---(end of broadcast)---
(BTIP 7: don't forget to increase your free space map settings

Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Greg Stark
Mark Aufflick <[EMAIL PROTECTED]> writes:

> Obviously Q2 is faster than Q1, 

That's not really obvious at all. If there are lots of records being returned
the index might not be faster than a sequential scan.

> My assumption is that the sequential scan is blowing the index from any cache
> it might live in, and simultaneously stealing all the disk IO that is needed 
> to
> access the index on disk (the table has 200,000 rows).

It kind of sounds to me like you've lowered random_page_cost to reflect the
fact that your indexes are nearly always completely cached. But when they're
not this unrealistic random_page_cost causes indexes to be used when they're
no longer faster.

Perhaps you should post an "EXPLAIN ANALYZE" of your Q1 and Q2 (the latter
preferable with and without enable_indexscan, but since it's a join you may
not be able to get precisely the comparable plan without just that one index
scan.)

> 2) change the way the server allocates/prioritizes different caches - i don't
> know enough about how postgres caches work to do this (if it's possible)

Postgres keeps one set of shared buffers, not separate pools . Normally you
only allocate a small amount of your memory for Postgres and let the OS handle
disk caching.

What is your shared_buffers set to and how much memory do you have?

> 3) try it on postgres 7.4 - possible, but migrating the system to 7.4 in
> production will be hard because the above code that I am not responsible for
> has a lot of (slightly wacky) implicit date casts

I can't think of any 7.4 changes that would affect this directly, but there
were certainly plenty of changes that had broad effects. you never know. 

8.0, on the other hand, has a new algorithm that specifically tries to protect
against the shared buffers being blown out by a sequential scan. But that will
only help if it's the shared buffers being thrashed that's hurting you, not
the entire OS file system cache.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] seq scan cache vs. index cache smackdown

2005-02-14 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> 8.0, on the other hand, has a new algorithm that specifically tries to
> protect against the shared buffers being blown out by a sequential
> scan. But that will only help if it's the shared buffers being
> thrashed that's hurting you, not the entire OS file system cache.

Something we ought to think about sometime: what are the performance
implications of the real-world situation that we have another level of
caching sitting underneath us?  AFAIK all the theoretical studies we've
looked at consider only a single level of caching.  But for example,
if our buffer management algorithm recognizes an index page as being
heavily hit and therefore keeps it in cache for a long time, then when
it does fall out of cache you can be sure it's going to need to be read
from disk when it's next used, because the OS-level buffer cache has not
seen a call for that page in a long time.  Contrariwise a page that we
think is only on the fringe of usefulness is going to stay in the OS
cache because we repeatedly drop it and then have to ask for it again.

I have no idea how to model this situation, but it seems like it needs
some careful thought.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to interpret this explain analyse?

2005-02-14 Thread Greg Stark

Kevin Brown <[EMAIL PROTECTED]> writes:

> Ouch.  Is this really a reasonable assumption?  I figured the primary
> use of a cursor was to fetch small amounts of data at a time from a
> large table, so 10% seems extremely high as an average fetch size.  Or
> is the optimization based on the number of rows that will be fetched
> by the cursor during the cursor's lifetime (as opposed to in a single
> fetch)?
> 
> Also, one has to ask what the consequences are of assuming a value too
> low versus too high.  Which ends up being worse?

This is one of the things the planner really cannot know. Ultimately it's the
kind of thing for which hints really are necessary. Oracle distinguishes
between the "minimize total time" versus "minimize startup time" with
/*+ ALL_ROWS */ and /*+ FIRST_ROWS */ hints, for example.

I would also find it reasonable to have hints to specify a selectivity for
expressions the optimizer has no hope of possibly being able to estimate.
Things like "WHERE myfunction(col1,col2,?) /*+ 10% */"


-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings