[PERFORM] sequential scan performance

2005-05-29 Thread Michael Engelhart

Hi -

I have a table of about 3 million rows of city "aliases" that I need  
to query using LIKE - for example:


select * from city_alias where city_name like '%FRANCISCO'


When I do an EXPLAIN ANALYZE on the above query, the result is:

 Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42)  
(actual time=73.369..3330.281 rows=407 loops=1)

   Filter: ((name)::text ~~ '%FRANCISCO'::text)
Total runtime: 3330.524 ms
(3 rows)


this is a query that our system needs to do a LOT.   Is there any way  
to improve the performance on this either with changes to our query  
or by configuring the database deployment?   We have an index on  
city_name but when using the % operator on the front of the query  
string postgresql can't use the index .


Thanks for any help.

Mike

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] sequential scan performance

2005-05-29 Thread Steinar H. Gunderson
On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote:
> this is a query that our system needs to do a LOT.   Is there any way  
> to improve the performance on this either with changes to our query  
> or by configuring the database deployment?   We have an index on  
> city_name but when using the % operator on the front of the query  
> string postgresql can't use the index .

Try tsearch2 from contrib, it might help you.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] sequential scan performance

2005-05-29 Thread Christopher Kings-Lynne

When I do an EXPLAIN ANALYZE on the above query, the result is:

 Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42)  (actual 
time=73.369..3330.281 rows=407 loops=1)

   Filter: ((name)::text ~~ '%FRANCISCO'::text)
Total runtime: 3330.524 ms
(3 rows)


this is a query that our system needs to do a LOT.   Is there any way  
to improve the performance on this either with changes to our query  or 
by configuring the database deployment?   We have an index on  city_name 
but when using the % operator on the front of the query  string 
postgresql can't use the index .


Of course not.  There really is now way to make your literal query above 
fast.  You could try making a functional index on the reverse() of the 
string and querying for the reverse() of 'francisco'.


Or, if you want a general full text index, you should absolutely be 
using contrib/tsearch2.


Chris

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Select performance vs. mssql

2005-05-29 Thread Jim C. Nasby
On Wed, May 25, 2005 at 09:29:36AM +0800, Christopher Kings-Lynne wrote:
> >--MSSQL's ability to hit the index only and not having
> >to go to the table itself results in a _big_
> >performance/efficiency gain. If someone who's in
> >development wants to pass this along, it would be a
> >nice addition to PostgreSQL sometime in the future.
> >I'd suspect that as well as making one query faster,
> >it would make everything else faster/more scalable as
> >the server load is so much less.
> 
> This is well-known and many databases do it.  However, due to MVCC 
> considerations in PostgreSQL, it's not feasible for us to implement it...

Wasn't there a plan to store some visibility info in indexes? IIRC the
idea was that a bit would be set in the index tuple indicating that all
transactions that wouldn't be able to see that index value were
complete, meaning that there was no reason to hit the heap for that
tuple.

I looked on the TODO but didn't see this, maybe it fell through the
cracks?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] sequential scan performance

2005-05-29 Thread Oleg Bartunov

Michael,

I'd recommend our contrib/pg_trgm module, which provides
trigram based fuzzy search and return results ordered by similarity
to your query.  Read 
http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
for more details.

Oleg
On Sun, 29 May 2005, Michael Engelhart wrote:


Hi -

I have a table of about 3 million rows of city "aliases" that I need to query 
using LIKE - for example:


select * from city_alias where city_name like '%FRANCISCO'


When I do an EXPLAIN ANALYZE on the above query, the result is:

Seq Scan on city_alias  (cost=0.00..59282.31 rows=2 width=42) (actual 
time=73.369..3330.281 rows=407 loops=1)

  Filter: ((name)::text ~~ '%FRANCISCO'::text)
Total runtime: 3330.524 ms
(3 rows)


this is a query that our system needs to do a LOT.   Is there any way to 
improve the performance on this either with changes to our query or by 
configuring the database deployment?   We have an index on city_name but when 
using the % operator on the front of the query string postgresql can't use 
the index .


Thanks for any help.

Mike

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
  (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] OID vs overall system performances on high load

2005-05-29 Thread Eric Lauzon
I am still in the dark due to my lack of knowledge on internal OID 
management,but
i would presume that a table with OID enable and that has high load would 
require
some more work from pgsql internal to maintain the OID index for the database.
 
So OID can be beneficial on static tables, or tables that you want to be able 
to manipulate
with pgadmin X , but can a table without OID increase performances on 
insert,delete,update,COPY?
 
I am not really worried about disk space that an OID collumn can take, but i 
was wandering if an 
insert in a table of 20 millions and more that has oid would slow the insert 
process. Since OID seem
to act as a global index mabey maintaning that index can become costy over high 
table load by postgresql
backend.
 
-Eric Lauzon
 
 
 
 
 
 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] OID vs overall system performances on high load

2005-05-29 Thread Tom Lane
"Eric Lauzon" <[EMAIL PROTECTED]> writes:
> I am still in the dark due to my lack of knowledge on internal OID 
> management,but
> i would presume that a table with OID enable and that has high load would 
> require
> some more work from pgsql internal to maintain the OID index for the database.

There is no "OID index"; at least not unless you choose to create one
for a given table.  The only thing particularly special about OID is
that there is an internal database-wide sequence generator for assigning
new values.  Otherwise it works a whole lot like a serial column.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] OID vs overall system performances on high load

2005-05-29 Thread Andrew McMillan
On Sun, 2005-05-29 at 16:17 -0400, Eric Lauzon wrote:
> I am still in the dark due to my lack of knowledge on internal OID 
> management,but
> i would presume that a table with OID enable and that has high load would 
> require
> some more work from pgsql internal to maintain the OID index for the database.
>  
> So OID can be beneficial on static tables, or tables that you want to be able 
> to manipulate
> with pgadmin X , but can a table without OID increase performances on 
> insert,delete,update,COPY?
>  
> I am not really worried about disk space that an OID collumn can take, but i 
> was wandering if an 
> insert in a table of 20 millions and more that has oid would slow the insert 
> process. Since OID seem
> to act as a global index mabey maintaning that index can become costy over 
> high table load by postgresql
> backend.

There is no OID index, unless you create one.

The disk space that an OID column can take has an effect on performance:
reducing the amount of physical disk reads will mean that more of your
real data is cached, and so forth.  How much effect it will have will
depend on the relative size of the OID column and the other columns in
your data.

Regards,
Andrew McMillan.

-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
   http://survey.net.nz/ - any more questions?
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] OID vs overall system performances on high load

2005-05-29 Thread Neil Conway
On Sun, 2005-05-29 at 16:17 -0400, Eric Lauzon wrote:
> So OID can be beneficial on static tables

OIDs aren't beneficial on "static tables"; unless you have unusual
requirements[1], there is no benefit to having OIDs on user-created
tables (see the default_with_oids GUC var, which will default to "false"
in 8.1)

-Neil

[1] Such as a column that references a column in the system catalogs.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] slow queries, possibly disk io

2005-05-29 Thread Rudi Starcevic

Hi,

I had some disk io issues recently with NFS, I found the command 'iostat 
-x 5' to be a great help when using Linux.


For example here is the output when I do a 10GB file transfer onto hdc
Device:rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
hdc  0.00 875.95  0.00 29.660.00 7244.89 0.00  3622.44   
244.27 3.07  103.52   1.78   5.27


The last field show the disk is 5.27% busy.

I have seen this value at 100%, adding more server brought it under 100%.
It seems that if you hit 100% problems sort of cascade all over that 
place. For example Apache connections went right up and hit their max.


I am not sure how accurate the % is but it has work pretty well for me.

Perhaps use this command in another window with you run your SQL and see 
what it shows.


HTH.
Kind regards,
Rudi.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings