[PERFORM] expression (functional) index use in joins
I just installed v7.4 and restored a database from v7.3.4. I have an index based on a function that the planner is using on the old version, but doing seq scans on left joins in the new version. I have run analyze on the table post restore. the query returns in less than 1 second on version 7.3.4 and takes over 10 seconds on version 7.4. Any help will be appreciated. Roger Ging Query: SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON music.fn_mri_id_no_program(P.mri_id_no) = L.program_id WHERE L.station = UPPER('kabc')::VARCHAR AND L.air_date = '04/12/2002'::TIMESTAMP AND P.cutoff_date IS NULL ORDER BY L.chron_start,L.chron_end; planner results on 7.4: Sort (cost=17595.99..17608.23 rows=4894 width=12) Sort Key: l.chron_start, l.chron_end -> Merge Left Join (cost=17135.92..17296.07 rows=4894 width=12) Merge Cond: ("outer"."?column5?" = "inner"."?column3?") Filter: ("inner".cutoff_date IS NULL) -> Sort (cost=1681.69..1682.73 rows=414 width=21) Sort Key: (l.program_id)::text -> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..1663.70 rows=414 width=21) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2002-04-12 00:00:00'::timestamp without time zone)) -> Sort (cost=15454.22..15465.06 rows=4335 width=20) Sort Key: (music.fn_mri_id_no_program(p.mri_id_no))::text -> Seq Scan on program p (cost=0.00..15192.35 rows=4335 width=20) planner results on 7.3.4: Sort (cost=55765.51..55768.33 rows=1127 width=41) Sort Key: l.chron_start, l.chron_end -> Nested Loop (cost=0.00..55708.36 rows=1127 width=41) Filter: ("inner".cutoff_date IS NULL) -> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..71.34 rows=17 width=21) Index Cond: ((station = 'KABC'::character varying) AND (air_date = '2002-04-12 00:00:00'::timestamp without time zone)) -> Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3209.16 rows=870 width=20) Index Cond: (music.fn_mri_id_no_program(p.mri_id_no) = "outer".program_id) table "Program" details: Column |Type | Modifiers +-+--- record_id | integer | title | character varying(40) | mri_id_no | character varying(8)| ascap_cat | character varying(1)| ascap_mult | numeric(5,3)| ascap_prod | character varying(10) | npa_ind| character varying(3)| non_inc_in | character varying(1)| as_pr_su | character varying(1)| as_1st_run | character varying(1)| as_cue_st | character varying(1)| bmi_cat| character varying(2)| bmi_mult | numeric(6,2)| bmi_prod | character varying(7)| year | integer | prog_type | character varying(1)| total_ep | integer | last_epis | character varying(3)| syndicator | character varying(6)| station| character varying(4)| syn_loc| character varying(1)| spdb_ver | character varying(4)| as_filed | character varying(4)| bmidb_ver | character varying(4)| cutoff_date| timestamp without time zone | effective_date | timestamp without time zone | program_id | character varying(5)| Indexes: "idx_program_mri_id_no" btree (mri_id_no) "idx_program_mri_id_no_program" btree (music.fn_mri_id_no_program(mri_id_no)) "idx_program_program_id" btree (program_id) "program_mri_id_no" btree (mri_id_no) "program_oid" btree (oid) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Maximum Possible Insert Performance?
Josh Berkus wrote: William, When my current job batch is done, I'll save a copy of the dir and give the WAL on ramdrive a test. And perhaps even buy a Sandisk at the local store and run that through the hooper. We'll be interested in the results. The Sandisk won't be much of a performance test; last I checked, their access speed was about 1/2 that of a fast SCSI drive. But it could be a feasability test for the more expensive RAMdrive approach. Some initial numbers. I simulated a CPU increase by underclocking the processors. Most of the time, performance does not scale linearly with clock speed but since I also underclocked the FSB and memory bandwidth with the CPU, it's nearly an exact match. 1.15GHz6.14 1.53GHz6.97 +33% CPU = +13.5% performance I then simulated adding a heapload of extra memory by running my job a second time. Unfortunately, to keep my 25GB DB mostly cached in memory, the word heapload is too accurate. Run 1 6.97 Run 2 7.99 +14% I popped in an extra IDE hard drive to store the WAL files and that boosted the numbers by a little. From looking at iostat, the ratio looked like 300K/s WAL for 1MB/s data. WAL+Data on same disk6.97 WAL+Data separated 7.26 +4% I then tried to put the WAL directory onto a ramdisk. I turned off swapping, created a tmpfs mount point and copied the pg_xlog directory over. Everything looked fine as far as I could tell but Postgres just panic'd with a "file permissions" error. Anybody have thoughts to why tmpfs would not work? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Impossibly slow DELETEs
Is it possible another connection has updated the record and not committed, and it takes a minute for the connection to time out and commit or roll back? -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] why index scan not working when using 'like'?
Tom Lane kirjutas T, 25.11.2003 kell 23:29: > Josh Berkus <[EMAIL PROTECTED]> writes: > > In regular text fields containing words, your problem is solvable with full > > text indexing (FTI). Unfortunately, FTI is not designed for arbitrary > > non-language strings. It could be adapted, but would require a lot of > > hacking. > > I'm not sure why you say that FTI isn't a usable solution. As long as > the gene symbols are separated by whitespace or some other non-letters > (eg, "foo mif bar" not "foomifbar"), I'd think FTI would work. If he wants to search on arbitrary substring, he could change tokeniser in FTI to produce trigrams, so that "foomifbar" would be indexed as if it were text "foo oom omi mif ifb fba bar" and search for things like %mifb% should first do a FTI search for "mif" AND "ifb" and then simple LIKE %mifb% to weed out something like "mififb". There are ways to use trigrams for 1 and 2 letter matches as well. - Hannu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Maximum Possible Insert Performance?
William Yu <[EMAIL PROTECTED]> writes: > I then tried to put the WAL directory onto a ramdisk. I turned off > swapping, created a tmpfs mount point and copied the pg_xlog directory > over. Everything looked fine as far as I could tell but Postgres just > panic'd with a "file permissions" error. Anybody have thoughts to why > tmpfs would not work? I'd say you got the file or directory ownership or permissions wrong. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] expression (functional) index use in joins
On Wednesday 26 November 2003 16:38, Roger Ging wrote: > I just installed v7.4 and restored a database from v7.3.4. [snip] Hmm - you seem to be getting different row estimates in the plan. Can you re-analyse both versions and post EXPLAIN ANALYSE rather than just EXPLAIN? > -> Seq Scan on program p (cost=0.00..15192.35 > rows=4335 width=20) > > planner results on 7.3.4: > > -> Index Scan using idx_program_mri_id_no_program on program > p (cost=0.00..3209.16 rows=870 width=20) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Maximum Possible Insert Performance?
Tom Lane wrote: William Yu <[EMAIL PROTECTED]> writes: I then tried to put the WAL directory onto a ramdisk. I turned off swapping, created a tmpfs mount point and copied the pg_xlog directory over. Everything looked fine as far as I could tell but Postgres just panic'd with a "file permissions" error. Anybody have thoughts to why tmpfs would not work? I'd say you got the file or directory ownership or permissions wrong. I did a mv instead of a cp which duplicates ownership & permissions exactly. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Followup - expression (functional) index use in joins
version 7.4 results: explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON music.fn_mri_id_no_program(P.mri_id_no) = L.program_id WHERE L.station = UPPER('kabc')::VARCHAR AND L.air_date = '04/12/2002'::TIMESTAMP AND P.cutoff_date IS NULL ORDER BY L.chron_start,L.chron_end; QUERY PLAN - Sort (cost=17595.99..17608.23 rows=4894 width=12) (actual time=8083.719..8083.738 rows=30 loops=1) Sort Key: l.chron_start, l.chron_end -> Merge Left Join (cost=17135.92..17296.07 rows=4894 width=12) (actual time=7727.590..8083.349 rows=30 loops=1) Merge Cond: ("outer"."?column5?" = "inner"."?column3?") Filter: ("inner".cutoff_date IS NULL) -> Sort (cost=1681.69..1682.73 rows=414 width=21) (actual time=1.414..1.437 rows=30 loops=1) Sort Key: (l.program_id)::text -> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..1663.70 rows=414 width=21) (actual time=0.509..1.228 rows=30 loops=1) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2002-04-12 00:00:00'::timestamp without time zone)) -> Sort (cost=15454.22..15465.06 rows=4335 width=20) (actual time=7718.612..7869.874 rows=152779 loops=1) Sort Key: (music.fn_mri_id_no_program(p.mri_id_no))::text -> Seq Scan on program p (cost=0.00..15192.35 rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1) Total runtime: 8194.290 ms (13 rows) version 7.3 results: explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON music.fn_mri_id_no_program(P.mri_id_no) = L.program_id WHERE L.station = UPPER('kabc')::VARCHAR AND L.air_date = '04/12/2002'::TIMESTAMP AND P.cutoff_date IS NULL ORDER BY L.chron_start,L.chron_end; QUERY PLAN -- Sort (cost=55765.51..55768.33 rows=1127 width=41) (actual time=7.74..7.75 rows=30 loops=1) Sort Key: l.chron_start, l.chron_end -> Nested Loop (cost=0.00..55708.36 rows=1127 width=41) (actual time=0.21..7.62 rows=30 loops=1) Filter: ("inner".cutoff_date IS NULL) -> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..71.34 rows=17 width=21) (actual time=0.14..0.74 rows=30 loops=1) Index Cond: ((station = 'KABC'::character varying) AND (air_date = '2002-04-12 00:00:00'::timestamp without time zone)) -> Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3209.16 rows=870 width=20) (actual time=0.05..0.22 rows=9 loops=30) Index Cond: (music.fn_mri_id_no_program(p.mri_id_no) = "outer".program_id) Total runtime: 7.86 msec ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Maximum Possible Insert Performance?
But the permissions of the base ramdisk might be wrong. I'd su to the user that you run postgres as (probably postgres), and make sure that you can go to the directory where the log and the database files are and make sure you can see the files. On Wed, Nov 26, 2003 at 10:03:47AM -0800, William Yu wrote: > Tom Lane wrote: > >William Yu <[EMAIL PROTECTED]> writes: > > > >>I then tried to put the WAL directory onto a ramdisk. I turned off > >>swapping, created a tmpfs mount point and copied the pg_xlog directory > >>over. Everything looked fine as far as I could tell but Postgres just > >>panic'd with a "file permissions" error. Anybody have thoughts to why > >>tmpfs would not work? > > > > > >I'd say you got the file or directory ownership or permissions wrong. > > I did a mv instead of a cp which duplicates ownership & permissions exactly. > > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Followup - expression (functional) index use in joins
On Wednesday 26 November 2003 18:39, Roger Ging wrote: > version 7.4 results: > > explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN > music.program P ON > music.fn_mri_id_no_program(P.mri_id_no) = L.program_id > WHERE L.station = UPPER('kabc')::VARCHAR > AND L.air_date = '04/12/2002'::TIMESTAMP > AND P.cutoff_date IS NULL > ORDER BY L.chron_start,L.chron_end; > -> Seq Scan on program p (cost=0.00..15192.35 > rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1) The estimated number of rows here (4335) is *way* off (173998 actually). If you only had 4335 rows, then this might be a more sensible plan. First step is to run: VACUUM ANALYSE program; Then, check the definition of your function fn_mri_id_no_program() and make sure it is marked immutable/stable (depending on what it does) and that it's returning a varchar. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] For full text indexing, which is better, tsearch2 or fulltextindex
Hi all, Which one is better (performance/easier to use), tsearch2 or fulltextindex? there is an example how to use fulltextindex in the techdocs, but I checked the contrib/fulltextindex package, there is a WARNING that fulltextindex is much slower than tsearch2. but tsearch2 seems complex to use, and I can not find a good example. Which one I should use? Any suggestions? thanks and Regards, William - Original Message - From: Hannu Krosing <[EMAIL PROTECTED]> Date: Wednesday, November 26, 2003 5:33 pm Subject: Re: [PERFORM] why index scan not working when using 'like'? > Tom Lane kirjutas T, 25.11.2003 kell 23:29: > > Josh Berkus <[EMAIL PROTECTED]> writes: > > > In regular text fields containing words, your problem is > solvable with full > > > text indexing (FTI). Unfortunately, FTI is not designed for > arbitrary > > > non-language strings. It could be adapted, but would require a > lot of > > > hacking. > > > > I'm not sure why you say that FTI isn't a usable solution. As > long as > > the gene symbols are separated by whitespace or some other non- > letters> (eg, "foo mif bar" not "foomifbar"), I'd think FTI would > work. > If he wants to search on arbitrary substring, he could change > tokeniserin FTI to produce trigrams, so that "foomifbar" would be > indexed as if > it were text "foo oom omi mif ifb fba bar" and search for things like > %mifb% should first do a FTI search for "mif" AND "ifb" and then > simpleLIKE %mifb% to weed out something like "mififb". > > There are ways to use trigrams for 1 and 2 letter matches as well. > > - > Hannu > > > ---(end of broadcast)--- > > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that > your message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] very large db performance question
Hello All, We will have a very large database to store microarray data (may exceed 80-100G some day). now we have 1G RAM, 2G Hz Pentium 4, 1 CPU. and enough hard disk. I never touched such large database before. I ask several dbas if the hardware is ok, some said it is ok for the query, but I am not so convinced. Because I check the mailing list and learned that it is not unreasonable to take several minutes to do the query. But I want to query to be as fast as possible. Could anybody tell me that our hardware is an issue or not? do we really need better hardware to make real difference? Regards, William ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] cross table indexes or something?
I was wondering if there is something I can do that would act similar to a index over more than one table. I have about 3 million people in my DB at the moment, they all have roles, and many of them have more than one name. for example, a Judge will only have one name, but a Litigant could have multiple aliases. Things go far to slow when I do a query on a judge named smith. Does any one know a possible way to speed this up? I would think that In a perfect world there would be a way to create an index on commonly used joins, or something of that nature. I've tried partial indexes, but the optimizer feels that it would be quicker to do an index scan for smith% then join using the pkey of the person to get their role. For litigants, this makes since, for non-litigants, this doesn't. thanx for any insight, -jj- the basic schema actor actor_id PK role_class_code identity actor_id FK identity_id PK full_name event event_date_time event_id PK event_actor event_id FK actor_id FK explain select distinct actor.actor_id,court.id,court.name,role_class_code,full_name from actor,identity,court,event,event_actor where role_class_code = 'Judge' and full_name like 'SMITH%' and identity.actor_id = actor.actor_id and identity.court_ori = actor.court_ori and actor.court_ori = court.id and actor.actor_id = event_actor.actor_id and event_actor.event_id = event.event_id and event_date_time > '20021126' order by full_name; QUERY PLAN -- Unique (cost=726.57..726.58 rows=1 width=92) -> Sort (cost=726.57..726.57 rows=1 width=92) Sort Key: identity.full_name, actor.actor_id, court.id, court.name, actor.role_class_code -> Nested Loop (cost=3.02..726.56 rows=1 width=92) -> Nested Loop (cost=3.02..720.72 rows=1 width=144) -> Nested Loop (cost=3.02..9.62 rows=1 width=117) Join Filter: (("outer".court_ori)::text = ("inner".court_ori)::text) -> Hash Join (cost=3.02..4.18 rows=1 width=93) Hash Cond: (("outer".id)::text = ("inner".court_ori)::text) -> Seq Scan on court (cost=0.00..1.10 rows=10 width=34) -> Hash (cost=3.01..3.01 rows=1 width=59) -> Index Scan using name_speed on identity (cost=0.00..3.01 rows=1 width=59) Index Cond: (((full_name)::text >= 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character varying)) Filter: ((full_name)::text ~~ 'SMITH%'::text) -> Index Scan using actor_speed on actor (cost=0.00..5.43 rows=1 width=50) Index Cond: (("outer".actor_id)::text = (actor.actor_id)::text) Filter: ((role_class_code)::text = 'Judge'::text) -> Index Scan using event_actor_speed on event_actor (cost=0.00..695.15 rows=1275 width=73) Index Cond: ((event_actor.actor_id)::text = ("outer".actor_id)::text) -> Index Scan using event_pkey on event (cost=0.00..5.83 rows=1 width=52) Index Cond: (("outer".event_id)::text = (event.event_id)::text) Filter: (event_date_time > '20021126'::bpchar) -- "You can't make a program without broken egos." -- Jeremiah Jahn <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Followup - expression (functional) index use in joins
Ran vacuum analyse on both program and logfile tables. Estimates are more in line with reality now, but query still takes 10 seconds on v7.4 and 10 ms on v7.3. Function is marked as immutable and returns varchar(5). I am wondering why the planner would choose a merge join (v7.4) as opposed to a nested loop (v7.3) given the small number of rows in the top level table (logfile) based upon the where clause ( L.air_date = '04/12/2002'::TIMESTAMP ) there are typically only 30 rows per station/air_date. What am I missing here? Richard Huxton wrote: On Wednesday 26 November 2003 18:39, Roger Ging wrote: version 7.4 results: explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON music.fn_mri_id_no_program(P.mri_id_no) = L.program_id WHERE L.station = UPPER('kabc')::VARCHAR AND L.air_date = '04/12/2002'::TIMESTAMP AND P.cutoff_date IS NULL ORDER BY L.chron_start,L.chron_end; -> Seq Scan on program p (cost=0.00..15192.35 rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1) The estimated number of rows here (4335) is *way* off (173998 actually). If you only had 4335 rows, then this might be a more sensible plan. First step is to run: VACUUM ANALYSE program; Then, check the definition of your function fn_mri_id_no_program() and make sure it is marked immutable/stable (depending on what it does) and that it's returning a varchar.
Re: [PERFORM] very large db performance question
LIANHE SHAO <[EMAIL PROTECTED]> writes: > We will have a very large database to store microarray data (may > exceed 80-100G some day). now we have 1G RAM, 2G Hz Pentium 4, 1 > CPU. and enough hard disk. > Could anybody tell me that our hardware is an issue or not? IMHO the size of the DB is less relevant than the query workload. For example, if you're storying 100GB of data but only doing a single index scan on it every 10 seconds, any modern machine with enough HD space should be fine. If you give us an idea of the # of queries you expect per second, the approximate mix of reads and writes, and some idea of how complex the queries are, we might be able to give you some better advice. -Neil ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] cross table indexes or something?
Sybase IQ lets you build "joined indexsets". This is amazing but pricey and really intended more for Data Warehousing than OLTP, although they did release a version which permitted writes on-the-fly. (This was implemented using a multi-concurrency solution much like PostreSQL uses.) It essentially pre-joined the data. Marc A. Leith redboxdata inc. E-mail:[EMAIL PROTECTED] Quoting Jeremiah Jahn <[EMAIL PROTECTED]>: > I was wondering if there is something I can do that would act similar to > a index over more than one table. > > I have about 3 million people in my DB at the moment, they all have > roles, and many of them have more than one name. > > for example, a Judge will only have one name, but a Litigant could have > multiple aliases. Things go far to slow when I do a query on a judge > named smith. Does any one know a possible way to speed this up? > > I would think that In a perfect world there would be a way to create an > index on commonly used joins, or something of that nature. I've tried > partial indexes, but the optimizer feels that it would be quicker to do > an index scan for smith% then join using the pkey of the person to get > their role. For litigants, this makes since, for non-litigants, this > doesn't. > > thanx for any insight, > -jj- > > the basic schema > > actor > actor_id PK > role_class_code > > identity > actor_id FK > identity_id PK > full_name > > event > event_date_time > event_id PK > > event_actor > event_id FK > actor_id FK > > > explain select distinct > actor.actor_id,court.id,court.name,role_class_code,full_name from > actor,identity,court,event,event_actor where role_class_code = 'Judge' and > full_name like 'SMITH%' and identity.actor_id = actor.actor_id and > identity.court_ori = actor.court_ori and actor.court_ori = court.id and > actor.actor_id = event_actor.actor_id and event_actor.event_id = > event.event_id and event_date_time > '20021126' order by full_name; > > QUERY PLAN > -- > Unique (cost=726.57..726.58 rows=1 width=92) >-> Sort (cost=726.57..726.57 rows=1 width=92) > Sort Key: identity.full_name, actor.actor_id, court.id, court.name, > actor.role_class_code > -> Nested Loop (cost=3.02..726.56 rows=1 width=92) >-> Nested Loop (cost=3.02..720.72 rows=1 width=144) > -> Nested Loop (cost=3.02..9.62 rows=1 width=117) >Join Filter: (("outer".court_ori)::text = > ("inner".court_ori)::text) >-> Hash Join (cost=3.02..4.18 rows=1 width=93) > Hash Cond: (("outer".id)::text = > ("inner".court_ori)::text) > -> Seq Scan on court (cost=0.00..1.10 > rows=10 width=34) > -> Hash (cost=3.01..3.01 rows=1 width=59) >-> Index Scan using name_speed on > identity (cost=0.00..3.01 rows=1 width=59) > Index Cond: (((full_name)::text > >= 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character > varying)) > Filter: ((full_name)::text ~~ > 'SMITH%'::text) >-> Index Scan using actor_speed on actor > (cost=0.00..5.43 rows=1 width=50) > Index Cond: (("outer".actor_id)::text = > (actor.actor_id)::text) > Filter: ((role_class_code)::text = > 'Judge'::text) > -> Index Scan using event_actor_speed on event_actor > (cost=0.00..695.15 rows=1275 width=73) >Index Cond: ((event_actor.actor_id)::text = > ("outer".actor_id)::text) >-> Index Scan using event_pkey on event (cost=0.00..5.83 > rows=1 width=52) > Index Cond: (("outer".event_id)::text = > (event.event_id)::text) > Filter: (event_date_time > '20021126'::bpchar) > > > -- > "You can't make a program without broken egos." > -- > Jeremiah Jahn <[EMAIL PROTECTED]> > > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] cross table indexes or something?
Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14: > I was wondering if there is something I can do that would act similar to > a index over more than one table. > > I have about 3 million people in my DB at the moment, they all have > roles, and many of them have more than one name. > > for example, a Judge will only have one name, but a Litigant could have > multiple aliases. Things go far to slow when I do a query on a judge > named smith. If you dont need all the judges named smith you could try to use LIMIT. Have you run ANALYZE ? Why does DB think that there is only one judge with name like SMITH% ? - Hannu P.S. Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM] - Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] very large db performance question
Thanks for reply. Actually our database only supply some scientists to use (we predict that). so there is no workload problem. there is only very infrequent updates. the query is not complex. the problem is, we have one table that store most of the data ( with 200 million rows). In this table, there is a text column which we need to do full text search for each row. The result will then join the data from another table which has 30,000 rows. Now the query runs almost forever. I tried a small table with 2 million rows using the following simple command, it takes me about 6 seconds to get the result back. So, I get confused. That is why I ask: Is it the hardware problem or something else. (I just vacuumed the whole database yesterday). PGA=> select count (*) from expressiondata ; count - 2197497 (1 row) PGA=> explain select count (*) from expressiondata ; QUERY PLAN -- Aggregate (cost=46731.71..46731.71 rows=1 width=0) -> Seq Scan on expressiondata (cost=0.00..41237.97 rows=2197497 width=0) (2 rows) Regards, William - Original Message - From: Neil Conway <[EMAIL PROTECTED]> Date: Wednesday, November 26, 2003 10:03 pm Subject: Re: [PERFORM] very large db performance question > LIANHE SHAO <[EMAIL PROTECTED]> writes: > > We will have a very large database to store microarray data (may > > exceed 80-100G some day). now we have 1G RAM, 2G Hz Pentium 4, 1 > > CPU. and enough hard disk. > > > Could anybody tell me that our hardware is an issue or not? > > IMHO the size of the DB is less relevant than the query workload. For > example, if you're storying 100GB of data but only doing a single > index scan on it every 10 seconds, any modern machine with enough HD > space should be fine. > > If you give us an idea of the # of queries you expect per second, the > approximate mix of reads and writes, and some idea of how complex the > queries are, we might be able to give you some better advice. > > -Neil > > > ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Followup - expression (functional) index use in joins
Roger Ging <[EMAIL PROTECTED]> writes: > Ran vacuum analyse on both program and logfile tables. Estimates are > more in line with reality now, And they are what now? You really can't expect to get useful help here when you're being so miserly with the details ... FWIW, I suspect you could force 7.4 to generate 7.3's plan by setting enable_mergejoin to off (might have to also set enable_hashjoin to off, if it then tries for a hash join). 7.3 could not even consider those join types in this example, while 7.4 can. The interesting question from my perspective is why the planner is guessing wrong about the relative costs of the plans. EXPLAIN ANALYZE results with each type of join forced would be useful to look at. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] For full text indexing, which is better, tsearch2 or
Which one is better (performance/easier to use), tsearch2 or fulltextindex? there is an example how to use fulltextindex in the techdocs, but I checked the contrib/fulltextindex package, there is a WARNING that fulltextindex is much slower than tsearch2. but tsearch2 seems complex to use, and I can not find a good example. Which one I should use? Any suggestions? I believe I wrote that warning :) Tsearch2 is what you should use. Yes, it's more complicated but it's HEAPS faster and seriously powerful. Just read the README file. You could also try out the original tsearch (V1), but that will probably be superceded soon, now that tsearch2 is around. Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] very large db performance question
Thanks for reply. Actually our database only supply some scientists to use (we predict that). so there is no workload problem. there is only very infrequent updates. the query is not complex. the problem is, we have one table that store most of the data ( with 200 million rows). In this table, there is a text column which we need to do full text search for each row. The result will then join the data from another table which has 30,000 rows. Now the query runs almost forever. Use TSearch2. I tried a small table with 2 million rows using the following simple command, it takes me about 6 seconds to get the result back. So, I get confused. That is why I ask: Is it the hardware problem or something else. (I just vacuumed the whole database yesterday). PGA=> select count (*) from expressiondata ; count - 2197497 (1 row) select count(*) on a postgres table ALWAYS does a sequential scan. Just don't do it. There are technical reasons (MVCC) why this is so. It's a bad "test". Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] For full text indexing, which is better, tsearch2 or
On Thu, Nov 27, 2003 at 08:51:14AM +0800, Christopher Kings-Lynne wrote: > >Which one is better (performance/easier to use), > >tsearch2 or fulltextindex? > >there is an example how to use fulltextindex in the > >techdocs, but I checked the contrib/fulltextindex > >package, there is a WARNING that fulltextindex is > >much slower than tsearch2. but tsearch2 seems > >complex to use, and I can not find a good example. > >Which one I should use? Any suggestions? > > I believe I wrote that warning :) > > Tsearch2 is what you should use. Yes, it's more complicated but it's > HEAPS faster and seriously powerful. > Can you provide some numbers please, both for creating full text indexes as well as for searching them? I tried to use tsearch and it seemed like just creating a full text index on million+ records took forever. > Just read the README file. > > You could also try out the original tsearch (V1), but that will probably > be superceded soon, now that tsearch2 is around. > > Chris > > > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] For full text indexing, which is better, tsearch2 or
On Thu, Nov 27, 2003 at 08:51:14AM +0800, Christopher Kings-Lynne wrote: > >Which one is better (performance/easier to use), > >tsearch2 or fulltextindex? > >there is an example how to use fulltextindex in the > >techdocs, but I checked the contrib/fulltextindex > >package, there is a WARNING that fulltextindex is > >much slower than tsearch2. but tsearch2 seems > >complex to use, and I can not find a good example. > >Which one I should use? Any suggestions? > > I believe I wrote that warning :) > > Tsearch2 is what you should use. Yes, it's more complicated but it's > HEAPS faster and seriously powerful. Does anyone have any metrics on how fast tsearch2 actually is? I tried it on a synthetic dataset of a million documents of a hundred words each and while insertions were impressively fast I gave up on the search after 10 minutes. Broken? Unusable slow? This was on the last 7.4 release candidate. Cheers, Steve ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] For full text indexing, which is better, tsearch2 or
Does anyone have any metrics on how fast tsearch2 actually is? I tried it on a synthetic dataset of a million documents of a hundred words each and while insertions were impressively fast I gave up on the search after 10 minutes. Broken? Unusable slow? This was on the last 7.4 release candidate. I just created a 1.1million row dataset by copying one of our 3 row production tables and just taking out the txtidx column. Then I inserted it into itself until it had 1.1 million rows. Then I created the GiST index - THAT took forever - seriously like 20 mins or half an hour or something. Now, to find a word: select * from tsearchtest where ftiidx ## 'curry'; Time: 9760.75 ms The AND of two words: Time: 103.61 ms The AND of three words: select * from tsearchtest where ftiidx ## 'curry&green&thai'; Time: 61.86 ms And now a one word query now that buffers are cached: select * from tsearchtest where ftiidx ## 'curry'; Time: 444.89 ms So, I have no idea why you think it's slow? Perhaps you forgot the 'create index using gist' step? Also, if you use the NOT (!) operand, you can get yourself into a really slow situation. Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] For full text indexing, which is better, tsearch2 or
On Thu, Nov 27, 2003 at 12:41:59PM +0800, Christopher Kings-Lynne wrote: > >Does anyone have any metrics on how fast tsearch2 actually is? > > > >I tried it on a synthetic dataset of a million documents of a hundred > >words each and while insertions were impressively fast I gave up on > >the search after 10 minutes. > > > >Broken? Unusable slow? This was on the last 7.4 release candidate. > > I just created a 1.1million row dataset by copying one of our 3 row > production tables and just taking out the txtidx column. Then I > inserted it into itself until it had 1.1 million rows. > > Then I created the GiST index - THAT took forever - seriously like 20 > mins or half an hour or something. > > Now, to find a word: > > select * from tsearchtest where ftiidx ## 'curry'; > Time: 9760.75 ms > So, I have no idea why you think it's slow? Perhaps you forgot the > 'create index using gist' step? No, it was indexed. Thanks, that was the datapoint I was looking for. It _can_ run fast, so I just need to work out what's going on. (It's hard to diagnose a slow query when you've no idea whether it's really 'slow'). Cheers, Steve ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html