[SQL] Performance Problem with sub-select using array
Hello all, I'm running the following query on about 6,000 records worth of data, and it takes about 8 seconds to complete. Can anyone provide any suggestions to improve performance? I have an index on two columns in the transacts table (program_id, customer_id). If I specify a number for customer.id in the sub-select, query time is reduced to about 2 seconds, which still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a many to many table for customers and programs. I know this query doesn't even reference any columns from programs; however, I dynamically insert where clauses to constrain the result set. SELECT distinct customers.id, first_name, last_name, address1, contact_city, contact_state, primary_phone, email, array(select programs.program_name from transacts, programs where customer_id = customers.id and programs.id = transacts.program_id and submit_status = 'success') AS partners from customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = programs.id
[SQL] Wildcard LIKE and Sub-select
Hi everybody,I have two tables of the following structure:Table "keywords"column | type-id | integerkeyword | varchar(255)andTable "badwords" column | type--badword | varchar(255)I need to delete all the rows from the keywords table where badword partially matches the keyword field. I know I can do an exact match with a sub-select, but I'm not sure how to structure a wildcard match / like clause with a sub-select. Also, is that the best way to do it, or should I be looking into full-text? I have roughly 10 million keywords and 1 million badwords. Thanks,Travis
Re: [SQL] Wildcard LIKE and Sub-select
I took off the USING clause like so, and it worked like a charm!DELETE FROM keywordsWHERE keyword ILIKE ANY (SELECT '%' || badword || '%' FROM badwords) Thanks so much,TravisOn 11/10/06, Erik Jones < [EMAIL PROTECTED]> wrote: Travis Whitton wrote:> Hi everybody,> I have two tables of the following structure: >> Table "keywords">> column | type> -> id | integer> keyword | varchar(255)>> and>> Table "badwords" >> column | type> --> badword | varchar(255)>>> I need to delete all the rows from the keywords table where badword> partially matches the keyword field. I know I can do an exact match > with a sub-select, but I'm not sure how to structure a wildcard match> / like clause with a sub-select. Also, is that the best way to do it,> or should I be looking into full-text? I have roughly 10 million > keywords and 1 million badwords.>> Thanks,> Travis>Hmm... Maybe (this is untested):DELETE FROM keywordsUSING badwordsWHERE keyword ILIKE ANY (SELECT '%' || badword || '%' FROM badwords)--erik jones <[EMAIL PROTECTED] >software developmentemma(r)
[SQL] Regular Expressions, LIKE, and indexes
From everything I've been able to find, it seems that the only way to get front-anchored regular expressions or LIKE patterns beginning with constants to use an index is to have previously initialized your database using the C locale. Is this still true? I'm trying to do something like: SELECT keyword FROM keywords WHERE keyword ~ '^foo'; or SELECT keyword FROM keywords WHERE keyword like 'foo%'; Are there any other functions that can provide equivalent results while using an index, or am going to have drop UTF-8 and recreate my database? Thanks, Travis
[SQL] pg_xlog on separate drive
Hey guys, sorry if this is slightly OT for this list, but I figure it's a simple question. If I'm storing pg_xlog on a second non-redundant drive using the symlink method and the journal drive were to crash, how difficult is recovery? Will Postgresql simply be able to reinitialize the journal on a new drive and carry on, or is there more to it than that? I realize any pending transactions would be lost, but that's not a huge concern for me because everything I'm importing comes from raw data. Thanks, Travis
Re: [SQL] pg_xlog on separate drive
Thanks for the replies guys. I think I may be ok in my case though because I'll be importing data as a single daily batch from raw data. I'll be taking nightly backups, and in the event of a crash, I can simply restore from a recent backup and then reimport the raw data. I can now see why losing pg_xlog would be a big problem if I were inserting and updating data continuously throughout the day though. Thanks, Travis On 12/4/06, Markus Schaber <[EMAIL PROTECTED]> wrote: Hi, Travis, Travis Whitton wrote: > Hey guys, sorry if this is slightly OT for this list, but I figure it's > a simple question. If I'm storing pg_xlog on a second non-redundant > drive using the symlink method and the journal drive were to crash, how > difficult is recovery? Will Postgresql simply be able to reinitialize > the journal on a new drive and carry on, or is there more to it than > that? I realize any pending transactions would be lost, but that's not a > huge concern for me because everything I'm importing comes from raw data. The problem is that you risk inconsistency at data and structural level. When the server crashes, it might happen that some pages in the data files are written only partially (because most disks have a much smaller blocksize than the PostgreSQL page size (which is 8k by default)). Now, when the server cannot reply the WAL log, those half-written pages will not be repaired, and your data may be inconsistent at a very low sematic level (duplicate rows, missing rows, broken rows, backend crashes etc.) with no way to repair. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Re: [SQL] How to reduce a database
I don't remember where I read it, but I saw something online a while back comparing vacuum stragies vs dumping with pg_dump and then reloading. The pg_dump and restore ended up compacting the database significantly more. I don't know if that still applies with 8.2, but it might be worth a try. I can find the article if you're interested. Travis On 12/29/06, Mario Behring <[EMAIL PROTECTED]> wrote: Hi list, I have an openNMS server that uses a Postgres database. For those who are not familiar, openNMS is an open source network management product. Anyway, the openNMS database is very large now, more than 25GB (considering all tables) and I am starting to have disk space issues. The openNMS product has a vacuumdb procedure that runs every 24 hours and reads a vacuumd-configuration.xml file for parameters on what to do. The problem is that this process is not reducing the database size. What I need to do is to delete some records based on timestamp fileds or something like that. I don't know how to do it though. Can you guys help me with some command line examples? There is this table, called EVENTS, that have the following structure: eventid | integer | not null eventuei| character varying(256) | not null nodeid | integer | eventtime | timestamp without time zone | not null eventhost | character varying(256) | eventsource | character varying(128) | not null ipaddr | character varying(16) | eventdpname | character varying(12) | not null eventsnmphost | character varying(256) | serviceid | integer | eventsnmp | character varying(256) | eventparms | text| eventcreatetime | timestamp without time zone | not null eventdescr | character varying(4000) | eventloggroup | character varying(32) | eventlogmsg | character varying(256) | eventseverity | integer | not null eventpathoutage | character varying(1024) | eventcorrelation| character varying(1024) | eventsuppressedcount| integer | eventoperinstruct | character varying(1024) | eventautoaction | character varying(256) | eventoperaction | character varying(256) | eventoperactionmenutext | character varying(64) | eventnotification | character varying(128) | eventtticket| character varying(128) | eventtticketstate | integer | eventforward| character varying(256) | eventmouseovertext | character varying(64) | eventlog| character(1)| not null eventdisplay| character(1)| not null eventackuser| character varying(256) | eventacktime| timestamp without time zone | I was thinking about using a DELETE FROM EVENTS WHERE eventtime = ..but I am kind of worried on what this could cause on other tables, if there is some relations between them or something. Here is the vacuumd-configuration.xml file: DELETE FROM node WHERE node.nodeType = 'D'; DELETE FROM ipInterface WHERE ipInterface.isManaged = 'D'; DELETE FROM if Services WHERE ifServices.status = 'D'; DELETE FROM even ts WHERE NOT EXISTS (SELECT svclosteventid FROM outages WHERE svclosteventid = events.eventid UNION SELECT svcregainedeventid FROM out ages WHERE svcregainedeventid = events.eventid UNION SELECT eventid FROM notifications WHERE eventid = events.eventid) AND eventtime & lt; now() - interval '6 weeks'; Any help is appreciated. Thank you. Mario __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
[SQL] Grabbing Newest Records From Duplicates
Given the following test table, I want to grab only the newest record and
disregard any older duplicates based on name. Is this the most efficient way
to do it? Will the indicies even make a difference? The table below
demonstrates a simple proof of concept. My final table will have millions of
records; however, the test I'm doing now does not seem to use the indicies
(probably because there are too few records?).
optin=# \d test
Table "public.test"
Column | Type | Modifiers
+-+---
name | text|
time | date|
id | integer |
Indexes:
"idx_name" btree (name)
"idx_time" btree ("time")
optin=# explain select * from test t1 where not exists (select 1 from test
t2 where t2.name = t1.name and t2.time > t1.time);
Thanks,
Travis
Re: [SQL] Grabbing Newest Records From Duplicates
One last question, using the "weather report" example, is it going to improve performance for the DISTINCT ON query to add an index to the location and time columns? Thanks a lot, Travis On 3/16/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Travis Whitton" <[EMAIL PROTECTED]> writes: > Given the following test table, I want to grab only the newest record and > disregard any older duplicates based on name. Is this the most efficient way > to do it? No, it's gonna be pretty awful. The best way I know of involves DISTINCT ON (see the "weather reports" example in the SELECT reference page). Unfortunately that's a Postgres-only construct. If you want something portable then you'll need something messy with subqueries... regards, tom lane
