[SQL] Performance Problem with sub-select using array

2006-08-24 Thread Travis Whitton
Hello all, I'm running the following query on about 6,000 records worth
of data, and it takes about 8 seconds to complete. Can anyone provide
any suggestions to improve performance? I have an index on
two columns in the transacts table (program_id, customer_id). If I specify a number
for customer.id
in the sub-select, query time is reduced to about 2 seconds, which
still seems like a million years for only 6,000 records, but I'm guessing that the sub-select can't resolve the id since it's done before the outer query, so it scans the entre recordset for every row? Transacts is a
many to many table for customers and programs. I know this query
doesn't even reference any columns from programs; however, I
dynamically insert where clauses to constrain the result set.
SELECT distinct customers.id,
first_name, last_name, address1, contact_city, contact_state,
primary_phone, email, array(select programs.program_name from
transacts, programs where customer_id = customers.id and 
programs.id
= transacts.program_id and submit_status = 'success') AS partners from
customers, transacts, programs where transacts.customer_id = customers.id and transacts.program_id = 
programs.id


[SQL] Wildcard LIKE and Sub-select

2006-11-10 Thread Travis Whitton
Hi everybody,I have two tables of the following structure:Table "keywords"column   | type-id   | integerkeyword | varchar(255)andTable "badwords"
column   | type--badword  | varchar(255)I need to delete all the rows from the keywords table where badword partially matches the keyword field. I know I can do an exact match with a sub-select, but I'm not sure how to structure a wildcard match / like clause with a sub-select. Also, is that the best way to do it, or should I be looking into full-text? I have roughly 10 million keywords and 1 million badwords.
Thanks,Travis


Re: [SQL] Wildcard LIKE and Sub-select

2006-11-10 Thread Travis Whitton
I took off the USING clause like so, and it worked like a charm!DELETE FROM keywordsWHERE keyword ILIKE ANY (SELECT '%' || badword || '%'                                                   FROM badwords)

Thanks so much,TravisOn 11/10/06, Erik Jones <
[EMAIL PROTECTED]> wrote:
Travis Whitton wrote:> Hi everybody,> I have two tables of the following structure:
>> Table "keywords">> column   | type> -> id   | integer> keyword | varchar(255)>> and>> Table "badwords"
>> column   | type> --> badword  | varchar(255)>>> I need to delete all the rows from the keywords table where badword> partially matches the keyword field. I know I can do an exact match
> with a sub-select, but I'm not sure how to structure a wildcard match> / like clause with a sub-select. Also, is that the best way to do it,> or should I be looking into full-text? I have roughly 10 million
> keywords and 1 million badwords.>> Thanks,> Travis>Hmm...  Maybe (this is untested):DELETE FROM keywordsUSING badwordsWHERE keyword ILIKE ANY (SELECT '%' || badword || '%'
   FROM badwords)--erik jones <[EMAIL PROTECTED]
>software developmentemma(r)



[SQL] Regular Expressions, LIKE, and indexes

2006-11-22 Thread Travis Whitton

From everything I've been able to find, it seems that the only way to get

front-anchored regular expressions or LIKE patterns beginning with constants
to use an index is to have previously initialized your database using the C
locale. Is this still true? I'm trying to do something like:

SELECT keyword FROM keywords WHERE keyword ~ '^foo';

or

SELECT keyword FROM keywords WHERE keyword like 'foo%';

Are there any other functions that can provide equivalent results while
using an index, or am going to have drop UTF-8 and recreate my database?

Thanks,
Travis


[SQL] pg_xlog on separate drive

2006-11-30 Thread Travis Whitton

Hey guys, sorry if this is slightly OT for this list, but I figure it's a
simple question. If I'm storing pg_xlog on a second non-redundant drive
using the symlink method and the journal drive were to crash, how difficult
is recovery? Will Postgresql simply be able to reinitialize the journal on a
new drive and carry on, or is there more to it than that? I realize any
pending transactions would be lost, but that's not a huge concern for me
because everything I'm importing comes from raw data.

Thanks,
Travis


Re: [SQL] pg_xlog on separate drive

2006-12-04 Thread Travis Whitton

Thanks for the replies guys. I think I may be ok in my case though because
I'll be importing data as a single daily batch from raw data. I'll be taking
nightly backups, and in the event of a crash, I can simply restore from a
recent backup and then reimport the raw data. I can now see why losing
pg_xlog would be a big problem if I were inserting and updating data
continuously throughout the day though.

Thanks,
Travis

On 12/4/06, Markus Schaber <[EMAIL PROTECTED]> wrote:


Hi, Travis,

Travis Whitton wrote:
> Hey guys, sorry if this is slightly OT for this list, but I figure it's
> a simple question. If I'm storing pg_xlog on a second non-redundant
> drive using the symlink method and the journal drive were to crash, how
> difficult is recovery? Will Postgresql simply be able to reinitialize
> the journal on a new drive and carry on, or is there more to it than
> that? I realize any pending transactions would be lost, but that's not a
> huge concern for me because everything I'm importing comes from raw
data.

The problem is that you risk inconsistency at data and structural level.

When the server crashes, it might happen that some pages in the data
files are written only partially (because most disks have a much smaller
blocksize than the PostgreSQL page size (which is 8k by default)).

Now, when the server cannot reply the WAL log, those half-written pages
will not be repaired, and your data may be inconsistent at a very low
sematic level (duplicate rows, missing rows, broken rows, backend
crashes etc.) with no way to repair.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org



Re: [SQL] How to reduce a database

2007-01-05 Thread Travis Whitton

I don't remember where I read it, but I saw something online a while back
comparing vacuum stragies vs dumping with pg_dump and then reloading. The
pg_dump and restore ended up compacting the database significantly more. I
don't know if that still applies with 8.2, but it might be worth a try. I
can find the article if you're interested.

Travis

On 12/29/06, Mario Behring <[EMAIL PROTECTED]> wrote:


Hi list,

I have an openNMS server that uses a Postgres database. For those who are
not familiar, openNMS is an open source network management product.

Anyway, the openNMS database is very large now, more than 25GB
(considering all tables) and I am starting to have disk space issues. The
openNMS product has a vacuumdb procedure that runs every 24 hours and reads
a vacuumd-configuration.xml file for parameters on what to do.

The problem is that this process is not reducing the database size. What I
need to do is to delete some records based on timestamp fileds or something
like that. I don't know how to do it though.

Can you guys help me with some command line examples?

There is this table, called EVENTS, that have the following structure:

 eventid | integer | not null
 eventuei| character varying(256)  | not null
 nodeid  | integer |
 eventtime   | timestamp without time zone | not null
 eventhost   | character varying(256)  |
 eventsource | character varying(128)  | not null
 ipaddr  | character varying(16)   |
 eventdpname | character varying(12)   | not null
 eventsnmphost   | character varying(256)  |
 serviceid   | integer |
 eventsnmp   | character varying(256)  |
 eventparms  | text|
 eventcreatetime | timestamp without time zone | not null
 eventdescr  | character varying(4000) |
 eventloggroup   | character varying(32)   |
 eventlogmsg | character varying(256)  |
 eventseverity   | integer | not null
 eventpathoutage | character varying(1024) |
 eventcorrelation| character varying(1024) |
 eventsuppressedcount| integer |
 eventoperinstruct   | character varying(1024) |
 eventautoaction | character varying(256)  |
 eventoperaction | character varying(256)  |
 eventoperactionmenutext | character varying(64)   |
 eventnotification   | character varying(128)  |
 eventtticket| character varying(128)  |
 eventtticketstate   | integer |
 eventforward| character varying(256)  |
 eventmouseovertext  | character varying(64)   |
 eventlog| character(1)| not null
 eventdisplay| character(1)| not null
 eventackuser| character varying(256)  |
 eventacktime| timestamp without time zone |

I was thinking about using a DELETE FROM EVENTS WHERE  eventtime = ..but I am kind of worried on what this could cause on
other tables, if there is some relations between them or something.

Here is the vacuumd-configuration.xml file:


 DELETE FROM node   WHERE
node.nodeType = 'D'; 
 DELETE FROM
ipInterface WHERE ipInterface.isManaged = 'D'; 
 DELETE FROM if  Services
WHERE ifServices.status = 'D'; 
 DELETE FROM even  ts WHERE
NOT EXISTS (SELECT svclosteventid FROM outages WHERE svclosteventid =
events.eventid UNION SELECT svcregainedeventid FROM out  ages WHERE
svcregainedeventid = events.eventid UNION SELECT eventid FROM
notifications WHERE eventid = events.eventid) AND eventtime &  lt;
now() - interval '6 weeks'; 


Any help is appreciated.

Thank you.

Mario

__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com



[SQL] Grabbing Newest Records From Duplicates

2007-03-16 Thread Travis Whitton

Given the following test table, I want to grab only the newest record and
disregard any older duplicates based on name. Is this the most efficient way
to do it? Will the indicies even make a difference? The table below
demonstrates a simple proof of concept. My final table will have millions of
records; however, the test I'm doing now does not seem to use the indicies
(probably because there are too few records?).

optin=# \d test
Table "public.test"
Column |  Type   | Modifiers
+-+---
name   | text|
time   | date|
id | integer |
Indexes:
   "idx_name" btree (name)
   "idx_time" btree ("time")

optin=# explain select * from test t1 where not exists (select 1 from test
t2 where t2.name = t1.name and t2.time > t1.time);

Thanks,
Travis


Re: [SQL] Grabbing Newest Records From Duplicates

2007-03-16 Thread Travis Whitton

One last question, using the "weather report" example, is it going to
improve performance for the DISTINCT ON query to add an index to the
location and time columns?

Thanks a lot,
Travis

On 3/16/07, Tom Lane <[EMAIL PROTECTED]> wrote:


"Travis Whitton" <[EMAIL PROTECTED]> writes:
> Given the following test table, I want to grab only the newest record
and
> disregard any older duplicates based on name. Is this the most efficient
way
> to do it?

No, it's gonna be pretty awful.  The best way I know of involves
DISTINCT ON (see the "weather reports" example in the SELECT reference
page).  Unfortunately that's a Postgres-only construct.  If you want
something portable then you'll need something messy with subqueries...

regards, tom lane