[PERFORM] Postgres ignoring index when using left outer join.

2007-11-20 Thread Matthew Schumacher
Anyone know what is up with this? I have two queries here which return the same results, one uses a left outer join to get some data from a table which may not match a constraint, and one that uses a union to get the data from each constraint and put them together. The second one isn't nearly as

Re: [PERFORM] SAN vs Internal Disks

2007-09-07 Thread Matthew Schumacher
I'm getting a san together to consolidate my disk space usage for my servers. It's iscsi based and I'll be pxe booting my servers from it. The idea is to keep spares on hand for one system (the san) and not have to worry about spares for each specific storage system on each server. This also makes

Re: [PERFORM] PostgreSQL to host e-mail?

2007-01-04 Thread Matthew Schumacher
Frank Wiles wrote: > On Thu, 4 Jan 2007 15:00:05 -0300 > "Charles A. Landemaine" <[EMAIL PROTECTED]> wrote: > >> I'm building an e-mail service that has two requirements: It should >> index messages on the fly to have lightening search results, and it >> should be able to handle large amounts of s

[PERFORM] Disk storage and san questions (was File Systems Compared)

2006-12-06 Thread Matthew Schumacher
Joshua D. Drake wrote: > I agree. I have many people that want to purchase a SAN because someone > told them that is what they need... Yet they can spend 20% of the cost > on two external arrays and get incredible performance... > > We are seeing great numbers from the following config: > > (2) HP

Re: [PERFORM] Problems with inconsistant query performance.

2006-09-28 Thread Matthew Schumacher
Marcin Mank wrote: >> So the question is why on a relatively simple proc and I getting a query >> performance delta between 3549ms and 7ms? > > What version of PG is it? > > I had such problems in a pseudo-realtime app I use here with Postgres, and > they went away when I moved to 8.1 (from 7.4).

Re: [PERFORM] Problems with inconsistant query performance.

2006-09-27 Thread Matthew Schumacher
Jim C. Nasby wrote: > > It can cause a race if another process could be performing those same > inserts or updates at the same time. There are inserts and updates running all of the time, but never the same data. I'm not sure how I can get around this since the queries are coming from my radius

Re: [PERFORM] Problems with inconsistant query performance.

2006-09-27 Thread Matthew Schumacher
Jim, Thanks for the help. I went and looked at that example and I don't see how it's different than the "INSERT into radutmp_tab" I'm already doing. Both raise an exception, the only difference is that I'm not doing anything with it. Perhaps you are talking about the "IF (NOT FOUND)" I put afte

[PERFORM] Problems with inconsistant query performance.

2006-09-27 Thread Matthew Schumacher
List, I posted a little about this a while back to the general list, but never really got any where with it so I'll try again, this time with a little more detail and hopefully someone can send me in the right direction. Here is the problem, I have a procedure that is called 100k times a day. Mo

Re: [PERFORM] SAN/NAS options

2005-12-19 Thread Matthew Schumacher
Jim C. Nasby wrote: > On Wed, Dec 14, 2005 at 01:56:10AM -0500, Charles Sprickman wrote: > You'll note that I'm being somewhat driven by my OS of choice, FreeBSD. > >>Unlike Solaris or other commercial offerings, there is no nice volume >>management available. While I'd love to keep managing a

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-04 Thread Matthew Schumacher
John A Meinel wrote: > Surely this isn't what you have. You have *no* loop here, and you have > stuff like: > AND > (bayes_token_tmp) NOT IN (SELECT token FROM bayes_token); > > I'm guessing this isn't your last version of the function. > > As far as putting the CREATE TEMP TABLE inside th

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-04 Thread Matthew Schumacher
Matthew Schumacher wrote: > Tom Lane wrote: > > >>I don't really see why you think that this path is going to lead to >>better performance than where you were before. Manipulation of the >>temp table is never going to be free, and IN (sub-select) is always >&g

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-04 Thread Matthew Schumacher
Tom Lane wrote: > I don't really see why you think that this path is going to lead to > better performance than where you were before. Manipulation of the > temp table is never going to be free, and IN (sub-select) is always > inherently not fast, and NOT IN (sub-select) is always inherently > aw

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-04 Thread Matthew Schumacher
John A Meinel wrote: > Matthew Schumacher wrote: > > I recommend that you drop and re-create the temp table. There is no > reason to have it around, considering you delete and re-add everything. > That means you never have to vacuum it, since it always only contains > the late

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-04 Thread Matthew Schumacher
Okay, Here is the status of the SA updates and a question: Michael got SA changed to pass an array of tokens to the proc so right there we gained a ton of performance due to connections and transactions being grouped into one per email instead of one per token. Now I am working on making the pro

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-01 Thread Matthew Schumacher
PFC wrote: > > >> select put_tokens2(1, '{"\\246\\323\\061\\332\\277"}', 1, 1, 1); > > > Try adding more backslashes until it works (seems that you need > or something). > Don't DBI convert the language types to postgres quoted forms on its > own ? > Your right I am find

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-01 Thread Matthew Schumacher
Tom Lane wrote: > > Revised insertion procedure: > > > CREATE or replace FUNCTION put_tokens (_id INTEGER, > _tokens BYTEA[], > _spam_count INTEGER, > _ham_count INTEGER, > _at

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-08-01 Thread Matthew Schumacher
Tom Lane wrote: > Michael Parker <[EMAIL PROTECTED]> writes: > >>sub bytea_esc { >> my ($str) = @_; >> my $buf = ""; >> foreach my $char (split(//,$str)) { >>if (ord($char) == 0) { $buf .= "000"; } >>elsif (ord($char) == 39) { $buf .= "047"; } >>elsif (ord($char) == 92) { $b

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-31 Thread Matthew Schumacher
Ok, here is the current plan. Change the spamassassin API to pass a hash of tokens into the storage module, pass the tokens to the proc as an array, start a transaction, load the tokens into a temp table using copy, select the tokens distinct into the token table for new tokens, update the token t

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-30 Thread Matthew Schumacher
Tom Lane wrote: > I looked into this a bit. It seems that the problem when you wrap the > entire insertion series into one transaction is associated with the fact > that the test does so many successive updates of the single row in > bayes_vars. (VACUUM VERBOSE at the end of the test shows it cl

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-30 Thread Matthew Schumacher
Karim Nassar wrote: > > [EMAIL PROTECTED]:~/k-bayesBenchmark$ time ./test.pl > <-- snip db creation stuff --> > 17:18:44 -- START > 17:19:37 -- AFTER TEMP LOAD : loaded 120596 records > 17:19:46 -- AFTER bayes_token INSERT : inserted 49359 new records into > bayes_token > 17:19:50 -- AFTER bayes_

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Matthew Schumacher
Ok, here is where I'm at, I reduced the proc down to this: CREATE FUNCTION update_token (_id INTEGER, _token BYTEA, _spam_count INTEGER, _ham_count INTEGER, _atime INTEGER) RETU

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Matthew Schumacher
Andrew McMillan wrote: > > For the data in question (i.e. bayes scoring) it would seem that not > much would be lost if you did have to restore your data from a day old > backup, so perhaps fsync=false is OK for this particular application. > > Regards, > And

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Matthew Schumacher
Ok, Here is something new, when I take my data.sql file and add a begin and commit at the top and bottom, the benchmark is a LOT slower? My understanding is that it should be much faster because fsync isn't called until the commit instead of on every sql command. I must be missing something here

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Matthew Schumacher
Andrew McMillan wrote: > On Thu, 2005-07-28 at 16:13 -0800, Matthew Schumacher wrote: > >>Ok, I finally got some test data together so that others can test >>without installing SA. >> >>The schema and test dataset is over at >>http://www.aptalaska.net/~matt.

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-28 Thread Matthew Schumacher
Gavin Sherry wrote: > > I had a look at your data -- thanks. > > I have a question though: put_token() is invoked 120596 times in your > benchmark... for 616 messages. That's nearly 200 queries (not even > counting the 1-8 (??) inside the function itself) per message. Something > doesn't seem ri

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-28 Thread Matthew Schumacher
Karim Nassar wrote: > On Wed, 2005-07-27 at 14:35 -0800, Matthew Schumacher wrote: > > >>I put the rest of the schema up at >>http://www.aptalaska.net/~matt.s/bayes/bayes_pg.sql in case someone >>needs to see it too. > > > Do you have sample data too?

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-27 Thread Matthew Schumacher
Josh Berkus wrote: > Matt, > > Well, it might be because we don't have a built-in GREATEST or LEAST prior to > 8.1. However, it's pretty darned easy to construct one. I was more talking about min() and max() but yea, I think you knew where I was going with it... > > Well, there's the genera

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-27 Thread Matthew Schumacher
Josh Berkus wrote: > Matt, > > >>After playing with various indexes and what not I simply am unable to >>make this procedure perform any better. Perhaps someone on the list can >>spot the bottleneck and reveal why this procedure isn't performing that >>well or ways to make it better. > > > Wel

[PERFORM] Performance problems testing with Spamassassin 3.1.0 Bayes module.

2005-07-27 Thread Matthew Schumacher
I'm not sure how much this has been discussed on the list, but wasn't able to find anything relevant in the archives. The new Spamassassin is due out pretty soon. They are currently testing 3.1.0pre4. One of the things I hope to get out of this release is bayes word stats moved to a real RDBMS.