Re: [PERFORM] Prefetch

2005-05-10 Thread Christopher Kings-Lynne
Another trick you can use with large data sets like this when you want results back in seconds is to have regularly updated tables that aggregate the data along each column normally aggregated against the main data set. Maybe some bright person will prove me wrong by posting some working informat

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Josh Berkus
Neil, > Sure, but that hardly makes it not "usable". Considering the price of > RAM these days, having enough RAM to hold the database (distributed over > the entire cluster) is perfectly acceptable for quite a few people. The other problem, as I was told it at OSCON, was that these were not hig

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Neil Conway
Joshua D. Drake wrote: Neil Conway wrote: Oh? What's wrong with MySQL's clustering implementation? Ram only tables :) Sure, but that hardly makes it not "usable". Considering the price of RAM these days, having enough RAM to hold the database (distributed over the entire cluster) is perfectly acc

Re: [PERFORM] Prefetch

2005-05-10 Thread Sam Vilain
Matt Olson wrote: Other databases like Oracle and DB2 implement some sort of row prefetch. Has there been serious consideration of implementing something like a prefetch subsystem? Does anyone have any opinions as to why this would be a bad idea for postgres? Postges is great for a multiuser envi

Re: [PERFORM] Configing 8 gig box.

2005-05-10 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > On Mon, May 09, 2005 at 04:55:53PM -0400, Joel Fradkin wrote: >> Seems to be only using like 360 meg out of 7 gig free (odd thing is I did >> see some used swap 4k out of 1.9) with a bunch of users (this may be normal, >> but it is not going overly fast

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Bruno Wolff III
On Tue, May 10, 2005 at 08:02:50 -0700, Adam Haberlach <[EMAIL PROTECTED]> wrote: > > > With all the Opteron v. Xeon around here, and talk of $30,000 machines, > perhaps it would be worth exploring the option of buying 10 cheapass > machines for $300 each. At the moment, that $300 buys you, fr

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Joshua D. Drake
Neil Conway wrote: Josh Berkus wrote: Don't hold your breath. MySQL, to judge by their first "clustering" implementation, has a *long* way to go before they have anything usable. Oh? What's wrong with MySQL's clustering implementation? Ram only tables :) -Neil ---(end of

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Neil Conway
Josh Berkus wrote: Don't hold your breath. MySQL, to judge by their first "clustering" implementation, has a *long* way to go before they have anything usable. Oh? What's wrong with MySQL's clustering implementation? -Neil ---(end of broadcast)---

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> No, not at all, because searching such an index will require a tree >> descent, thus negating the one true advantage of hash indexes. > The hash index still has to do a tree descent, it just has a larger branching >

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > To the best of my knowledge no such work has been done. There is a > project (who's name escapes me) that lets you run queries against a > remote postgresql server from a postgresql connection to a different > server, which could serve as the basis for

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>: > > >>*laff* > >>Yeah, like they've been working on views for the last 5 years, and > >>still haven't released them :D :D :D > > > > ? > > http://dev.mysql.com/doc/mysql/en/create-view.html > > ...for MySQL 5.0.1+ ? > > Give me a call when i

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Christopher Kings-Lynne
*laff* Yeah, like they've been working on views for the last 5 years, and still haven't released them :D :D :D ? http://dev.mysql.com/doc/mysql/en/create-view.html ...for MySQL 5.0.1+ ? Give me a call when it's RELEASED. Chris ---(end of broadcast)

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Joshua D. Drake
Mischa Sandberg wrote: Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>: This is why I mention partitioning. It solves this issue by storing different data sets on different machines under the same schema. These seperate chunks of the table can then be replicated as well for data redundancy a

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>: > > This is why I mention partitioning. It solves this issue by storing > > different data sets on different machines under the same schema. > > These seperate chunks of the table can then be replicated as well for > > data redundancy and so o

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Christopher Kings-Lynne
This is why I mention partitioning. It solves this issue by storing different data sets on different machines under the same schema. These seperate chunks of the table can then be replicated as well for data redundancy and so on. MySQL are working on these things *laff* Yeah, like they've bee

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > No, not at all, because searching such an index will require a tree > descent, thus negating the one true advantage of hash indexes. The hash index still has to do a tree descent, it just has a larger branching factor than the btree index. btree indexes

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mischa Sandberg
Quoting Tom Lane <[EMAIL PROTECTED]>: > Mischa Sandberg <[EMAIL PROTECTED]> writes: > > The PG hash join is the simplest possible: build a hash table in > memory, and match an input stream against it. > > [ raised eyebrow... ] Apparently you've not read the code. It's > been hybrid hashjoin si

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Tom Lane
Mischa Sandberg <[EMAIL PROTECTED]> writes: > The PG hash join is the simplest possible: build a hash table in memory, > and match an input stream against it. > *Hybrid hash* is where you spill the hash to disk in a well-designed > way. Instead of thinking of it as building a hash table in memory,

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mark Lewis
If the original paper was published in 1984, then it's been more than 20 years. Any potential patents would already have expired, no? -- Mark Lewis On Tue, 2005-05-10 at 14:35, Mischa Sandberg wrote: > Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > > > Well, in a hash-join right now you normally

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Jim C. Nasby
On Tue, May 10, 2005 at 02:55:55PM -0700, Mischa Sandberg wrote: > just beyond belief, for both updates and queries. At Acxiom, the > datasets are so large, even after partitioning, that they just > constantly cycle them through memory, and commands are executes in > convoys --- sort of like riding

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Alex Stapleton <[EMAIL PROTECTED]>: > This is why I mention partitioning. It solves this issue by storing > different data sets on different machines under the same schema. > These seperate chunks of the table can then be replicated as well for > data redundancy and so on. MySQL are wor

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Jim C. Nasby
On Tue, May 10, 2005 at 07:29:59PM +0200, PFC wrote: > I wonder how Oracle does it ;) Oracle *clustering* demands shared storage. So you've shifted your money from big-iron CPUs to big-iron disk arrays. Oracle replication works similar to Slony, though it supports a lot more modes (ie: sync

Re: [PERFORM] Configing 8 gig box.

2005-05-10 Thread Jim C. Nasby
On Mon, May 09, 2005 at 04:55:53PM -0400, Joel Fradkin wrote: > Seems to be only using like 360 meg out of 7 gig free (odd thing is I did > see some used swap 4k out of 1.9) with a bunch of users (this may be normal, > but it is not going overly fast so thought I would ask). This is perfectly norm

Re: [PERFORM] PGSQL Capacity

2005-05-10 Thread Jim C. Nasby
http://stats.distributed.net has a table that's 130M rows. http://stats.distributed.net/participant/phistory.php?project_id=8&id=39622 is a page that hits that table, and as you can see it's quite fast. This is on a dual opteron with 4G of memory. Unless you're looking for sub millisecond response

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mischa Sandberg
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > Well, in a hash-join right now you normally end up feeding at least > one > side of the join with a seqscan. Wouldn't it speed things up > considerably if you could look up hashes in the hash index instead? You might want to google on "grace hash" and

Re: [PERFORM] Prefetch - OffTopic

2005-05-10 Thread Mohan, Ross
Yes, that would be a sufficient (although not necessary) condition for being well and fine with kdB. Last time I used APL was.pre-Gregorian, so yea, that's scary to me, too. ( Of course, one can use C/ODBC or Java/JDBC to reach kdB; once there, one uses SQL92, or proprietary kSQL. ) ---

Re: [PERFORM] Prefetch - OffTopic

2005-05-10 Thread Chris Browne
[EMAIL PROTECTED] ("Mohan, Ross") writes: > for time-series and "insane fast", nothing beats kdB, I believe > > www.kx.com ... Which is well and fine if you're prepared to require that all of the staff that interact with data are skilled APL hackers. Skilled enough that they're all ready to leap

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting [EMAIL PROTECTED]: > > exploring the option of buying 10 cheapass > > machines for $300 each. At the moment, that $300 buys you, from > Dell, a > > 2.5Ghz Pentium 4 > > Buy cheaper ass Dells with an AMD 64 3000+. Beats the crap out of > the 2.5 > GHz Pentium, especially for PostgreSQL.

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: >>> What if the hash index stored *only* the hash code? That could be useful for >>> indexing large datatypes that would otherwise create large indexes. >> >> Hmm, that could be a thought. > Hm, if you go this route of having hash indexes store tuples ordere

[PERFORM] Prefetch - OffTopic

2005-05-10 Thread Mohan, Ross
for time-series and "insane fast", nothing beats kdB, I believe www.kx.com Not trying to Quisling-out PG here, just hoping to respond to Mr. Olson -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Tuesday, May 10, 2005 2:54 PM To: Gr

Re: [PERFORM] Prefetch

2005-05-10 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Actually forcing things to use indexes is the wrong direction to go if you're > trying to process lots of data and want to stream it off disk as rapidly as > possible. I would think about whether you can structure your data such that > you can use sequential

Re: [PERFORM] Prefetch

2005-05-10 Thread Matt Olson
My postgres binaries and WAL are on a separate disk from the raid array. The table I'm doing the selects from is probably about 4GB in size and 18-20 million records. No concurrent or dependent inserts or deletes are going on. Tom's point and your points about optimizing the application are we

RE: [PERFORM] Partitioning / Clustering

2005-05-10 Thread tdrayton
Hi Alex, Actually, our product can partition data among several clustered nodes running PostgreSQL, if that is what you are looking for. Data is distributed based on a designated column. Other tables can be replicated to all nodes. For SELECTs, it also knows when it can join locally or it needs

Re: [PERFORM] Prefetch

2005-05-10 Thread Greg Stark
Matt Olson <[EMAIL PROTECTED]> writes: > I've done other things that make sense, like using indexes, playing with the > planner constants and turning up the postgres cache buffers. > > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no > apparent difference in database pe

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mischa Sandberg
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > I'm not really familiar enough with hash indexes to know if this > would > work, but if the maximum bucket size was known you could use that to > determine a maximum range of buckets to look at. In some cases, that > range would include only one bucket

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Well, in a hash-join right now you normally end up feeding at least one > side of the join with a seqscan. Wouldn't it speed things up > considerably if you could look up hashes in the hash index instead? That's called a "nestloop with inner index scan"

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Josh Berkus
Alex, > This is why I mention partitioning. It solves this issue by storing   > different data sets on different machines under the same schema.   That's clustering, actually. Partitioning is simply dividing up a table into chunks and using the chunks intelligently. Putting those chunks on se

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > > What if the hash index stored *only* the hash code? That could be useful for > > indexing large datatypes that would otherwise create large indexes. > > Hmm, that could be a thought. Hm, if you go this route of having hash indexes store tuples ordered by

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread PFC
SELECT row1, row2 FROM table1_on_machine_a NATURAL JOIN table2_on_machine_b WHERE restrict_table_1 AND restrict_table_2 AND restrict_1_based_on_2; I don't think that's ever going to be efficient... What would be efficient would be, for instance, a Join of a part of a table against another pa

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Jim C. Nasby
On Tue, May 10, 2005 at 11:49:50AM -0400, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > What's the challange to making it adaptive, comming up with an algorithm > > that gives you the optimal bucket size (which I would think there's > > research on...) or allowing the index to ac

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > What's the challange to making it adaptive, comming up with an algorithm > that gives you the optimal bucket size (which I would think there's > research on...) or allowing the index to accommodate different bucket > sizes existing in the index at once?

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread John A Meinel
Adam Haberlach wrote: I think that perhaps he was trying to avoid having to buy "Big Iron" at all. With all the Opteron v. Xeon around here, and talk of $30,000 machines, perhaps it would be worth exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you,

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Richard_D_Levine
> exploring the option of buying 10 cheapass > machines for $300 each. At the moment, that $300 buys you, from Dell, a > 2.5Ghz Pentium 4 Buy cheaper ass Dells with an AMD 64 3000+. Beats the crap out of the 2.5 GHz Pentium, especially for PostgreSQL. See the thread "Whence the Opterons" for mo

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Jim C. Nasby
On Tue, May 10, 2005 at 12:10:57AM -0400, Tom Lane wrote: > be responsive to your search.) (This also brings up the thought that > it might be interesting to support hash buckets smaller than a page ... > but I don't know how to make that work in an adaptive fashion.) IIRC, other databases that s

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Jim C. Nasby
On Tue, May 10, 2005 at 10:14:11AM +1000, Neil Conway wrote: > Jim C. Nasby wrote: > >> No, hash joins and hash indexes are unrelated. > >I know they are now, but does that have to be the case? > > I mean, the algorithms are fundamentally unrelated. They share a bit of > code such as the hash fun

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
On 10 May 2005, at 16:02, Adam Haberlach wrote: I think that perhaps he was trying to avoid having to buy "Big Iron" at all. You would be right. Although we are not against paying a bit more than $300 for a server ;) With all the Opteron v. Xeon around here, and talk of $30,000 machines, per

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
On 10 May 2005, at 15:41, John A Meinel wrote: Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for

Re: [PERFORM] Prefetch

2005-05-10 Thread Rod Taylor
> I've done other things that make sense, like using indexes, playing with the > planner constants and turning up the postgres cache buffers. After you load the new days data try running CLUSTER on the structure using a key of (stockID, date) -- probably your primary key. This should significantl

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Adam Haberlach
I think that perhaps he was trying to avoid having to buy "Big Iron" at all. With all the Opteron v. Xeon around here, and talk of $30,000 machines, perhaps it would be worth exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you, from Dell, a 2.5Ghz

Re: [PERFORM] full outer performance problem

2005-05-10 Thread Tom Lane
Kim Bisgaard <[EMAIL PROTECTED]> writes: > I have two BIG tables (virtually identical) with 3 NOT NULL columns > Station_id, TimeObs, Temp_, with indexes on (Station_id, TimeObs) > and valid ANALYSE (set statistics=100). I want to join the two tables > with a FULL OUTER JOIN. I'm confused.

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread John A Meinel
Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I

Re: [PERFORM] full outer performance problem

2005-05-10 Thread John A Meinel
Kim Bisgaard wrote: Hi, I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible? I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_

Re: [PERFORM] Prefetch

2005-05-10 Thread Tom Lane
Matt Olson <[EMAIL PROTECTED]> writes: > Other databases like Oracle and DB2 implement some sort of row prefetch. Has > there been serious consideration of implementing something like a prefetch > subsystem? No. > Does anyone have any opinions as to why this would be a bad idea for > postgres?

[PERFORM] Prefetch

2005-05-10 Thread Matt Olson
I wanted to get some opinions about row prefetching. AFAIK, there is no prefetching done by PostgreSQL; all prefetching is delegated to the operating system. The hardware (can't say enough good things about it): Athlon 64, dual channel 4GB ram 240GB usable 4 disk raid5 (ATA133) Fedora Core 3 Pos

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> writes: >> I think that efficient implementation of this would require explicitly >> storing the hash code for each index entry, > It seems that means doubling the size of the hash index. That's a pretty big > i/o to cpu tradeof

[PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be

[PERFORM] full outer performance problem

2005-05-10 Thread Kim Bisgaard
Hi, I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible? I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, T