Another trick you can use with large data sets like this when you want
results
back in seconds is to have regularly updated tables that aggregate the data
along each column normally aggregated against the main data set.
Maybe some bright person will prove me wrong by posting some working
informat
Neil,
> Sure, but that hardly makes it not "usable". Considering the price of
> RAM these days, having enough RAM to hold the database (distributed over
> the entire cluster) is perfectly acceptable for quite a few people.
The other problem, as I was told it at OSCON, was that these were not
hig
Joshua D. Drake wrote:
Neil Conway wrote:
Oh? What's wrong with MySQL's clustering implementation?
Ram only tables :)
Sure, but that hardly makes it not "usable". Considering the price of
RAM these days, having enough RAM to hold the database (distributed over
the entire cluster) is perfectly acc
Matt Olson wrote:
Other databases like Oracle and DB2 implement some sort of row prefetch. Has
there been serious consideration of implementing something like a prefetch
subsystem? Does anyone have any opinions as to why this would be a bad idea
for postgres?
Postges is great for a multiuser envi
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Mon, May 09, 2005 at 04:55:53PM -0400, Joel Fradkin wrote:
>> Seems to be only using like 360 meg out of 7 gig free (odd thing is I did
>> see some used swap 4k out of 1.9) with a bunch of users (this may be normal,
>> but it is not going overly fast
On Tue, May 10, 2005 at 08:02:50 -0700,
Adam Haberlach <[EMAIL PROTECTED]> wrote:
>
>
> With all the Opteron v. Xeon around here, and talk of $30,000 machines,
> perhaps it would be worth exploring the option of buying 10 cheapass
> machines for $300 each. At the moment, that $300 buys you, fr
Neil Conway wrote:
Josh Berkus wrote:
Don't hold your breath. MySQL, to judge by their first "clustering"
implementation, has a *long* way to go before they have anything usable.
Oh? What's wrong with MySQL's clustering implementation?
Ram only tables :)
-Neil
---(end of
Josh Berkus wrote:
Don't hold your breath. MySQL, to judge by their first "clustering"
implementation, has a *long* way to go before they have anything usable.
Oh? What's wrong with MySQL's clustering implementation?
-Neil
---(end of broadcast)---
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> No, not at all, because searching such an index will require a tree
>> descent, thus negating the one true advantage of hash indexes.
> The hash index still has to do a tree descent, it just has a larger branching
>
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>:
> To the best of my knowledge no such work has been done. There is a
> project (who's name escapes me) that lets you run queries against a
> remote postgresql server from a postgresql connection to a different
> server, which could serve as the basis for
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>:
>
> >>*laff*
> >>Yeah, like they've been working on views for the last 5 years, and
> >>still haven't released them :D :D :D
> >
> > ?
> > http://dev.mysql.com/doc/mysql/en/create-view.html
> > ...for MySQL 5.0.1+ ?
>
> Give me a call when i
*laff*
Yeah, like they've been working on views for the last 5 years, and
still haven't released them :D :D :D
?
http://dev.mysql.com/doc/mysql/en/create-view.html
...for MySQL 5.0.1+ ?
Give me a call when it's RELEASED.
Chris
---(end of broadcast)
Mischa Sandberg wrote:
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>:
This is why I mention partitioning. It solves this issue by storing
different data sets on different machines under the same schema.
These seperate chunks of the table can then be replicated as well for
data redundancy a
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>:
> > This is why I mention partitioning. It solves this issue by storing
> > different data sets on different machines under the same schema.
> > These seperate chunks of the table can then be replicated as well for
> > data redundancy and so o
This is why I mention partitioning. It solves this issue by storing
different data sets on different machines under the same schema. These
seperate chunks of the table can then be replicated as well for data
redundancy and so on. MySQL are working on these things
*laff*
Yeah, like they've bee
Tom Lane <[EMAIL PROTECTED]> writes:
> No, not at all, because searching such an index will require a tree
> descent, thus negating the one true advantage of hash indexes.
The hash index still has to do a tree descent, it just has a larger branching
factor than the btree index.
btree indexes
Quoting Tom Lane <[EMAIL PROTECTED]>:
> Mischa Sandberg <[EMAIL PROTECTED]> writes:
> > The PG hash join is the simplest possible: build a hash table in
> memory, and match an input stream against it.
>
> [ raised eyebrow... ] Apparently you've not read the code. It's
> been hybrid hashjoin si
Mischa Sandberg <[EMAIL PROTECTED]> writes:
> The PG hash join is the simplest possible: build a hash table in memory,
> and match an input stream against it.
> *Hybrid hash* is where you spill the hash to disk in a well-designed
> way. Instead of thinking of it as building a hash table in memory,
If the original paper was published in 1984, then it's been more than 20
years. Any potential patents would already have expired, no?
-- Mark Lewis
On Tue, 2005-05-10 at 14:35, Mischa Sandberg wrote:
> Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>:
>
> > Well, in a hash-join right now you normally
On Tue, May 10, 2005 at 02:55:55PM -0700, Mischa Sandberg wrote:
> just beyond belief, for both updates and queries. At Acxiom, the
> datasets are so large, even after partitioning, that they just
> constantly cycle them through memory, and commands are executes in
> convoys --- sort of like riding
Quoting Alex Stapleton <[EMAIL PROTECTED]>:
> This is why I mention partitioning. It solves this issue by storing
> different data sets on different machines under the same schema.
> These seperate chunks of the table can then be replicated as well for
> data redundancy and so on. MySQL are wor
On Tue, May 10, 2005 at 07:29:59PM +0200, PFC wrote:
> I wonder how Oracle does it ;)
Oracle *clustering* demands shared storage. So you've shifted your money
from big-iron CPUs to big-iron disk arrays.
Oracle replication works similar to Slony, though it supports a lot more
modes (ie: sync
On Mon, May 09, 2005 at 04:55:53PM -0400, Joel Fradkin wrote:
> Seems to be only using like 360 meg out of 7 gig free (odd thing is I did
> see some used swap 4k out of 1.9) with a bunch of users (this may be normal,
> but it is not going overly fast so thought I would ask).
This is perfectly norm
http://stats.distributed.net has a table that's 130M rows.
http://stats.distributed.net/participant/phistory.php?project_id=8&id=39622
is a page that hits that table, and as you can see it's quite fast. This
is on a dual opteron with 4G of memory.
Unless you're looking for sub millisecond response
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>:
> Well, in a hash-join right now you normally end up feeding at least
> one
> side of the join with a seqscan. Wouldn't it speed things up
> considerably if you could look up hashes in the hash index instead?
You might want to google on "grace hash" and
Yes, that would be a sufficient (although not necessary) condition for being
well and fine with kdB.
Last time I used APL was.pre-Gregorian, so yea, that's scary to me, too.
( Of course, one can use C/ODBC or Java/JDBC to reach kdB; once there, one uses
SQL92, or
proprietary kSQL. )
---
[EMAIL PROTECTED] ("Mohan, Ross") writes:
> for time-series and "insane fast", nothing beats kdB, I believe
>
> www.kx.com
... Which is well and fine if you're prepared to require that all of
the staff that interact with data are skilled APL hackers. Skilled
enough that they're all ready to leap
Quoting [EMAIL PROTECTED]:
> > exploring the option of buying 10 cheapass
> > machines for $300 each. At the moment, that $300 buys you, from
> Dell, a
> > 2.5Ghz Pentium 4
>
> Buy cheaper ass Dells with an AMD 64 3000+. Beats the crap out of
> the 2.5
> GHz Pentium, especially for PostgreSQL.
Greg Stark <[EMAIL PROTECTED]> writes:
>>> What if the hash index stored *only* the hash code? That could be useful for
>>> indexing large datatypes that would otherwise create large indexes.
>>
>> Hmm, that could be a thought.
> Hm, if you go this route of having hash indexes store tuples ordere
for time-series and "insane fast", nothing beats kdB, I believe
www.kx.com
Not trying to Quisling-out PG here, just hoping to respond to Mr. Olson
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
Sent: Tuesday, May 10, 2005 2:54 PM
To: Gr
Greg Stark <[EMAIL PROTECTED]> writes:
> Actually forcing things to use indexes is the wrong direction to go if you're
> trying to process lots of data and want to stream it off disk as rapidly as
> possible. I would think about whether you can structure your data such that
> you can use sequential
My postgres binaries and WAL are on a separate disk from the raid array. The
table I'm doing the selects from is probably about 4GB in size and 18-20
million records. No concurrent or dependent inserts or deletes are going on.
Tom's point and your points about optimizing the application are we
Hi Alex,
Actually, our product can partition data among several clustered nodes
running PostgreSQL, if that is what you are looking for. Data is
distributed based on a designated column. Other tables can be
replicated to all nodes.
For SELECTs, it also knows when it can join locally or it needs
Matt Olson <[EMAIL PROTECTED]> writes:
> I've done other things that make sense, like using indexes, playing with the
> planner constants and turning up the postgres cache buffers.
>
> Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no
> apparent difference in database pe
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>:
> I'm not really familiar enough with hash indexes to know if this
> would
> work, but if the maximum bucket size was known you could use that to
> determine a maximum range of buckets to look at. In some cases, that
> range would include only one bucket
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> Well, in a hash-join right now you normally end up feeding at least one
> side of the join with a seqscan. Wouldn't it speed things up
> considerably if you could look up hashes in the hash index instead?
That's called a "nestloop with inner index scan"
Alex,
> This is why I mention partitioning. It solves this issue by storing
> different data sets on different machines under the same schema.
That's clustering, actually. Partitioning is simply dividing up a table into
chunks and using the chunks intelligently. Putting those chunks on se
Tom Lane <[EMAIL PROTECTED]> writes:
> > What if the hash index stored *only* the hash code? That could be useful for
> > indexing large datatypes that would otherwise create large indexes.
>
> Hmm, that could be a thought.
Hm, if you go this route of having hash indexes store tuples ordered by
SELECT row1, row2 FROM table1_on_machine_a NATURAL JOIN
table2_on_machine_b
WHERE restrict_table_1 AND restrict_table_2
AND restrict_1_based_on_2;
I don't think that's ever going to be efficient...
What would be efficient would be, for instance, a Join of a part of a
table against another pa
On Tue, May 10, 2005 at 11:49:50AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > What's the challange to making it adaptive, comming up with an algorithm
> > that gives you the optimal bucket size (which I would think there's
> > research on...) or allowing the index to ac
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> What's the challange to making it adaptive, comming up with an algorithm
> that gives you the optimal bucket size (which I would think there's
> research on...) or allowing the index to accommodate different bucket
> sizes existing in the index at once?
Adam Haberlach wrote:
I think that perhaps he was trying to avoid having to buy "Big Iron" at all.
With all the Opteron v. Xeon around here, and talk of $30,000 machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each. At the moment, that $300 buys you,
> exploring the option of buying 10 cheapass
> machines for $300 each. At the moment, that $300 buys you, from Dell, a
> 2.5Ghz Pentium 4
Buy cheaper ass Dells with an AMD 64 3000+. Beats the crap out of the 2.5
GHz Pentium, especially for PostgreSQL.
See the thread "Whence the Opterons" for mo
On Tue, May 10, 2005 at 12:10:57AM -0400, Tom Lane wrote:
> be responsive to your search.) (This also brings up the thought that
> it might be interesting to support hash buckets smaller than a page ...
> but I don't know how to make that work in an adaptive fashion.)
IIRC, other databases that s
On Tue, May 10, 2005 at 10:14:11AM +1000, Neil Conway wrote:
> Jim C. Nasby wrote:
> >> No, hash joins and hash indexes are unrelated.
> >I know they are now, but does that have to be the case?
>
> I mean, the algorithms are fundamentally unrelated. They share a bit of
> code such as the hash fun
On 10 May 2005, at 16:02, Adam Haberlach wrote:
I think that perhaps he was trying to avoid having to buy "Big
Iron" at all.
You would be right. Although we are not against paying a bit more
than $300 for a server ;)
With all the Opteron v. Xeon around here, and talk of $30,000
machines,
per
On 10 May 2005, at 15:41, John A Meinel wrote:
Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded
your box
and tuned the conf files as much as you can? But your query load is
just too high for
> I've done other things that make sense, like using indexes, playing with the
> planner constants and turning up the postgres cache buffers.
After you load the new days data try running CLUSTER on the structure
using a key of (stockID, date) -- probably your primary key.
This should significantl
I think that perhaps he was trying to avoid having to buy "Big Iron" at all.
With all the Opteron v. Xeon around here, and talk of $30,000 machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each. At the moment, that $300 buys you, from Dell, a
2.5Ghz
Kim Bisgaard <[EMAIL PROTECTED]> writes:
> I have two BIG tables (virtually identical) with 3 NOT NULL columns
> Station_id, TimeObs, Temp_, with indexes on (Station_id, TimeObs)
> and valid ANALYSE (set statistics=100). I want to join the two tables
> with a FULL OUTER JOIN.
I'm confused.
Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded your box
and tuned the conf files as much as you can? But your query load is
just too high for a single machine?
Upgrading stock Dell boxes (I
Kim Bisgaard wrote:
Hi,
I'm having problems with the query optimizer and FULL OUTER JOIN on
PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins.
I might be naive, but I think that it should be possible?
I have two BIG tables (virtually identical) with 3 NOT NULL columns
Station_
Matt Olson <[EMAIL PROTECTED]> writes:
> Other databases like Oracle and DB2 implement some sort of row prefetch. Has
> there been serious consideration of implementing something like a prefetch
> subsystem?
No.
> Does anyone have any opinions as to why this would be a bad idea for
> postgres?
I wanted to get some opinions about row prefetching. AFAIK, there is no
prefetching done by PostgreSQL; all prefetching is delegated to the operating
system.
The hardware (can't say enough good things about it):
Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
Pos
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> I think that efficient implementation of this would require explicitly
>> storing the hash code for each index entry,
> It seems that means doubling the size of the hash index. That's a pretty big
> i/o to cpu tradeof
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded your
box and tuned the conf files as much as you can? But your query load
is just too high for a single machine?
Upgrading stock Dell boxes (I know we could be
Hi,
I'm having problems with the query optimizer and FULL OUTER JOIN on
PostgreSQL 7.4. I cannot get it to use my indexes with full outer
joins. I might be naive, but I think that it should be possible?
I have two BIG tables (virtually identical) with 3 NOT NULL columns
Station_id, TimeObs, T
57 matches
Mail list logo