Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-10 Thread Zeugswetter Andreas DCP SD
> > > Two pass will create the count of subfiles proportional to: > > > Subfile_count = original_stream_size/sort_memory_buffer_size > > > > > > The merge pass requires (sizeof record * subfile_count) memory. > > > > That is true from an algorithmic perspective. But to make the merge > > effici

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-10 Thread Martijn van Oosterhout
On Fri, Mar 10, 2006 at 09:57:28AM +0100, Zeugswetter Andreas DCP SD wrote: > > > Two pass will create the count of subfiles proportional to: > > Subfile_count = original_stream_size/sort_memory_buffer_size > > > > The merge pass requires (sizeof record * subfile_count) memory. > > That is true

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-10 Thread Zeugswetter Andreas DCP SD
> Two pass will create the count of subfiles proportional to: > Subfile_count = original_stream_size/sort_memory_buffer_size > > The merge pass requires (sizeof record * subfile_count) memory. That is true from an algorithmic perspective. But to make the merge efficient you would need to have en

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Luke Lonergan
Dann, On 3/9/06 3:56 PM, "Dann Corbit" <[EMAIL PROTECTED]> wrote: > Two pass does not require sqrt(total) memory. This figure is clearly > wrong. Clearly you haven't read the paper I posted previously in this thread from 1986 written by Jim Grey at Tandem. - Luke --

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Luke Lonergan
Tom, On 3/9/06 3:59 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Possibly nothing. However, from an algorithmic point of view the > CVS-tip code *is* two-pass-sort, given adequate work_mem and no > requirement for random access. Further, the available profile data > doesn't show any indication t

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Luke Lonergan
Stephen, On 3/9/06 3:48 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote: > So, if we get a huge performance increase, what's wrong with: > if [ sqrt(est(total)) <= work_mem ]; then > two-pass-sort(); > else > tape-sort(); > fi I have something similar but less complex in mind. One of the obse

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes: > So, if we get a huge performance increase, what's wrong with: > if [ sqrt(est(total)) <=3D work_mem ]; then > two-pass-sort(); > else > tape-sort(); > fi > ? Possibly nothing. However, from an algorithmic point of view the CVS-tip code *is* two-pass

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Dann Corbit
> -Original Message- > From: Stephen Frost [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 09, 2006 3:49 PM > To: Tom Lane > Cc: Luke Lonergan; Jim C. Nasby; Greg Stark; Dann Corbit; Simon Riggs; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Merge algorit

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote: > "Luke Lonergan" <[EMAIL PROTECTED]> writes: > > I would only suggest that we replace the existing algorithm with one that > > will work regardless of (reasonable) memory requirements. Perhaps we can > > agree that at least 1MB of RAM for external sorting wil

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > I would only suggest that we replace the existing algorithm with one that > will work regardless of (reasonable) memory requirements. Perhaps we can > agree that at least 1MB of RAM for external sorting will always be available > and proceed from there

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Luke Lonergan
Tom, On 3/9/06 9:44 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > I think this argumentation hinges on some irrational aversion to the > word "tape". Given adequate work_mem, the CVS-tip behavior is exactly > what you propose already (at least for the cases where we don't need > random access to t

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > Consider that a popular commercial database, running on a 6-disk RAID5 with > one filesystem, performs external sorting 4 times faster (1/4 of the time) > than Postgres using a two pass sort. There is no special optimization of > the I/O path involved,

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Luke Lonergan
Jim, On 3/9/06 8:35 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > Well, the reality remains though; most folks are unlikely to setup > enough dedicated temp areas so that we can do one tape per disk, so it > would be really good to have a sort method that didn't rely on that. Agreed - however

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 10:20:08PM -0500, Greg Stark wrote: > > For that to be of any use, wouldn't you need to use only as many tapes > > as spindles/2? Otherwise you're still trying to read and write from the > > same set of drives, which means you're probably doing a lot of seeking. > > Or do th

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Florian Weimer
* Greg Stark: > That's one thing that gives me pause about the current approach of > using more tapes. It seems like ideally the user would create a > temporary work space on each spindle and the database would arrange > to use no more than that number of tapes. Then each merge operation > would i

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-03-08 kell 20:08, kirjutas Jim C. Nasby: > But it will take a whole lot of those rewinds to equal the amount of > time required by an additional pass through the data. I guess that missing a sector read also implies a "rewind", i.e. if you don't process the data read f

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-09 Thread Zeugswetter Andreas DCP SD
> > This amounts to an assumption that you have infinite work_mem, in > which > > case you hardly need an external sort at all. If your > work_mem is in > > fact finite, then at some point you need more than two passes. I'm > not > > really interested in ripping out support for sort > operati

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Greg Stark
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > On Wed, Mar 08, 2006 at 06:55:59PM -0500, Greg Stark wrote: > > > > "Luke Lonergan" <[EMAIL PROTECTED]> writes: > > > > > > I am pretty sure from this thread that PostgreSQL is not doing #1, and I > > > > have no idea if it is doing #2. > > > > > >

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Andrew Dunstan
Dann Corbit wrote: I do not clearly understand the sorting code in PostgreSQL. If I did have a good grasp of it, I would take a go at improving it. "Show me the code" (and the benchmarks). Seriously. We see regular discussions on this and similar topics, but I haven't seen a patch that

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 06:55:59PM -0500, Greg Stark wrote: > > "Luke Lonergan" <[EMAIL PROTECTED]> writes: > > > > I am pretty sure from this thread that PostgreSQL is not doing #1, and I > > > have no idea if it is doing #2. > > > > Yep. Even Knuth says that the tape goo is only interesting f

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Dann Corbit
> -Original Message- > From: Jim C. Nasby [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 08, 2006 5:44 PM > To: Dann Corbit > Cc: Tom Lane; Luke Lonergan; Simon Riggs; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Merge algorithms for large numbers of "

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 03:35:53PM -0800, Dann Corbit wrote: > I think I did not explain it clearly enough. Suppose that you have a > set of rows you need to sort. Instead of loading the whole row into > memory, just load the columns (or parts of columns) that are being > sorted. I hope that it

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Jonah H. Harris
om Lane; Jim C. Nasby; Simon Riggs; pgsql- > [EMAIL PROTECTED]> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes">>> "Luke Lonergan" < [EMAIL PROTECTED]> writes:>> > > I am pretty sure from this thread that PostgreSQL is not doing

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 10:49:16AM -0800, Luke Lonergan wrote: > Jim, > > On 3/8/06 9:49 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > > > On Wed, Mar 08, 2006 at 11:20:50AM -0500, Tom Lane wrote: > > >> Not sure that follows. In particular, the entire point of the recent > >> changes has bee

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Dann Corbit
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 08, 2006 3:56 PM > To: Luke Lonergan > Cc: Dann Corbit; Tom Lane; Jim C. Nasby; Simon Riggs; pgsql- > [EMAIL PROTECTED] > Subject: Re: [HACKERS] Merge algorithms for la

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Greg Stark
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > > I am pretty sure from this thread that PostgreSQL is not doing #1, and I > > have no idea if it is doing #2. > > Yep. Even Knuth says that the tape goo is only interesting from a > historical perspective and may not be relevant in an era of disk d

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Dann Corbit
> -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 08, 2006 3:17 PM > To: Dann Corbit > Cc: Jim C. Nasby; Luke Lonergan; Simon Riggs; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Merge algorithms for large numbers of

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Tom Lane
"Dann Corbit" <[EMAIL PROTECTED]> writes: > Here are some suggestions of things that I know work really, really > well: > #1. Two pass merge (none of that silly poly-tape merge goo) This amounts to an assumption that you have infinite work_mem, in which case you hardly need an external sort at al

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Dann Corbit
2006 1:59 PM > To: Luke Lonergan; Tom Lane; Jim C. Nasby > Cc: Simon Riggs; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes" > > > -Original Message- > > From: Luke Lonergan [mailto:[EMAIL PROTECTED] > >

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Dann Corbit
> -Original Message- > From: Luke Lonergan [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 08, 2006 1:52 PM > To: Dann Corbit; Tom Lane; Jim C. Nasby > Cc: Simon Riggs; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Merge algorithms for large numbers of "ta

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Luke Lonergan
Dann, On 3/8/06 12:39 PM, "Dann Corbit" <[EMAIL PROTECTED]> wrote: > Here are some suggestions of things that I know work really, really > well: Can you point to an example? That might help move the discussion along. The reason to interject about the tape goo in this discussion is that we seem

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Dann Corbit
m C. Nasby > Cc: Luke Lonergan; Simon Riggs; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes" > > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > But do fewer/longer sorted runs translate into not merging

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > But do fewer/longer sorted runs translate into not merging back to disk? > I thought that was controlled by if we had to be able to rewind the > result set. A plain SELECT ... ORDER BY doesn't assume that anymore. It is still required for some cases su

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Luke Lonergan
Jim, On 3/8/06 9:49 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote: > On Wed, Mar 08, 2006 at 11:20:50AM -0500, Tom Lane wrote: >> Not sure that follows. In particular, the entire point of the recent >> changes has been to extend the range in which we can use a single merge >> pass --- that is, w

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Simon Riggs
On Wed, 2006-03-08 at 10:21 -0500, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > 1. Earlier we had some results that showed that the heapsorts got slower > > when work_mem was higher and that concerns me most of all right now. > > Fair enough, but that's completely independent of t

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 11:20:50AM -0500, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > If we do have to fail to disk, cut back to 128MB, because having 8x that > > certainly won't make the sort run anywhere close to 8x faster. > > Not sure that follows. In particular, the enti

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > If we do have to fail to disk, cut back to 128MB, because having 8x that > certainly won't make the sort run anywhere close to 8x faster. Not sure that follows. In particular, the entire point of the recent changes has been to extend the range in which

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 07:28:16AM -0800, Luke Lonergan wrote: > > I think this would be extremely dangerous, as it would encourage > > processes to take more than their fair share of available resources. > > I agree - in fact, we currently have no structured concept of "fair share of > available

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Luke Lonergan
Tom, On 3/8/06 7:21 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: >> 1. Earlier we had some results that showed that the heapsorts got slower >> when work_mem was higher and that concerns me most of all right now. > > Fair enough, but that's completely indep

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > 1. Earlier we had some results that showed that the heapsorts got slower > when work_mem was higher and that concerns me most of all right now. Fair enough, but that's completely independent of the merge algorithm. (I don't think the Nyberg results necessa

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-08 Thread Simon Riggs
On Tue, 2006-03-07 at 18:14 -0500, Tom Lane wrote: > BTW, I was just looking over Knuth's discussion of sorting again, and > realized that there is still something more that could be done within > the existing sort framework. We currently use standard polyphase merge > (his Algorithm 5.4.2D), whic

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-07 Thread Luke Lonergan
Tom, On 3/7/06 8:03 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > "Luke Lonergan" <[EMAIL PROTECTED]> writes: >> Two passes is the state-of-the-practice on external disk sorts. > > There is no such thing as a fixed number of passes regardless of > available memory and size of the data. While tech

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-07 Thread Greg Stark
"Jonah H. Harris" <[EMAIL PROTECTED]> writes: > On 3/7/06, Tom Lane <[EMAIL PROTECTED]> wrote: > > > > However, now that we've changed the code to prefer large numbers of tapes, > > it's not at all clear that Algorithm D is still the right one to use. In > > particular I'm looking at cascade merge

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-07 Thread Luke Lonergan
Title: Re: [HACKERS] Merge algorithms for large numbers of "tapes" Yes – all of the current best practice external sorts use two passes.  A first to produce the runs, which results in “S” number of “files”, then a single merge pass across the runs.  At most 1 pass across the S runs i

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-07 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > Two passes is the state-of-the-practice on external disk sorts. There is no such thing as a fixed number of passes regardless of available memory and size of the data. regards, tom lane ---(end of broad

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-07 Thread Luke Lonergan
Tom, > fewer passes when T is large. Do you want to try that? Two passes is the state-of-the-practice on external disk sorts. If we¹re looking to replace the tape sort approach, I would hope for a two pass approach, with the merge pass avoided in the case of unidirectional access. - Luke --

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-07 Thread Dann Corbit
sub-files, it still only takes a total of two read-write passes. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Jonah H. Harris Sent: Tuesday, March 07, 2006 5:28 PM To: Tom Lane Cc: Simon Riggs; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Merge algorithms for

Re: [HACKERS] Merge algorithms for large numbers of "tapes"

2006-03-07 Thread Jonah H. Harris
On 3/7/06, Tom Lane <[EMAIL PROTECTED]> wrote: BTW, I was just looking over Knuth's discussion of sorting again, andrealized that there is still something more that could be done withinthe existing sort framework.  We currently use standard polyphase merge (his Algorithm 5.4.2D), which IIRC I chose