Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Gary Doades
Tom Lane wrote:
> I increased the size of the test case by 10x (basically s/10/100/)
> which is enough to push it into the external-sort regime.  I get
> amazingly stable runtimes now --- I didn't have the patience to run 100
> trials, but in 30 trials I have slowest 11538 msec and fastest 11144 msec.
> So this code path is definitely not very sensitive to this data
> distribution.
>
> While these numbers aren't glittering in comparison to the best-case
> qsort times (~450 msec to sort 10% as much data), they are sure a lot
> better than the worst-case times.  So maybe a workaround for you is
> to decrease maintenance_work_mem, counterintuitive though that be.
> (Now, if you *weren't* using maintenance_work_mem of 100MB or more
> for your problem restore, then I'm not sure I know what's going on...)
>

Good call. I basically reversed your test by keeping the number of rows
the same (20), but reducing maintenance_work_mem. Reducing to 8192
made no real difference. Reducing to 4096 flattened out all the times
nicely. Slower overall, but at least predictable. Hopefully only a
temporary solution until qsort is fixed.

My restore now takes 22 minutes :)

I think the reason I wasn't seeing performance issues with normal sort
operations is because they use work_mem not maintenance_work_mem which was
only set to 2048 anyway. Does that sound right?

Regards,
Gary.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Steinar H. Gunderson
On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote:
> Even better (and more easily scaled as the number of GPR's in the CPU 
> changes) is to use
> the set {L; L+1; L+2; t>>1; R-2; R-1; R}
> This means that instead of 7 random memory accesses, we have 3; two 
> of which result in a
> burst access for three elements each.

Isn't that improvement going to disappear competely if you choose a bad
pivot?

> SIDE NOTE: IIRC glibc's qsort is actually merge sort.  Merge sort 
> performance is insensitive to all inputs, and there are way to 
> optimize it as well.

glibc-2.3.5/stdlib/qsort.c:

  /* Order size using quicksort.  This implementation incorporates
 four optimizations discussed in Sedgewick:

I can't see any references to merge sort in there at all.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [HACKERS] qsort again

2006-02-16 Thread Florian Weimer
* Neil Conway:

> On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote:
>> It seems clear that our qsort.c is doing a pretty awful job of picking
>> qsort pivots, while glibc is mostly managing not to make that mistake.
>> I haven't looked at the glibc code yet to see what they are doing
>> differently.
>
> glibc qsort is actually merge sort, so I'm not surprised it avoids this
> problem.

qsort also performs twice as many key comparisons as the theoretical
minimum.  If key comparison is not very cheap, other schemes (like
heapsort, for example) are more attractive.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [HACKERS] qsort again

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 01:10:48PM +0100, Florian Weimer wrote:
> * Neil Conway:
> 
> > On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote:
> >> It seems clear that our qsort.c is doing a pretty awful job of picking
> >> qsort pivots, while glibc is mostly managing not to make that mistake.
> >> I haven't looked at the glibc code yet to see what they are doing
> >> differently.
> >
> > glibc qsort is actually merge sort, so I'm not surprised it avoids this
> > problem.
> 
> qsort also performs twice as many key comparisons as the theoretical
> minimum.  If key comparison is not very cheap, other schemes (like
> heapsort, for example) are more attractive.

Last time around there were a number of different algorithms tested.
Did anyone run those tests while getting it to count the number of
actual comparisons (which could easily swamp the time taken to do the
actual sort in some cases)?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [PERFORM] [HACKERS] qsort again

2006-02-16 Thread Sven Geisler

Martijn van Oosterhout schrieb:


Last time around there were a number of different algorithms tested.
Did anyone run those tests while getting it to count the number of
actual comparisons (which could easily swamp the time taken to do the
actual sort in some cases)?



The last time I did such tests is almost 10 years ago. I had used 
MetroWerks CodeWarrior C/C++, which had Quicksort as algorithm in the Lib C.
Anyhow, I tested a few algorithms including merge sort and heapsort. I 
end up with heapsort because it was the fastest algorithm for our issue. 
We joined two arrays where each array was sorted and run qsort to sort 
the new array.


Sven.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Ron

At 06:35 AM 2/16/2006, Steinar H. Gunderson wrote:

On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote:
> Even better (and more easily scaled as the number of GPR's in the CPU
> changes) is to use
> the set {L; L+1; L+2; t>>1; R-2; R-1; R}
> This means that instead of 7 random memory accesses, we have 3; two
> of which result in a burst access for three elements each.

Isn't that improvement going to disappear competely if you choose a bad
pivot?
Only if you _consistently_ (read: "the vast majority of the time": 
quicksort is actually darn robust) choose a _pessimal_, not just 
"bad", pivot quicksort will degenerate to the O(N^2) behavior 
everyone worries about.  See Corman & Rivest for a proof on this.


Even then, doing things as above has benefits:
1= The worst case is less bad since the guaranteed O(lgs!) pivot 
choosing algorithm puts s elements into final position.

Worst case becomes better than O(N^2/(s-1)).

2=  The overhead of pivot choosing can overshadow the benefits using 
more traditional methods for even moderate values of s.  See 
discussions on the quicksort variant known as "samplesort" and 
Sedgewick's PhD thesis for details.  Using a pivot choosing algorithm 
that actually does some of the partitioning (and does it more 
efficiently than the "usual" partitioning algorithm does) plus using 
partition-in-place (rather then Lomuto's method) reduces overhead 
very effectively (at the "cost" of more complicated / delicate to get 
right partitioning code).  The above reduces the number of moves used 
in a quicksort pass considerably regardless of the number of compares used.


3= Especially in modern systems where the gap between internal CPU 
bandwidth and memory bandwidth is so great, the overhead of memory 
accesses for comparisons and moves is the majority of the overhead 
for both the pivot choosing and the partitioning algorithms within 
quicksort.  Particularly random memory accesses.  The reason (#GPRs - 
1) is a magic constant is that it's the most you can compare and move 
using only register-to-register operations.


In addition, replacing as many of the memory accesses you must do 
with sequential rather than random memory accesses is a big deal: 
sequential memory access is measured in 10's of CPU cycles while 
random memory access is measured in hundreds of CPU cycles.  It's no 
accident that the advances in Grey's sorting contest have involved 
algorithms that are both register and cache friendly, minimizing 
overall memory access and using sequential memory access as much as 
possible when said access can not be avoided.  As caches grow larger 
and memory accesses more expensive, it's often worth it to use a 
BucketSort+QuickSort hybrid rather than just QuickSort.


...and of course if you know enough about the data to be sorted so as 
to constrain it appropriately, one should use a non comparison based 
O(N) sorting algorithm rather than any of the general comparison 
based O(NlgN) methods.




> SIDE NOTE: IIRC glibc's qsort is actually merge sort.  Merge sort
> performance is insensitive to all inputs, and there are way to
> optimize it as well.

glibc-2.3.5/stdlib/qsort.c:

  /* Order size using quicksort.  This implementation incorporates
 four optimizations discussed in Sedgewick:

I can't see any references to merge sort in there at all.

Well, then I'm not the only person on the lists whose memory is faulty ;-)

The up side of MergeSort is that its performance is always O(NlgN).
The down sides are that it is far more memory hungry than QuickSort and slower.


Ron




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Postgres slower than MS ACCESS

2006-02-16 Thread Peter Childs
On 15/02/06, Jay Greenfield <[EMAIL PROTECTED]> wrote:
I've been vacuuming between each test run.Not vacuuming results in times all the way up to 121 minutes.  For a directcomparison with Access, the vacuuming time with Postgres should really beincluded as this is not required with Access.


Hmm but then you would have to include Access Vacuum too I'll think you
will find "Tools -> Database Utils -> Compact Database" preforms
a simular purpose and is just as important as I've seen many Access
Databases bloat in my time.

Peter Childs 



Re: [PERFORM] [HACKERS] qsort again

2006-02-16 Thread Ron

At 07:10 AM 2/16/2006, Florian Weimer wrote:

* Neil Conway:

> On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote:
>> It seems clear that our qsort.c is doing a pretty awful job of picking
>> qsort pivots, while glibc is mostly managing not to make that mistake.
>> I haven't looked at the glibc code yet to see what they are doing
>> differently.
>
> glibc qsort is actually merge sort, so I'm not surprised it avoids this
> problem.

qsort also performs twice as many key comparisons as the theoretical
minimum.


The theoretical minimum number of comparisons for a general purpose 
comparison based sort is O(lgN!).
QuickSort uses 2NlnN ~= 1.38NlgN or ~1.38x the optimum without tuning 
(see Knuth, Sedgewick, Corman, ... etc)
OTOH, QuickSort uses ~2x as many =moves= as the theoretical minimum 
unless tuned, and moves are more expensive than compares in modern systems.


See my other posts for QuickSort tuning methods that attempt to 
directly address both issues.



Ron 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, Ron,

Ron wrote:

> ...and of course if you know enough about the data to be sorted so as to
> constrain it appropriately, one should use a non comparison based O(N)
> sorting algorithm rather than any of the general comparison based
> O(NlgN) methods.

Sounds interesting, could you give us some pointers (names, URLs,
papers) to such algorithms?

Thanks a lot,
Markus



-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote:
> 3= Especially in modern systems where the gap between internal CPU 
> bandwidth and memory bandwidth is so great, the overhead of memory 
> accesses for comparisons and moves is the majority of the overhead 
> for both the pivot choosing and the partitioning algorithms within 
> quicksort.  Particularly random memory accesses.  The reason (#GPRs - 
> 1) is a magic constant is that it's the most you can compare and move 
> using only register-to-register operations.

But how much of this applies to us? We're not sorting arrays of
integers, we're sorting pointers to tuples. So while moves cost very
little, a comparison costs hundreds, maybe thousands of cycles. A tuple
can easily be two or three cachelines and you're probably going to
access all of it, not to mention the Fmgr structures and the Datums
themselves.

None of this is cache friendly. The actual tuples themselves could be
spread all over memory (I don't think any particular effort is expended
trying to minimize fragmentation).

Do these algorithms discuss the case where a comparison is more than
1000 times the cost of a move?

Where this does become interesting is where we can convert a datum to
an integer such that if f(A) > f(B) then A > B. Then we can sort on
f(X) first with just integer comparisons and then do a full tuple
comparison only if f(A) = f(B). This would be much more cache-coherent
and make these algorithms much more applicable in my mind.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Gary Doades
> "Gary Doades" <[EMAIL PROTECTED]> writes:
>> I think the reason I wasn't seeing performance issues with normal sort
>> operations is because they use work_mem not maintenance_work_mem which
>> was
>> only set to 2048 anyway. Does that sound right?
>
> Very probable.  Do you want to test the theory by jacking that up?  ;-)

Hmm, played around a bit. I have managed to get it to do a sort on one of
the "bad" columns using a select of two whole tables that results in a
sequntial scan, sort and merge join. I also tried a simple select column
order by column for a bad column.

I tried varying maintenance_work_mem and work_mem up and down between 2048
and 65536 but I always get similar results. The sort phase always takes 4
to 5 seconds which seems about right for 900,000 rows.

This was on a colunm that took 12 minutes to create an index on.

I've no idea why it should behave this way, but probably explains why I
(and others) may not have noticed it before.

Regards,
Gary.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote:

On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote:
> 3= Especially in modern systems where the gap between internal CPU
> bandwidth and memory bandwidth is so great, the overhead of memory
> accesses for comparisons and moves is the majority of the overhead
> for both the pivot choosing and the partitioning algorithms within
> quicksort.  Particularly random memory accesses.  The reason (#GPRs -
> 1) is a magic constant is that it's the most you can compare and move
> using only register-to-register operations.

But how much of this applies to us? We're not sorting arrays of
integers, we're sorting pointers to tuples. So while moves cost very
little, a comparison costs hundreds, maybe thousands of cycles. A tuple
can easily be two or three cachelines and you're probably going to
access all of it, not to mention the Fmgr structures and the Datums
themselves.
Pointers are simply fixed size 32b or 64b quantities.  They are 
essentially integers.  Comparing and moving pointers or fixed size 
keys to those pointers is exactly the same problem as comparing and 
moving integers.


Comparing =or= moving the actual data structures is a much more 
expensive and variable cost proposition.  I'm sure that pg's sort 
functionality is written intelligently enough that the only real data 
moves are done in a final pass after the exact desired order has been 
found using pointer compares and (re)assignments during the sorting 
process.  That's a standard technique for sorting data whose "key" or 
pointer is much smaller than a datum.


Your cost comment basically agrees with mine regarding the cost of 
random memory accesses.  The good news is that the number of datums 
to be examined during the pivot choosing process is small enough that 
the datums can fit into CPU cache while the pointers to them can be 
assigned to registers: making pivot choosing +very+ fast when done correctly.


As you've noted, actual partitioning is going to be more expensive 
since it involves accessing enough actual datums that they can't all 
fit into CPU cache.  The good news is that QuickSort has a very 
sequential access pattern within its inner loop.  So while we must go 
to memory for compares, we are at least keeping the cost for it down 
it a minimum.  In addition, said access is nice enough to be very 
prefetch and CPU cache hierarchy friendly.




None of this is cache friendly. The actual tuples themselves could be
spread all over memory (I don't think any particular effort is expended
trying to minimize fragmentation).
It probably would be worth it to spend some effort on memory layout 
just as we do for HD layout.




Do these algorithms discuss the case where a comparison is more than
1000 times the cost of a move?
A move is always more expensive than a compare when the datum is 
larger than its pointer or key.  A move is always more expensive than 
a compare when it involves memory to memory movement rather than CPU 
location to CPU location movement.  A move is especially more 
expensive than a compare when it involves both factors.  Most moves 
do involve both.


What I suspect you meant is that a key comparison that involves 
accessing the data in memory is more expensive than reassigning the 
pointers associated with those keys.   That is certainly true.


Yes.  The problem has been extensively studied. ;-)



Where this does become interesting is where we can convert a datum to
an integer such that if f(A) > f(B) then A > B. Then we can sort on
f(X) first with just integer comparisons and then do a full tuple
comparison only if f(A) = f(B). This would be much more cache-coherent
and make these algorithms much more applicable in my mind.

In fact we can do better.
Using hash codes or what-not to map datums to keys and then sorting 
just the keys and the pointers to those datums followed by an 
optional final pass where we do the actual data movement is also a 
standard technique for handling large data structures.



Regardless of what tweaks beyond the basic algorithms we use, the 
algorithms themselves have been well studied and their performance 
well established.  QuickSort is the best performing of the O(nlgn) 
comparison based sorts and it uses less resources than HeapSort or MergeSort.


Ron



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Tom Lane
Ron <[EMAIL PROTECTED]> writes:
> Your cost comment basically agrees with mine regarding the cost of 
> random memory accesses.  The good news is that the number of datums 
> to be examined during the pivot choosing process is small enough that 
> the datums can fit into CPU cache while the pointers to them can be 
> assigned to registers: making pivot choosing +very+ fast when done correctly.

This is more or less irrelevant given that comparing the pointers is not
the operation we need to do.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Craig A. James

Markus Schaber wrote:

Ron wrote:

...and of course if you know enough about the data to be sorted so as to
constrain it appropriately, one should use a non comparison based O(N)
sorting algorithm rather than any of the general comparison based
O(NlgN) methods.


Sounds interesting, could you give us some pointers (names, URLs,
papers) to such algorithms?


Most of these techniques boil down to good ol' "bucket sort".  A simple example: suppose 
you have a column of integer percentages, range zero to 100.  You know there are only 101 distinct 
values.  So create 101 "buckets" (e.g. linked lists), make a single pass through your 
data and drop each row's ID into the right bucket, then make a second pass through the buckets, and 
write the row ID's out in bucket order.  This is an O(N) sort technique.

Any time you have a restricted data range, you can do this.  Say you have 100 
million rows of scientific results known to be good to only three digits -- it 
can have at most 1,000 distinct values (regardless of the magnitude of the 
values), so you can do this with 1,000 buckets and just two passes through the 
data.

You can also use this trick when the optimizer is asked for "fastest first result."  Say you have a cursor on a column of numbers with good distribution.  If you do a bucket sort on the first two or three digits only, you know the first "page" of results will be in the first bucket.  So you only need to apply qsort to that first bucket (which is very fast, since it's small), and you can deliver the first page of data to the application.  This can be particularly effective in interactive situations, where the user typically looks at a few pages of data and then abandons the search.  


I doubt this is very relevant to Postgres.  A relational database has to be general 
purpose, and it's hard to give it "hints" that would tell it when to use this 
particular optimization.

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 10:52 AM 2/16/2006, Ron wrote:

At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote:


Where this does become interesting is where we can convert a datum to
an integer such that if f(A) > f(B) then A > B. Then we can sort on
f(X) first with just integer comparisons and then do a full tuple
comparison only if f(A) = f(B). This would be much more cache-coherent
and make these algorithms much more applicable in my mind.

In fact we can do better.
Using hash codes or what-not to map datums to keys and then sorting 
just the keys and the pointers to those datums followed by an 
optional final pass where we do the actual data movement is also a 
standard technique for handling large data structures.

I thought some follow up might be in order here.

Let's pretend that we have the typical DB table where rows are ~2-4KB 
apiece.  1TB of storage will let us have 256M-512M rows in such a table.


A 32b hash code can be assigned to each row value such that only 
exactly equal rows will have the same hash code.

A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b+32b= 
64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an optional 
pass to rearrange the actual rows if we so wish.


We get the same result while only examining and manipulating 1/50 to 
1/25 as much data during the sort.


If we want to spend more CPU time in order to save more space, we can 
compress the key+pointer representation.  That usually reduces the 
amount of data to be manipulated to ~1/4 the original key+pointer 
representation, reducing things to ~512M-1GB worth of compressed 
pointers+keys.  Or ~1/200 - ~1/100 the original amount of data we 
were discussing.


Either representation is small enough to fit within RAM rather than 
requiring HD IO, so we solve the HD IO bottleneck in the best 
possible way: we avoid ever doing it.


Ron   




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 11:32:55AM -0500, Ron wrote:
> At 10:52 AM 2/16/2006, Ron wrote:
> >In fact we can do better.
> >Using hash codes or what-not to map datums to keys and then sorting 
> >just the keys and the pointers to those datums followed by an 
> >optional final pass where we do the actual data movement is also a 
> >standard technique for handling large data structures.

Or in fact required if the Datums are not all the same size, which is
the case in PostgreSQL.

> I thought some follow up might be in order here.
> 
> Let's pretend that we have the typical DB table where rows are ~2-4KB 
> apiece.  1TB of storage will let us have 256M-512M rows in such a table.
> 
> A 32b hash code can be assigned to each row value such that only 
> exactly equal rows will have the same hash code.
> A 32b pointer can locate any of the 256M-512M rows.

That hash code is impossible the way you state it, since the set of
strings is not mappable to a 32bit integer. You probably meant that a
hash code can be assigned such that equal rows have equal hashes (drop
the only).

> Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b+32b= 
> 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an optional 
> pass to rearrange the actual rows if we so wish.
> 
> We get the same result while only examining and manipulating 1/50 to 
> 1/25 as much data during the sort.

But this is what we do now. The tuples are loaded, we sort an array of
pointers, then we write the output. Except we don't have the hash, so
we require access to the 1TB of data to do the actual comparisons. Even
if we did have the hash, we'd *still* need access to the data to handle
tie-breaks.

That's why your comment about moves always being more expensive than
compares makes no sense. A move can be acheived simply by swapping two
pointers in the array. A compare actually needs to call all sorts of
functions. If and only if we have functions for every data type to
produce an ordered hash, we can optimise sorts based on single
integers.

For reference, look at comparetup_heap(). It's just 20 lines, but each
function call there expands to maybe a dozen lines of code. And it has
a loop. I don't think we're anywhere near the stage where locality of
reference makes much difference.

We very rarely needs the tuples actualised in memory in the required
order, just the pointers are enough.

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


[PERFORM] Why does not perform index combination

2006-02-16 Thread Adnan DURSUN



HI ALL,
 
  I have query for a report. Explain analyze 
result is below. The execution plan tells that it would use 
"t_koltuk_islem_pkey" index on table "t_koltuk_islem" to scan. However, 
there is another index on table "t_koltuk_islem" on column "islem_tarihi" 
that can be combined on plan. Why doesn't optimizer choice that ? It prefer 
to perform a filter on column "islem_tarihi" ... 
Why ?
 
QUERY 
PLAN--"Nested 
Loop  (cost=0.00..2411.48 rows=14 width=797) (actual time=117.427..4059.351 
rows=55885 loops=1)""  ->  Nested Loop  (cost=0.00..35.69 
rows=1 width=168) (actual time=0.124..8.714 rows=94 
loops=1)""    Join Filter: 
((""outer"".sefer_tip_kod = ""inner"".kod) AND ((""outer"".firma_no)::text = 
(""inner"".firma_no)::text))""    
->  Nested Loop  (cost=0.00..34.64 rows=1 width=154) (actual 
time=0.114..7.555 rows=94 
loops=1)""  
->  Nested Loop  (cost=0.00..30.18 rows=1 width=144) (actual 
time=0.106..6.654 rows=94 
loops=1)""    
->  Nested Loop  (cost=0.00..25.71 rows=1 width=134) (actual 
time=0.089..5.445 rows=94 
loops=1)""  
Join Filter: (((""inner"".""no"")::text = (""outer"".hat_no)::text) AND 
((""inner"".firma_no)::text = 
(""outer"".firma_no)::text))""  
->  Nested Loop  (cost=0.00..24.21 rows=1 width=116) (actual 
time=0.063..1.632 rows=94 
loops=1)""    
Join Filter: ((""outer"".kod)::text = 
(""inner"".durumu)::text)""    
->  Seq Scan on t_domains d2  (cost=0.00..2.21 rows=2 width=18) 
(actual time=0.029..0.056 rows=2 
loops=1)""  
Filter: ((name)::text = 
'SFR_DURUMU'::text)""    
->  Nested Loop  (cost=0.00..10.91 rows=7 width=103) (actual 
time=0.028..0.649 rows=94 
loops=2)""  
Join Filter: ((""outer"".kod)::text = 
(""inner"".ek_dev)::text)""  
->  Seq Scan on t_domains d1  (cost=0.00..2.21 rows=2 width=18) 
(actual time=0.017..0.046 rows=2 
loops=2)""    
Filter: ((name)::text = 
'EKDEV'::text)""  
->  Seq Scan on t_seferler s  (cost=0.00..3.17 rows=94 width=90) 
(actual time=0.003..0.160 rows=94 
loops=4)""    
Filter: ((iptal)::text = 
'H'::text)""  
->  Seq Scan on t_hatlar h  (cost=0.00..1.20 rows=20 width=18) 
(actual time=0.002..0.020 rows=20 
loops=94)""    
->  Index Scan using t_yer_pkey on t_yer y2  (cost=0.00..4.45 
rows=1 width=14) (actual time=0.008..0.009 rows=1 
loops=94)""  
Index Cond: (""outer"".varis_yer_kod = 
y2.kod)""  
Filter: ((iptal)::text = 
'H'::text)""  
->  Index Scan using t_yer_pkey on t_yer y1  (cost=0.00..4.45 
rows=1 width=14) (actual time=0.004..0.006 rows=1 
loops=94)""    
Index Cond: (""outer"".kalkis_yer_kod = 
y1.kod)""    
Filter: ((iptal)::text = 
'H'::text)""    ->  Seq Scan 
on t_sefer_tip t  (cost=0.00..1.02 rows=2 width=18) (actual 
time=0.002..0.006 rows=2 
loops=94)""  
Filter: ((iptal)::text = 'H'::text)""  ->  Index Scan using 
t_koltuk_islem_pkey on t_koltuk_islem i  (cost=0.00..2375.10 rows=39 
width=644) (actual time=38.151..41.881 rows=595 
loops=94)""    Index Cond: 
(((""outer"".firma_no)::text = (i.firma_no)::text) AND ((""outer"".hat_no)::text 
= (i.hat_no)::text) AND (""outer"".kod = 
i.sefer_kod))""    Filter: 
((islem_tarihi >= '2006-01-17'::date) AND (islem_tarihi <= 
'2006-02-16'::date))""Total runtime: 4091.242 ms"
Best Regards
 

Adnan DURSUN
ASRIN Bilişim Ltd.Şti
Ankara / TURKEY


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Tom Lane
"Craig A. James" <[EMAIL PROTECTED]> writes:
> You can also use this trick when the optimizer is asked for "fastest first 
> result."  Say you have a cursor on a column of numbers with good 
> distribution.  If you do a bucket sort on the first two or three digits only, 
> you know the first "page" of results will be in the first bucket.  So you 
> only need to apply qsort to that first bucket (which is very fast, since it's 
> small), and you can deliver the first page of data to the application.  This 
> can be particularly effective in interactive situations, where the user 
> typically looks at a few pages of data and then abandons the search.  

> I doubt this is very relevant to Postgres.  A relational database has to be 
> general purpose, and it's hard to give it "hints" that would tell it when to 
> use this particular optimization.

Actually, LIMIT does nicely for that hint; the PG planner has definitely
got a concept of preferring fast-start plans for limited queries.  The
real problem in applying bucket-sort ideas is the lack of any
datatype-independent way of setting up the buckets.

Once or twice we've kicked around the idea of having some
datatype-specific sorting code paths alongside the general-purpose one,
but I can't honestly see this as being workable from a code maintenance
standpoint.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Why does not perform index combination

2006-02-16 Thread Tom Lane
"Adnan DURSUN" <[EMAIL PROTECTED]> writes:
>   I have query for a report. Explain analyze result is below. The =
> execution plan tells that it would use "t_koltuk_islem_pkey" index on =
> table "t_koltuk_islem" to scan. However, there is another index on table =
> "t_koltuk_islem" on column "islem_tarihi" that can be combined on plan. =
> Why doesn't optimizer choice that ? It prefer to perform a filter on =
> column "islem_tarihi" ... Why ?

Probably thinks that the extra index doesn't add enough selectivity to
be worth scanning.  It's probably right, too --- maybe with a narrower
date range the answer would be different.

I think the main problem in this plan is the poor estimation of the size
of the d1/s join.  Are your stats up to date on those tables?  Maybe
boosting the statistics target for one or both would help.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)

2006-02-16 Thread Tom Lane
"Gary Doades" <[EMAIL PROTECTED]> writes:
> I think the reason I wasn't seeing performance issues with normal sort
> operations is because they use work_mem not maintenance_work_mem which was
> only set to 2048 anyway. Does that sound right?

Very probable.  Do you want to test the theory by jacking that up?  ;-)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Scott Lamb

On Feb 16, 2006, at 8:32 AM, Ron wrote:
Let's pretend that we have the typical DB table where rows are  
~2-4KB apiece.  1TB of storage will let us have 256M-512M rows in  
such a table.


A 32b hash code can be assigned to each row value such that only  
exactly equal rows will have the same hash code.

A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b 
+32b= 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an  
optional pass to rearrange the actual rows if we so wish.


I don't understand this.

This is a true statement: (H(x) != H(y)) => (x != y)
This is not: (H(x) < H(y)) => (x < y)

Hash keys can tell you there's an inequality, but they can't tell you  
how the values compare. If you want 32-bit keys that compare in the  
same order as the original values, here's how you have to get them:


(1) sort the values into an array
(2) use each value's array index as its key

It reduces to the problem you're trying to use it to solve.


--
Scott Lamb 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 12:19 PM 2/16/2006, Scott Lamb wrote:

On Feb 16, 2006, at 8:32 AM, Ron wrote:

Let's pretend that we have the typical DB table where rows are
~2-4KB apiece.  1TB of storage will let us have 256M-512M rows in
such a table.

A 32b hash code can be assigned to each row value such that only
exactly equal rows will have the same hash code.
A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b 
+32b= 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an

optional pass to rearrange the actual rows if we so wish.


I don't understand this.

This is a true statement: (H(x) != H(y)) => (x != y)
This is not: (H(x) < H(y)) => (x < y)

Hash keys can tell you there's an inequality, but they can't tell you
how the values compare. If you want 32-bit keys that compare in the
same order as the original values, here's how you have to get them:
For most hash codes, you are correct.  There is a class of hash or 
hash-like codes that maintains the mapping to support that second statement.


More later when I can get more time.
Ron 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Neil Conway
On Thu, 2006-02-16 at 12:35 +0100, Steinar H. Gunderson wrote:
> glibc-2.3.5/stdlib/qsort.c:
> 
>   /* Order size using quicksort.  This implementation incorporates
>  four optimizations discussed in Sedgewick:
> 
> I can't see any references to merge sort in there at all.

stdlib/qsort.c defines _quicksort(), not qsort(), which is defined by
msort.c. On looking closer, it seems glibc actually tries to determine
the physical memory in the machine -- if it is sorting a single array
that exceeds 1/4 of the machine's physical memory, it uses quick sort,
otherwise it uses merge sort.

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Why does not perform index combination

2006-02-16 Thread Adnan DURSUN





>From: Tom Lane
>Date: 02/16/06 
19:29:21
>To: Adnan DURSUN
>Cc: pgsql-performance@postgresql.org
>Subject: Re: [PERFORM] 
Why does not perform index combination
 
>"Adnan DURSUN" <[EMAIL PROTECTED]> writes:
>>   I have query for a report. Explain analyze result is 
below. The =
>> execution plan tells that it would use "t_koltuk_islem_pkey" index 
on =
>> table "t_koltuk_islem" to scan. However, there is another index on 
table =
>> "t_koltuk_islem" on column "islem_tarihi" that can be combined on 
plan. =
>> Why doesn't optimizer choice that ? It prefer to perform a filter 
on =
>> column "islem_tarihi" ... Why ?
 
>Probably thinks that the extra index doesn't add enough selectivity 
to
>be worth scanning.  It's probably right, too --- maybe with a 
narrower
>date range the answer would be different.
 
    Yes, a narrower date range solves that.. Thanks for your 
suggestions...
 
>I think the main problem in this plan is the poor estimation of the 
size
>of the d1/s join.  Are your stats up to date on those 
tables?  Maybe
>boosting the statistics target for one or both would help.
 
    Database was vacuumed and analyzed before got take 
the plan..
 
Regards
Adnan DURSUN
 


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Mark Lewis
On Thu, 2006-02-16 at 12:15 -0500, Tom Lane wrote:
> Once or twice we've kicked around the idea of having some
> datatype-specific sorting code paths alongside the general-purpose one,
> but I can't honestly see this as being workable from a code maintenance
> standpoint.
> 
>   regards, tom lane


It seems that instead of maintaining a different sorting code path for
each data type, you could get away with one generic path and one
(hopefully faster) path if you allowed data types to optionally support
a 'sortKey' interface by providing a function f which maps inputs to 32-
bit int outputs, such that the following two properties hold:

f(a)>=f(b) iff a>=b
if a==b then f(a)==f(b)

So if a data type supports the sortKey interface you could perform the
sort on f(value) and only refer back to the actual element comparison
functions when two sortKeys have the same value.

Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).

Depending on the overhead, you might not even need to maintain 2
independent search code paths, since you could always use f(x)=0 as the
default sortKey function which would degenerate to the exact same sort
behavior in use today.

-- Mark Lewis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, Mark,

Mark Lewis schrieb:

> It seems that instead of maintaining a different sorting code path for
> each data type, you could get away with one generic path and one
> (hopefully faster) path if you allowed data types to optionally support
> a 'sortKey' interface by providing a function f which maps inputs to 32-
> bit int outputs, such that the following two properties hold:
> 
> f(a)>=f(b) iff a>=b
> if a==b then f(a)==f(b)

Hmm, to remove redundancy, I'd change the <= to a < and define:

if a==b then f(a)==f(b)
if a Data types which could probably provide a useful function for f would be
> int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).

With int2 or some restricted ranges of oid and int4, we could even
implement a bucket sort.

Markus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 02:17:36PM -0800, Mark Lewis wrote:
> It seems that instead of maintaining a different sorting code path for
> each data type, you could get away with one generic path and one
> (hopefully faster) path if you allowed data types to optionally support
> a 'sortKey' interface by providing a function f which maps inputs to 32-
> bit int outputs, such that the following two properties hold:
> 
> f(a)>=f(b) iff a>=b
> if a==b then f(a)==f(b)

Note this is a property of the collation, not the type. For example
strings can be sorted in many ways and the sortKey must reflect that.
So in postgres terms it's a property of the btree operator class.

It's something I'd like to do if I get A Round Tuit. :)

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Greg Stark
Markus Schaber <[EMAIL PROTECTED]> writes:

> Hmm, to remove redundancy, I'd change the <= to a < and define:
> 
> if a==b then f(a)==f(b)
> if a 
> > Data types which could probably provide a useful function for f would be
> > int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).

How exactly do you imagine doing this for text?

I could see doing it for char(n)/varchar(n) where n<=4 in SQL_ASCII though.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread PFC




It seems that instead of maintaining a different sorting code path for
each data type, you could get away with one generic path and one
(hopefully faster) path if you allowed data types to optionally support
a 'sortKey' interface by providing a function f which maps inputs to 32-
bit int outputs, such that the following two properties hold:


	Looks like the decorate-sort-undecorate pattern, which works quite well.  
Good idea.
	I would have said a 64 bit int, but it's the same idea. However it won't  
work for floats, which is a pity, because floats fit in 64 bits. Unless  
more types creep in the code path (which would not necessarily make it  
that slower).
	As for text, the worst case is when all strings start with the same 8  
letters, but a good case pops up when a few-letter code is used as a key  
in a table. Think about a zipcode, for instance. If a merge join needs to  
sort on zipcodes, it might as well sort on 64-bits integers...


	By the way, I'd like to declare my zipcode columns as SQL_ASCII while the  
rest of my database is in UNICODE, so they are faster to index and sort.  
Come on, MySQL does it...


Keep up !

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Mark Lewis
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote:
> > > Data types which could probably provide a useful function for f would be
> > > int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
> 
> How exactly do you imagine doing this for text?
> 
> I could see doing it for char(n)/varchar(n) where n<=4 in SQL_ASCII though.


In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
sortKey as elsewhere suggested).  The sorting key doesn't need to be a
one-to-one mapping.

-- Mark Lewis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, PFC,

PFC schrieb:

> By the way, I'd like to declare my zipcode columns as SQL_ASCII
> while the  rest of my database is in UNICODE, so they are faster to
> index and sort.  Come on, MySQL does it...

Another use case for parametric column definitions - charset definitions
- and the first one that cannot be emulated via constraints.

Other use cases I remember were range definitions for numbers or PostGIS
dimension, subtype and SRID, but those cann all be emulated via checks /
constraints.

Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Steinar H. Gunderson
On Fri, Feb 17, 2006 at 12:05:23AM +0100, PFC wrote:
>   I would have said a 64 bit int, but it's the same idea. However it 
>   won't  work for floats, which is a pity, because floats fit in 64 bits. 

Actually, you can compare IEEE floats directly as ints, as long as they're
positive. (If they can be both positive and negative, you need to take
special care of the sign bit first, but it's still doable.)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread David Lang

On Thu, 16 Feb 2006, Mark Lewis wrote:


On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote:

Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).


How exactly do you imagine doing this for text?

I could see doing it for char(n)/varchar(n) where n<=4 in SQL_ASCII though.



In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
sortKey as elsewhere suggested).  The sorting key doesn't need to be a
one-to-one mapping.


that would violate your second contraint ( f(a)==f(b) iff (a==b) )

if you could drop that constraint (the cost of which would be extra 'real' 
compares within a bucket) then a helper function per datatype could work 
as you are talking.


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 01:47 PM 2/16/2006, Ron wrote:

At 12:19 PM 2/16/2006, Scott Lamb wrote:

On Feb 16, 2006, at 8:32 AM, Ron wrote:

Let's pretend that we have the typical DB table where rows are
~2-4KB apiece.  1TB of storage will let us have 256M-512M rows in
such a table.

A 32b hash code can be assigned to each row value such that only
exactly equal rows will have the same hash code.
A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b 
+32b= 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an

optional pass to rearrange the actual rows if we so wish.


I don't understand this.

This is a true statement: (H(x) != H(y)) => (x != y)
This is not: (H(x) < H(y)) => (x < y)

Hash keys can tell you there's an inequality, but they can't tell you
how the values compare. If you want 32-bit keys that compare in the
same order as the original values, here's how you have to get them:
For most hash codes, you are correct.  There is a class of hash or 
hash-like codes that maintains the mapping to support that second statement.


More later when I can get more time.
Ron


OK, so here's _a_ way (there are others) to obtain a mapping such that
 if a < b then f(a) < f (b) and
 if a == b then f(a) == f(b)

Pretend each row is a integer of row size (so a 2KB row becomes a 
16Kb integer; a 4KB row becomes a 32Kb integer; etc)
Since even a 1TB table made of such rows can only have 256M - 512M 
possible values even if each row is unique, a 28b or 29b key is large 
enough to represent each row's value and relative rank compared to 
all of the others even if all row values are unique.


By scanning the table once, we can map say 001h (Hex used to ease 
typing) to the row with the minimum value and 111h to the row 
with the maximum value as well as mapping everything in between to 
their appropriate keys.  That same scan can be used to assign a 
pointer to each record's location.


We can now sort the key+pointer pairs instead of the actual data and 
use an optional final pass to rearrange the actual rows if we wish.


That initial scan to set up the keys is expensive, but if we wish 
that cost can be amortized over the life of the table so we don't 
have to pay it all at once.  In addition, once we have created those 
keys, then can be saved for later searches and sorts.


Further space savings can be obtained whenever there are duplicate 
keys and/or when compression methods are used on the Key+pointer pairs.


Ron







---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org