Re: [HACKERS] Using quicksort for every external sort run

2015-12-09 Thread Jeremy Harris
On 09/12/15 00:02, Jeff Janes wrote: > The second one consumes that giant tape run along with 232 small tape > runs. In terms of number of comparisons, binary merge works best when the inputs are of similar length. I'd assume the same goes for n-ary merge, but I don't know if comparison count is

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Jeremy Harris
On 07/12/15 16:44, Simon Riggs wrote: > There are many optimizations we might adopt, yet planning time is a factor. > It seems simple enough to ignore more complex optimizations if we have > already achieved a threshold cost (say 10). Such a test would add nearly > zero time for the common case. We

Re: [HACKERS] Getting sorted data from foreign server

2015-10-08 Thread Jeremy Harris
On 08/10/15 17:09, Robert Haas wrote: > - You consider pushing down ORDER BY if any prefix of the query > pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me. > If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a), > ordering the result set by a doesn't help us much. We

Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-29 Thread Jeremy Harris
On 27/09/15 21:58, Gavin Wahl wrote: > Somewhat harder but still possible would be using BRIN indexes to > accelerate ORDER BY. This would require a sorting algorithm that can take > advantage of mostly-sorted inputs. You would sort the page ranges by their > minimum or maximum value, then feed the

Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-08-01 Thread Jeremy Harris
On 31/07/15 18:31, Robert Haas wrote: > On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris wrote: >> Heapification is O(n) already, whether siftup (existing) or down. > > That's not my impression, or what Wikipedia says. Source? Measurements done last year: http://www.postg

[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"

2015-07-31 Thread Jeremy Harris
On 30/07/15 02:05, Peter Geoghegan wrote: > Since heapification is now a big fraction of the total cost of a sort > sometimes, even where the heap invariant need not be maintained for > any length of time afterwards, it might be worth revisiting the patch > to make that an O(n) rather than a O(log

Re: [HACKERS] Asynchronous DRAM Self-Refresh

2015-05-23 Thread Jeremy Harris
On 22/05/15 22:30, Josh Berkus wrote: > At CoreOS Fest, Intel presented about a technology which they used to > improve write times for the nonrelational data store Etcd. It's called > Asynchronous DRAM Self-Refresh, or ADR. This is supposedly a feature of > all of their chips since E5 which allo

Re: [HACKERS] Asynchronous DRAM Self-Refresh

2015-05-23 Thread Jeremy Harris
On 22/05/15 22:30, Josh Berkus wrote: > At CoreOS Fest, Intel presented about a technology which they used to > improve write times for the nonrelational data store Etcd. It's called > Asynchronous DRAM Self-Refresh, or ADR. This is supposedly a feature of > all of their chips since E5 which allo

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-03-03 Thread Jeremy Harris
On 03/03/15 03:08, Arthur Silva wrote: > Does it always perform mergesort instead of quicksort when enabled? Yes; there seemed no advantage to any additional complexity. The merge consistently performs fewer comparisons than the quicksort, on random input - and many fewer if there are any sorted (

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-03-02 Thread Jeremy Harris
On 25/02/15 00:32, Jeremy Harris wrote: > On 23/02/15 16:40, Tomas Vondra wrote: >> On 22.2.2015 22:30, Peter Geoghegan wrote: >>> You should try it with the data fully sorted like this, but with one >>> tiny difference: The very last tuple is out of order. How does t

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-25 Thread Jeremy Harris
On 25/02/15 00:42, Peter Geoghegan wrote: > On Tue, Feb 24, 2015 at 4:32 PM, Jeremy Harris wrote: >> On 23/02/15 16:40, Tomas Vondra wrote: >>> On 22.2.2015 22:30, Peter Geoghegan wrote: >>>> You should try it with the data fully sorted like this, but with one >&

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-24 Thread Jeremy Harris
On 23/02/15 16:40, Tomas Vondra wrote: > On 22.2.2015 22:30, Peter Geoghegan wrote: >> You should try it with the data fully sorted like this, but with one >> tiny difference: The very last tuple is out of order. How does that >> look? If this case is actually important, a merge-sort can take si

Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 14:54, Tom Lane wrote: > Jeremy Harris writes: >> On 14/11/14 00:46, Simon Riggs wrote: >>> Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >>> -> Sort (cost= rows=568733 width=175) (actual time= >>> rows=20 l

Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 00:46, Simon Riggs wrote: > Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >-> Sort (cost= rows=568733 width=175) (actual time= > rows=20 loops=1) > Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort

Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-29 Thread Jeremy Harris
On 29/10/14 16:11, Andres Freund wrote: > I do think checking the link count to > be 1 is safe though. You will fail against certain styles of online-backup. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http:/

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-26 Thread Jeremy Harris
On 26/02/14 03:08, Robert Haas wrote: On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris wrote: On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-25 Thread Jeremy Harris
On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced to the heap. The existing code takes none at all. Can you explain this further? This seems

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-20 Thread Jeremy Harris
On 09/02/14 17:11, Jeremy Harris wrote: On 06/02/14 18:21, Jeff Janes wrote: Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? We will also need to test it on strings. I usually use md5(random()::text) to generate strings for such purposes, at least for a

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-07 Thread Jeremy Harris
On 06/02/14 22:12, Jeremy Harris wrote: Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? Summary (low numbers better): Random ints: 83% compares, level on time. Sorted ints: level compares, 70% time. Reverse-sorted ints: 10% compares, 15

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris
On 06/02/14 14:22, Robert Haas wrote: On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris wrote: On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris
On 06/02/14 18:21, Jeff Janes wrote: How big of sets were you sorting in each case? Big enough to go external. The timings and compare-counts given are purely for the heapify stage not the total for the sort, so are constrained by the work_mem not by the sort size per se. I'm limited to wor

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris
On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the

[HACKERS] Minor performance improvement in transition to external sort

2014-02-04 Thread Jeremy Harris
The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim th

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-28 Thread Jeremy Harris
On 27/01/14 18:00, Simon Riggs wrote: On 27 January 2014 17:44, Pavel Stehule wrote: This topic is interesting - we found very bad performance with hashing large tables with high work_mem. MergeJoin with quicksort was significantly faster. I've seen this also. I didn't deeper research - th

[HACKERS] Questionable coding in orderedsetaggs.c

2014-01-25 Thread Jeremy Harris
In ordered_set_startup() sorts are initialised in non-randomAccess mode (tuplesort_begin_heap() and ~datum(), last argument). The use of tuplesort_skip_tuples() feels very like a random access to me. I think it doesn't fail because the only use (and implementation) is to skip forwards; if backwa

Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

2014-01-22 Thread Jeremy Harris
On 22/01/14 03:53, Tom Lane wrote: Jon Nelson writes: A rough summary of the patch follows: - a GUC variable enables or disables this capability - in nodeAgg.c, eliding duplicate tuples is enabled if the number of distinct columns is equal to the number of sort columns (and both are gr

Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

2014-01-22 Thread Jeremy Harris
On 22/01/14 03:16, Jon Nelson wrote: Greetings -hackers: I have worked up a patch to PostgreSQL which elides tuples during an external sort. The primary use case is when sorted input is being used to feed a DISTINCT operation. The idea is to throw out tuples that compare as identical whenever it

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris
On 31/12/13 01:41, Andreas Karlsson wrote: On 12/29/2013 08:24 AM, David Rowley wrote: If it was possible to devise some way to reuse any previous tuplesortstate perhaps just inventing a reset method which clears out tuples, then we could see performance exceed the standard seqscan -> sort. The

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris
On 13/01/14 18:01, Alexander Korotkov wrote: Thanks. It's included into attached version of patch. As wall as estimation improvements, more comments and regression tests fix. Would it be possible to totally separate the two sets of sort-keys, only giving the non-index set to the tuplesort? At

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeremy Harris
On 14/01/14 22:23, Dave Chinner wrote: On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote: To quantify that, in a production setting we were seeing pauses of up to two minutes with shared_buffers set to 8GB and default dirty ^^

Re: [HACKERS] PoC: Partial sort

2014-01-16 Thread Jeremy Harris
On 22/12/13 20:26, Alexander Korotkov wrote: On Sat, Dec 14, 2013 at 6:30 PM, Jeremy Harris wrote: On 14/12/13 12:54, Andres Freund wrote: Is that actually all that beneficial when sorting with a bog standard qsort() since that doesn't generally benefit from data being pre-sorted? I thi

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Jeremy Harris
On 14/12/13 12:54, Andres Freund wrote: On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote: Currently when we need to get ordered result from table we have to choose one of two approaches: get results from index in exact order we need or do sort of tuples. However, it could be useful to mix

Re: [HACKERS] Dynamic Shared Memory stuff

2013-11-23 Thread Jeremy Harris
On 20/11/13 19:58, Robert Haas wrote: Parallel sort, and then parallel other stuff. Eventually general parallel query. I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort and you may find that interesting/helpful as a statement of intent. I've been playing with an internal

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Jeremy Harris
On 13/11/13 09:07, Leonardo Francalanci wrote: Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Surely there's good correlation between IMSI & IMEI, so have a separate table to translate one to (

Re: [HACKERS] regression tests

2013-09-06 Thread Jeremy Harris
On 06/09/13 15:44, Robert Haas wrote: On Fri, Sep 6, 2013 at 3:34 AM, Jeremy Harris wrote: I don't see the regression tests running any index-hash operations. What am I missing? What's an index-hash operation? Ones that hit tuplesort_begin_index_hash() -- Cheers, Jeremy

[HACKERS] regression tests

2013-09-06 Thread Jeremy Harris
Hi, I don't see the regression tests running any index-hash operations. What am I missing? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to pass data (tuples) to worker processes?

2013-08-07 Thread Jeremy Harris
On 06/08/13 13:59, Robert Haas wrote: My thought is to create a queue abstraction that sits on top of the dynamic shared memory infrastructure, so that you can set aside a portion of your dynamic shared memory segment to use as a ring buffer and send messages back and forth with using some kind

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-10-06 Thread Jeremy Harris
On 10/05/2012 09:03 PM, Tom Lane wrote: Note that allowing subsequent requests to jump the queue would not be a good fix for this; if you do that, it's likely the ex-lock will never be granted, at least not till the next system idle time. Offering that option to the admin sounds like a good thi

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Jeremy Harris
On 2012-03-18 15:25, Tom Lane wrote: Jeff Janes writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haas wrote: The problem there is that none of the files can be deleted until it was entirely read, so you end up with all the data on disk twice. I don't know how often people run their databases s

Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Jeremy Harris
On 2012-01-05 10:04, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. It'd be really nice if the DBMS measured actual experienced values.. -- Jeremy -- Sent via pgsql-hackers mailing list (pgs

Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Jeremy Harris
On 2012-01-03 04:44, Robert Haas wrote: On read-only workloads, you get spinlock contention, because everyone who wants a snapshot has to take the LWLock mutex to increment the shared lock count and again (just a moment later) to decrement it. Does the LWLock protect anything but the shared loc