Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

Jim Nasby Thu, 13 Nov 2014 23:39:07 -0800

On 11/12/14, 1:54 AM, David Rowley wrote:

On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs <si...@2ndquadrant.com 
<mailto:si...@2ndquadrant.com>> wrote:



    This plan type is widely used in reporting queries, so will hit the
    mainline of BI applications and many Mat View creations.
    This will allow SELECT count(*) FROM foo to go faster also.

We'd also need to add some infrastructure to merge aggregate states together 
for this to work properly. This means that could also work for avg() and stddev 
etc. For max() and min() the merge functions would likely just be the same as 
the transition functions.


Sanity check: what % of a large aggregate query fed by a seqscan actually spent 
in the aggregate functions? Even if you look strictly at CPU cost, isn't there 
more code involved to get data to the aggregate function than in the 
aggregation itself, except maybe for numeric?

In other words, I suspect that just having a dirt-simple parallel SeqScan could 
be a win for CPU. It should certainly be a win IO-wise; in my experience we're 
not very good at maxing out IO systems.

(I was curious and came up with the list below for just the page-level stuff 
(ignoring IO). I don't see much code involved in per-tuple work, but I also 
never came across detoasting code, so I suspect I'm missing something...)

ExecScanFetch, heapgettup_pagemode, ReadBuffer, BufferAlloc, 
heap_page_prune_opt, LWLockAcquire... then you can finally do per-tuple work. 
HeapTupleSatisfiesVisibility.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

Reply via email to