Re: [HACKERS] Parallel Aggregate

David Rowley Thu, 21 Jan 2016 12:45:57 -0800

On 21 January 2016 at 18:26, Haribabu Kommi <[email protected]> wrote:
> Here I attached update parallel aggregate patch on top of recent commits
> of combine aggregate and parallel join patch. It still lacks of cost 
> comparison
> code to compare both parallel and normal aggregates.


Thanks for the updated patch.

I'm just starting to look over this now.

# create table t1 as select x from generate_series(1,1000000) x(x);
# vacuum ANALYZE t1;
# set max_parallel_degree =8;
# explain select sum(x) from t1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Aggregate  (cost=9633.33..9633.34 rows=1 width=4)
   ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667 width=4)
(2 rows)

I'm not quite sure what's happening here yet as I've not ran it
through my debugger, but how can we have a Parallel Seq Scan without a
Gather node? It appears to give correct results, so I can only assume
it's not actually a parallel scan at all.

Let's check:

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
 relname | seq_scan
---------+----------
 t1      |        0
(1 row)

# explain analyze select sum(x) from t1;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=9633.33..9633.34 rows=1 width=4) (actual
time=161.820..161.821 rows=1 loops=1)
   ->  Parallel Seq Scan on t1  (cost=0.00..8591.67 rows=416667
width=4) (actual time=0.051..85.348 rows=1000000 loops=1)
 Planning time: 0.040 ms
 Execution time: 161.861 ms
(4 rows)

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
 relname | seq_scan
---------+----------
 t1      |        1
(1 row)

Only 1 scan.


# explain analyze select * from t1 where x=1;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..10633.43 rows=1 width=4) (actual
time=0.231..49.105 rows=1 loops=1)
   Number of Workers: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..9633.33 rows=0 width=4)
(actual time=29.060..45.302 rows=0 loops=3)
         Filter: (x = 1)
         Rows Removed by Filter: 333333
 Planning time: 0.049 ms
 Execution time: 51.438 ms
(7 rows)

# select relname,seq_scan from pg_stat_user_tables where relname ='t1';
 relname | seq_scan
---------+----------
 t1      |        4
(1 row)

3 more scans. This one seems to actually be parallel, and makes sense
based on "Number of Workers: 2"


Also looking at the patch:

+bool
+aggregates_allow_partial(Node *clause)
+{

In the latest patch that I sent on the combine aggregates thread:
http://www.postgresql.org/message-id/CAKJS1f_in9J_ru4gPfygCQLUeB3=rzq3kg6rnpn-fzzhddi...@mail.gmail.com
I made it so there's 3 possible return values from this function. As
your patch stands now, if I create an aggregate function with an
INTERNAL state with a combine function set, then this patch might try
to parallel aggregate that and pass around the pointer to the internal
state in the Tuple going from the worker to the main process, when the
main process dereferences this pointer we'll get a segmentation
violation. So I'd say you should maybe use a modified version of my
latest aggregates_allow_partial() and check for PAT_ANY, and only
parallelise the aggregate if you get that value.  If the use of
partial aggregate was within a single process then you could be quite
content with PAT_INTERNAL_ONLY. You'll just need to pull out the logic
that checks for serial and deserial functions, since that's not in
yet, and just have it return PAT_INTERNAL_ONLY if INTERNAL aggregates
are found which have combine functions set.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

Reply via email to