Re: Re: parallel distinct union and aggregate support patch

bu...@sohu.com Mon, 30 Nov 2020 09:27:51 -0800

> 1.
> +#define BATCH_SORT_MAX_BATCHES 512
>  
> Did you decide this number based on some experiment or is there some
> analysis behind selecting this number?
When there are too few batches, if a certain process works too slowly, it will 
cause unbalanced load.
When there are too many batches, FD will open and close files frequently.


> 2.
> +BatchSortState* ExecInitBatchSort(BatchSort *node, EState *estate, int 
> eflags)
> +{
> + BatchSortState *state;
> + TypeCacheEntry *typentry;
> ....
> + for (i=0;i<node->numGroupCols;++i)
> + {
> ...
> + InitFunctionCallInfoData(*fcinfo, flinfo, 1, attr->attcollation, NULL, 
> NULL);
> + fcinfo->args[0].isnull = false;
> + state->groupFuns = lappend(state->groupFuns, fcinfo);
> + }
>  
> From the variable naming, it appeared like the batch sort is dependent
> upon the grouping node.  I think instead of using the name
> numGroupCols and groupFuns we need to use names that are more relevant
> to the batch sort something like numSortKey.
Not all data types support both sorting and hashing calculations, such as 
user-defined data types.
We do not need all columns to support hash calculation when we batch, so I used 
two variables.

> 3.
> + if (eflags & (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))
> + {
> + /* for now, we only using in group aggregate */
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("not support execute flag(s) %d for group sort", eflags)));
> + }
>  
> Instead of ereport, you should just put an Assert for the unsupported
> flag or elog.
In fact, this is an unfinished feature, BatchSort should also support these 
features, welcome to supplement.

> 4.
> + state = makeNode(BatchSortState);
> + state->ps.plan = (Plan*) node;
> + state->ps.state = estate;
> + state->ps.ExecProcNode = ExecBatchSortPrepare;
>  
> I think the main executor entry function should be named ExecBatchSort
> instead of ExecBatchSortPrepare, it will look more consistent with the
> other executor machinery.
The job of the ExecBatchSortPrepare function is to preprocess the data (batch 
and pre-sort),
and when its work ends, it will call "ExecSetExecProcNode(pstate, 
ExecBatchSort)" to return the data to the ExecBatchSort function.
There is another advantage of dividing into two functions, 
It is not necessary to judge whether tuplesort is now available every time the 
function is processed to improve the subtle performance.
And I think this code is clearer.

Re: Re: parallel distinct union and aggregate support patch

Reply via email to