Re: Parallel copy

Amit Kapila Mon, 19 Oct 2020 02:11:03 -0700

On Sun, Oct 18, 2020 at 7:47 AM Hou, Zhijie <[email protected]> wrote:
>
> Hi Vignesh,
>
> After having a look over the patch,
> I have some suggestions for
> 0003-Allow-copy-from-command-to-process-data-from-file.patch.
>
> 1.
>
> +static uint32
> +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List 
> *attnamelist,
> +                                  char **whereClauseStr, char 
> **rangeTableStr,
> +                                  char **attnameListStr, char 
> **notnullListStr,
> +                                  char **nullListStr, char **convertListStr)
> +{
> +       uint32          strsize = 
> MAXALIGN(sizeof(SerializedParallelCopyState));
> +
> +       strsize += EstimateStringSize(cstate->null_print);
> +       strsize += EstimateStringSize(cstate->delim);
> +       strsize += EstimateStringSize(cstate->quote);
> +       strsize += EstimateStringSize(cstate->escape);
>
>
> It use function EstimateStringSize to get the strlen of null_print, delim, 
> quote and escape.
> But the length of null_print seems has been stored in null_print_len.
> And delim/quote/escape must be 1 byte, so I think call strlen again seems 
> unnecessary.
>
> How about  " strsize += sizeof(uint32) + cstate->null_print_len + 1"
>


+1. This seems like a good suggestion but add comments for
delim/quote/escape to indicate that we are considering one-byte for
each. I think this will obviate the need of function
EstimateStringSize. Another thing in this regard is that we normally
use add_size function to compute the size but I don't see that being
used in this and nearby computation. That helps us to detect overflow
of addition if any.

EstimateCstateSize()
{
..
+
+ strsize++;
..
}

Why do we need this additional one-byte increment? Does it make sense
to add a small comment for the same?

> 2.
> +       strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);
>
> +       copiedsize += CopyStringToSharedMemory(cstate, whereClauseStr,
> +                                                                             
>      shmptr + copiedsize);
>
> Some string length is counted for two times.
> The ' whereClauseStr ' has call strlen in EstimateNodeSize once and call 
> strlen in CopyStringToSharedMemory again.
> I don't know wheather it's worth to refacor the code to avoid duplicate 
> strlen . what do you think ?
>

It doesn't seem worth to me. We probably need to use additional
variables to save those lengths. I think it will add more
code/complexity than we will save. See EstimateParamListSpace and
SerializeParamList where we get the typeLen each time, that way code
looks neat to me and we are don't going to save much by not following
a similar thing here.

-- 
With Regards,
Amit Kapila.

Re: Parallel copy

Reply via email to