Re: Parallel copy

vignesh C Wed, 14 Oct 2020 06:43:03 -0700

On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow <gregn4...@gmail.com> wrote:
>
> On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignes...@gmail.com> wrote:
>
> > Attached v6 patch with the fixes.
> >
>
> Hi Vignesh,
>
> I noticed a couple of issues when scanning the code in the following
patch:
>
>     v6-0003-Allow-copy-from-command-to-process-data-from-file.patch
>
> In the following code, it will put a junk uint16 value into *destptr
> (and thus may well cause a crash) on a Big Endian architecture
> (Solaris Sparc, s390x, etc.):
> You're storing a (uint16) string length in a uint32 and then pulling
> out the lower two bytes of the uint32 and copying them into the
> location pointed to by destptr.
>
>
> static void
> +CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
> + uint32 *copiedsize)
> +{
> + uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
> +
> + memcpy(destptr, (uint16 *) &len, sizeof(uint16));
> + *copiedsize += sizeof(uint16);
> + if (len)
> + {
> + memcpy(destptr + sizeof(uint16), srcPtr, len);
> + *copiedsize += len;
> + }
> +}
>
> I suggest you change the code to:
>
>     uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
>     memcpy(destptr, &len, sizeof(uint16));
>
> [I assume string length here can't ever exceed (65535 - 1), right?]
>
> Looking a bit deeper into this, I'm wondering if in fact your
> EstimateStringSize() and EstimateNodeSize() functions should be using
> BUFFERALIGN() for EACH stored string/node (rather than just calling
> shm_toc_estimate_chunk() once at the end, after the length of packed
> strings and nodes has been estimated), to ensure alignment of start of
> each string/node. Other Postgres code appears to be aligning each
> stored chunk using shm_toc_estimate_chunk(). See the definition of
> that macro and its current usages.
>


I'm not handling this, this is similar to how it is handled in other places.

> Then you could safely use:
>
>     uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
>     *(uint16 *)destptr = len;
>     *copiedsize += sizeof(uint16);
>     if (len)
>     {
>         memcpy(destptr + sizeof(uint16), srcPtr, len);
>         *copiedsize += len;
>     }
>
> and in the CopyStringFromSharedMemory() function, then could safely use:
>
>     len = *(uint16 *)srcPtr;
>
> The compiler may be smart enough to optimize-away the memcpy() in this
> case anyway, but there are issues in doing this for architectures that
> take a performance hit for unaligned access, or don't support
> unaligned access.

Changed it to uin32, so that there are no issues in case if length exceeds
65535 & also to avoid problems in Big Endian architecture.

> Also, in CopyXXXXFromSharedMemory() functions, you should use palloc()
> instead of palloc0(), as you're filling the entire palloc'd buffer
> anyway, so no need to ask for additional MemSet() of all buffer bytes
> to 0 prior to memcpy().
>

I have changed palloc0 to palloc.

Thanks Greg for reviewing & providing your comments. These changes are
fixed in one of my earlier mail [1] that I sent.
[1]
https://www.postgresql.org/message-id/CALDaNm1n1xW43neXSGs%3Dc7zt-mj%2BJHHbubWBVDYT9NfCoF8TuQ%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel copy

Reply via email to