Re: Parallel copy

2022-03-06 Thread Bharath Rupireddy
On Mon, Dec 28, 2020 at 3:14 PM vignesh C wrote: > > Attached is a patch that was used for the same. The patch is written > on top of the parallel copy patch. > The design Amit, Andres & myself voted for that is the leader > identifying the line bound design and sharing it in shared memory is > pe

Re: Parallel copy

2020-12-28 Thread vignesh C
On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila wrote: > > On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas wrote: > > > > On 02/11/2020 08:14, Amit Kapila wrote: > > > On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas > > > wrote: > > >> > > >> In this design, you don't need to keep line bounda

Re: Parallel copy

2020-12-26 Thread vignesh C
On Wed, Dec 23, 2020 at 3:05 PM Hou, Zhijie wrote: > > Hi > > > Yes this optimization can be done, I will handle this in the next patch > > set. > > > > I have a suggestion for the parallel safety-check. > > As designed, The leader does not participate in the insertion of data. > If User use (PARA

RE: Parallel copy

2020-12-23 Thread Hou, Zhijie
Hi > Yes this optimization can be done, I will handle this in the next patch > set. > I have a suggestion for the parallel safety-check. As designed, The leader does not participate in the insertion of data. If User use (PARALLEL 1), there is only one worker process which will do the insertion

Re: Parallel copy

2020-12-09 Thread vignesh C
On Mon, Dec 7, 2020 at 3:00 PM Hou, Zhijie wrote: > > > Attached v11 patch has the fix for this, it also includes the changes to > > rebase on top of head. > > Thanks for the explanation. > > I think there is still chances we can know the size. > > +* line_size will be set. Read th

RE: Parallel copy

2020-12-07 Thread Hou, Zhijie
> > 4. > > A suggestion for CacheLineInfo. > > > > It use appendBinaryStringXXX to store the line in memory. > > appendBinaryStringXXX will double the str memory when there is no enough > spaces. > > > > How about call enlargeStringInfo in advance, if we already know the whole > line size? > > It c

RE: Parallel copy

2020-11-19 Thread Hou, Zhijie
Hi Vignesh, I took a look at the v10 patch set. Here are some comments: 1. +/* + * CheckExprParallelSafety + * + * Determine if where cluase and default expressions are parallel safe & do not + * have volatile expressions, return true if condition satisfies else return + * false. + */ 'cluase'

Re: Parallel copy

2020-11-18 Thread vignesh C
On Sat, Nov 7, 2020 at 7:01 PM vignesh C wrote: > > On Thu, Nov 5, 2020 at 6:33 PM Hou, Zhijie wrote: > > > > Hi > > > > > > > > my $bytes = $ARGV[0]; > > > for(my $i = 0; $i < $bytes; $i+=8){ > > > print "longdata"; > > > } > > > print "\n"; > > > > > > > > > postgres=# copy longda

Re: Parallel copy

2020-11-18 Thread vignesh C
On Sat, Oct 31, 2020 at 2:07 AM Tomas Vondra wrote: > > Hi, > > I've done a bit more testing today, and I think the parsing is busted in > some way. Consider this: > > test=# create extension random; > CREATE EXTENSION > > test=# create table t (a text); > CREATE TABLE > >

Re: Parallel copy

2020-11-18 Thread vignesh C
On Fri, Nov 13, 2020 at 2:25 PM Amit Kapila wrote: > > On Wed, Nov 11, 2020 at 10:42 PM vignesh C wrote: > > > > On Tue, Nov 10, 2020 at 7:27 PM Amit Kapila wrote: > > > > > > On Tue, Nov 10, 2020 at 7:12 PM vignesh C wrote: > > > > > > > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila > > > > w

Re: Parallel copy

2020-11-18 Thread vignesh C
On Thu, Oct 29, 2020 at 2:26 PM Daniel Westermann (DWE) wrote: > > On 27/10/2020 15:36, vignesh C wrote: > >> Attached v9 patches have the fixes for the above comments. > > >I did some testing: > > I did some testing as well and have a cosmetic remark: > > postgres=# copy t1 from '/var/tmp/aa.txt'

Re: Parallel copy

2020-11-18 Thread vignesh C
On Wed, Oct 28, 2020 at 5:36 PM Hou, Zhijie wrote: > > Hi > > I found some issue in v9-0002 > > 1. > + > + elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d", > +write_pos, lineInfo->first_block, > + pg_atomic_read_

Re: Parallel copy

2020-11-18 Thread vignesh C
On Thu, Oct 29, 2020 at 2:20 PM Heikki Linnakangas wrote: > > On 27/10/2020 15:36, vignesh C wrote: > > Attached v9 patches have the fixes for the above comments. > > I did some testing: > > /tmp/longdata.pl: > > #!/usr/bin/perl > # > # Generate three rows: > # foo > # longdatalongdatalon

Re: Parallel copy

2020-11-17 Thread Bharath Rupireddy
On Thu, Oct 29, 2020 at 2:54 PM Amit Kapila wrote: > > 4) Worker has to hop through all the processed chunks before getting > the chunk which it can process. > > One more point, I have noticed that some time back [1], I have given > one suggestion related to the way workers process the set of line

Re: Parallel copy

2020-11-13 Thread Amit Kapila
On Wed, Nov 11, 2020 at 10:42 PM vignesh C wrote: > > On Tue, Nov 10, 2020 at 7:27 PM Amit Kapila wrote: > > > > On Tue, Nov 10, 2020 at 7:12 PM vignesh C wrote: > > > > > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila > > > wrote: > > > > > > > > > > I have worked to provide a patch for the par

Re: Parallel copy

2020-11-11 Thread vignesh C
On Tue, Nov 10, 2020 at 7:27 PM Amit Kapila wrote: > > On Tue, Nov 10, 2020 at 7:12 PM vignesh C wrote: > > > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila wrote: > > > > > > > I have worked to provide a patch for the parallel safety checks. It > > checks if parallely copy can be performed, Paral

Re: Parallel copy

2020-11-10 Thread Amit Kapila
On Tue, Nov 10, 2020 at 7:12 PM vignesh C wrote: > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila wrote: > > > > I have worked to provide a patch for the parallel safety checks. It > checks if parallely copy can be performed, Parallel copy cannot be > performed for the following a) If relation is t

Re: Parallel copy

2020-11-10 Thread vignesh C
On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila wrote: > > On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas wrote: > > > > On 02/11/2020 08:14, Amit Kapila wrote: > > > On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas > > > wrote: > > >> > > >> In this design, you don't need to keep line bounda

Re: Parallel copy

2020-11-07 Thread vignesh C
On Thu, Nov 5, 2020 at 6:33 PM Hou, Zhijie wrote: > > Hi > > > > > my $bytes = $ARGV[0]; > > for(my $i = 0; $i < $bytes; $i+=8){ > > print "longdata"; > > } > > print "\n"; > > > > > > postgres=# copy longdata from program 'perl /tmp/longdata.pl 1' > > with (parallel 2); > >

RE: Parallel copy

2020-11-05 Thread Hou, Zhijie
Hi > > my $bytes = $ARGV[0]; > for(my $i = 0; $i < $bytes; $i+=8){ > print "longdata"; > } > print "\n"; > > > postgres=# copy longdata from program 'perl /tmp/longdata.pl 1' > with (parallel 2); > > This gets stuck forever (or at least I didn't have the patience to wait >

Re: Parallel copy

2020-11-03 Thread Heikki Linnakangas
On 03/11/2020 10:59, Amit Kapila wrote: On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas wrote: However, the point of parallel copy is to maximize bandwidth. Okay, but this first-phase (finding the line boundaries) can anyway be not done in parallel and we have seen in some of the initial

Re: Parallel copy

2020-11-03 Thread Amit Kapila
On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas wrote: > > On 02/11/2020 08:14, Amit Kapila wrote: > > On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas wrote: > >> > >> In this design, you don't need to keep line boundaries in shared memory, > >> because each worker process is responsible f

Re: Parallel copy

2020-11-01 Thread Heikki Linnakangas
On 02/11/2020 09:10, Heikki Linnakangas wrote: On 02/11/2020 08:14, Amit Kapila wrote: We have discussed both these approaches (a) single producer multiple consumer, and (b) all workers doing the processing as you are saying in the beginning and concluded that (a) is better, see some of the rele

Re: Parallel copy

2020-11-01 Thread Heikki Linnakangas
On 02/11/2020 08:14, Amit Kapila wrote: On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas wrote: Leader process: The leader process is simple. It picks the next FREE buffer, fills it with raw data from the file, and marks it as FILLED. If no buffers are FREE, wait. Worker process: 1. Clai

Re: Parallel copy

2020-11-01 Thread Amit Kapila
On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas wrote: > > Leader process: > > The leader process is simple. It picks the next FREE buffer, fills it > with raw data from the file, and marks it as FILLED. If no buffers are > FREE, wait. > > Worker process: > > 1. Claim next READY block from que

Re: Parallel copy

2020-10-31 Thread Tomas Vondra
On Sat, Oct 31, 2020 at 12:09:32AM +0200, Heikki Linnakangas wrote: On 30/10/2020 22:56, Tomas Vondra wrote: I agree this design looks simpler. I'm a bit worried about serializing the parsing like this, though. It's true the current approach (where the first phase of parsing happens in the leade

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
On 30/10/2020 22:56, Tomas Vondra wrote: I agree this design looks simpler. I'm a bit worried about serializing the parsing like this, though. It's true the current approach (where the first phase of parsing happens in the leader) has a similar issue, but I think it would be easier to improve tha

Re: Parallel copy

2020-10-30 Thread Tomas Vondra
On Fri, Oct 30, 2020 at 06:41:41PM +0200, Heikki Linnakangas wrote: On 30/10/2020 18:36, Heikki Linnakangas wrote: I find this design to be very complicated. Why does the line-boundary information need to be in shared memory? I think this would be much simpler if each worker grabbed a fixed-size

Re: Parallel copy

2020-10-30 Thread Tomas Vondra
Hi, I've done a bit more testing today, and I think the parsing is busted in some way. Consider this: test=# create extension random; CREATE EXTENSION test=# create table t (a text); CREATE TABLE test=# insert into t select random_string(random_int(10, 256*1024)) f

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
On 30/10/2020 18:36, Heikki Linnakangas wrote: Whether the leader process finds the EOLs or the worker processes, it's pretty clear that it needs to be done ASAP, for a chunk at a time, because that cannot be done in parallel. I think some refactoring in CopyReadLine() and friends would be in ord

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
On 30/10/2020 18:36, Heikki Linnakangas wrote: I find this design to be very complicated. Why does the line-boundary information need to be in shared memory? I think this would be much simpler if each worker grabbed a fixed-size block of raw data, and processed that. In your patch, the leader pr

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas
On 27/10/2020 15:36, vignesh C wrote: Attached v9 patches have the fixes for the above comments. I find this design to be very complicated. Why does the line-boundary information need to be in shared memory? I think this would be much simpler if each worker grabbed a fixed-size block of raw d

Re: Parallel copy

2020-10-29 Thread Amit Kapila
On Thu, Oct 29, 2020 at 11:45 AM Amit Kapila wrote: > > On Tue, Oct 27, 2020 at 7:06 PM vignesh C wrote: > > > [latest version] > > I think the parallel-safety checks in this patch > (v9-0002-Allow-copy-from-command-to-process-data-from-file) are > incomplete and wrong. > One more point, I have

Re: Parallel copy

2020-10-29 Thread Daniel Westermann (DWE)
On 27/10/2020 15:36, vignesh C wrote: >> Attached v9 patches have the fixes for the above comments. >I did some testing: I did some testing as well and have a cosmetic remark: postgres=# copy t1 from '/var/tmp/aa.txt' with (parallel 10); ERROR: value 10 out of bounds for option

Re: Parallel copy

2020-10-29 Thread Heikki Linnakangas
On 27/10/2020 15:36, vignesh C wrote: Attached v9 patches have the fixes for the above comments. I did some testing: /tmp/longdata.pl: #!/usr/bin/perl # # Generate three rows: # foo # longdatalongdatalongdata... # bar # # The length of the middle row is given as command line arg. # m

Re: Parallel copy

2020-10-28 Thread Amit Kapila
On Tue, Oct 27, 2020 at 7:06 PM vignesh C wrote: > [latest version] I think the parallel-safety checks in this patch (v9-0002-Allow-copy-from-command-to-process-data-from-file) are incomplete and wrong. See below comments. 1. +static pg_attribute_always_inline bool +CheckExprParallelSafety(CopySt

RE: Parallel copy

2020-10-28 Thread Hou, Zhijie
Hi I found some issue in v9-0002 1. + + elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d", +write_pos, lineInfo->first_block, +pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts), +

Re: Parallel copy

2020-10-27 Thread vignesh C
On Fri, Oct 23, 2020 at 6:58 PM Ashutosh Sharma wrote: > > > > > I think, if possible, all these if-else checks in CopyFrom() can be > > moved to a single function which can probably be named as > > IdentifyCopyInsertMethod() and this function can be called in > > IsParallelCopyAllowed(). This wil

Re: Parallel copy

2020-10-27 Thread vignesh C
Thanks Ashutosh for reviewing and providing your comments. On Fri, Oct 23, 2020 at 5:43 PM Ashutosh Sharma wrote: > > Hi Vignesh, > > Thanks for the updated patches. Here are some more comments that I can > find after reviewing your latest patches: > > +/* > + * This structure helps in storing th

Re: Parallel copy

2020-10-27 Thread vignesh C
On Wed, Oct 21, 2020 at 3:50 PM Amit Kapila wrote: > > On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy > wrote: > > > > > > 9. Instead of calling CopyStringToSharedMemory() for each string > > variable, can't we just create a linked list of all the strings that > > need to be copied into shm an

Re: Parallel copy

2020-10-27 Thread vignesh C
Thanks Heikki for reviewing and providing your comments. Please find my thoughts below. On Fri, Oct 23, 2020 at 2:01 PM Heikki Linnakangas wrote: > > I had a brief look at at this patch. Important work! A couple of first > impressions: > > 1. The split between patches > 0002-Framework-for-leader-

Re: Parallel copy

2020-10-27 Thread vignesh C
On Wed, Oct 21, 2020 at 4:20 PM Bharath Rupireddy wrote: > > On Wed, Oct 21, 2020 at 3:18 PM Bharath Rupireddy > wrote: > > > > 17. Remove extra lines after #define IsHeaderLine() > > (cstate->header_line && cstate->cur_lineno == 1) in copy.h > > > > I missed one comment: > > 18. I think we nee

Re: Parallel copy

2020-10-23 Thread Ashutosh Sharma
On Fri, Oct 23, 2020 at 5:42 PM Ashutosh Sharma wrote: > > Hi Vignesh, > > Thanks for the updated patches. Here are some more comments that I can > find after reviewing your latest patches: > > +/* > + * This structure helps in storing the common data from CopyStateData that > are > + * required

Re: Parallel copy

2020-10-23 Thread Ashutosh Sharma
Hi Vignesh, Thanks for the updated patches. Here are some more comments that I can find after reviewing your latest patches: +/* + * This structure helps in storing the common data from CopyStateData that are + * required by the workers. This information will then be allocated and stored + * into

Re: Parallel copy

2020-10-23 Thread Heikki Linnakangas
I had a brief look at at this patch. Important work! A couple of first impressions: 1. The split between patches 0002-Framework-for-leader-worker-in-parallel-copy.patch and 0003-Allow-copy-from-command-to-process-data-from-file.patch is quite artificial. All the stuff introduced in the first

Re: Parallel copy

2020-10-21 Thread Bharath Rupireddy
On Wed, Oct 21, 2020 at 3:18 PM Bharath Rupireddy wrote: > > 17. Remove extra lines after #define IsHeaderLine() > (cstate->header_line && cstate->cur_lineno == 1) in copy.h > I missed one comment: 18. I think we need to treat the number of parallel workers as an integer similar to the paralle

Re: Parallel copy

2020-10-21 Thread Amit Kapila
On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy wrote: > > > 9. Instead of calling CopyStringToSharedMemory() for each string > variable, can't we just create a linked list of all the strings that > need to be copied into shm and call CopyStringToSharedMemory() only > once? We could avoid 5 func

Re: Parallel copy

2020-10-21 Thread Bharath Rupireddy
Hi Vignesh, I took a look at the v8 patch set. Here are some comments: 1. PopulateCommonCstateInfo() -- can we use PopulateCommonCStateInfo() or PopulateCopyStateInfo()? And also EstimateCstateSize() -- EstimateCStateSize(), PopulateCstateCatalogInfo() -- PopulateCStateCatalogInfo()? 2. Instead

Re: Parallel copy

2020-10-21 Thread vignesh C
On Thu, Oct 8, 2020 at 11:15 AM vignesh C wrote: > > I'm summarizing the pending open points so that I don't miss anything: > 1) Performance test on latest patch set. It is tested and results are shared by bharath at [1] > 2) Testing points suggested. Tests are added as suggested and details sh

Re: Parallel copy

2020-10-20 Thread Bharath Rupireddy
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > > > 2. Do we have tests for toast tables? I think if you implement the > > previous point some existing tests might cover it but I feel we sh

Re: Parallel copy

2020-10-19 Thread Amit Kapila
On Sun, Oct 18, 2020 at 7:47 AM Hou, Zhijie wrote: > > Hi Vignesh, > > After having a look over the patch, > I have some suggestions for > 0003-Allow-copy-from-command-to-process-data-from-file.patch. > > 1. > > +static uint32 > +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List >

RE: Parallel copy

2020-10-17 Thread Hou, Zhijie
Hi Vignesh, After having a look over the patch, I have some suggestions for 0003-Allow-copy-from-command-to-process-data-from-file.patch. 1. +static uint32 +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist, + char **whereClauseStr, c

Re: Parallel copy

2020-10-15 Thread Amit Kapila
On Wed, Oct 14, 2020 at 6:51 PM vignesh C wrote: > > On Fri, Oct 9, 2020 at 11:01 AM Amit Kapila wrote: > > > > I am not able to properly parse the data but If understand the wal > > data for non-parallel (1116 | 0 | 3587203) and parallel (1119 > > | 6 | 3624405) case doesn't seem

Re: Parallel copy

2020-10-14 Thread vignesh C
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow wrote: > > On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote: > > > Attached v6 patch with the fixes. > > > > Hi Vignesh, > > I noticed a couple of issues when scanning the code in the following patch: > > v6-0003-Allow-copy-from-command-to-process-d

Re: Parallel copy

2020-10-14 Thread vignesh C
On Sat, Oct 3, 2020 at 6:20 AM Tomas Vondra wrote: > > Hello Vignesh, > > I've done some basic benchmarking on the v4 version of the patches (but > AFAIKC the v5 should perform about the same), and some initial review. > > For the benchmarking, I used the lineitem table from TPC-H - for 75GB > dat

Re: Parallel copy

2020-10-14 Thread vignesh C
On Fri, Oct 9, 2020 at 11:01 AM Amit Kapila wrote: > > On Thu, Oct 8, 2020 at 12:14 AM vignesh C wrote: > > > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila > > wrote: > > > > > + */ > > > > > +typedef struct ParallelCopyLineBoundary > > > > > > > > > > Are we doing all this state management to

Re: Parallel copy

2020-10-14 Thread vignesh C
On Fri, Oct 9, 2020 at 10:42 AM Amit Kapila wrote: > > On Thu, Oct 8, 2020 at 12:14 AM vignesh C wrote: > > > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > > > > > > I am convinced by the reason given by Kyotaro-San in that another > > > thread [1] and performance data shown by P

Re: Parallel copy

2020-10-14 Thread Bharath Rupireddy
I did performance testing on v7 patch set[1] with custom postgresql.conf[2]. The results are of the triplet form (exec time in sec, number of workers, gain) Use case 1: 10million rows, 5.2GB data, 2 indexes on integer columns, 1 index on text column, binary file (1104.898, 0, 1X), (1112.221, 1, 1X

Re: Parallel copy

2020-10-09 Thread Amit Kapila
On Fri, Oct 9, 2020 at 3:50 PM Bharath Rupireddy wrote: > > On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila wrote: > > > > On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy > > wrote: > > > > > > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila > > > wrote: > > > > > > > > From the testing perspective, >

Re: Parallel copy

2020-10-09 Thread Bharath Rupireddy
On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila wrote: > > On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy > wrote: > > > > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > > > > > From the testing perspective, > > > 1. Test by having something force_parallel_mode = regress which means > > > t

Re: Parallel copy

2020-10-09 Thread Amit Kapila
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy wrote: > > On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > > > From the testing perspective, > > 1. Test by having something force_parallel_mode = regress which means > > that all existing Copy tests in the regression will be executed via >

Re: Parallel copy

2020-10-09 Thread Greg Nancarrow
On Fri, Oct 9, 2020 at 5:40 PM Amit Kapila wrote: > > > Looking a bit deeper into this, I'm wondering if in fact your > > EstimateStringSize() and EstimateNodeSize() functions should be using > > BUFFERALIGN() for EACH stored string/node (rather than just calling > > shm_toc_estimate_chunk() once

Re: Parallel copy

2020-10-08 Thread Amit Kapila
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow wrote: > > On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote: > > > Attached v6 patch with the fixes. > > > > Hi Vignesh, > > I noticed a couple of issues when scanning the code in the following patch: > > v6-0003-Allow-copy-from-command-to-process-d

Re: Parallel copy

2020-10-08 Thread Amit Kapila
On Thu, Oct 8, 2020 at 12:14 AM vignesh C wrote: > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > + */ > > > > +typedef struct ParallelCopyLineBoundary > > > > > > > > Are we doing all this state management to avoid using locks while > > > > processing lines? If so, I think we can

Re: Parallel copy

2020-10-08 Thread Amit Kapila
On Thu, Oct 8, 2020 at 12:14 AM vignesh C wrote: > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > > > I am convinced by the reason given by Kyotaro-San in that another > > thread [1] and performance data shown by Peter that this can't be an > > independent improvement and rather in

Re: Parallel copy

2020-10-07 Thread vignesh C
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > Few additional comments: > > == > > Some more comments: > > v5-0002-Framework-for-leader-worker-in-parallel-copy > === > 1

Re: Parallel copy

2020-10-07 Thread vignesh C
On Mon, Sep 28, 2020 at 6:37 PM Ashutosh Sharma wrote: > > On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote: > > > > On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > > > > > Thanks Ashutosh for your comments. > > > > > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma > > > wrote: > > > >

Re: Parallel copy

2020-10-07 Thread Greg Nancarrow
On Thu, Oct 8, 2020 at 5:44 AM vignesh C wrote: > Attached v6 patch with the fixes. > Hi Vignesh, I noticed a couple of issues when scanning the code in the following patch: v6-0003-Allow-copy-from-command-to-process-data-from-file.patch In the following code, it will put a junk uint16 va

Re: Parallel copy

2020-10-07 Thread vignesh C
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote: > > On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > > > Thanks Ashutosh for your comments. > > > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma > > wrote: > > > > > > Hi Vignesh, > > > > > > I've spent some time today looking at your ne

Re: Parallel copy

2020-10-07 Thread vignesh C
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow wrote: > > Hi Vignesh and Bharath, > > Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as > parallel-unsafe. > Can you explain why this is? Yes we don't need to restrict parallelism for RI_TRIGGER_PK cases as we don't do any command cou

Re: Parallel copy

2020-10-03 Thread Amit Kapila
On Sat, Oct 3, 2020 at 6:20 AM Tomas Vondra wrote: > > Hello Vignesh, > > I've done some basic benchmarking on the v4 version of the patches (but > AFAIKC the v5 should perform about the same), and some initial review. > > For the benchmarking, I used the lineitem table from TPC-H - for 75GB > dat

Re: Parallel copy

2020-10-02 Thread Tomas Vondra
Hello Vignesh, I've done some basic benchmarking on the v4 version of the patches (but AFAIKC the v5 should perform about the same), and some initial review. For the benchmarking, I used the lineitem table from TPC-H - for 75GB data set, this largest table is about 64GB once loaded, with another

Re: Parallel copy

2020-09-30 Thread Amit Kapila
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow wrote: > > Hi Vignesh and Bharath, > > Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as > parallel-unsafe. > Can you explain why this is? > I don't think we need to restrict this case and even if there is some reason to do so then pro

Re: Parallel copy

2020-09-29 Thread vignesh C
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila wrote: > > On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > > > Few additional comments: > > == > > Some more comments: > Thanks Amit for the comments, I will work on the comments and provide a patch in the next few days. Re

Re: Parallel copy

2020-09-29 Thread Amit Kapila
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila wrote: > > Few additional comments: > == Some more comments: v5-0002-Framework-for-leader-worker-in-parallel-copy === 1. These values + * help in handover of multiple records with significant

Re: Parallel copy

2020-09-29 Thread Greg Nancarrow
Hi Vignesh and Bharath, Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as parallel-unsafe. Can you explain why this is? Regards, Greg Nancarrow Fujitsu Australia

Re: Parallel copy

2020-09-28 Thread Ashutosh Sharma
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila wrote: > > On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > > > Thanks Ashutosh for your comments. > > > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma > > wrote: > > > > > > Hi Vignesh, > > > > > > I've spent some time today looking at your ne

Re: Parallel copy

2020-09-28 Thread Amit Kapila
On Tue, Sep 22, 2020 at 2:44 PM vignesh C wrote: > > Thanks Ashutosh for your comments. > > On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma wrote: > > > > Hi Vignesh, > > > > I've spent some time today looking at your new set of patches and I've > > some thoughts and queries which I would like to

Re: Parallel copy

2020-09-27 Thread Amit Kapila
On Wed, Jul 22, 2020 at 7:48 PM vignesh C wrote: > > On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila wrote: > > > > > Review comments: > > === > > > > 0001-Copy-code-readjustment-to-support-parallel-copy > > 1. > > @@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate) > > else > >

Re: Parallel copy

2020-09-24 Thread Ashutosh Sharma
On Thu, Sep 24, 2020 at 3:00 PM Bharath Rupireddy wrote: > > > > > > Have you tested your patch when encoding conversion is needed? If so, > > > could you please point out the email that has the test results. > > > > > > > We have not yet done encoding testing, we will do and post the results > >

Re: Parallel copy

2020-09-24 Thread Bharath Rupireddy
> > > Have you tested your patch when encoding conversion is needed? If so, > > could you please point out the email that has the test results. > > > > We have not yet done encoding testing, we will do and post the results > separately in the coming days. > Hi Ashutosh, I ran the tests ensuring p

Re: Parallel copy

2020-09-24 Thread Bharath Rupireddy
Thanks Greg for the testing. On Thu, Sep 24, 2020 at 8:27 AM Greg Nancarrow wrote: > > > 3. Could you please run the test case 3 times at least? Just to ensure the consistency of the issue. > > Yes, have run 4 times. Seems to be a performance hit (whether normal > copy or parallel-1 copy) on the

Re: Parallel copy

2020-09-23 Thread Greg Nancarrow
Hi Bharath, > Few things to follow before testing: > 1. Is the table being dropped/truncated after the test with 0 workers and > before running with 1 worker? If not, then the index insertion time would > increase.[1](for me it is increasing by 10 sec). This is obvious because the > 1st time in

Re: Parallel copy

2020-09-22 Thread Bharath Rupireddy
On Thu, Sep 17, 2020 at 11:06 AM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > > On Wed, Sep 16, 2020 at 1:20 PM Greg Nancarrow wrote: > > > > Fortunately I have been given permission to share the exact table > > definition and data I used, so you can check the behaviour and

Re: Parallel copy

2020-09-16 Thread Bharath Rupireddy
On Wed, Sep 16, 2020 at 1:20 PM Greg Nancarrow wrote: > > Fortunately I have been given permission to share the exact table > definition and data I used, so you can check the behaviour and timings > on your own test machine. > Thanks Greg for the script. I ran your test case and I didn't observe

Re: Parallel copy

2020-09-16 Thread Ashutosh Sharma
Hi Vignesh, I've spent some time today looking at your new set of patches and I've some thoughts and queries which I would like to put here: Why are these not part of the shared cstate structure? SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print); SerializeString(pcx

Re: Parallel copy

2020-09-16 Thread Greg Nancarrow
Hi Bharath, On Tue, Sep 15, 2020 at 11:49 PM Bharath Rupireddy wrote: > > Few questions: > 1. Was the run performed with default postgresql.conf file? If not, > what are the changed configurations? Yes, just default settings. > 2. Are the readings for normal copy(190.891sec, mentioned by you >

Re: Parallel copy

2020-09-15 Thread Bharath Rupireddy
On Fri, Sep 11, 2020 at 3:49 AM Greg Nancarrow wrote: > > I couldn't use the original machine from which I obtained the previous > results, but ended up using a 4-core CentOS7 VM, which showed a > similar pattern in the performance results for this test case. > I obtained the following results fro

Re: Parallel copy

2020-09-07 Thread vignesh C
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow wrote: > > Hi Vignesh, > > >Can you share with me the script you used to generate the data & the ddl of > >the table, so that it will help me check that >scenario you faced the > >>problem. > > Unfortunately I can't directly share it (considered comp

Re: Parallel copy

2020-09-02 Thread Greg Nancarrow
>On Wed, Sep 2, 2020 at 3:40 PM vignesh C wrote: > I have attached the scripts that I used for the test results I > mentioned in my previous mail. create.sql file has the table that I > used, insert_data_gen.txt has the insert data generation scripts. I > varied the count in insert_data_gen to gen

Re: Parallel copy

2020-09-01 Thread vignesh C
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow wrote: > > Hi Vignesh, > > >Can you share with me the script you used to generate the data & the ddl of > >the table, so that it will help me check that >scenario you faced the > >>problem. > > Unfortunately I can't directly share it (considered comp

Re: Parallel copy

2020-09-01 Thread Greg Nancarrow
Hi Vignesh, >Can you share with me the script you used to generate the data & the ddl of >the table, so that it will help me check that >scenario you faced the >problem. Unfortunately I can't directly share it (considered company IP), though having said that it's only doing something that is rel

Re: Parallel copy

2020-08-31 Thread vignesh C
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote: > - Parallel Copy with 1 worker ran slower than normal Copy in a couple > of cases (I did question if allowing 1 worker was useful in my patch > review). Thanks Greg for your review & testing. I had executed various tests with 1GB, 2GB & 5GB w

Re: Parallel copy

2020-08-27 Thread Amit Kapila
On Thu, Aug 27, 2020 at 4:56 PM vignesh C wrote: > > On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila wrote: > > > > On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote: > > > > > > > I have attached new set of patches with the fixes. > > > > Thoughts? > > > > > > Hi Vignesh, > > > > > > I don't rea

Re: Parallel copy

2020-08-27 Thread vignesh C
On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila wrote: > > On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote: > > > > > I have attached new set of patches with the fixes. > > > Thoughts? > > > > Hi Vignesh, > > > > I don't really have any further comments on the code, but would like > > to share s

Re: Parallel copy

2020-08-26 Thread Amit Kapila
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow wrote: > > > I have attached new set of patches with the fixes. > > Thoughts? > > Hi Vignesh, > > I don't really have any further comments on the code, but would like > to share some results of some Parallel Copy performance tests I ran > (attached).

Re: Parallel copy

2020-08-26 Thread Greg Nancarrow
> I have attached new set of patches with the fixes. > Thoughts? Hi Vignesh, I don't really have any further comments on the code, but would like to share some results of some Parallel Copy performance tests I ran (attached). The tests loaded a 5GB CSV data file into a 100 column table (of diffe

Re: Parallel copy

2020-08-16 Thread Greg Nancarrow
Hi Vignesh, Some further comments: (1) v3-0002-Framework-for-leader-worker-in-parallel-copy.patch +/* + * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data + * block to process to avoid lock contention. This value should be divisible by + * RINGSIZE, as wrap around cases

Re: Parallel copy

2020-08-11 Thread Greg Nancarrow
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, failed Hi, I don't claim to yet understand all of the Postgres inte

Re: Parallel copy

2020-08-05 Thread vignesh C
On Tue, Aug 4, 2020 at 9:51 PM Tomas Vondra wrote: > > On Mon, Aug 03, 2020 at 12:33:48PM +0530, Bharath Rupireddy wrote: > >On Sat, Aug 1, 2020 at 9:55 AM vignesh C wrote: > >> > >> The patches were not applying because of the recent commits. > >> I have rebased the patch over head & attached. >

  1   2   3   >