Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-26 Thread Amit Langote
On Sun, Jul 26, 2020 at 6:06 AM Tom Lane wrote: > Amit Langote writes: > > [ v7-0001-Improve-performance-of-binary-COPY-FROM-with-buff.patch ] > > Pushed with cosmetic changes. Thanks for that. > I'd always supposed that stdio does enough internal buffering that short > fread()s shouldn't be mu

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-25 Thread Tom Lane
Amit Langote writes: > [ v7-0001-Improve-performance-of-binary-COPY-FROM-with-buff.patch ] Pushed with cosmetic changes. I'd always supposed that stdio does enough internal buffering that short fread()s shouldn't be much worse than memcpy(). But I reproduced your result of ~30% speedup for data

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-17 Thread Bharath Rupireddy
> > > > I will change the status to "ready for committer" in commitfest > > tomorrow. Hope that's fine. > > I agree, a committer can have a look at this. > I changed the status in the commit fest to "Ready for Committer". https://commitfest.postgresql.org/28/2632/ With Regards, Bharath Rupireddy

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-16 Thread vignesh C
On Thu, Jul 16, 2020 at 8:52 PM Bharath Rupireddy wrote: > > On Thu, Jul 16, 2020 at 7:44 PM Amit Langote wrote: > > > > On Wed, Jul 15, 2020 at 1:06 PM vignesh C wrote: > > > On Wed, Jul 15, 2020 at 8:03 AM Amit Langote > > > wrote: > > > > On Tue, Jul 14, 2020 at 10:23 PM vignesh C wrote: >

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-16 Thread Bharath Rupireddy
On Thu, Jul 16, 2020 at 7:44 PM Amit Langote wrote: > > On Wed, Jul 15, 2020 at 1:06 PM vignesh C wrote: > > On Wed, Jul 15, 2020 at 8:03 AM Amit Langote > > wrote: > > > On Tue, Jul 14, 2020 at 10:23 PM vignesh C wrote: > > > > Rest of the changes looked fine to me. Also I noticed that > > >

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-16 Thread Amit Langote
On Wed, Jul 15, 2020 at 1:06 PM vignesh C wrote: > On Wed, Jul 15, 2020 at 8:03 AM Amit Langote wrote: > > On Tue, Jul 14, 2020 at 10:23 PM vignesh C wrote: > > > Rest of the changes looked fine to me. Also I noticed that > > > commit message was missing in the patch. >> > > Please see the attac

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-14 Thread vignesh C
On Wed, Jul 15, 2020 at 8:03 AM Amit Langote wrote: > > Hi Vignesh, > > On Tue, Jul 14, 2020 at 10:23 PM vignesh C wrote: > > On Tue, Jul 14, 2020 at 11:19 AM Amit Langote > > wrote: > > > Sounds fine to me. Although CopyLoadRawBuf() does not seem to a > > > candidate for rigorous code optimiz

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-14 Thread Amit Langote
Hi Vignesh, On Tue, Jul 14, 2020 at 10:23 PM vignesh C wrote: > On Tue, Jul 14, 2020 at 11:19 AM Amit Langote wrote: > > Sounds fine to me. Although CopyLoadRawBuf() does not seem to a > > candidate for rigorous code optimization as it does not get called > > that often. > > I thought we could

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-14 Thread vignesh C
On Tue, Jul 14, 2020 at 11:19 AM Amit Langote wrote: > > > Sounds fine to me. Although CopyLoadRawBuf() does not seem to a > candidate for rigorous code optimization as it does not get called > that often. > I thought we could include that change as we are making changes around that code. Rest o

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Amit Langote
On Tue, Jul 14, 2020 at 2:02 PM vignesh C wrote: > On Tue, Jul 14, 2020 at 7:26 AM Amit Langote wrote: > > In CopyLoadRawBuf(), we could also change the condition if > > (cstate->raw_buf_index < cstate->raw_buf_len) to if (BUF_BYTES > 0), > > which looks clearer. > > > > Also, if we are going to

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread vignesh C
On Tue, Jul 14, 2020 at 7:26 AM Amit Langote wrote: > > Good idea, thanks. > > In CopyLoadRawBuf(), we could also change the condition if > (cstate->raw_buf_index < cstate->raw_buf_len) to if (BUF_BYTES > 0), > which looks clearer. > > Also, if we are going to use the macro more generally, let's m

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Amit Langote
On Mon, Jul 13, 2020 at 11:58 PM Bharath Rupireddy wrote: > > I had one small comment: > > +{ > > + int copied_bytes = 0; > > + > > +#define BUF_BYTES (cstate->raw_buf_len - cstate->raw_buf_index) > > +#define DRAIN_COPY_RAW_BUF(cstate, dest, nbytes)\ > > + do {\ > > +

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Bharath Rupireddy
> > I had one small comment: > +{ > + int copied_bytes = 0; > + > +#define BUF_BYTES (cstate->raw_buf_len - cstate->raw_buf_index) > +#define DRAIN_COPY_RAW_BUF(cstate, dest, nbytes)\ > + do {\ > + memcpy((dest), (cstate)->raw_buf + > (cstate)->raw_buf_ind

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread vignesh C
On Mon, Jul 13, 2020 at 8:02 AM Amit Langote wrote: > By the way, considering the rebase over cd22d3cdb9b, it seemed to me > that we needed to update the comments in CopyStateData struct > definition a bit more. While doing that, I realized > CopyReadFromRawBuf as a name for the new function migh

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Bharath Rupireddy
> > > > CopyReadFromRawBuf as a name for the new function might be misleading > > > as long as we are only using it for binary data. Maybe > > > CopyReadBinaryData is more appropriate? See attached v4 with these > > > and a few other cosmetic changes. > > > > > > > CopyReadBinaryData() looks mean

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-13 Thread Amit Langote
On Mon, Jul 13, 2020 at 12:17 PM Bharath Rupireddy wrote: > > CopyReadFromRawBuf as a name for the new function might be misleading > > as long as we are only using it for binary data. Maybe > > CopyReadBinaryData is more appropriate? See attached v4 with these > > and a few other cosmetic chang

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-12 Thread Bharath Rupireddy
> > > This is due to the recent commit > > cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb, in which we restricted the > > raw_buf and line_buf allocations for binary files. Since we are using > > raw_buf for this performance improvement feature, now, it's enough to > > restrict only line_buf for binary f

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-12 Thread Amit Langote
On Mon, Jul 13, 2020 at 10:19 AM Bharath Rupireddy wrote: > > On Mon, Jul 13, 2020 at 4:06 AM Thomas Munro wrote: > > > > This error showed up when cfbot tried it: > > > > COPY BINARY stud_emp FROM > > '/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/stud_emp.data'; > > +

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-12 Thread Bharath Rupireddy
Thanks Thomas for checking this feature. > On Mon, Jul 13, 2020 at 4:06 AM Thomas Munro wrote: > > This error showed up when cfbot tried it: > > COPY BINARY stud_emp FROM > '/home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/stud_emp.data'; > +ERROR: could not read from COP

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-12 Thread Thomas Munro
On Mon, Jul 13, 2020 at 1:13 AM vignesh C wrote: > On Fri, Jul 10, 2020 at 8:51 AM Amit Langote wrote: > > Here the numbers with the updated patch: > > > > HEADpatched (v2) > > foo5 8.56.1 > > foo10 149.4 > > foo20 25 16.7 > > > > Patc

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-12 Thread vignesh C
On Fri, Jul 10, 2020 at 8:51 AM Amit Langote wrote: > > Hi Bharath, > Here the numbers with the updated patch: > > HEADpatched (v2) > foo5 8.56.1 > foo10 149.4 > foo20 25 16.7 > Patch applies cleanly, make check & make check-world pass

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-09 Thread Amit Langote
Hi Bharath, On Thu, Jul 9, 2020 at 7:33 PM Bharath Rupireddy wrote: > Thanks Amit for buying into the idea. I agree that your patch looks > clean and simple compared to mine and I'm okay with your patch. > > I reviewed and tested your patch, below are few comments: Thanks for checking it out. >

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-09 Thread Bharath Rupireddy
> > Considering these points, I came up with the attached patch with a > much smaller footprint. As far as I can see, it implements the same > basic idea as your patch. With it, I was able to see an improvement > in loading time consistent with your numbers. I measured the time of > loading 10 m

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-07 Thread Amit Langote
On Tue, Jul 7, 2020 at 4:28 PM Amit Langote wrote: > The median times for the COPY FROM commands above, with and without > the patch, are as follows: > > HEADpatched > foo58.5 6.5 > foo10 14 10 > foo20 25 18 Sorry, I forgot to mention that these t

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-07 Thread Amit Langote
Hi Bharath, On Mon, Jun 29, 2020 at 2:21 PM Bharath Rupireddy wrote: > For Copy From Binary files, there exists below information for each tuple/row. > 1. field count(number of columns) > 2. for every field, field size(column data length) > 3. field data of field size(actual column data) > > Curr

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-01 Thread Bharath Rupireddy
Hi, Added this to commitfest incase this is useful - https://commitfest.postgresql.org/28/ With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com On Mon, Jun 29, 2020 at 10:50 AM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > Hi Hackers, > > For Copy Fr

[PATCH] Performance Improvement For Copy From Binary Files

2020-06-28 Thread Bharath Rupireddy
Hi Hackers, For Copy From Binary files, there exists below information for each tuple/row. 1. field count(number of columns) 2. for every field, field size(column data length) 3. field data of field size(actual column data) Currently, all the above data required at each step is read directly from