Hi, Added this to commitfest incase this is useful - https://commitfest.postgresql.org/28/
With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com On Mon, Jun 29, 2020 at 10:50 AM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > Hi Hackers, > > For Copy From Binary files, there exists below information for each > tuple/row. > 1. field count(number of columns) > 2. for every field, field size(column data length) > 3. field data of field size(actual column data) > > Currently, all the above data required at each step is read directly from > file using fread() and this happens for all the tuples/rows. > > One observation is that in the total execution time of a copy from binary > file, the fread() call is taking upto 20% of time and the fread() function > call count is also too high. > > For instance, with a dataset of size 5.3GB, 10million tuples with 10 > columns, > total exec time in sec total time taken for fread() fread() function call > count > 101.193 *21.33* 210000005 > 101.345 *21.436* 210000005 > > The total time taken for fread() and the corresponding function call count > may increase if we have more number of columns for instance 1000. > > One solution to this problem is to read data from binary file in > RAW_BUF_SIZE(64KB) chunks to avoid repeatedly calling fread()(thus possibly > avoiding few disk IOs). This is similar to the approach followed for > csv/text files. > > Attaching a patch, implementing the above solution for binary format files. > > Below is the improvement gained. > total exec time in sec total time taken for fread() fread() function call > count > 75.757 *2.73* 160884 > 75.351 *2.742* 160884 > > *Execution is 1.36X times faster, fread() time is reduced by 87%, fread() > call count is reduced by 99%.* > > Request the community to take this patch for review if this approach and > improvement seem beneficial. > > Any suggestions to improve further are most welcome. > > Attached also is the config file used for testing the above use case. > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com >