Hi, Attached is the patch supporting parallel copy for binary format files.
The performance improvement achieved with different workers is as shown below. Dataset used has 10million tuples and is of 5.3GB size. parallel workers test case 1(exec time in sec): copy from binary file, 2 indexes on integer columns and 1 index on text column test case 2(exec time in sec): copy from binary file, 1 gist index on text column test case 3(exec time in sec): copy from binary file, 3 indexes on integer columns 0 1106.899(1X) 772.758(1X) 171.338(1X) 1 1094.165(1.01X) 757.365(1.02X) 163.018(1.05X) 2 618.397(1.79X) 428.304(1.8X) 117.508(1.46X) 4 320.511(3.45X) 231.938(3.33X) 80.297(2.13X) 8 172.462(6.42X) 150.212(5.14X) *71.518(2.39X)* 16 110.460(10.02X) *124.929(6.18X)* 91.308(1.88X) 20 *98.470(11.24X)* 137.313(5.63X) 95.289(1.79X) 30 109.229(10.13X) 173.54(4.45X) 95.799(1.78X) Design followed for developing this patch: Leader reads data from the file into the DSM data blocks each of 64K size. It also identifies each tuple data block id, start offset, end offset, tuple size and updates this information in the ring data structure. Workers parallely read the tuple information from the ring data structure, the actual tuple data from the data blocks and parallely insert the tuples into the table. Please note that this patch can be applied on the series of patches that were posted previously[1] for parallel copy for csv/text files. The correct order to apply all the patches is - 0001-Copy-code-readjustment-to-support-parallel-copy.patch <https://www.postgresql.org/message-id/attachment/111463/0001-Copy-code-readjustment-to-support-parallel-copy.patch> 0002-Framework-for-leader-worker-in-parallel-copy.patch <https://www.postgresql.org/message-id/attachment/111465/0002-Framework-for-leader-worker-in-parallel-copy.patch> 0003-Allow-copy-from-command-to-process-data-from-file-ST.patch <https://www.postgresql.org/message-id/attachment/111464/0003-Allow-copy-from-command-to-process-data-from-file-ST.patch> 0004-Documentation-for-parallel-copy.patch <https://www.postgresql.org/message-id/attachment/111466/0004-Documentation-for-parallel-copy.patch> and 0005-Parallel-Copy-For-Binary-Format-Files.patch The above tests were run with the configuration attached config.txt, which is the same used for performance tests of csv/text files posted earlier in this mail chain. Request the community to take this patch up for review along with the parallel copy for csv/text file patches and provide feedback. [1] - https://www.postgresql.org/message-id/CALDaNm3uyHpD9sKoFtB0EnMO8DLuD6H9pReFm%3Dtm%3D9ccEWuUVQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Postgres configuration used for above testing: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off System Configuration: RAM: 503GB Disk Type: SSD Disk Size: 1.6TB Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 47 Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz Stepping: 2 CPU MHz: 1064.000 CPU max MHz: 2129.0000 CPU min MHz: 1064.0000 BogoMIPS: 4266.62 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 24576K
0005-Parallel-Copy-For-Binary-Format-Files.patch
Description: Binary data