> I have attached new set of patches with the fixes. > Thoughts? Hi Vignesh,
I don't really have any further comments on the code, but would like to share some results of some Parallel Copy performance tests I ran (attached). The tests loaded a 5GB CSV data file into a 100 column table (of different data types). The following were varied as part of the test: - Number of workers (1 – 10) - No indexes / 4-indexes - Default settings / increased resources (shared_buffers,work_mem, etc.) (I did not do any partition-related tests as I believe those type of tests were previously performed) I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4). The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM. I observed the following trends: - For the data file size used, Parallel Copy achieved best performance using about 9 – 10 workers. Larger data files may benefit from using more workers. However, I couldn’t really see any better performance, for example, from using 16 workers on a 10GB CSV data file compared to using 8 workers. Results may also vary depending on machine characteristics. - Parallel Copy with 1 worker ran slower than normal Copy in a couple of cases (I did question if allowing 1 worker was useful in my patch review). - Typical load time improvement (load factor) for Parallel Copy was between 2x and 3x. Better load factors can be obtained by using larger data files and/or more indexes. - Increasing Postgres resources made little or no difference to Parallel Copy performance when the target table had no indexes. Increasing Postgres resources improved Parallel Copy performance when the target table had indexes. Regards, Greg Nancarrow Fujitsu Australia
(1) Postgres default settings, 5GB CSV (5100000 rows), no indexes on table: Copy Type Duration (s) Load factor =============================================== Normal Copy 132.838 - Parallel Copy (#workers) 1 97.537 1.36 2 61.700 2.15 3 52.788 2.52 4 46.607 2.85 5 45.524 2.92 6 43.799 3.03 7 42.970 3.09 8 42.974 3.09 9 43.698 3.04 10 43.362 3.06 (2) Postgres default settings, 5GB CSV (5100000 rows), 4 indexes on table: Copy Type Duration (s) Load factor =============================================== Normal Copy 221.111 - Parallel Copy (#workers) 1 331.609 0.66 2 99.085 2.23 3 89.751 2.46 4 81.137 2.73 5 79.138 2.79 6 77.155 2.87 7 75.813 2.92 8 74.961 2.95 9 77.803 2.84 10 75.399 2.93 (3) Postgres increased resources, 5GB CSV (5100000 rows), no indexes on table: shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB work_mem = 10% of RAM = 38GB maintenance_work_mem = 10% of RAM = 38GB max_worker_processes = 16 max_parallel_workers = 16 checkpoint_timeout = 30min max_wal_size=2GB Copy Type Duration (s) Load factor =============================================== Normal Copy 78.138 - Parallel Copy (#workers) 1 95.203 0.82 2 62.596 1.24 3 52.318 1.49 4 48.246 1.62 5 42.832 1.82 6 42.921 1.82 7 43.146 1.81 8 41.557 1.88 9 43.489 1.80 10 43.362 1.80 (4) Postgres increased resources, 5GB CSV (5100000 rows), 4 indexes on table: Copy Type Duration (s) Load factor =============================================== Normal Copy 151.364 - Parallel Copy (#workers) 1 120.058 1.26 2 87.465 1.73 3 76.871 1.97 4 69.805 2.17 5 64.100 2.36 6 60.667 2.49 7 59.202 2.56 8 57.417 2.67 9 61.143 2.48 10 57.500 2.63