Hi Ashutosh. On 2017/12/19 19:12, Ashutosh Bapat wrote: > On Tue, Dec 19, 2017 at 3:36 PM, Amit Langote > <langote_amit...@lab.ntt.co.jp> wrote: >> >> * Bulk-inserting 100,000 rows using COPY: >> >> copy t1 from '/tmp/t1.csv' csv; >> >> * Times in milliseconds: >> >> #parts HEAD Patched >> >> 8 458.301 450.875 >> 16 409.271 510.723 >> 32 500.960 612.003 >> 64 430.687 795.046 >> 128 449.314 565.786 >> 256 493.171 490.187 > > While the earlier numbers were monotonically increasing with number of > partitions, these numbers don't. For example the number on HEAD with 8 > partitions is higher than that with 128 partitions as well. That's > kind of wierd. May be something wrong with the measurement.
In the bulk-insert case, we initialize partitions only once, because the COPY that loads those 100,000 rows is executed just once. Whereas in the non-bulk insert case, we initialize partitions (lock, allocate various objects) 100,000 times, because that's how many times the INSERT is executed, once for each of 100,000 rows to be inserted. Without the patch, the object initialization occurs N times where N is the number of partitions. With the patch it occurs just once -- only for the partition to which the row was routed. Time required, although became smaller with the patch, is still monotonically increasing, because the patch didn't do anything about locking all partitions. Does that make sense? > Do we see > similar unstability when bulk inserting in an unpartitioned table? > Also, the numbers against 64 partitions are really bad. That's almost > 2x slower. Sorry, as I said the numbers I initially posted were a bit noisy. I just re-ran that COPY against the patched and get the following numbers: #parts Patched 8 441.852 16 417.510 32 435.276 64 486.497 128 436.473 256 446.312 Thanks, Amit