Below are the results for the posted methods. Tested it on local and it gave no difference in timing between the method-2 andmethod-3. Failed to run in dbfiddle somehow.
Also I was initially worried if adding the trigger to the our target table, will worsen the performance as because , it will make all the execution to "row by row" rather a true batch insert(method-3 as posted) as there will be more number of context switches , but it seems it will still be doing the batch commits(like the way its in method-2). So as per that , we won't lose any performance as such. Is this understanding correct? *Method-1- 00:01:44.48* *Method-2- 00:00:02.67* *Method-3- 00:00:02.39* https://gist.github.com/databasetech0073/8e9106757d751358c0c0c65a2374dbc6 On Thu, Sep 19, 2024 at 6:42 PM Lok P <loknath...@gmail.com> wrote: > > > On Thu, Sep 19, 2024 at 5:40 PM Ron Johnson <ronljohnso...@gmail.com> > wrote: > >> On Thu, Sep 19, 2024 at 5:24 AM Lok P <loknath...@gmail.com> wrote: >> >>> >>> >>> [snip] >> >>> DO $$ >>> DECLARE >>> num_inserts INTEGER := 100000; >>> batch_size INTEGER := 50; >>> start_time TIMESTAMP; >>> end_time TIMESTAMP; >>> elapsed_time INTERVAL; >>> i INTEGER; >>> BEGIN >>> -- Method 1: Individual Inserts with Commit after every Row >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table VALUES (i, 'a'); >>> COMMIT; >>> END LOOP; >>> >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 1: Individual Inserts with Commit after every Row', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 2: Individual Inserts with Commit after 100 Rows >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..num_inserts LOOP >>> INSERT INTO parent_table2 VALUES (i, 'a'); >>> -- Commit after every 100 rows >>> IF i % batch_size = 0 THEN >>> COMMIT; >>> END IF; >>> END LOOP; >>> >>> -- Final commit if not already committed >>> commit; >>> >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', >>> start_time, end_time, elapsed_time); >>> >>> -- Method 3: Batch Inserts with Commit after all >>> start_time := clock_timestamp(); >>> >>> FOR i IN 1..(num_inserts / batch_size) LOOP >>> INSERT INTO parent_table3 VALUES >>> (1 + (i - 1) * batch_size, 'a'), >>> >> [snip] >> >>> (49 + (i - 1) * batch_size, 'a'), >>> (50 + (i - 1) * batch_size, 'a')); >>> COMMIT; >>> END LOOP; >>> >>> COMMIT; -- Final commit for all >>> end_time := clock_timestamp(); >>> elapsed_time := end_time - start_time; >>> INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) >>> VALUES ('Method 3: Batch Inserts with Commit after All', start_time, >>> end_time, elapsed_time); >>> >>> END $$; >>> >> >> Reproduce what behavior? >> >> Anyway, plpgsql functions (including anonymous DO statements) are -- to >> Postgresql -- single statements. Thus, they'll be faster than >> individual calls.. >> >> An untrusted language like plpython3u might speed things up even more, if >> you have to read a heterogeneous external file and insert all the records >> into the db. >> > > Here if you see my script , the method-1 is doing commit after each row > insert. And method-2 is doing a batch commit i.e. commit after every "50" > row. And method-3 is doing a true batch insert i.e. combining all the 50 > values in one insert statement and submitting to the database in oneshot > and then COMMIT it, so the context switching will be a lot less. So I was > expecting Method-3 to be the fastest way to insert the rows here, but the > response time shows the same response time for Method-2 and method-3. > Method-1 is the slowest through. >