On Thu, Sep 19, 2024 at 5:24 AM Lok P <loknath...@gmail.com> wrote:
> > On Thu, Sep 19, 2024 at 11:31 AM Ron Johnson <ronljohnso...@gmail.com> > wrote: > >> >> [snip] >> >>> >>> Method-4 >>> >>> INSERT INTO parent_table VALUES (1, 'a'), (2, 'a'); >>> INSERT INTO child_table VALUES (1,1, 'a'), (1,2, 'a'); >>> commit; >>> >> >> If I knew that I had to load a structured input data file (even if it had >> parent and child records), this is how I'd do it (but probably first try >> and see if "in-memory COPY INTO" is such a thing). >> >> > > I was trying to reproduce this behaviour using row by row commit vs just > batch commit vs true batch insert as you mentioned, i am not able to see > any difference between "batch commit" and "true batch insert" response. Am > I missing anything? > > [snip] > DO $$ > DECLARE > num_inserts INTEGER := 100000; > batch_size INTEGER := 50; > start_time TIMESTAMP; > end_time TIMESTAMP; > elapsed_time INTERVAL; > i INTEGER; > BEGIN > -- Method 1: Individual Inserts with Commit after every Row > start_time := clock_timestamp(); > > FOR i IN 1..num_inserts LOOP > INSERT INTO parent_table VALUES (i, 'a'); > COMMIT; > END LOOP; > > end_time := clock_timestamp(); > elapsed_time := end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 1: Individual Inserts with Commit after every Row', > start_time, end_time, elapsed_time); > > -- Method 2: Individual Inserts with Commit after 100 Rows > start_time := clock_timestamp(); > > FOR i IN 1..num_inserts LOOP > INSERT INTO parent_table2 VALUES (i, 'a'); > -- Commit after every 100 rows > IF i % batch_size = 0 THEN > COMMIT; > END IF; > END LOOP; > > -- Final commit if not already committed > commit; > > end_time := clock_timestamp(); > elapsed_time := end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 2: Individual Inserts with Commit after 100 Rows', > start_time, end_time, elapsed_time); > > -- Method 3: Batch Inserts with Commit after all > start_time := clock_timestamp(); > > FOR i IN 1..(num_inserts / batch_size) LOOP > INSERT INTO parent_table3 VALUES > (1 + (i - 1) * batch_size, 'a'), > [snip] > (49 + (i - 1) * batch_size, 'a'), > (50 + (i - 1) * batch_size, 'a')); > COMMIT; > END LOOP; > > COMMIT; -- Final commit for all > end_time := clock_timestamp(); > elapsed_time := end_time - start_time; > INSERT INTO debug_log (method1, start_time, end_time, elapsed_time) > VALUES ('Method 3: Batch Inserts with Commit after All', start_time, > end_time, elapsed_time); > > END $$; > Reproduce what behavior? Anyway, plpgsql functions (including anonymous DO statements) are -- to Postgresql -- single statements. Thus, they'll be faster than individual calls.. An untrusted language like plpython3u might speed things up even more, if you have to read a heterogeneous external file and insert all the records into the db. -- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive. <Redacted> crustacean!