Hi All,

I have a big table (200G with 728 million records), the table slows down lots 
of things. It's time to clean the data up. The requirement is when I delete I 
must summarise few columns to a new table for backup purpose. So that mean I 
need to go through each row and add the columns' value to the summary table 
(the corresponding category).

The table has indexes (multicoumn indexes) before, I removed the indexes 
because I read on forum says the indexes heavily reduce the speed of deleting. 
That's true if I only delete from the table, but my case is I first SELECT this 
row, then add few values to the summary table and then DELETE the row according 
to its multiple primary key. This seems to be a bad idea now as it takes very 
long time to DELETE the row (as delete needs to find the row first).

Here are the two version of the delete functions, please help to point out how 
can I speed it up.
1.

CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
                                                                rec             
        RECORD;
                                                                subrec          
        RECORD;

BEGIN
                                                                FOR rec IN 
SELECT * FROM tableA limit 100 LOOP
                                                                                
                                                BEGIN
                                                                                
                                                                UPDATE 
summaryTable set count1 = rec.count1 + rec.count1, count2 =... where category 
match;

                                                                                
                                                                delete from 
tableA where tableA.primaryKeyA=rec.primaryKeyA and 
tableA.primaryKeyB=rec.primaryKeyB;
                                                                                
                                                END;
                                                                END LOOP;

                                                                return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

And then I have a .bat script to loop the above function million times.

2.

CREATE OR REPLACE FUNCTION summary_delete_table()
RETURNS integer AS
$BODY$
DECLARE
                                                                rec             
        RECORD;
                                                                td_cursor       
     CURSOR FOR SELECT * FROM tableA limit 100;

BEGIN
                                                                FOR rec IN 
td_cursor LOOP
                                                                                
                BEGIN
                                                                                
                                UPDATE summaryTable set count1 = rec.count1 + 
rec.count1, count2 =... where category match;

                                                                                
                                delete from tableA WHERE CURRENT OF td_cursor;
                                                                                
                END;
                                                                END LOOP;

                                                                return 1;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;


Method 2 is bit faster but not much, the delete speed is 2478 rows/s for method 
2 and 2008 rows/s for method 1.
Any suggestions are welcomed.

BTW, I guess if reindex, it may take few days to finish.
Also, I tried change delete 100 rows at a time and 1000, 2000. The result 
showed 1000 is faster than 100 and 2000 a time.



Thanks and Regards,
Haiming





[cid:logo_new_96ac6d31-abb9-4f10-a3b7-74542f8c4a5f1111111111111111111111111111111.png]


Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E haiming.zh...@redflex.com.au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia


If you are not an authorised recipient of this e-mail, please contact me at 
Redflex immediately by return phone call or by email. In this case, you should 
not read, print, retransmit, store or act in reliance on this e-mail or any 
attachments, and should destroy all copies of them. This e-mail and any 
attachments are confidential and may contain privileged information and/or 
copyright material of Redflex or third parties. You should only retransmit, 
distribute or commercialise the material if you are authorised to do so. This 
notice should not be removed.






Reply via email to