Thanks. I have fixed the problem. And now the query can run in 4 mins for 1.5 million records in TABLE1 and 0.5 million records in TABLE2.
The solution is I created a function that gets all event_id from TABLE2 then travel through each event_id and do the update for TABLE1. This removes the side effects of using IN predicate. Also I have 1. set FILLFACTOR = 70 2. remove trigger and index before the function, create index based on event_id 3. add trigger and index back after run function. Regards, Haiming From: BladeOfLight16 [mailto:bladeofligh...@gmail.com] Sent: Monday, 15 July 2013 5:26 AM To: pgsql-general@postgresql.org; Haiming Zhang Subject: Re: [GENERAL] Update big table I don't believe you can use JOIN explicitly in this situation because it's an UPDATE, but I believe you can accomplish the same effect with FROM and WHERE. UPDATE table SET column1 = TRUE FROM table2 WHERE table1.event_id = table2.event_id; I would make sure there's an index on table2.event_id if it's not the PK. If it's part of a composite key, either make sure that table2.event_id is the first column in the index (PostgreSQL will use an index like that; right?) or create an index where it is. (Sorry. Forgot to hit Reply All.) On Sun, Jul 14, 2013 at 7:03 AM, Vincenzo Romano <vincenzo.rom...@notorand.it<mailto:vincenzo.rom...@notorand.it>> wrote: 2013/7/14 Haiming Zhang <haiming.zh...@redflex.com.au<mailto:haiming.zh...@redflex.com.au>>: > Hi All, > > > > I am using postgres 9.1, I have a question about updating big table. Here is > the basic information of this table. > > 1. This table has one primary key and other 11 columns. > > 2. It also has a trigger that before update records, > another table got updated first. > > 3. The has millions of records now. > 4. I am not allowed to delete records in this table when > UPDATE > > The problem is when I do a "Update" query it takes a long time to execute. > Eg. when I run query like this " update TABLE set column1 = true where > EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update the > whole table. In order to optimize the update speed. I tried the following > strategies: > > 1. create index based on primary key, column1 and > combination of primary key and column1. > > 2. Alter FILLFACTOR = 70, vacuum all and then reindex > > 3. drop trigger before update > > Then I use "EXPLAIN" to estimate query plan, all of the above strategies do > not improve the UPDATE speed dramatically. > > > > Please comments on my three strategies (eg, does I index too many columns in > 1?) and please advise me how to improve the update speed. Any advice is > welcomed. I appreciate all you help. > > > > Thanks, > > > > Regards, > > Haiming A JOIN would solve your speed problem. The IN() predicate is the cause. AFAIK. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general ________________________________ If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.