Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Jasen Betts
On 2013-04-27, Yang Zhang wrote: > On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic wrote: >> Optionaly you can run vacuum analyze after bulk operation... > > But wouldn't a bulk UPDATE touch many existing pages (say, 20% > scattered around) to mark rows as dead (per MVCC)? I guess it comes > down t

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Tom Lane
Yang Zhang writes: > You're right, we're only sequentially issuing (unprepared) UPDATEs. You definitely want to fix both parts of that, then. > If we ship many UPDATE statements per call to our DB API's execution > function (we're using Python's psycopg2 if that matters, but I think > that just

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Misa Simic
Well About best approach with large datasets - rarely there is "always true" best principle... You will always see there are a few ways - best one just test confirms - depends on many things like hardware os etc... Sometimes even depends on dataset for update... " CREATE TEMP TABLE tmp AS SELECT

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 3:06 AM, Misa Simic wrote: > I dont know - u can test :) I probably will, but I do have a huge stack of such experiments to run by now, and it's always tricky / takes care to get benchmarks right, avoid disk caches, etc. Certainly I think it would be helpful (or at least

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Misa Simic
I dont know - u can test :) In whole solution it is just one command different - so easy to test and compare... To me it doesnt sound as faster... Sounds as more operation needed what should be done... And produce more problems...i.e what with table foo? What if another table refference foo etc.

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 2:54 AM, Yang Zhang wrote: > On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic wrote: >> Hi, >> >> If dataset for update is large... >> >> Maybe best would be: >> >> From client machine, instead of sending update statements with data - export >> data to file ready for copy comma

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic wrote: > Hi, > > If dataset for update is large... > > Maybe best would be: > > From client machine, instead of sending update statements with data - export > data to file ready for copy command > Transfer file to the server where pg is running > Make pg

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Misa Simic
Hi, If dataset for update is large... Maybe best would be: >From client machine, instead of sending update statements with data - export data to file ready for copy command Transfer file to the server where pg is running Make pgsql function which Create temp table Copy to temp from the file Up

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 12:24 AM, Yang Zhang wrote: > On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane wrote: >> Yang Zhang writes: >>> It currently takes up to 24h for us to run a large set of UPDATE >>> statements on a database, which are of the form: >> >>> UPDATE table SET field1 = constant1, f

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane wrote: > Yang Zhang writes: >> It currently takes up to 24h for us to run a large set of UPDATE >> statements on a database, which are of the form: > >> UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE >> id = constid > >> (We'r

Re: [GENERAL] Optimizing bulk update performance

2013-04-26 Thread Tom Lane
Yang Zhang writes: > It currently takes up to 24h for us to run a large set of UPDATE > statements on a database, which are of the form: > UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE > id = constid > (We're just overwriting fields of objects identified by ID.) Fo

Re: [GENERAL] Optimizing bulk update performance

2013-04-26 Thread Yang Zhang
On Fri, Apr 26, 2013 at 7:01 PM, Gavin Flower wrote: > Please do not top post, the convention in these list are to add stuff at the > end, apart from comments interspersed to make use of appropriate context! Noted, thanks. Anyway, any performance hints are greatly appreciated. > > > On 27/04/13

Re: [GENERAL] Optimizing bulk update performance

2013-04-26 Thread Gavin Flower
Please do not top post, the convention in these list are to add stuff at the end, apart from comments interspersed to make use of appropriate context! On 27/04/13 13:35, Yang Zhang wrote: We're using Postgresql 9.1.9 on Ubuntu 12.04 on EBS volumes on m1.xlarge instances, which have: 15 GiB me

Re: [GENERAL] Optimizing bulk update performance

2013-04-26 Thread Yang Zhang
We're using Postgresql 9.1.9 on Ubuntu 12.04 on EBS volumes on m1.xlarge instances, which have: 15 GiB memory 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each) 64-bit platform (Yes, we're moving to EBS Optimized instances + Provisioned IOPS volumes, but prelim. benchmarks sugges

Re: [GENERAL] Optimizing bulk update performance

2013-04-26 Thread Gavin Flower
On 27/04/13 12:14, Yang Zhang wrote: It currently takes up to 24h for us to run a large set of UPDATE statements on a database, which are of the form: UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE id = constid (We're just overwriting fields of objects identified