Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane wrote: > Yang Zhang writes: >> It currently takes up to 24h for us to run a large set of UPDATE >> statements on a database, which are of the form: > >> UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE >> id = constid > >> (We'r

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 12:24 AM, Yang Zhang wrote: > On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane wrote: >> Yang Zhang writes: >>> It currently takes up to 24h for us to run a large set of UPDATE >>> statements on a database, which are of the form: >> >>> UPDATE table SET field1 = constant1, f

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Misa Simic
Hi, If dataset for update is large... Maybe best would be: >From client machine, instead of sending update statements with data - export data to file ready for copy command Transfer file to the server where pg is running Make pgsql function which Create temp table Copy to temp from the file Up

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic wrote: > Hi, > > If dataset for update is large... > > Maybe best would be: > > From client machine, instead of sending update statements with data - export > data to file ready for copy command > Transfer file to the server where pg is running > Make pg

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 2:54 AM, Yang Zhang wrote: > On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic wrote: >> Hi, >> >> If dataset for update is large... >> >> Maybe best would be: >> >> From client machine, instead of sending update statements with data - export >> data to file ready for copy comma

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Misa Simic
I dont know - u can test :) In whole solution it is just one command different - so easy to test and compare... To me it doesnt sound as faster... Sounds as more operation needed what should be done... And produce more problems...i.e what with table foo? What if another table refference foo etc.

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 3:06 AM, Misa Simic wrote: > I dont know - u can test :) I probably will, but I do have a huge stack of such experiments to run by now, and it's always tricky / takes care to get benchmarks right, avoid disk caches, etc. Certainly I think it would be helpful (or at least

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Misa Simic
Well About best approach with large datasets - rarely there is "always true" best principle... You will always see there are a few ways - best one just test confirms - depends on many things like hardware os etc... Sometimes even depends on dataset for update... " CREATE TEMP TABLE tmp AS SELECT

[GENERAL] outgoing TCP from custom background worker?

2013-04-27 Thread Tobias Oberstein
Hi, is it possible to create and process an outgoing TCP connection from within a custom background worker process? Something like: // upon worker startup, enter loop .. conn = connectTCP("myhost"); while (data = conn.read()) { parse(data); // do SQL stuff via SPI and using "data", prod

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Tom Lane
Yang Zhang writes: > You're right, we're only sequentially issuing (unprepared) UPDATEs. You definitely want to fix both parts of that, then. > If we ship many UPDATE statements per call to our DB API's execution > function (we're using Python's psycopg2 if that matters, but I think > that just

Re: [GENERAL] Basic question on recovery and disk snapshotting

2013-04-27 Thread Tom Lane
Yang Zhang writes: > My question really boils down to: if we're interested in using COW > snapshotting (a common feature of modern filesystems and hosting > environments), would we necessarily need to ensure the data and > pg_xlog are on the same snapshotted volume? Yeah, I think so. It's possib

Re: [GENERAL] Basic question on recovery and disk snapshotting

2013-04-27 Thread Jeff Janes
On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang wrote: > On Sat, Apr 27, 2013 at 4:25 AM, Jov wrote: > > Are you sure the EBS snapshot is consistent? if the snapshot is not > > consistent,enven on the same volume,you will have prolbems with your > backup. > > I think so. EBS gives you "point-in-ti

Re: [GENERAL] Basic question on recovery and disk snapshotting

2013-04-27 Thread Yang Zhang
On Sat, Apr 27, 2013 at 11:55 AM, Jeff Janes wrote: > On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang wrote: >> My question really boils down to: if we're interested in using COW >> snapshotting (a common feature of modern filesystems and hosting >> environments), would we necessarily need to ensure

[GENERAL] DISTINCT ON changes sort order on its own it seems

2013-04-27 Thread Alexander Reichstadt
Hi, following a query: SELECT DISTINCT ON (msgid) msgid FROM (SELECT refid_messages as msgid FROM messagehistorywithcontent WHERE 1=1 AND (lower(ARRAY_TO_STRING(ARRAY[login,firstname,lastname,content,msgtitle], ' ')) LIKE '%gg%') ORDER BY messagekind DESC) as foo This query rearranges the sor

Re: [GENERAL] DISTINCT ON changes sort order on its own it seems

2013-04-27 Thread Tom Lane
Alexander Reichstadt writes: > following a query: > SELECT DISTINCT ON (msgid) msgid FROM (SELECT refid_messages as msgid FROM > messagehistorywithcontent WHERE 1=1 AND > (lower(ARRAY_TO_STRING(ARRAY[login,firstname,lastname,content,msgtitle], ' > ')) LIKE '%gg%') ORDER BY messagekind DESC) as

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Jasen Betts
On 2013-04-27, Yang Zhang wrote: > On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic wrote: >> Optionaly you can run vacuum analyze after bulk operation... > > But wouldn't a bulk UPDATE touch many existing pages (say, 20% > scattered around) to mark rows as dead (per MVCC)? I guess it comes > down t

Re: [GENERAL] regex help wanted

2013-04-27 Thread Jasen Betts
On 2013-04-25, Karsten Hilbert wrote: > On Thu, Apr 25, 2013 at 10:32:26AM -0400, Tom Lane wrote: > >> Karsten Hilbert writes: >> > What I don't understand is: Why does the following return a >> > substring ? >> >> >select substring ('junk $$ junk' from >> > '\$<[^<]+?::[^:]+?>\$'); >> >>

[GENERAL] pgAdmin shows two servers with the identical data

2013-04-27 Thread Bob Futrelle
I have two PG servers with the same data. I know the data is the same, because if I change a value in a table on one server, it changes the value in a table with the same name in the other server. in pgAdmin III: Properties for server Local (localhost:5432): Name: Local Host: localhost Por

Re: [GENERAL] Table containing only valid table names

2013-04-27 Thread Jasen Betts
On 2013-04-26, Michael Graham wrote: > Hi all, > > I'm trying to create a table that contains only valid table names. could you get by with a view off pg_catalog.pg_tables or information_schema.tables -- ⚂⚃ 100% natural -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)

Re: [GENERAL] pgAdmin shows two servers with the identical data

2013-04-27 Thread Jasen Betts
On 2013-04-28, Bob Futrelle wrote: > --001a11c2f448244d3504db64b5d7 > Content-Type: text/plain; charset=ISO-8859-1 > > I have two PG servers with the same data. > > I know the data is the same, because if I change a value in a table > > on one server, it changes the value in a table with the same

Re: [GENERAL] pgAdmin shows two servers with the identical data

2013-04-27 Thread Ian Lawrence Barwick
013/4/28 Bob Futrelle : > I have two PG servers with the same data. > > I know the data is the same, because if I change a value in a table > on one server, it changes the value in a table with the same > name in the other server. > > in pgAdmin III: > > Properties for server Local (localhost:5432)