[BUGS] pg_dump is O(N) in DB table count N even if dumping only one table
Hi, pg_dump takes O(N) time dumping just one table (or a few) explicitly specified with a -t parameter. It thus becomes painfully slow on a database with very many tables. (The use case is copying a few tables over to a test DB, from a large production data warehouse.) The three queries taking O(N) time are listed below. AFAICT each of these queries could be filtered by table name/OID, at least when the number of tables matching the -t parameters is small, allowing pg_dump to complete in seconds rather than minutes. SELECT c.tableoid, c.oid, c.relname, c.relacl, c.relkind, c.relnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname, c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, c.relhasoids, c.relfrozenxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid, c.relpersistence, CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, array_to_string(c.reloptions, ', ') AS reloptions, array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ', ') AS toast_reloptions FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = 'S' AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype = 'a') LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) WHERE c.relkind in ('r', 'S', 'v', 'c', 'f') ORDER BY c.oid SELECT tableoid, oid, typname, typnamespace, '{=U}' AS typacl, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = typowner) AS rolname, typinput::oid AS typinput, typoutput::oid AS typoutput, typelem, typrelid, CASE WHEN typrelid = 0 THEN ' '::"char" ELSE (SELECT relkind FROM pg_class WHERE oid = typrelid) END AS typrelkind, typtype, typisdefined, typname[0] = '_' AND typelem != 0 AND (SELECT typarray FROM pg_type te WHERE oid = pg_type.typelem) = oid AS isarray FROM pg_type SELECT classid, objid, refclassid, refobjid, deptype FROM pg_depend WHERE deptype != 'p' AND deptype != 'e' ORDER BY 1,2 Cheers, Gulli
Re: [BUGS] pg_dump is O(N) in DB table count N even if dumping only one table
Gunnlaugur Thor Briem writes: > pg_dump takes O(N) time dumping just one table (or a few) explicitly > specified with a -t parameter. It thus becomes painfully slow on a database > with very many tables. This is not a bug. It needs information about all the tables anyway to deal with dependencies (possible inheritance and similar situations). Having said that, it does look like getTables is pulling back a lot of info that we don't need *yet*, and would never need if we conclude we don't need to dump the table. Possibly some of this work could usefully be postponed to, say, getTableAttrs. OTOH, if that makes the normal dump-everything case noticeably slower, it's unlikely such a patch would get accepted. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] pg_dump is O(N) in DB table count N even if dumping only one table
On 2013-06-10 13:28:32 +, Gunnlaugur Thor Briem wrote: > Hi, > > pg_dump takes O(N) time dumping just one table (or a few) explicitly > specified with a -t parameter. It thus becomes painfully slow on a database > with very many tables. > > (The use case is copying a few tables over to a test DB, from a large > production data warehouse.) > > The three queries taking O(N) time are listed below. AFAICT each of these > queries could be filtered by table name/OID, at least when the number of > tables matching the -t parameters is small, allowing pg_dump to complete in > seconds rather than minutes. Which of those queries take how long in your case? E.g. I could imagine that the first two can be sped up a bit without changing the result. E.g. not calling pg_tablespace_location() for every relation and such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] pg_dump is O(N) in DB table count N even if dumping only one table
OK, that was what I thought at first, but then I read this note in `man pg_dump`: When -t is specified, pg_dump makes no attempt to dump any other database objects that the selected table(s) might depend upon. so I supposed that that dependency information was *not* required. So I posted the bug. Is that note then outdated/incorrect? Or am I mistaken in finding it at odds with your explanation? Regards, Gulli On Mon, Jun 10, 2013 at 2:04 PM, Tom Lane wrote: > Gunnlaugur Thor Briem writes: > > pg_dump takes O(N) time dumping just one table (or a few) explicitly > > specified with a -t parameter. It thus becomes painfully slow on a > database > > with very many tables. > > This is not a bug. It needs information about all the tables anyway > to deal with dependencies (possible inheritance and similar situations). > > Having said that, it does look like getTables is pulling back a lot of > info that we don't need *yet*, and would never need if we conclude we > don't need to dump the table. Possibly some of this work could usefully > be postponed to, say, getTableAttrs. OTOH, if that makes the normal > dump-everything case noticeably slower, it's unlikely such a patch would > get accepted. > > regards, tom lane >
Re: [BUGS] pg_dump is O(N) in DB table count N even if dumping only one table
On Mon, Jun 10, 2013 at 2:08 PM, Andres Freund wrote: > On 2013-06-10 13:28:32 +, Gunnlaugur Thor Briem wrote: > > The three queries taking O(N) time are listed below. AFAICT each of these > > Which of those queries take how long in your case? > They were as follows: duration: 27770.917 ms statement: SELECT c.tableoid, c.oid, c.relname, ... duration: 28133.407 ms statement: SELECT tableoid, oid, typname, ... duration: 55751.996 ms statement: SELECT classid, objid, refclassid, ... However, that was running pg_dump remotely over a slow network, and that may exaggerate durations since it's a bunch of data. So I ran this again locally on the production server: duration: 16012.518 ms statement: SELECT c.tableoid, c.oid, c.relname, ... duration: 16516.708 ms statement: SELECT tableoid, oid, typname, ... duration: 13400.694 ms statement: SELECT classid, objid, refclassid, ... Duration decrease is probably *not* significantly due to a warmer cache, because I had two runs over the slow network and their durations were similar (actually longer the second time). Gulli
Re: [BUGS] pg_dump is O(N) in DB table count N even if dumping only one table
Hi, On 2013-06-10 14:24:14 +, Gunnlaugur Thor Briem wrote: > So I ran this again locally on the production server: > > duration: 16012.518 ms statement: SELECT c.tableoid, c.oid, c.relname, ... > duration: 16516.708 ms statement: SELECT tableoid, oid, typname, ... > duration: 13400.694 ms statement: SELECT classid, objid, refclassid, ... > > Duration decrease is probably *not* significantly due to a warmer cache, > because I had two runs over the slow network and their durations were > similar (actually longer the second time). Hm. 13s for pg_depend locally vs 55s remotely. You need to have a tremendous amount of dependencies. Could you do a count(*) of pg_depend, pg_type and pg_class? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] pg_dump is O(N) in DB table count N even if dumping only one table
On Mon, Jun 10, 2013 at 2:38 PM, Andres Freund wrote: > Hm. 13s for pg_depend locally vs 55s remotely. You need to have a > tremendous amount of dependencies. > Could you do a count(*) of pg_depend, pg_type and pg_class? > Yep, there's rather a lot: select count(*) from pg_depend; 7692365 select count(*) from pg_type; 130 select count(*) from pg_class; 1788727 It's a very-broad-scope heterogeneous data warehouse, hence the unusual table count. PostgreSQL has displayed tremendous grace under this fire, the only trouble we've encountered is small niggles like tab completion in psql and pg_dump performance (and the latter was helped a lot by the pg_dump perf improvements in 9.1.4 and 9.2, for larger dumps). All the same we will undoubtedly end up spreading all this to more instances/clusters eventually (the dependency structure allows for that). Gulli
Re: [BUGS] bug in Prepared statement with DELETE RETURNING and rule on view
[ got around to looking at this thread finally ] Amit Kapila writes: > What happens when you change ON DELETE rule of the view that really deletes > elements is that command type after applying rule remains same which means > Delete, so it can set the Tag. The point here is that in extended-query mode, we've defined that only the same statement that sets the command tag can return RETURNING rows. In the case at hand, the original DELETE isn't executed at all, being replaced by an UPDATE according to the rule. But we don't change the returned command tag to UPDATE, and we don't let the UPDATE's RETURNING clause return anything to the client. Both of these rules are meant to ensure unsurprising behavior as seen from the client side. We could debate changing them, but I'd be pretty worried about breaking user applications if we did. At the same time, things don't look terribly consistent because in psql (which uses simple query protocol) you *do* see the RETURNING results. That's because simple query protocol doesn't have a restriction that only one resultset can be returned from a single query. So it's a lot more wild-west as to what will really happen, and application code is expected to just deal with that. psql doesn't have a problem with multiple query results because it doesn't particularly care what they are; it's just going to print each one. Apps that are supposed to actually make sense of the data have more of an issue with that. The extended query protocol was explicitly designed to lock things down better so that interactions would be more predictable. The main thing I'm noticing in looking at this is that the documentation doesn't seem to explain anywhere the restriction to getting RETURNING results back from only the primary query. We ought to fix that. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
Hi, On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev wrote: > On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund wrote: >> The attached patch fixes this although I don't like the way it knowledge of >> the >> point up to which StartupSUBTRANS zeroes pages is handled. > > One month has passed since the patched version was installed in our > production environment and can confirm that everything works perfect. > Thank you very much for your prompt help, Andres. Are there any plans to commit this patch and what version it is going to be done to? Thank you. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray...@gmail.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 11 June 2013 04:36, Sergey Konoplev wrote: > Hi, > > On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev wrote: >> On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund >> wrote: >>> The attached patch fixes this although I don't like the way it knowledge of >>> the >>> point up to which StartupSUBTRANS zeroes pages is handled. >> >> One month has passed since the patched version was installed in our >> production environment and can confirm that everything works perfect. >> Thank you very much for your prompt help, Andres. > > Are there any plans to commit this patch and what version it is going > to be done to? > > Thank you. I'll be committing this soon, since we're likely coming up to the next point release soon. Thanks for the reminder. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs