Don't know if everybody on this list has been paying attention to the pgsql-bugs thread about bug #6699. The shortest example of the problem is
create table t1 (f1 int primary key, f2 text); create view v1 as select f1, f2 from t1 group by f1; The view's query is legal only because of the primary key on f1, else the reference to f2 would be considered inadequately grouped. So when we create the view we mark it as dependent on that primary key (or more accurately, the view's _RETURN rule is so marked). This all works fine so far as the backend is concerned: you can't drop the PK without dropping or redefining the view. But it gives pg_dump indigestion. If you do "pg_dump -Fc | pg_restore -l -v" you see (relevant entries only): 5; 2615 2200 SCHEMA - public postgres PRE_DATA 168; 1259 41968 TABLE public t1 postgres PRE_DATA ; depends on: 5 1921; 2606 41975 CONSTRAINT public t1_pkey postgres POST_DATA ; depends on: 168 168 169; 1259 41976 VIEW public v1 postgres PRE_DATA ; depends on: 1919 5 1922; 0 41968 TABLE DATA public t1 postgres DATA ; depends on: 168 Now, there are two not-nice things about this. First, the view is shown as depending on "object 1919", which is pg_dump's internal DumpId for the view's _RETURN rule --- but that's nowhere to be seen in the dump, so the fact that it's dependent on the t1_pkey constraint is not visible in the dump. This puts parallel pg_restore at risk of doing the wrong thing. Second, because view definitions are preferentially dumped in the PRE_DATA section, the t1_pkey constraint has been hoisted up to come before the table data. That means that in an ordinary serial restore, the index will be created before the table's data is loaded, which is undesirable for efficiency reasons. It gets worse though. I've labeled the above archive items with their "SECTION" codes (which for some reason pg_restore -l -v doesn't print) so that you can see that we've got a POST_DATA item that must come before a PRE_DATA item. This wreaks havoc with the entire concept of splitting dump files into three sections. When I first realized that, I was thinking that we would have to revert the --section flag we added to pg_dump/pg_restore in 9.2, for lack of any way to guarantee that the items can actually be restored if they are split according to sections. I think that we can fix it though. There are provisions in pg_dump already for splitting a view apart from its _RETURN rule --- and if the rule comes out as a separate object, it's POST_DATA. So at least in principle, we could fix things by dumping this scenario this way: SCHEMA public PRE_DATA TABLE t1 PRE_DATA TABLE v1 PRE_DATA (at this point it's a table not a view) TABLE DATA t1 DATA CONSTRAINT t1_pkey POST_DATA RULE for v1 POST_DATA (adding the rule turns v1 into a view) The problem is how to persuade pg_dump to do that; as the code stands, it will only break a view apart from its rule if that's necessary to break a circular dependency loop, and there is none in this example. Another point worth thinking about is that if --section is to be trusted at all, we need some way of guaranteeing that the dependency-sorting code won't happily create other similar cases. There is nothing in there right now that would think anything is wrong with an ordering that breaks the section division. I believe the right fix for both of these issues is to add knowledge of the section concept to the topological sort logic, so that an ordering that puts POST_DATA before DATA or PRE_DATA after DATA is considered to be a dependency-ordering violation. One way to do that is to add dummy "fencepost" objects to the sort, representing the start and end of the DATA section. However, these objects would need explicit dependency links to every other DumpableObject, so that doesn't sound very good from a performance standpoint. What I'm going to go look at is whether we can mark DumpableObjects with their SECTION codes at creation time (rather than adding that information at ArchiveEntry() time) and then have the topo sort logic take that marking into account in addition to the explicit dependency links. Assuming we can make that work, how far should it be back-patched? The --section issue is new in 9.2, but this would also take care of the restore problems for views with constraint dependencies, which are allowed as of 9.1, so I'm thinking we'd better put it into 9.1. The other problem is the bogus dependency IDs that pg_dump emits by virtue of not caring whether the dependency links to an object it's actually dumped. I posted a patch for that in the pgsql-bugs thread but pointed out that it would not work before 9.2 without additional surgery. If we fix the view vs constraint issue by adding section knowledge to the sort, I think we can maybe get away with not fixing the dependency IDs in back branches. parallel pg_restore is definitely at risk without the fix, but in the absence of any other known problem cases I'm inclined to not change the back branches more than we have to. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers