* Simon Riggs (si...@2ndquadrant.com) wrote: > On 7 May 2013 01:18, Stephen Frost <sfr...@snowman.net> wrote: > > * Simon Riggs (si...@2ndquadrant.com) wrote: > >> If anybody really wanted to fix pg_dump, they could do. If that was so > >> important, why block this patch, but allow parallel pg_dump to be > >> committed without it? > > > > Because parallel pg_dump didn't make the problem any *worse*..? This > > does. > > Sorry, not accurate. Patch makes nothing *worse*.
I really don't understand that claim. It clearly increases the time between transaction start and when the locks are all acquired. If this is used programatically (instead of someone hand-typing commands into psql), perhaps it doesn't increase it very much, but increasing it at all is clearly going in the "worse" direction. > The existing API *can* be misused in the way you say, and so also > could pg_dump if the patch is allowed. And if the patch isn't allowed, then pg_dump isn't any worse off than it's always been. The corrollary is that pg_dump *is* worse off if the patch goes in. The patch is absolutely *encouraging* such misuse of this API because there's almost no other way to use this new option to pg_dump except as an abuse of the API. > However, there is no reason to suppose that such misuse would be > common; The concern is exactly this point. It wouldn't be simply common, were this patch applied and the option used, it would be the predominant case and tools would be built which either completely ignore the problem (most likely) or which attempt to duplicate what pg_dump already does, likely very poorly. > no reason why a timing gap would *necessarily* occur in the > way your previous example showed, or if it did why it would > necessarily present a problem for the user. Especially if we put > something in the docs. I don't believe papering over this with documentation is really a solution. I did suggest a few other options which don't seem to be getting much traction from anyone, so perhaps they're not workable, but I'll reiterate them for fun anyway: Provide a way for users (perhaps even pg_dump) to acquire locks on essentially everything in the DB. This could be a plpgsql function or perhaps even a new lock type. Grander ideas might also support the filtering options which pg_dump supports. Have pg_dump output the snapshot which it has acquired and shared, for others to then connect and use. It would do this after acquiring all of the locks, of course. > Snapshots have no WARNING on them. The documentation is certainly fallible, though this is certainly a problem which all applications need to address in some way. We address it as best we can in pg_dump today and the goal is to continue to do so. > There is no guarantee in any > transaction that the table you want will not be dropped before you try > to access it. Hence why we do the best we can in pg_dump by immediately locking the objects, once we've identified what they are. > > However, > > there really isn't a way for users to use this new option correctly- > > Not accurate. Then please articulate how they would use it correctly? Would they write a function which goes and acquires all of the locks which pg_dump would (assuming they've figured out what locks pg_dump needs), prior to exporting the snapshot and calling pg_dump? If so, perhaps we could simplify their lives by providing such a function for them to use instead of asking each user to write it? > The argument against this is essentially that we don't trust the user > to use it well, so we won't let them have it at all. Which makes no > sense since they already have this API and don't need our permission > to use it. All that blocking this patch does is to remove any chance > the user has of coordinating pg_dump with other actions; preventing > that causes more issues for the user and so doing nothing is not a > safe or correct either. A balanced viewpoint needs to include the same > level of analysis on both sides, not just a deep look at the worst > case on one side and claim everything is rosy with the current > situation. I'm not sure where the claim that everything is rosy was made; I've certainly not seen it nor made any claim that it is. This whole discussion started with Tom pointing out that pg_dump is essentially already busted and that this change would end up making that situation worse. Perhaps in very select cases users will do the right thing and acquire all the locks they need before exporting the snapshot and calling pg_dump, but that description of how to use this option correctly certainly wasn't articulated anywhere in the initial description or contents of the patch. There's also nothing done to make that any easier to do nor a way to move users using this system to a new methodology when we figure out how to make pg_dump do this better, thus putting us in a situation of *always* having this issue, even if we fix it for 'normal' pg_dump. Thanks, Stephen
signature.asc
Description: Digital signature