On 7/13/21 3:40 PM, Stephen Frost wrote:
Greetings,
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
[1] your proposal of "[+-] OBJTYPE OBJIDENT" plus empty lines allowed
plus lines starting with # are comments, seems plenty. Any line not
following that format would cause an error to be thrown.
I'd like to see some kind of keyword on each line, so that we could extend
the command set by adding new keywords. As this stands, I fear we'd end
up using random punctuation characters in place of [+-], which seems
pretty horrid from a readability standpoint.
I agree that it'd end up being bad with single characters.
The [+-] format is based on what rsync does, so there's at least some
precedent for that, and IMHO it's fairly readable. I agree the rest of
the rule (object type, ...) may be a bit more verbose.
I think that this file format should be designed with an eye to allowing
every, or at least most, pg_dump options to be written in the file rather
than on the command line. I don't say we have to *implement* that right
now; but if the format spec is incapable of being extended to meet
requests like that one, I think we'll regret it. This line of thought
suggests that the initial commands ought to match the existing
include/exclude switches, at least approximately.
I agree that we want to have an actual config file that allows just
about every pg_dump option. I'm also fine with saying that we don't
have to implement that initially but the format should be one which can
be extended to allow that.
I understand the desire to have a config file that may contain all
pg_dump options, but I really don't see why we'd want to mix that with
the file containing filter rules.
I think those should be separate, one of the reasons being that I find
it desirable to be able to "include" the filter rules into different
pg_dump configs. That also means the format for the filter rules can be
much simpler.
It's also not clear to me whether the single-file approach would allow
filtering not supported by actual pg_dump option, for example.
Hence I suggest
include table PATTERN
exclude table PATTERN
which ends up being the above but with words not [+-].
Work for me.
Which ends up inventing yet-another-file-format which people will end up
writing generators and parsers for. Which is exactly what I was arguing
we really should be trying to avoid doing.
People will have to write generators *in any case* because how else
would you use this? Unless we also provide tools to manipulate that file
(which seems rather futile), they'll have to do that. Even if we used
JSON/YAML/TOML/... they'd still need to deal with the semantics of the
file format.
FWIW I don't understand why would they need to write parsers. That's
something we'd need to do to process the file. I think the case when the
filter file needs to be modified is rather rare - it certainly is not
what the original use case Pavel tried to address needs. (I know that
customer and the filter would be generated and used for a single dump.)
My opinion is that the best solution (to make both generators and
parsers simple) is to keep the format itself as simple as possible.
Which is exactly why I'm arguing for only addressing the filtering, not
trying to invent a "universal" pg_dump config file format.
I definitely feel that we should have a way to allow anything that can
be created as an object in the database to be explicitly included in the
file and that means whatever we do need to be able to handle objects
that have names that span multiple lines, etc. It's not clear how the
above would. As I recall, the proposed patch didn't have anything for
handling that, which was one of the issues I had with it and is why I
bring it up again.
I really don't understand why you think the current format can't do
escaping/quoting or handle names spanning multiple lines. The fact that
the original patch did not handle that correctly is a bug, but it does
not mean the format can't handle that.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company