Ășt 17. 11. 2020 v 22:53 odesĂ­latel Justin Pryzby <pry...@telsasoft.com>
napsal:

> On Wed, Nov 11, 2020 at 06:49:43AM +0100, Pavel Stehule wrote:
> > Perhaps this feature could co-exist with a full blown configuration for
> > >> pg_dump, but even then there's certainly issues with what's proposed-
> > >> how would you handle explicitly asking for a table which is named
> > >> "  mytable" to be included or excluded?  Or a table which has a
> newline
> > >> in it?  Using a standardized format which supports the full range of
> > >> what we do in a table name, explicitly and clearly, would address
> these
> > >> issues and also give us the flexibility to extend the options which
> > >> could be used through the configuration file beyond just the filters
> in
> > >> the future.
>
> I think it's a reasonable question - why would a new configuration file
> option
> include support for only a handful of existing arguments but not the rest.
>

I don't see a strong technical problem - enhancing parsing is not hard
work, but I miss a use case for this. The option "--filter" tries to solve
a problem with limited command line size. This is a clean use case and
there and supported options are options that can be used repeatedly on the
command line. Nothing less, nothing more. The format that is used is
designed just for this purpose.

When we would implement an alternative configuration to command line and
system environments, then the use case should be defined first. When the
use case is defined, we can talk about implementation and about good
format. There are a lot of interesting formats, but I miss a reason why the
usage of this alternative configuration can be helpful for pg_dump. Using
external libraries for richer formats means a new dependency, necessity to
solve portability issues, and maybe other issues, and for this there should
be a good use case. Passing a list of tables for dumping doesn't need a
rich format.

I cannot imagine using a config file with generated object names and some
other options together. Maybe if these configurations will not be too long
(then handy written) configuration can be usable. But when I think about
using pg_dump from some bash scripts, then much more practical is using
usual command line options and passing a list of objects by pipe. I really
miss the use case for special pg_dump's config file, and if there is, then
it is very different from a use case for "--filter" option.


> > > This is the correct argument - I will check a possibility to use
> strange
> > > names, but there is the same possibility and functionality like we
> allow
> > > from the command line. So you can use double quoted names. I'll check
> it.
> >
> > I checked
> > echo "+t \"bad Name\"" | /usr/local/pgsql/master/bin/pg_dump
> --filter=/dev/stdin
> > It is working without any problem
>
> I think it couldn't possibly work with newlines, since you call
> pg_get_line().
> I realize that entering a newline into the shell would also be a PITA, but
> that
> could be one *more* reason to support a config file - to allow terrible
> table
> names to be in a file and avoid writing dash tee quote something enter else
> quote in a pg_dump command, or shell script.
>

New patch is working with names that contains multilines

[pavel@localhost postgresql.master]$ psql -At -X -c "select '+t ' ||
quote_ident(table_name) from information_schema.tables where table_name
like 'foo%'"|  /usr/local/pgsql/master/bin/pg_dump --filter=/dev/stdin
--
-- PostgreSQL database dump
--

-- Dumped from database version 14devel
-- Dumped by pg_dump version 14devel

-
-- Name: foo boo; Type: TABLE; Schema: public; Owner: pavel
--

CREATE TABLE public."foo
boo" (
    a integer
);


ALTER TABLE public."foo
boo" OWNER TO pavel;

--
-- Data for Name: foo boo; Type: TABLE DATA; Schema: public; Owner: pavel
--

COPY public."foo
boo" (a) FROM stdin;
\.


--
-- PostgreSQL database dump complete
--


> I fooled with argument parsing to handle reading from a file in the
> quickest
> way.  As written, this fails to handle multiple config files, and special
> table
> names, which need to support arbitrary, logical lines, with quotes
> surrounding
> newlines or other special chars.  As written, the --config file is parsed
> *after* all other arguments, so it could override previous args (like
> --no-blobs --no-blogs, --file, --format, --compress, --lock-wait), which I
> guess is bad, so the config file should be processed *during* argument
> parsing.
> Unfortunately, I think that suggests duplicating parsing of all/most the
> argument parsing for config file support - I'd be happy if someone
> suggested a
> better way.
>
> BTW, in your most recent patch:
> s/empty rows/empty lines/
> unbalanced parens: "invalid option type (use [+-]"
>

should be fixed now, thank you for check

Regards

Pavel



> @cfbot: I renamed the patch so please ignore it.
>
> --
> Justin
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 0aa35cf0c3..068821583f 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -751,6 +751,56 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>--filter=<replaceable class="parameter">filename</replaceable></option></term>
+      <listitem>
+       <para>
+        Read objects filters from the specified file.
+        If you use "-" as a filename, the filters are read from stdin.
+        The lines of this file must have the following format:
+<synopsis>
+(+|-)[tnfd] <replaceable class="parameter">objectname</replaceable>
+</synopsis>
+       </para>
+
+       <para>
+        The first character specifies whether the object is to be included
+        (<literal>+</literal>) or excluded (<literal>-</literal>), and the
+        second character specifies the type of object to be filtered:
+        <literal>t</literal> (table),
+        <literal>n</literal> (schema),
+        <literal>f</literal> (foreign server),
+        <literal>d</literal> (table data).
+       </para>
+
+       <para>
+        With the following filter file, the dump would include table
+        <literal>mytable1</literal> and data from foreign tables of
+        <literal>some_foreign_server</literal> foreign server, but exclude data
+        from table <literal>mytable2</literal>.
+<programlisting>
++t mytable1
++f some_foreign_server
+-d mytable2
+</programlisting>
+       </para>
+
+       <para>
+        The lines starting with symbol <literal>#</literal> are ignored.
+        Previous white chars (spaces, tabs) are not allowed. These
+        lines can be used for comments, notes. Empty lines are ignored too.
+       </para>
+
+       <para>
+        The <option>--filter</option> option works just like the other
+        options to include or exclude tables, schemas, table data, or foreign
+        tables, and both forms may be combined.  Note that there are no options
+        to exclude a specific foreign table or to include a specific table's
+        data.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>--if-exists</option></term>
       <listitem>
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index dc1d41dd8d..074d57ad3c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -54,9 +54,11 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -294,6 +296,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(TableInfo *tbinfo);
+static void read_patterns_from_file(char *filename, DumpOptions *dopt);
 
 
 int
@@ -367,6 +370,7 @@ main(int argc, char **argv)
 		{"enable-row-security", no_argument, &dopt.enable_row_security, 1},
 		{"exclude-table-data", required_argument, NULL, 4},
 		{"extra-float-digits", required_argument, NULL, 8},
+		{"filter", required_argument, NULL, 12},
 		{"if-exists", no_argument, &dopt.if_exists, 1},
 		{"inserts", no_argument, NULL, 9},
 		{"lock-wait-timeout", required_argument, NULL, 2},
@@ -390,6 +394,7 @@ main(int argc, char **argv)
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
 		{"index-collation-versions-unknown", no_argument, &dopt.coll_unknown, 1},
+		{"include-foreign-data-file", required_argument, NULL, 17},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -607,6 +612,10 @@ main(int argc, char **argv)
 										  optarg);
 				break;
 
+			case 12:			/* filter implementation */
+				read_patterns_from_file(optarg, &dopt);
+				break;
+
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit_nicely(1);
@@ -1035,6 +1044,8 @@ help(const char *progname)
 			 "                               access to)\n"));
 	printf(_("  --exclude-table-data=PATTERN do NOT dump data for the specified table(s)\n"));
 	printf(_("  --extra-float-digits=NUM     override default setting for extra_float_digits\n"));
+	printf(_("  --filter=FILENAME            dump objects and data based on the filter expressions\n"
+			 "                               from the filter file\n"));
 	printf(_("  --if-exists                  use IF EXISTS when dropping objects\n"));
 	printf(_("  --include-foreign-data=PATTERN\n"
 			 "                               include data of foreign tables on foreign\n"
@@ -18635,3 +18646,219 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 	if (!res)
 		pg_log_warning("could not parse reloptions array");
 }
+
+/*
+ * Print error message and exit.
+ */
+static void
+exit_invalid_filter_format(FILE *fp, char *filename, char *message, char *line, int lineno)
+{
+	pg_log_error("invalid format of filter file \"%s\": %s",
+				 filename,
+				 message);
+
+	fprintf(stderr, "%d: %s\n", lineno, line);
+
+	if (fp != stdin)
+		fclose(fp);
+
+	exit_nicely(-1);
+}
+
+/*
+ * Read dumped object specification from file
+ */
+static void
+read_patterns_from_file(char *filename, DumpOptions *dopt)
+{
+	FILE	   *fp;
+	int			lineno = 0;
+	StringInfoData line;
+	PQExpBuffer quoted_name = NULL;
+
+	/* use "-" as symbol for stdin */
+	if (strcmp(filename, "-") != 0)
+	{
+		fp = fopen(filename, "r");
+		if (!fp)
+			fatal("could not open the input file \"%s\": %m",
+				  filename);
+	}
+	else
+		fp = stdin;
+
+	initStringInfo(&line);
+
+	while (pg_get_line_buf(fp, &line))
+	{
+		bool		is_include;
+		char		objecttype;
+		char	   *objectname;
+		char	   *str = line.data;
+
+		lineno += 1;
+
+		(void) pg_strip_crlf(str);
+
+		/* ignore empty lines */
+		if (*str == '\0')
+			continue;
+
+		/* when first char is hash, ignore whole line */
+		if (*str == '#')
+			continue;
+
+		if (str[1] == '\0')
+			exit_invalid_filter_format(fp,
+									   filename,
+									   "line too short",
+									   str,
+									   lineno);
+
+		if (str[0] == '+')
+			is_include = true;
+		else if (str[0] == '-')
+			is_include = false;
+		else
+			exit_invalid_filter_format(fp,
+									   filename,
+									   "invalid option type (use [+-])",
+									   str,
+									   lineno);
+
+		objecttype = str[1];
+		objectname = &str[2];
+
+		/* skip initial spaces */
+		while (isspace(*objectname))
+			objectname++;
+
+		if (*objectname == '\0')
+			exit_invalid_filter_format(fp,
+									   filename,
+									   "missing object name",
+									   str,
+									   lineno);
+
+		if (*objectname == '"')
+		{
+			PQExpBuffer		quoted_name;
+			char	   *ptr = objectname + 1;
+
+			quoted_name = createPQExpBuffer();
+
+			appendPQExpBufferChar(quoted_name, '"');
+
+			while (1)
+			{
+				if (*ptr == '\0')
+				{
+					if (!pg_get_line_buf(fp, &line))
+						exit_invalid_filter_format(fp,
+												   filename,
+												   "unexpected end of file",
+												   "",
+												   lineno);
+
+					if (ferror(fp))
+						fatal("could not read from file \"%s\": %m", filename);
+
+					appendPQExpBufferChar(quoted_name, '\n');
+					ptr = line.data;
+					lineno += 1;
+				}
+
+				appendPQExpBufferChar(quoted_name, *ptr);
+				if (*ptr++ == '"')
+				{
+					if (*ptr == '"')
+						ptr++;
+					else
+						break;
+				}
+			}
+
+			/* check garbage after identifier */
+			while (*ptr != '\0')
+			{
+				if (!isspace(*ptr))
+					exit_invalid_filter_format(fp,
+											   filename,
+											   "unexpected chars after object name",
+											   ptr,
+											   lineno);
+				ptr++;
+			}
+
+			objectname = quoted_name->data;
+		}
+
+		if (objecttype == 't')
+		{
+			if (is_include)
+			{
+				simple_string_list_append(&table_include_patterns,
+										  objectname);
+				dopt->include_everything = false;
+			}
+			else
+				simple_string_list_append(&table_exclude_patterns,
+										  objectname);
+		}
+		else if (objecttype == 'n')
+		{
+			if (is_include)
+			{
+				simple_string_list_append(&schema_include_patterns,
+										  objectname);
+				dopt->include_everything = false;
+			}
+			else
+				simple_string_list_append(&schema_exclude_patterns,
+										  objectname);
+		}
+		else if (objecttype == 'd')
+		{
+			if (is_include)
+				exit_invalid_filter_format(fp,
+										   filename,
+										   "include filter is not supported for this type of object",
+										   str,
+										   lineno);
+			else
+				simple_string_list_append(&tabledata_exclude_patterns,
+										  objectname);
+		}
+		else if (objecttype == 'f')
+		{
+			if (is_include)
+				simple_string_list_append(&foreign_servers_include_patterns,
+										  objectname);
+			else
+				exit_invalid_filter_format(fp,
+										   filename,
+										   "exclude filter is not supported for this type of object",
+										   str,
+										   lineno);
+		}
+		else
+			exit_invalid_filter_format(fp,
+									   filename,
+									   "invalid object type (use [tndf])",
+									   str,
+									   lineno);
+		if (quoted_name)
+		{
+			destroyPQExpBuffer(quoted_name);
+			quoted_name = NULL;
+		}
+	}
+
+	pfree(line.data);
+
+	if (ferror(fp))
+		fatal("could not read from file \"%s\": %m", filename);
+
+	if (fp != stdin)
+		fclose(fp);
+}
diff --git a/src/bin/pg_dump/t/004_pg_dump_filter.pl b/src/bin/pg_dump/t/004_pg_dump_filter.pl
new file mode 100644
index 0000000000..957442c5e4
--- /dev/null
+++ b/src/bin/pg_dump/t/004_pg_dump_filter.pl
@@ -0,0 +1,129 @@
+use strict;
+use warnings;
+
+use Config;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 22;
+
+my $tempdir       = TestLib::tempdir;
+my $inputfile;
+
+
+my $node = get_new_node('main');
+my $port = $node->port;
+my $backupdir = $node->backup_dir;
+my $plainfile = "$backupdir/plain.sql";
+
+$node->init;
+$node->start;
+
+$node->safe_psql('postgres', "CREATE TABLE table_one(a varchar)");
+$node->safe_psql('postgres', "CREATE TABLE table_two(a varchar)");
+$node->safe_psql('postgres', "CREATE TABLE table_three(a varchar)");
+$node->safe_psql('postgres', "INSERT INTO table_one VALUES('*** TABLE ONE ***')");
+$node->safe_psql('postgres', "INSERT INTO table_two VALUES('*** TABLE TWO ***')");
+$node->safe_psql('postgres', "INSERT INTO table_three VALUES('*** TABLE THREE ***')");
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+
+print $inputfile "+t table_one\n";
+print $inputfile "+t table_two\n";
+print $inputfile "# skip this line\n";
+print $inputfile "\n";
+print $inputfile "-d table_one\n";
+close $inputfile;
+
+my ($cmd, $stdout, $stderr, $result);
+command_ok(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	"dump tables with filter");
+
+my $dump = slurp_file($plainfile);
+
+ok($dump =~ qr/^CREATE TABLE public.table_one/m, "dumped table one");
+ok($dump =~ qr/^CREATE TABLE public.table_two/m, "dumped table two");
+ok($dump !~ qr/^CREATE TABLE public.table_three/m, "table three not dumped");
+ok($dump !~ qr/^COPY public.table_one/m, "content of table one is not included");
+ok($dump =~ qr/^COPY public.table_two/m, "content of table two is included");
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+
+print $inputfile "-t table_one\n";
+close $inputfile;
+
+command_ok(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	"dump tables with filter");
+
+$dump = slurp_file($plainfile);
+
+ok($dump !~ qr/^CREATE TABLE public.table_one/m, "table one not dumped");
+ok($dump =~ qr/^CREATE TABLE public.table_two/m, "dumped table two");
+ok($dump =~ qr/^CREATE TABLE public.table_three/m, "dumped table three");
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+
+print $inputfile "-n public\n";
+close $inputfile;
+
+command_ok(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	"dump tables with filter");
+
+$dump = slurp_file($plainfile);
+
+ok($dump !~ qr/^CREATE TABLE/m, "no table dumped");
+
+#########################################
+# For test of +f option we need created foreign server or accept
+# fail and check error
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+
+print $inputfile "+f doesnt_exists\n";
+close $inputfile;
+
+command_fails_like(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	qr/pg_dump: error: no matching foreign servers were found for pattern/,
+	"dump foreign server");
+
+#########################################
+# Test broken input format
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+print $inputfile "k";
+close $inputfile;
+
+command_fails_like(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	qr/pg_dump: error: invalid format of filter file/,
+	"broken format check");
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+print $inputfile "+";
+close $inputfile;
+
+command_fails_like(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	qr/pg_dump: error: invalid format of filter file/,
+	"broken format check");
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+print $inputfile "+d sometable";
+close $inputfile;
+
+command_fails_like(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	qr/include filter is not supported for this type of object/,
+	"broken format check");
+
+open $inputfile, '>', "$tempdir/inputfile.txt";
+print $inputfile "-f someforeignserver";
+close $inputfile;
+
+command_fails_like(
+	[ "pg_dump", '-p', $port, "-f", $plainfile, "--filter=$tempdir/inputfile.txt", 'postgres' ],
+	qr/exclude filter is not supported for this type of object/,
+	"broken format check");

Reply via email to