po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent <boekewurm+postg...@gmail.com> napsal: > > Hi, > > With [0] we got COPY progress reporting. Before the column names of > this newly added view are effectively set in stone with the release of > pg14, I propose the following set of relatively small patches. These > are v2, because it is a patchset that is based on a set of patches > that I previously posted in [0]. > > 0001 Adds a column to pg_stat_progress_copy which details the amount > of tuples that were excluded from insertion by the WHERE clause of the > COPY FROM command. > > 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead > of 'line'-terminology. 'Line' doesn't make sense in the binary copy > case, and only for the 'text' copy format there can be a guarantee > that the source / output file actually contains the reported amount of > lines, whereas the amount of data tuples (which is also what it's > called internally) is guaranteed to equal for all data types. > > There was some discussion about this in [0] where the author thought > 'line' is more consistent with the CSV documentation, and where I > argued that 'tuple' is both more consistent with the rest of the > progress reporting tables and more consistent with the actual counted > items: these are the tuples serialized / inserted (as noted in the CSV > docs; "Thus the files are not strictly one line per table row like > text-format files."). > > Patch 0003 adds backlinks to the progress reporting docs from the docs > of the commands that have progress reporting (re/index, cluster, > vacuum, etc.) such that progress reporting is better discoverable from > the relevant commands, and removes the datname column from the > progress_copy view (that column was never committed). This too should > be fairly trivial and uncontroversial. > > 0004 adds the 'command' column to the progress_copy view; which > distinguishes between COPY FROM and COPY TO. The two commands are (in > my opinion) significantly different enough to warrant this column; > similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY] > which also report that information. I believe that this change is > appropriate; as the semantics of the columns change depending on the > command being executed. > > Lastly, 0005 adds 'io_target' to the reported information, that is, > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily > be determined based on the commands in pg_stat_activity, it is > reasonably something that a user would want to query on, as the > origin/target of COPY has security and performance implications, > whereas other options (e.g. format) are less interesting for clients > that are not executing that specific COPY command.
I took a little deeper look and I'm not sure if I understand FILE and STDIO. I have finally tried to finalize some initial regress testing of COPY command progress using triggers. I have attached the initial patch applicable to your changes. As you can see COPY FROM STDIN is reported as FILE. That's probably expected, but it is a little confusing for me since STDIN and STDIO sound similar. What is the purpose of STDIO? When is the COPY command reported with io_target of STDIO? > Of special interest in 0005 is that it reports the io_target for the > logical replications' initial tablesyncs' internal COPY. This would > otherwise be measured, but no knowledge about the type of copy (or its > origin) would be available on the worker's side. I'm not married to > this patch 0005, but I believe it could be useful, and therefore > included it in the patchset. > > > With regards, > > Matthias van de Meent. > > > [0] > https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com
From d91fbe3a80c1858e9cbc126e4ade44945cea9ae9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.sima...@gmail.com> Date: Tue, 9 Feb 2021 09:27:21 +0100 Subject: [PATCH] Add initial regress test for COPY progress reporting. --- src/test/regress/expected/progress.out | 43 ++++++++++++++++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/progress.sql | 39 +++++++++++++++++++++++ 3 files changed, 83 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/progress.out create mode 100644 src/test/regress/sql/progress.sql diff --git a/src/test/regress/expected/progress.out b/src/test/regress/expected/progress.out new file mode 100644 index 000000000000..2c745b946366 --- /dev/null +++ b/src/test/regress/expected/progress.out @@ -0,0 +1,43 @@ +-- setup for COPY progress testing +CREATE TEMP TABLE test_progress_with_trigger ( + a int, + b text +) ; +CREATE OR REPLACE function notice_after_progress_reporting() RETURNS trigger AS +$$ +DECLARE report record; +BEGIN + SELECT INTO report * FROM pg_stat_progress_copy report WHERE pid = pg_backend_pid(); + raise info 'progress datname: %', report.datname; + raise info 'progress command: %', report.command; + raise info 'progress io_target: %', report.io_target; + raise info 'progress bytes_processed: %', report.bytes_processed; + raise info 'progress bytes_total: %', report.bytes_total; + raise info 'progress tuples_processed: %', report.tuples_processed; + raise info 'progress tuples_excluded: %', report.tuples_excluded; + + RETURN NEW; +END; +$$ LANGUAGE plpgsql; +CREATE OR REPLACE TRIGGER check_after_progress_reporting + AFTER INSERT ON test_progress_with_trigger + FOR EACH ROW + EXECUTE FUNCTION notice_after_progress_reporting(); +-- simple COPY from STDIN +COPY test_progress_with_trigger (a, b) FROM STDIN; +INFO: progress datname: regression +INFO: progress command: COPY FROM +INFO: progress io_target: FILE +INFO: progress bytes_processed: 9 +INFO: progress bytes_total: 0 +INFO: progress tuples_processed: 1 +INFO: progress tuples_excluded: 0 +-- COPY from STDIN with WHERE skipping lines +COPY test_progress_with_trigger (a, b) FROM STDIN WHERE a > 1; +INFO: progress datname: regression +INFO: progress command: COPY FROM +INFO: progress io_target: FILE +INFO: progress bytes_processed: 18 +INFO: progress bytes_total: 0 +INFO: progress tuples_processed: 1 +INFO: progress tuples_excluded: 1 diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 12bb67e49118..e445ad24f9ed 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -97,7 +97,7 @@ test: publication subscription # ---------- # Another group of parallel tests # ---------- -test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock indirect_toast equivclass +test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock indirect_toast equivclass progress # ---------- # Another group of parallel tests (JSON related) diff --git a/src/test/regress/sql/progress.sql b/src/test/regress/sql/progress.sql new file mode 100644 index 000000000000..ef5d95056197 --- /dev/null +++ b/src/test/regress/sql/progress.sql @@ -0,0 +1,39 @@ +-- setup for COPY progress testing +CREATE TEMP TABLE test_progress_with_trigger ( + a int, + b text +) ; + +CREATE OR REPLACE function notice_after_progress_reporting() RETURNS trigger AS +$$ +DECLARE report record; +BEGIN + SELECT INTO report * FROM pg_stat_progress_copy report WHERE pid = pg_backend_pid(); + raise info 'progress datname: %', report.datname; + raise info 'progress command: %', report.command; + raise info 'progress io_target: %', report.io_target; + raise info 'progress bytes_processed: %', report.bytes_processed; + raise info 'progress bytes_total: %', report.bytes_total; + raise info 'progress tuples_processed: %', report.tuples_processed; + raise info 'progress tuples_excluded: %', report.tuples_excluded; + + RETURN NEW; +END; +$$ LANGUAGE plpgsql; + +CREATE OR REPLACE TRIGGER check_after_progress_reporting + AFTER INSERT ON test_progress_with_trigger + FOR EACH ROW + EXECUTE FUNCTION notice_after_progress_reporting(); + +-- simple COPY from STDIN +COPY test_progress_with_trigger (a, b) FROM STDIN; +1 test_1 +\. + +-- COPY from STDIN with WHERE skipping lines +COPY test_progress_with_trigger (a, b) FROM STDIN WHERE a > 1; +1 test_1 +2 test_2 +\. +