út 5. 1. 2021 v 0:46 odesílatel Tomas Vondra <tomas.von...@enterprisedb.com> napsal: > > Hi, > > I did take a quick look today, and I have a couple minor comments:
Hi! Thanks for your time. > 1) The catalog sgml docs seem to mention bytes_processed twice (one of > that should be bytes_total), and line_processed (should be "lines_"). Fixed in attached patch. > > 2) I'm not quite sure about not including any info about the command. > For example pg_stat_progress_create_index includes info about the > command, although it's not very detailed. Not sure how useful would it > be to show just COPY FROM / COPY TO, without more details. > > It's probably possible to extract this from pg_stat_activity, but that > may be rather cumbersome (parsing arbitrary SQL and all that). Also, > what if the COPY is called from a function etc.? Any idea where to discuss this? My usecase is really simple. I would like to be able to see progress of COPY command by pid. There is a lot of COPY info inside CopyToStateData and CopyFromStateData structs. The only limitation I see is support of only int64 values for progress reporting columns. I'm not sure if it is easily possible to expose for example filename this way. > > 3) I wonder if we should have something like lines_estimated. For COPY > TO it's pretty simple to calculate it as > > bytes_total / (bytes_processed / lines_processed) > > but what about > > COPY (query) TO file > > In that case we don't know the total amount of bytes / rows, so we can't > calculate any estimate. So maybe we could peek into the query plan? But > I agree this is something we can add later. If I remember well one of the original ideas was to be able to pass custom bytes_total (or lines_total) by COPY options to be stored in copy progress. I can add this in some following patch if still welcomed. For estimates I would prefer to add an additional column to not mix those two together (or at least boolean estimated = true/false and reuse bytes_total column). If query estimates are welcomed, I can take a look at how to reach the query plan and expose those numbers when the query is used to estimated_lines and potentially estimated_bytes columns. It would be probably a little tricky to calculate estimated_bytes for some column types. Also currently only COPY FROM file supports bytes_total (it is 0 for all other scenarios). I think it should be possible for PostgreSQL to know the actual amount of lines query returns for some kind of queries, but I have no idea where to look at this. If that's possible to get, it would be one of the next steps to introduce additional column lines_total. > > 4) This comment is a bit confusing, as it mixes "total" and "processed". > I'd just say "number of bytes processed so far" instead. > Fixed in attached patch. > > Other than that, it seems fine. I'm sure we could add more features, but > it seems like a good start - I plan to get this committed once I get a > patch fixing the docs issues. Patch is attached, it should be applied to the top of the previous patch. Overall patch (having both patches merged together) could be found at https://patch-diff.githubusercontent.com/raw/simi/postgres/pull/6.patch. > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
From 6d2ee68b227c05ce4d1eb95a4c4a9c4f7dd6fbfa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Josef=20=C5=A0im=C3=A1nek?= <josef.sima...@gmail.com> Date: Tue, 5 Jan 2021 02:07:03 +0100 Subject: [PATCH] Fix docs and comment. --- doc/src/sgml/monitoring.sgml | 4 ++-- src/include/commands/copyfrom_internal.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 51d261defd94..875133303e19 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -6477,7 +6477,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, <row> <entry role="catalog_table_entry"><para role="column_definition"> - <structfield>bytes_processed</structfield> <type>bigint</type> + <structfield>bytes_total</structfield> <type>bigint</type> </para> <para> Size of source file for <command>COPY FROM</command> command in bytes. It is set to 0 if not available. @@ -6486,7 +6486,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, <row> <entry role="catalog_table_entry"><para role="column_definition"> - <structfield>line_processed</structfield> <type>bigint</type> + <structfield>lines_processed</structfield> <type>bigint</type> </para> <para> Number of lines already processed by <command>COPY</command> command. diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index ae76be295a9b..80fac1e58a12 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -154,7 +154,7 @@ typedef struct CopyFromStateData char *raw_buf; int raw_buf_index; /* next byte to process */ int raw_buf_len; /* total # of bytes stored */ - uint64 bytes_processed;/* total # of bytes processed, used for progress reporting */ + uint64 bytes_processed;/* number of bytes processed so far */ /* Shorthand for number of unconsumed bytes available in raw_buf */ #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) } CopyFromStateData;