On Tue, Apr 1, 2025 at 1:38 PM vignesh C <vignes...@gmail.com> wrote: > > On Tue, 1 Apr 2025 at 06:31, jian he <jian.universal...@gmail.com> wrote: > > > > On Mon, Mar 31, 2025 at 4:05 PM Kirill Reshke <reshkekir...@gmail.com> > > wrote: > > > > Thanks for doing the benchmark. > > Few comments to improve the comments, error message and remove > redundant assignment: > 1) How about we change below: > /* > * partition's rowtype might differ from the root table's. We must > * convert it back to the root table's rowtype as we are export > * partitioned table data here. > */ > To: > /* > * A partition's row type might differ from the root table's. > * Since we're exporting partitioned table data, we must > * convert it back to the root table's row type. > */ > i changed it to /* * A partition's rowtype might differ from the root table's. * Since we are export partitioned table data here, * we must convert it back to the root table's rowtype. */ Since many places use "rowtype", using "rowtype" instead of "row type" should be fine.
> 2) How about we change below: > + if (relkind == RELKIND_FOREIGN_TABLE) > + ereport(ERROR, > + > errcode(ERRCODE_WRONG_OBJECT_TYPE), > + errmsg("cannot > copy from foreign table \"%s\"", > + > RelationGetRelationName(rel)), > + > errdetail("partition \"%s\" is a foreign table", > RelationGetRelationName(rel)), > + errhint("Try > the COPY (SELECT ...) TO variant.")); > > To: > + if (relkind == RELKIND_FOREIGN_TABLE) > + ereport(ERROR, > + > errcode(ERRCODE_WRONG_OBJECT_TYPE), > + errmsg("cannot > copy from a partitioned table having foreign table partition \"%s\"", > + > RelationGetRelationName(rel)), > + > errdetail("partition \"%s\" is a foreign table", > RelationGetRelationName(rel)), > + errhint("Try > the COPY (SELECT ...) TO variant.")); > i am not so sure. since the surrounding code we have else if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot copy from foreign table \"%s\"", RelationGetRelationName(rel)), errhint("Try the COPY (SELECT ...) TO variant."))); let's see what others think. > 3) How about we change below: > /* > * rel: the relation to be copied to. > * root_rel: if not NULL, then the COPY partitioned relation to destination. > * processed: number of tuples processed. > */ > To: > /* > * rel: the relation from which data will be copied. > * root_rel: If not NULL, indicates that rel's row type must be > * converted to root_rel's row type. > * processed: number of tuples processed. > */ > i changed it to /* * rel: the relation from which the actual data will be copied. * root_rel: if not NULL, it indicates that we are copying partitioned relation * data to the destination, and "rel" is the partition of "root_rel". * processed: number of tuples processed. */ > 4) You can initialize processed to 0 along with declaration in > DoCopyTo function and remove the below: > + if (cstate->rel && cstate->rel->rd_rel->relkind == > RELKIND_PARTITIONED_TABLE) > { > ... > ... > processed = 0; > - while (table_scan_getnextslot(scandesc, > ForwardScanDirection, slot)) > ... > ... > - > - ExecDropSingleTupleTableSlot(slot); > - table_endscan(scandesc); > + } > + else if (cstate->rel) > + { > + processed = 0; > + CopyThisRelTo(cstate, cstate->rel, NULL, &processed); > } ok.
From 374f38d7187e92882e5fa4fe6e8b9dd7d7785ed9 Mon Sep 17 00:00:00 2001 From: jian he <jian.universality@gmail.com> Date: Fri, 4 Apr 2025 11:09:04 +0800 Subject: [PATCH v9 1/1] support COPY partitioned_table TO CREATE TABLE pp (id INT, val int ) PARTITION BY RANGE (id); create table pp_1 (val int, id int); create table pp_2 (val int, id int); ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5); ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10); insert into pp select g, 10 + g from generate_series(1,9) g; copy pp to stdout(header); the above case is much slower (around 25% in some case) than ``COPY (select * from pp) to stdout(header);``, because of column remaping. but this is still a new feature, since master does not support ``COPY (partitioned_table)``. reivewed by: vignesh C <vignesh21@gmail.com> reivewed by: David Rowley <dgrowleyml@gmail.com> reivewed by: Melih Mutlu <m.melihmutlu@gmail.com> reivewed by: Kirill Reshke <reshkekirill@gmail.com> discussion: https://postgr.es/m/CACJufxEZt+G19Ors3bQUq-42-61__C=y5k2wk=sHEFRusu7=iQ@mail.gmail.com commitfest entry: https://commitfest.postgresql.org/patch/5467/ --- doc/src/sgml/ref/copy.sgml | 8 +- src/backend/commands/copyto.c | 144 ++++++++++++++++++++++------ src/test/regress/expected/copy2.out | 20 ++++ src/test/regress/sql/copy2.sql | 17 ++++ 4 files changed, 156 insertions(+), 33 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index df093da97c5..b7d24f5d271 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -521,15 +521,15 @@ COPY <replaceable class="parameter">count</replaceable> <para> <command>COPY TO</command> can be used only with plain - tables, not views, and does not copy rows from child tables - or child partitions. For example, <literal>COPY <replaceable + tables, not views, and does not copy rows from child tables, + however <command>COPY TO</command> can be used with partitioned tables. + For example, in a table inheritance hierarchy, <literal>COPY <replaceable class="parameter">table</replaceable> TO</literal> copies the same rows as <literal>SELECT * FROM ONLY <replaceable class="parameter">table</replaceable></literal>. The syntax <literal>COPY (SELECT * FROM <replaceable class="parameter">table</replaceable>) TO ...</literal> can be used to - dump all of the rows in an inheritance hierarchy, partitioned table, - or view. + dump all of the rows in a table inheritance hierarchy, or view. </para> <para> diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 84a3f3879a8..d0aab894c36 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -19,6 +19,8 @@ #include <sys/stat.h> #include "access/tableam.h" +#include "access/table.h" +#include "catalog/pg_inherits.h" #include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" @@ -82,6 +84,7 @@ typedef struct CopyToStateData List *attnumlist; /* integer list of attnums to copy */ char *filename; /* filename, or NULL for STDOUT */ bool is_program; /* is 'filename' a program to popen? */ + List *partitions; /* oid list of partition oid for copy to */ copy_data_dest_cb data_dest_cb; /* function for writing data */ CopyFormatOptions opts; @@ -116,6 +119,8 @@ static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot); static void CopyAttributeOutText(CopyToState cstate, const char *string); static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote); +static void CopyThisRelTo(CopyToState cstate, Relation rel, + Relation root_rel, uint64 *processed); /* built-in format-specific routines */ static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc); @@ -643,6 +648,8 @@ BeginCopyTo(ParseState *pstate, PROGRESS_COPY_COMMAND_TO, 0 }; + List *children = NIL; + List *scan_oids = NIL; if (rel != NULL && rel->rd_rel->relkind != RELKIND_RELATION) { @@ -670,11 +677,35 @@ BeginCopyTo(ParseState *pstate, errmsg("cannot copy from sequence \"%s\"", RelationGetRelationName(rel)))); else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("cannot copy from partitioned table \"%s\"", - RelationGetRelationName(rel)), - errhint("Try the COPY (SELECT ...) TO variant."))); + { + children = find_all_inheritors(RelationGetRelid(rel), + AccessShareLock, + NULL); + + foreach_oid(childreloid, children) + { + char relkind = get_rel_relkind(childreloid); + + if (relkind == RELKIND_FOREIGN_TABLE) + { + char *relation_name; + + relation_name = get_rel_name(childreloid); + ereport(ERROR, + errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("cannot copy from foreign table \"%s\"", relation_name), + errdetail("Partition \"%s\" is a foreign table in the partitioned table \"%s.%s\"", + relation_name, RelationGetRelationName(rel), + get_namespace_name(rel->rd_rel->relnamespace)), + errhint("Try the COPY (SELECT ...) TO variant.")); + } + + if (RELKIND_HAS_PARTITIONS(relkind)) + continue; + + scan_oids = lappend_oid(scan_oids, childreloid); + } + } else ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), @@ -710,6 +741,7 @@ BeginCopyTo(ParseState *pstate, cstate->rel = rel; tupDesc = RelationGetDescr(cstate->rel); + cstate->partitions = list_copy(scan_oids); } else { @@ -1028,7 +1060,7 @@ DoCopyTo(CopyToState cstate) TupleDesc tupDesc; int num_phys_attrs; ListCell *cur; - uint64 processed; + uint64 processed = 0; if (fe_copy) SendCopyBegin(cstate); @@ -1066,36 +1098,25 @@ DoCopyTo(CopyToState cstate) cstate->routine->CopyToStart(cstate, tupDesc); - if (cstate->rel) + /* + * if COPY TO source table is a partitioned table, then open each + * partition and process each individual partition. + */ + if (cstate->rel && cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { - TupleTableSlot *slot; - TableScanDesc scandesc; - - scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL); - slot = table_slot_create(cstate->rel, NULL); - - processed = 0; - while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot)) + foreach_oid(scan_oid, cstate->partitions) { - CHECK_FOR_INTERRUPTS(); + Relation scan_rel; - /* Deconstruct the tuple ... */ - slot_getallattrs(slot); + scan_rel = table_open(scan_oid, AccessShareLock); - /* Format and send the data */ - CopyOneRowTo(cstate, slot); + CopyThisRelTo(cstate, scan_rel, cstate->rel, &processed); - /* - * Increment the number of processed tuples, and report the - * progress. - */ - pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED, - ++processed); + table_close(scan_rel, AccessShareLock); } - - ExecDropSingleTupleTableSlot(slot); - table_endscan(scandesc); } + else if (cstate->rel) + CopyThisRelTo(cstate, cstate->rel, NULL, &processed); else { /* run the plan --- the dest receiver will send tuples */ @@ -1113,6 +1134,71 @@ DoCopyTo(CopyToState cstate) return processed; } +/* + * rel: the relation from which the actual data will be copied. + * root_rel: if not NULL, it indicates that we are copying partitioned relation + * data to the destination, and "rel" is the partition of "root_rel". + * processed: number of tuples processed. +*/ +static void +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel, + uint64 *processed) +{ + TupleTableSlot *slot; + TableScanDesc scandesc; + AttrMap *map = NULL; + TupleTableSlot *root_slot = NULL; + TupleTableSlot *original_slot = NULL; + TupleDesc scan_tupdesc; + TupleDesc rootdesc = NULL; + + scan_tupdesc = RelationGetDescr(rel); + scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL); + slot = table_slot_create(rel, NULL); + + /* + * A partition's rowtype might differ from the root table's. + * Since we are export partitioned table data here, + * we must convert it back to the root table's rowtype. + */ + if (root_rel != NULL) + { + rootdesc = RelationGetDescr(root_rel); + map = build_attrmap_by_name_if_req(rootdesc, scan_tupdesc, false); + } + + while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot)) + { + CHECK_FOR_INTERRUPTS(); + + /* Deconstruct the tuple ... */ + if (map != NULL) + { + original_slot = slot; + root_slot = MakeSingleTupleTableSlot(rootdesc, &TTSOpsBufferHeapTuple); + slot = execute_attr_map_slot(map, slot, root_slot); + } + else + slot_getallattrs(slot); + + /* Format and send the data */ + CopyOneRowTo(cstate, slot); + + /* + * Increment the number of processed tuples, and report the + * progress. + */ + pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED, + ++(*processed)); + + if (original_slot != NULL) + ExecDropSingleTupleTableSlot(original_slot); + } + + ExecDropSingleTupleTableSlot(slot); + table_endscan(scandesc); +} + /* * Emit one row during DoCopyTo(). */ diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 64ea33aeae8..b01389df44c 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -929,3 +929,23 @@ truncate copy_default; -- DEFAULT cannot be used in COPY TO copy (select 1 as test) TO stdout with (default '\D'); ERROR: COPY DEFAULT cannot be used with COPY TO +-- COPY TO with partitioned table +CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id); +CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id); +CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id); +ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5); +ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10); +CREATE TABLE pp_15 (val int, id int); +CREATE TABLE pp_510 (val int,id int); +ALTER TABLE pp_1 ATTACH PARTITION pp_15 FOR VALUES FROM (1) TO (5); +ALTER TABLE pp_2 ATTACH PARTITION pp_510 FOR VALUES FROM (5) TO (10); +INSERT INTO pp SELECT g, 10 + g FROM generate_series(1, 6) g; +COPY pp TO stdout(header); +id val +1 11 +2 12 +3 13 +4 14 +5 15 +6 16 +DROP TABLE PP; diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 45273557ce0..3730f8af4e5 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -707,3 +707,20 @@ truncate copy_default; -- DEFAULT cannot be used in COPY TO copy (select 1 as test) TO stdout with (default '\D'); + +-- COPY TO with partitioned table +CREATE TABLE pp (id int,val int) PARTITION BY RANGE (id); +CREATE TABLE pp_1 (val int, id int) PARTITION BY RANGE (id); +CREATE TABLE pp_2 (val int, id int) PARTITION BY RANGE (id); +ALTER TABLE pp ATTACH PARTITION pp_1 FOR VALUES FROM (1) TO (5); +ALTER TABLE pp ATTACH PARTITION pp_2 FOR VALUES FROM (5) TO (10); + +CREATE TABLE pp_15 (val int, id int); +CREATE TABLE pp_510 (val int,id int); +ALTER TABLE pp_1 ATTACH PARTITION pp_15 FOR VALUES FROM (1) TO (5); +ALTER TABLE pp_2 ATTACH PARTITION pp_510 FOR VALUES FROM (5) TO (10); + +INSERT INTO pp SELECT g, 10 + g FROM generate_series(1, 6) g; + +COPY pp TO stdout(header); +DROP TABLE PP; -- 2.34.1