On Mon, Jan 27, 2025 at 4:17 PM jian he <jian.universal...@gmail.com> wrote: > > hi. > > There are two ways we can use to represent the new copy format: json. > 1. > typedef struct CopyFormatOptions > { > bool binary; /* binary format? */ > bool freeze; /* freeze rows on loading? */ > bool csv_mode; /* Comma Separated Value format? */ > bool json_mode; /* JSON format? */ > ... > } > > 2. > typedef struct CopyFormatOptions > { > CopyFormat format; /* format of the COPY operation */ > ..... > } > > typedef enum CopyFormat > { > COPY_FORMAT_TEXT = 0, > COPY_FORMAT_BINARY, > COPY_FORMAT_CSV, > COPY_FORMAT_JSON, > } CopyFormat; > > both the sizeof(cstate->opts) (CopyToStateData.CopyFormatOptions) is 184. > so the struct size will not influence the performance. > > I also did some benchmarks when using CopyFormat. > the following are the benchmarks info: > ------------------------------------------------------------------------------------------------------- > create unlogged table t as select g from generate_series(1, 1_000_000) g; > > build_type=release patch: > copy t to '/dev/null' json \watch i=0.1 c=10 > last execution Time: 108.741 ms > > copy t to '/dev/null' (format text) \watch i=0.1 c=10 > last execution Time: 42.600 ms > > build_type=release master: > copy t to '/dev/null' (format text) \watch i=0.1 c=10 > last execution Time Time: 42.948 ms > > --------------------------------------------------------- > so a new version is attached, using the struct CopyFormatOptions. > > changes mainly in CopyOneRowTo. > now it is: > """" > if(!cstate->rel) > { > memcpy(TupleDescAttr(slot->tts_tupleDescriptor, 0), > TupleDescAttr(cstate->queryDesc->tupDesc, 0), > cstate->queryDesc->tupDesc->natts * > sizeof(FormData_pg_attribute)); > for (int i = 0; i < cstate->queryDesc->tupDesc->natts; i++) > populate_compact_attribute(slot->tts_tupleDescriptor, i); > BlessTupleDesc(slot->tts_tupleDescriptor); > } > """" > reasoning for change: > for composite_to_json to construct json key, we only need > FormData_pg_attribute.attname > but code path > composite_to_json->fastgetattr->TupleDescCompactAttr->verify_compact_attribute > means we also need to call populate_compact_attribute to populate > other attributes. > > v14-0001-Introduce-CopyFormat-and-replace-csv_mode-and-bi.patch, > author is by Joel Jacobson. > As I mentioned in above, > replacing 3 bool fields by an enum didn't change the struct CopyFormatOptions. > but consolidated 3 bool fields into one enum to make code more lean. > I think the refactoring (v14-0001) is worth it.
I've refactored the patch to adapt the newly introduced CopyToRoutine struct, see 2e4127b6d2. v15-0001 is the merged one of v14-0001 and v14-0002 There are some other ongoing *copy to/from* refactors[1] which we can benefit to make the code cleaner, especially the checks done in ProcessCopyOptions. [1]: https://www.postgresql.org/message-id/20250301.115009.424844407736647598.kou%40clear-code.com -- Regards Junwang Zhao
v15-0002-Add-option-force_array-for-COPY-JSON-FORMAT.patch
Description: Binary data
v15-0001-Introduce-json-format-for-COPY-TO.patch
Description: Binary data