Re: New "raw" COPY format

2025-02-27 Thread newtglobal postgresql_contributors
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hi Joel, After testing the patch, I observed that for single-

Re: New "raw" COPY format

2024-11-07 Thread Joel Jacobson
Thread renamed to: New "single" COPY format [1] [1] https://postgr.es/m/1db18e33-f1cf-4f2c-9d52-b6d7ff242...@app.fastmail.com /Joel

Re: New "raw" COPY format

2024-11-04 Thread Masahiko Sawada
On Mon, Nov 4, 2024 at 7:22 PM Joel Jacobson wrote: > > On Mon, Nov 4, 2024, at 19:34, Masahiko Sawada wrote: > > On Sat, Nov 2, 2024 at 4:08 AM Joel Jacobson wrote: > >> > >> On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote: > >> > As I mentioned in a separate email, if we use the OS default

Re: New "raw" COPY format

2024-11-04 Thread Joel Jacobson
On Mon, Nov 4, 2024, at 19:34, Masahiko Sawada wrote: > On Sat, Nov 2, 2024 at 4:08 AM Joel Jacobson wrote: >> >> On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote: >> > As I mentioned in a separate email, if we use the OS default EOL as >> > the default EOL in raw format, it would not be neces

Re: New "raw" COPY format

2024-11-04 Thread Masahiko Sawada
On Sat, Nov 2, 2024 at 4:08 AM Joel Jacobson wrote: > > On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote: > > As I mentioned in a separate email, if we use the OS default EOL as > > the default EOL in raw format, it would not be necessary to allow it > > to be multi characters. I think it's wo

Re: New "raw" COPY format

2024-11-02 Thread Joel Jacobson
On Fri, Nov 1, 2024, at 22:28, Masahiko Sawada wrote: > As I mentioned in a separate email, if we use the OS default EOL as > the default EOL in raw format, it would not be necessary to allow it > to be multi characters. I think it's worth considering it. I like the idea, but not sure I understand

Re: New "raw" COPY format

2024-11-01 Thread Masahiko Sawada
On Wed, Oct 30, 2024 at 4:54 AM Joel Jacobson wrote: > > On Wed, Oct 30, 2024, at 09:14, Joel Jacobson wrote: > > $ psql -f bench_result.sql > > Ops, I realized I benchmarked a debug build, > reran the benchmark with `meson setup build --buildtype=release`, > and also added benchmarking of HEAD: >

Re: New "raw" COPY format

2024-10-30 Thread Masahiko Sawada
On Tue, Oct 29, 2024 at 9:48 AM Joel Jacobson wrote: > > > --- > > It's a bit odd to me to use the delimiter as a EOL marker in raw > > format, but probably it's okay. > > > > --- > > - if (cstate->opts.format != COPY_FORMAT_BINARY) > > + if (cstate->opts.format == COPY_FORMAT_

Re: New "raw" COPY format

2024-10-29 Thread Joel Jacobson
On Mon, Oct 28, 2024, at 18:50, Masahiko Sawada wrote: > Thank you for updating the patch. Here are review comments on the v15 > 0002 patch: Thanks for review. > When testing the patch with an empty delimiter, I got the following failure: > > postgres(1:903898)=# copy hoge from '/tmp/tmp.raw' wit

Re: New "raw" COPY format

2024-10-28 Thread Masahiko Sawada
On Mon, Oct 28, 2024 at 3:21 AM Joel Jacobson wrote: > > On Mon, Oct 28, 2024, at 10:30, Joel Jacobson wrote: > > On Mon, Oct 28, 2024, at 08:56, jian he wrote: > >> /* Check force_quote */ > >> - if (!opts_out->csv_mode && (opts_out->force_quote || > >> opts_out->force_quote_all)) > >> + if (op

Re: New "raw" COPY format

2024-10-28 Thread Joel Jacobson
On Mon, Oct 28, 2024, at 10:30, Joel Jacobson wrote: > On Mon, Oct 28, 2024, at 08:56, jian he wrote: >> /* Check force_quote */ >> - if (!opts_out->csv_mode && (opts_out->force_quote || >> opts_out->force_quote_all)) >> + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || >> +

Re: New "raw" COPY format

2024-10-28 Thread Joel Jacobson
On Mon, Oct 28, 2024, at 08:56, jian he wrote: > /* Check force_quote */ > - if (!opts_out->csv_mode && (opts_out->force_quote || > opts_out->force_quote_all)) > + if (opts_out->format != COPY_FORMAT_CSV && (opts_out->force_quote || > + opts_out->force_quote_all)) > ereport(ERROR, > (errcode(

Re: New "raw" COPY format

2024-10-27 Thread jian he
On Thu, Oct 24, 2024 at 2:30 PM Joel Jacobson wrote: > > On Thu, Oct 24, 2024, at 03:54, Masahiko Sawada wrote: > > I have one question: > > > > From the 0001 patch's commit message: > > > > No behavioral changes are intended; this is a pure refactoring to improve > > code > > clarity and maintai

Re: New "raw" COPY format

2024-10-23 Thread Joel Jacobson
On Thu, Oct 24, 2024, at 03:54, Masahiko Sawada wrote: > I have one question: > > From the 0001 patch's commit message: > > No behavioral changes are intended; this is a pure refactoring to improve code > clarity and maintainability. > > Does the reorganization of the option validation done by this

Re: New "raw" COPY format

2024-10-23 Thread Masahiko Sawada
Hi, On Sat, Oct 19, 2024 at 8:33 AM Joel Jacobson wrote: > > On Sat, Oct 19, 2024, at 12:13, jian he wrote: > > We already make RAW and can only have one column. > > if RAW has no default delimiter, then COPY FROM a text file will > > become one datum value; > > which makes it looks like importin

Re: New "raw" COPY format

2024-10-21 Thread Joel Jacobson
On Mon, Oct 21, 2024, at 16:35, jian he wrote: > make the ProcessCopyOptions process in following order: > 1. Extract options from the statement node tree > 2. checking each option, if not there set default value. > 3. checking for interdependent options > > I still think > making step2 aligned wit

Re: New "raw" COPY format

2024-10-21 Thread jian he
On Sat, Oct 19, 2024 at 11:33 PM Joel Jacobson wrote: > > > ProcessCopyOptions > > /* Extract options from the statement node tree */ > > foreach(option, options) > > { > > } > > /* --- DELIMITER option --- */ > > /* --- NULL option --- */ > > /* --- QUOTE option --- */ > > Currently the regress t

Re: New "raw" COPY format

2024-10-19 Thread Joel Jacobson
On Sat, Oct 19, 2024, at 12:13, jian he wrote: > We already make RAW and can only have one column. > if RAW has no default delimiter, then COPY FROM a text file will > become one datum value; > which makes it looks like importing a Large Object. > (https://www.postgresql.org/docs/17/lo-funcs.html)

Re: New "raw" COPY format

2024-10-19 Thread jian he
On Sat, Oct 19, 2024 at 1:24 AM Joel Jacobson wrote: >> > Handling of e.g. JSON and other structured text files that could contain > newlines, in a seamless way seems important, so therefore the default is > no delimiter for the raw format, so that the entire input is read as one data > value for

Re: New "raw" COPY format

2024-10-19 Thread Joel Jacobson
On Fri, Oct 18, 2024, at 19:24, Joel Jacobson wrote: > Attachments: > * v11-0001-Refactor-ProcessCopyOptions-introduce-CopyFormat-enu.patch > * v11-0002-Add-raw-format-to-COPY-command.patch Here is a demo of a importing a decently sized real text file, that can't currently be imported without the

Re: New "raw" COPY format

2024-10-18 Thread Joel Jacobson
On Fri, Oct 18, 2024, at 15:52, jian he wrote: > Raw Format is duplicated > Raw Format didn't mention the special handling of > end-of-data marker. Thanks for reviewing, above fixed. Here is a summary of the changes since v10, thanks to the feedback: Handling of e.g. JSON and other structured t

Re: New "raw" COPY format

2024-10-18 Thread jian he
On Wed, Oct 16, 2024 at 2:37 PM Joel Jacobson wrote: > > On Wed, Oct 16, 2024, at 05:31, jian he wrote: > > Hi. > > I only checked 0001, 0002, 0003. > > the raw format patch is v9-0016. > > 003-0016 is a lot of small patches, maybe you can consolidate it to > > make the review more easier. > > Tha

Re: New "raw" COPY format

2024-10-17 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 21:13, Joel Jacobson wrote: > Therefore, maybe DELIMITER NONE would be a better default > for RAW? Especially since it's then also more honest in being "raw". > > If needing to import an unstructured text file that is just newline > delimited, and not wanting the entire fil

Re: New "raw" COPY format

2024-10-16 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 20:30, Joel Jacobson wrote: > A final thought is to maybe consider just skipping > the automagical newline detection for RAW? > > Instead of the automagical detection, > the default newline delimiter could be the OS default, > similar to how COPY TO works. > > That way, it

Re: New "raw" COPY format

2024-10-16 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote: > Joel Jacobson wrote: > >> However, I thinking rejecting such column data seems like the >> better alternative, to ensure data exported with COPY TO >> can always be imported back using COPY FROM, >> for the same format. > > On the other hand,

Re: New "raw" COPY format

2024-10-16 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 18:04, Jacob Champion wrote: > A hypothetical type whose text representation can contain '\r' but not > '\n' still can't be unambiguously round-tripped under this scheme: > COPY FROM will see the "mixed" line endings and complain, even though > there's no ambiguity. Yeah,

Re: New "raw" COPY format

2024-10-16 Thread Daniel Verite
Joel Jacobson wrote: > However, I thinking rejecting such column data seems like the > better alternative, to ensure data exported with COPY TO > can always be imported back using COPY FROM, > for the same format. On the other hand, that might prevent cases where we want to export, for i

Re: New "raw" COPY format

2024-10-16 Thread Jacob Champion
On Tue, Oct 15, 2024 at 1:38 PM Joel Jacobson wrote: > > However, I thinking rejecting such column data seems like the > better alternative, to ensure data exported with COPY TO > can always be imported back using COPY FROM, > for the same format. If text column data contains newlines, > users pro

Re: New "raw" COPY format

2024-10-15 Thread Joel Jacobson
On Wed, Oct 16, 2024, at 05:31, jian he wrote: > Hi. > I only checked 0001, 0002, 0003. > the raw format patch is v9-0016. > 003-0016 is a lot of small patches, maybe you can consolidate it to > make the review more easier. Thanks for reviewing. OK, I've consolidated the v9 0003-0016 into a singl

Re: New "raw" COPY format

2024-10-15 Thread jian he
On Tue, Oct 15, 2024 at 8:50 PM Joel Jacobson wrote: > Hi. I only checked 0001, 0002, 0003. the raw format patch is v9-0016. 003-0016 is a lot of small patches, maybe you can consolidate it to make the review more easier. -COPY x to stdin (format TEXT, force_quote(a)); +COPY x to stdout (format

Re: New "raw" COPY format

2024-10-15 Thread Joel Jacobson
On Tue, Oct 15, 2024, at 19:30, Jacob Champion wrote: > Hi, > > Idle thoughts from a design perspective -- feel free to ignore, since > I'm not the target audience for the feature: Many thanks for looking at this! > - If the column data stored in Postgres contains newlines, it seems > like COPY T

Re: New "raw" COPY format

2024-10-15 Thread Jacob Champion
Hi, Idle thoughts from a design perspective -- feel free to ignore, since I'm not the target audience for the feature: - If the column data stored in Postgres contains newlines, it seems like COPY TO won't work "correctly". Is that acceptable? - RAW seems like an okay-ish label, but for something

Re: New "raw" COPY format

2024-10-14 Thread Joel Jacobson
On Mon, Oct 14, 2024, at 10:51, Joel Jacobson wrote: > On Mon, Oct 14, 2024, at 10:07, Joel Jacobson wrote: >> Attached is a first draft implementation of the new proposed COPY "raw" >> format. >> >> The first two patches are just the bug fix in HEAD, reported separately: >> https://commitfest.pos

Re: New "raw" COPY format

2024-10-14 Thread Joel Jacobson
On Mon, Oct 14, 2024, at 10:07, Joel Jacobson wrote: > Attached is a first draft implementation of the new proposed COPY "raw" > format. > > The first two patches are just the bug fix in HEAD, reported separately: > https://commitfest.postgresql.org/50/5297/ I forgot about adding support for the

Re: New "raw" COPY format

2024-10-14 Thread Joel Jacobson
opyFormat, with options for the three current formats. * v4-0004-Reorganize-ProcessCopyOptions-for-clarity-and-consis.patch The fourth patch reorganize ProcessCopyOptions for clarity and consistent option handling. * v4-0005-Add-raw-COPY-format-support-for-unstructured-text-da.patch Finally, the fi

Re: New "raw" COPY format

2024-10-13 Thread Joel Jacobson
On Sun, Oct 13, 2024, at 11:52, Tatsuo Ishii wrote: > After copy imported the "unstructured text file" in "row" COPY format, > what the column type is? text? or bytea? If it's text, how do you > handle encoding conversion if the "unstructured text file" is encoded > in server side unsafe encoding

Re: New "raw" COPY format

2024-10-13 Thread Tatsuo Ishii
> Hi hackers, > > This thread is about implementing a new "raw" COPY format. > > This idea came up in a different thread [1], moved here. > > [1] > https://postgr.es/m/47b5c6a7-5c0e-40aa-8ea2-c7b95ccf296f%40app.fastmail.com > > The main use-case for t

Re: New "raw" COPY format

2024-10-11 Thread Joel Jacobson
On Sat, Oct 12, 2024, at 02:48, jian he wrote: > git version 2.34.1 > cannot do `git apply` Sorry about that, fixed. > typedef enum CopyFormat > { > COPY_FORMAT_TEXT, > COPY_FORMAT_BINARY, > COPY_FORMAT_CSV > } CopyFormat; Thanks, fixed. > CopyFormat should add to > src/tools/pginde

Re: New "raw" COPY format

2024-10-11 Thread jian he
On Sat, Oct 12, 2024 at 5:02 AM Joel Jacobson wrote: > > On Fri, Oct 11, 2024, at 22:29, Joel Jacobson wrote: > > Hi hackers, > > > > This thread is about implementing a new "raw" COPY format. > ... > > The attached patch implements the above ideas. >

Re: New "raw" COPY format

2024-10-11 Thread Joel Jacobson
On Fri, Oct 11, 2024, at 22:29, Joel Jacobson wrote: > Hi hackers, > > This thread is about implementing a new "raw" COPY format. ... > The attached patch implements the above ideas. > > I think with these changes, it would be easier to hack on new and existin

New "raw" COPY format

2024-10-11 Thread Joel Jacobson
Hi hackers, This thread is about implementing a new "raw" COPY format. This idea came up in a different thread [1], moved here. [1] https://postgr.es/m/47b5c6a7-5c0e-40aa-8ea2-c7b95ccf296f%40app.fastmail.com The main use-case for the raw format, is when needing to import arbitrary un