Re: Should CSV parsing be stricter about mid-field quotes?

Joel Jacobson Fri, 11 Oct 2024 12:54:00 -0700

On Fri, Oct 11, 2024, at 15:04, Joel Jacobson wrote:
> On Thu, Oct 10, 2024, at 10:37, Daniel Verite wrote:
>> Joel Jacobson wrote:
>>
>>> - No Headers or Metadata:
>>
>> It's not clear why it's necessary to disable the HEADER option
>> for this format?
>
> It's not necessary, no, just couldn't see a use-case,
> since I only thought about the COPY FROM case
> where one would be dealing with unstructured undelimited
> text files, such as log files coming from some other system,
> that I've never seen have header rows.
>
> However, thanks to your question, I see how a user
> might want to use the raw format to export a text
> column "as is" using COPY TO, in which case it would
> be useful to use HEADER and then HEADER MATCH
> for COPY FROM.
>
> I therefore think the HEADER option should be supported
> for the new raw format.
>
>>>  The format does not support header rows or end-of-data markers;
>>>  every line is treated as data.
>>
>> With COPY FROM STDIN followed by inline data in a script,
>> an end-of-data marker is required.  That's also a problem
>> for CSV except it's mitigated by the possibility of quoting
>> (using "\." instead of \.)
>
> Right. As long as \. won't have any special meaning for the raw format
> except in the STDIN case, that seems fine.
>
> I haven't looked at that part of the code in detail yet though.
>
> As a preparatory step, I think we should replace the two
> "binary" and "csv_mode" bool fields in CopyFormatOptions,
> with a new "format" field of a new new CopyFormat enum type.
>
> If instead introducing another bool field, I think the code would
> be too cluttered.


I'm starting a new thread for this with a more suitable subject.

/Joel

Re: Should CSV parsing be stricter about mid-field quotes?

Reply via email to