Re: New "single" COPY format

Andrew Dunstan Thu, 19 Dec 2024 05:40:26 -0800


On 2024-12-16 Mo 10:09 AM, Joel Jacobson wrote:

Hi hackers,

After further consideration, I'm withdrawing the patch.
Some fundamental questions remain unresolved:

- Should round-trip fidelity be a strict goal? By "round-trip fidelity",
   I mean that data exported and then re-imported should yield exactly
   the original values, including the distinction between NULL and empty 
strings.
- If round-trip fidelity is a requirement, how do we distinguish NULL from empty
   strings without delimiters or escapes?
- Is automatic newline detection (as in "csv" and "text") more valuable than
   the ability to embed \r (CR) characters?
- Would it be better to extend the existing COPY options rather than introducing
   a new format?
- Or should we consider a JSONL format instead, one that avoids the NULL/empty
   string problem entirely?

No clear solution or consensus has emerged. For now, I'll step back from the
proposal. If someone wants to revisit this later, I'd be happy to contribute.

Thanks again for all the feedback and consideration.

We seem to have got seriously into the weeds, here. I'd be sorry to seethis dropped. After all, it's not something new, and while we have asort of workaround for "one json doc per line" it's far from obvious,and except in a few blog posts undocumented.

I think we're trying to be far too general here but in the absence ofmore general use cases. The ones I recall having encountered in the wildare:


  . one json datum per line

  . one json document per file

  . a sequence of json documents per file

The last one is hard to deal with, and I think I've only seen it once ortwice, so I suggest leaving it aside for now.

Notice these are all JSON. I could imagine XML might have similarrequirements, but I encounter it extremely rarely.

Regarding NULL, an empty string is not a valid JSON literal, so thereshould be no confusion there. It is valid for XML, though.

Given all that I think restricting ourselves to just the JSON cases, andpossibly just to JSONL, would be perfectly reasonable.

Regarding CR, it's not a valid character in a JSON string item, althoughit is valid in JSON whitespace. I would not treat it as magical unlessit immediately precedes an NL. That gives rise to a very sightambiguity, but I think it's one we could live with.

As for what the format is called, I don't like the "LIST" proposal much,even for the general case. Seems too close to an array.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: New "single" COPY format

Reply via email to