On 2024-12-16 Mo 10:09 AM, Joel Jacobson wrote:
Hi hackers,

After further consideration, I'm withdrawing the patch.
Some fundamental questions remain unresolved:

- Should round-trip fidelity be a strict goal? By "round-trip fidelity",
   I mean that data exported and then re-imported should yield exactly
   the original values, including the distinction between NULL and empty 
strings.
- If round-trip fidelity is a requirement, how do we distinguish NULL from empty
   strings without delimiters or escapes?
- Is automatic newline detection (as in "csv" and "text") more valuable than
   the ability to embed \r (CR) characters?
- Would it be better to extend the existing COPY options rather than introducing
   a new format?
- Or should we consider a JSONL format instead, one that avoids the NULL/empty
   string problem entirely?

No clear solution or consensus has emerged. For now, I'll step back from the
proposal. If someone wants to revisit this later, I'd be happy to contribute.

Thanks again for all the feedback and consideration.


We seem to have got seriously into the weeds, here. I'd be sorry to see this dropped. After all, it's not something new, and while we have a sort of workaround for "one json doc per line" it's far from obvious, and except in a few blog posts undocumented.

I think we're trying to be far too general here but in the absence of more general use cases. The ones I recall having encountered in the wild are:

  . one json datum per line

  . one json document per file

  . a sequence of json documents per file

The last one is hard to deal with, and I think I've only seen it once or twice, so I suggest leaving it aside for now.

Notice these are all JSON. I could imagine XML might have similar requirements, but I encounter it extremely rarely.

Regarding NULL, an empty string is not a valid JSON literal, so there should be no confusion there. It is valid for XML, though.

Given all that I think restricting ourselves to just the JSON cases, and possibly just to JSONL, would be perfectly reasonable.

Regarding CR, it's not a valid character in a JSON string item, although it is valid in JSON whitespace. I would not treat it as magical unless it immediately precedes an NL. That gives rise to a very sight ambiguity, but I think it's one we could live with.

As for what the format is called, I don't like the "LIST" proposal much, even for the general case. Seems too close to an array.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com



Reply via email to