On 2024-10-08 Tu 3:25 AM, Joel Jacobson wrote:
On Sun, Oct 6, 2024, at 15:12, Andrew Dunstan wrote:
On 2024-10-04 Fr 12:19 PM, Joel Jacobson wrote:
2. Avoid needing hacks like using E'\x01' as quoting char.
Introduce QUOTE NONE and DELIMITER NONE,
to allow raw lines to be imported "as is" into a single text column.
As I think I previously indicated, I'm perfectly happy about 2, because
it replaces a far from obvious hack, but I am at best dubious about 1.
I've looked at how to implement this, and there is quite a lot of complexity
having to do with quoting and escaping.
Need guidance on what you think would be best to do:
2a) Should we aim to support all NONE combinations, at the cost of increasing
the
complexity at all code having to do with quoting, escaping and delimiters?
2b) Should we aim to only support the QUOTE NONE DELIMITER NONE ESCAPE NONE
case,
useful to the real-life scenario we've identified, that is, importing raw log
lines into a single column, which could then be handed by a much simpler and
probably faster version of CopyReadAttributesCSV(),
e.g. named CopyReadAttributesUnquotedUnDelimited() or
maybe CopyReadAttributesRaw()?
(We also need to modify CopyReadLineText(), but seems we only need a
quote_none bool, to skip over the quoting code there, so don't think a
separate function is warranted there.)
I think ESCAPE NONE should be implied from QUOTE NONE, since the default escape
character is the same as the quote character, so if there isn't any
quote character, then I think that would imply no escape character either.
Can we think of any other valid, useful, realistic, and safe combinations of
QUOTE NONE, DELIMITER NONE and ESCAPE NONE, that would be interesting
to support?
If not, then I think 2b looks more interesting, to reduce risk of accidental
misuse, simpler implementation, and since it also should allow importing
raw log files faster, thanks to the reduced complexity.
Off hand I can't think of a case other than 2b that would apply in the
real world, although others might like to chime in here. If we're going
to do that, let's find a shorter way to spell it. In fact, we should do
that even if we go with 2a.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com