On Sun, Oct 6, 2024, at 15:12, Andrew Dunstan wrote: > On 2024-10-04 Fr 12:19 PM, Joel Jacobson wrote: >> 2. Avoid needing hacks like using E'\x01' as quoting char. >> >> Introduce QUOTE NONE and DELIMITER NONE, >> to allow raw lines to be imported "as is" into a single text column. > > As I think I previously indicated, I'm perfectly happy about 2, because > it replaces a far from obvious hack, but I am at best dubious about 1.
I've looked at how to implement this, and there is quite a lot of complexity having to do with quoting and escaping. Need guidance on what you think would be best to do: 2a) Should we aim to support all NONE combinations, at the cost of increasing the complexity at all code having to do with quoting, escaping and delimiters? 2b) Should we aim to only support the QUOTE NONE DELIMITER NONE ESCAPE NONE case, useful to the real-life scenario we've identified, that is, importing raw log lines into a single column, which could then be handed by a much simpler and probably faster version of CopyReadAttributesCSV(), e.g. named CopyReadAttributesUnquotedUnDelimited() or maybe CopyReadAttributesRaw()? (We also need to modify CopyReadLineText(), but seems we only need a quote_none bool, to skip over the quoting code there, so don't think a separate function is warranted there.) I think ESCAPE NONE should be implied from QUOTE NONE, since the default escape character is the same as the quote character, so if there isn't any quote character, then I think that would imply no escape character either. Can we think of any other valid, useful, realistic, and safe combinations of QUOTE NONE, DELIMITER NONE and ESCAPE NONE, that would be interesting to support? If not, then I think 2b looks more interesting, to reduce risk of accidental misuse, simpler implementation, and since it also should allow importing raw log files faster, thanks to the reduced complexity. Best regards, Joel