On Tue, Oct 15, 2024 at 1:38 PM Joel Jacobson <j...@compiler.org> wrote: > > However, I thinking rejecting such column data seems like the > better alternative, to ensure data exported with COPY TO > can always be imported back using COPY FROM, > for the same format. If text column data contains newlines, > users probably ought to be using the text or csv format instead.
Yeah. I think _someone's_ going to have strong opinions one way or the other, but that person is not me. And I assume a contents check during COPY TO is going to have a noticeable performance impact... > > - RAW seems like an okay-ish label, but for something that's doing as > > much magic end-of-line detection as this patch is, I'd personally > > prefer SINGLE (as in, "single column"). > > It's actually the same end-of-line detection as the text format > in copyfromparse.c's CopyReadLineText(), except the code > is simpler thanks to not having to deal with quotes or escapes. Right, sorry, I hadn't meant to imply that you made it up. :D Just that a "raw" format that is actually automagically detecting things doesn't seem very "raw" to me, so I prefer the other name. > It basically just learns the newline sequence based on the first > occurrence, and then require it to be the same throughout the file. A hypothetical type whose text representation can contain '\r' but not '\n' still can't be unambiguously round-tripped under this scheme: COPY FROM will see the "mixed" line endings and complain, even though there's no ambiguity. Maybe no one will run into that problem in practice? But if they did, I think that'd be a pretty frustrating limitation. It'd be nice to override the behavior, to change it from "do what you think I mean" to "do what I say". > > - Speaking of magic end-of-line detection, can there be a way to turn > > that off? Say, via DELIMITER? > > - Generic DELIMITER support, for any single-byte separator at all, > > might make a "single-column" format more generally applicable. But I > > might be over-architecting. And it would make the COPY TO issue even > > worse... > > That's an interesting idea that would provide more flexibility, > though, at the cost of complicating things by overloading the meaning > of DELIMITER. I think that'd be a docs issue rather than a conceptual one, though... it's still a delimiter. I wouldn't really expect end-user confusion. > If aiming to make this more generally applicable, > then at least DELIMITER would need to be multi-byte, > since otherwise the Windows case \r\n couldn't be specified. True. > What I found appealing with the idea of a new COPY format, > was that instead of overloading the existing options > with more complexity, a new format wouldn't need to affect > the existing options, and the new format could be explained > separately, without making things worse for users not > using this format. I agree that we should not touch the existing formats. If RAW/SINGLE/whatever needed a multibyte line delimiter, I'm not proposing that the other formats should change. --Jacob