> On Jan 9, 2015, at 11:37 AM, Andrew Dunstan <and...@dunslane.net> wrote: > > > On 01/08/2015 08:42 PM, Aaron Botsis wrote: >> >>> On Jan 8, 2015, at 3:44 PM, Andrew Dunstan <and...@dunslane.net> wrote: >>> >>> >>> On 01/08/2015 03:05 PM, Aaron Botsis wrote: >>>> >>>>> It's also unnecessary. CSV format, while not designed for this, is >>>>> nevertheless sufficiently flexible to allow successful import of json >>>>> data meeting certain criteria (essentially no newlines), like this: >>>>> >>>>> copy the_table(jsonfield) >>>>> from '/path/to/jsondata' >>>>> csv quote e'\x01' delimiter e'\x02’; >>>> While perhaps unnecessary, given the size and simplicity of the patch, IMO >>>> it’s a no brainer to merge (it actually makes the code smaller by 3 >>>> lines). It also enables non-json use cases anytime one might want to >>>> preserve embedded escapes, or use different ones entirely. Do you see >>>> other reasons not to commit it? >>> >>> Well, for one thing it's seriously incomplete. You need to be able to >>> change the delimiter as well. Otherwise, any embedded tab in the json will >>> cause you major grief. >>> >>> Currently the delimiter and the escape MUST be a single byte non-nul >>> character, and there is a check for this in csv mode. Your patch would >>> allow any arbitrary string (including one of zero length) for the escape in >>> text mode, and would then silently ignore all but the first byte. That's >>> not the way we like to do things. >>> >>> And, frankly, I would need to spend quite a lot more time thinking about >>> other implications than I have given it so far. This is an area where I >>> tend to be VERY cautious about making changes. This is a fairly fragile >>> ecosystem. >> >> So I’m going to do a bit more testing with another patch tomorrow with >> delimiters removed. If you can think of any specific cases you think will >> break it let me know and I’ll make sure to add regression tests for them as >> well. >> > Well, I still need convincing that this is the best solution to the problem. > As I said, I need to spend more time thinking about it.
I offered an alternative (RAW mode w/record terminator) that you ignored. So in lieu of a productive discussion about “the best solution to the problem”, I went ahead and continued to complete the patch since I believe it’s a useful FE that it could be helpful for this and other use cases. There have been multiple times (not even w/json) I wished COPY would stop being so smart. FWIW, (if anyone’s interested in it) I also hacked up some python that’ll read a json file, and outputs a binary file suitable for use with COPY BINARY that gets around all this stuff. Obviously this only works for json (not jsonb) columns (though you could SELECT INTO a jsonb column). Happy to pass it along. Aaron