Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes

Aaron Botsis Fri, 09 Jan 2015 17:55:15 -0800

> On Jan 9, 2015, at 11:37 AM, Andrew Dunstan <and...@dunslane.net> wrote:
> 
> 
> On 01/08/2015 08:42 PM, Aaron Botsis wrote:
>> 
>>> On Jan 8, 2015, at 3:44 PM, Andrew Dunstan <and...@dunslane.net> wrote:
>>> 
>>> 
>>> On 01/08/2015 03:05 PM, Aaron Botsis wrote:
>>>> 
>>>>> It's also unnecessary. CSV format, while not designed for this, is 
>>>>> nevertheless sufficiently flexible to allow successful import of json 
>>>>> data meeting certain criteria (essentially no newlines), like this:
>>>>> 
>>>>>  copy the_table(jsonfield)
>>>>>  from '/path/to/jsondata'
>>>>>  csv quote e'\x01' delimiter e'\x02’;
>>>> While perhaps unnecessary, given the size and simplicity of the patch, IMO 
>>>> it’s a no brainer to merge (it actually makes the code smaller by 3 
>>>> lines). It also enables non-json use cases anytime one might want to 
>>>> preserve embedded escapes, or use different ones entirely. Do you see 
>>>> other reasons not to commit it?
>>> 
>>> Well, for one thing it's seriously incomplete. You need to be able to 
>>> change the delimiter as well. Otherwise, any embedded tab in the json will 
>>> cause you major grief.
>>> 
>>> Currently the delimiter and the escape MUST be a single byte non-nul 
>>> character, and there is a check for this in csv mode. Your patch would 
>>> allow any arbitrary string (including one of zero length) for the escape in 
>>> text mode, and would then silently ignore all but the first byte. That's 
>>> not the way we like to do things.
>>> 
>>> And, frankly, I would need to spend quite a lot more time thinking about 
>>> other implications than I have given it so far. This is an area where I 
>>> tend to be VERY cautious about making changes. This is a fairly fragile 
>>> ecosystem.
>> 
>> So I’m going to do a bit more testing with another patch tomorrow with 
>> delimiters removed. If you can think of any specific cases you think will 
>> break it let me know and I’ll make sure to add regression tests for them as 
>> well.
>> 
> Well, I still need convincing that this is the best solution to the problem. 
> As I said, I need to spend more time thinking about it.



I offered an alternative (RAW mode w/record terminator) that you ignored. So in 
lieu of a productive discussion about “the best solution to the problem”, I 
went ahead and continued to complete the patch since I believe it’s a useful FE 
that it could be helpful for this and other use cases. There have been multiple 
times (not even w/json) I wished COPY would stop being so smart. 

FWIW, (if anyone’s interested in it) I also hacked up some python that’ll read 
a json file, and outputs a binary file suitable for use with COPY BINARY that 
gets around all this stuff. Obviously this only works for json (not jsonb) 
columns (though you could SELECT INTO a jsonb column). Happy to pass it along.

Aaron

Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes

Reply via email to