Hi,

On Tue, Nov 14, 2017 at 7:40 AM, David Sommerseth
<open...@sf.lists.topphemmelig.net> wrote:
>
> On 14/11/17 12:02, Gert Doering wrote:
>> JSON is very trivial to produce (unlike XML, or netlink).  The escaping
>> rules on producing are also very easy - basically, encode things in double
>> quotes, and escape the set of { BS, FF, NL, CR, Tab, Backslash, Quote }
>> with a single backslash before the corresponding character.

My reading of the RFC [1] (maybe not the only RFC for JSON?) is that
all control characters must be escaped (all characters < 0x20). And
the quotation mark that must be escaped is the double-quote (0x22),
not the single-quote (0x27).

Also, the escaping needs to be of the form \uXXXX where XXXX is the
URF-16 code unit for the character. That's easy for these characters
-- I think it's printf("\\u%04d", ch) since all the characters we are
escaping are in the Basic Multilingual Plane (U+0000 through U+FFFF).


> <https://stackoverflow.com/questions/19176024/how-to-escape-special-characters-in-building-a-json-string>
>
> Right, all those are single byte characters - and that's fairly simple
> ... but that ignores various quirks which easily appears with multi-byte
> characters - especially when "looping" through a value, byte by byte.
> We support UTF-8 in certificate subject fields.  So this escaper needs
> to handle those classes the stackoverflow mentions, plus beware of
> multi-byte strings (so we need to use the plethora of mb* related
> functions).
>
> In a clean 8-bit ASCII only-world, things are less complicated.
>
> Heiko and I have looked into the "simple" world of revamping the argv
> parser (to avoid our own "homebrewed" printf-like processing and base it
> on what is in the C library) and even this pre-parsing we need to do
> have popped up with surprises.  The argv caller scope mostly covers
> parsing strings which is defined by us developers so the variations are
> not as broad, and luckily format strings is not expected to contain
> UTF-8 chars.  But I do not for a second think processing certificate
> subject strings will as easy as those values we need to parse (typically
> CN) are not generated by us but a broad range of users.  Who knows what
> kind of funny tricks they'll throw at our code?

How does OpenVPN currently handle escaping when generating CSV? If we
don't do it correctly for multi-byte strings for CSV, do we need to
for JSON?

>From what I can tell, the only problematic character that needs to be
escaped is the backslash (0x5C). All the other characters that need to
be escaped are less than 0x40, so they will never appear as part of a
multi-byte sequence. That one character makes this difficult, but IMO
the problem is simple enough that it could be done within OpenVPN.
That would be my preference, instead of adding another dependency.

Best regards,

Jon Bullard

[1] https://www.ietf.org/rfc/rfc4627.txt

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Openvpn-devel mailing list
Openvpn-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openvpn-devel

Reply via email to