> From: Bruce Richardson [mailto:bruce.richard...@intel.com] > Sent: Friday, 24 June 2022 10.14 > > On Thu, Jun 23, 2022 at 09:04:31PM +0200, Morten Brørup wrote: > > > From: Bruce Richardson [mailto:bruce.richard...@intel.com] > > > Sent: Thursday, 23 June 2022 18.43 > > > > > > This RFC shows one possible approach for escaping strings for the > json > > > output of telemetry library. For now this RFC supports escaping > strings > > > for the cases of returning a single string, or returning an array > of > > > strings. Not done is escaping of strings in objs/dicts [see more > below > > > on TODO] > > > > Very good initiative. > > > > > > > > As well as telemetry lib changes, this patchset includes unit tests > for > > > the above and also little bit of cleanup to the json tests. > > > > > > TODO: > > > Beyond what is here in this RFC: > > > > > > 1. we need to decide what to do about name/value pairs. Personally, > I > > > think we should add the restriction to the > "rte_tel_data_add_obj_*" > > > APIs > > > to only allow a defined subset of characters in names: e.g. > > > alphanumeric > > > chars, underscore and dash. That means that we only need to > escape > > > the data part in the case of string returns. > > > > I agree about only allowing a subset of characters in names, so JSON > (and other) encoding is not required. > > > > However, I think we should be less restrictive, and also allow > characters commonly used for separation, indexing and wildcard, such as > '/', '[', ']', and '*', '?' or '%'. > > > > Obviously, we should disallow characters requiring escaping in not > just JSON, but also other foreseeable encodings and protocols. So > please bring your crystal ball to the discussion. ;-) > > > Exactly why I am looking for feedback - and why I'm looking to have an > explicit allowed list of characters rather than trying to just block > the > known-bad in json ones. > > For your suggestions: +1 to separators and indexing, i.e. '[', ']' and > '/', > though I would probably also add ',' and maybe '.' (unless it's likely > to > cause issues with some protocol we are likely to want to use).
After having slept on it, I think we should also allow characters that could appear in IP and MAC addresses, i.e. '.' and ':' (and '/' for subnetting). > For the wildcarding, I find it hard to see why we would want those? Initially, I thought a wildcard might be useful as a placeholder in templates. But it might also be useful for partial IP or MAC addresses. E.g.: - The SmartShare Systems OUI could be represented by the MAC address "00:1F:B4:??:??:??". - A default gateway address in a template configuration could be "192.168.*.1". On the other hand, wildcard characters could be disallowed or require escaping in other (non-JSON) protocols. So I'm just being a bit creative here, throwing out ideas in our search for the right balance in the restrictions. > > The other advantage of using an allowlist of characters is that it > makes it > possible to expand over time, compared to a blocklist which always runs > the > risk of breaking something if you expand it. Therefore I suggest we > keep > the list as small as we need right now, and expand it only as we need. +1 > > > > 2. once agreed, need to implement a patch to escape strings in > > > dicts/objs > > > > Yes. > > > > > > > > 3. need to add a patch to escape the input command if it contains > > > invalid chars > > > > What do you mean here? You mean unescape JSON encoded input (arriving > on the JSON telemetry socket) to a proper binary string? > > > > The thing with the telemetry socket interface right now is that the > input > requests are not-json. The reasons for that is that they be kept as > simple > as possible, and to avoid needing a full json parser inside DPDK. > Therefore, the input sent by the user could contain invalid characters > for > json output so we need to: > 1. Guarantee that no command registered with the telemetry library > contains > invalid json characters (though why someone would do so, I don't > know!) > 2. When we return the command back in the reply, properly escape any > invalid characters in the error case. > > #1 is very important for sanity checking, but now that I think about it > #2 > is probably optional, since if any user does start sending invalid > garbage > input that breaks their json parser on return, they are only hurting > themselves and not affecting anything else on the system. > > > > 4. some small refactoring of the main telemetry.c json-encoding > > > function may be possible. > > > > Perhaps. > > > I saw some options for cleanup when I was working on the code, so > including > this as a note-to-self as much as anything else for feedback. :-) > > /Bruce