On Sep 5, 2020, at 9:51 AM, John Thacker <johnthac...@gmail.com> wrote:
> On Sat, Sep 5, 2020 at 2:49 AM Guy Harris <ghar...@sonic.net> wrote: > >> OK, here's the types of strings that there can be in protocols. >> >> 1. Counted strings. A counted string has a count of the number of octets >> (it might be half the number of octets if it's UTF-16 or UCS-2, but it's >> really a count of octets). There is nothing special about null characters, >> other than "if some software treats the string as a C string, it won't work >> correctly if the string happens to include a null character, so maybe that >> software shouldn't treat it as a C string". >> >> 2. Null-terminated, non-counted strings. A null-terminated, non-counted, >> string has no count of the number of octets; it runs until a null terminator >> (8-bit, 16-bit, whatever) is seen. Obviously, that string can't contain a >> null terminator. >> >> 3. Padded strings in a fixed-size buffer. If the string is the length of >> the buffer, there is no null terminator; otherwise, there's a null >> terminator; the protocol may specify that all octets after the null >> terminator, if the string isn't the length of the buffer - 1, must be null, >> or may say there is no such requirement. >> >> 4. Strings that are counted *and* null-terminated. This is redundant, but >> maybe somebody wants to make sure that there's a null at the end, to make >> life easier for implementations that use C strings internally. This doesn't >> *really* make things easier, unless you're in a 100% controlled environment, >> because, in any environment in which the software has to deal with packets >> from buggy or malicious senders, and uses C strings internally, they have to >> make sure there's a null terminator *anyway* - they *can't* just copy the >> string and blithely assume that a terminating null is part of the counted >> blob. >> >> In the case of a counted string, if the string is a 7-octet value of >> "foo\0bar", it should show up as exactly that in the packet tree item; if >> not, we should fix it to do so. > > So this does not happen, at least not when proto_tree_add_item() is used on a > FT_STRING. Then it should be fixed. > Things that call tvb_format_text() will end up producing "foo\000bar" > (because \0 is not one of the special cases handled, it defaults to using > octal in format_text() in epan/strutil.c, which is used for non-printable but > valid ASCII characters. That's acceptable - if there's a control-A, it'll show up as \001, not \1. > However, proto_tree_add_item() does not do that. > > I tested this by hexediting a pcap (from here: > https://gitlab.com/wireshark/wireshark/-/issues/16208) to put a NUL in the > middle of a counted FT_STRING, and it looked like this: > > I had applied the patch to allow SSIDs to be UTF-8 in that bug, it looks like > this: > > Tag: SSID parameter set: admini\000tración > Tag Number: SSID parameter set (0) > Tag length: 15 > SSID: admini > [Expert Info (Warning/Undecoded): Trailing stray characters] > > (admini\000tración also showed up in the FT_COLUMN). In the first entry, > where \000 shows up, tvb_format_text() is used explicitly and then that text > is appended. In the one where it is cut off, proto_tree_add_item() is used. > So that is a bug that should be fixed, then, we want to display the trailing > characters as well as warn about them? Display them. yes. Warn about them, possibly, although, once we display them, the warning might not be necessary. > Currently, we use FT_STRING for the first of those, FT_STRINGZ for the second > of those, FT_STRINGZPAD for the third of those, and FT_STRINGZ for the fourth > of those; the difference between the two FT_STRINGZ cases is whether a length > was specified or not. > That helps, but isn't there also FT_UINT_STRING for the common case of > counted strings where the count of the number of octets appears immediately > before the string? It seems like mostly a helper combining "add a > FT_UINT[...] and return its value, then add a FT_STRING with that length." So > ideally detect_trailing_stray_characters() should also operate on the string > part of FT_UINT_STRING, and that just hasn't been added, correct? It hasn't been added yet. A lot of this is a work in progress. I need to do some regression testing to discover fields that have the incorrect FT_STRING... type; when that's done, I'll update epan/proto.c to fix various issues. ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev@wireshark.org> Archives: https://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe