On Sep 5, 2020, at 9:51 AM, John Thacker <johnthac...@gmail.com> wrote:

> On Sat, Sep 5, 2020 at 2:49 AM Guy Harris <ghar...@sonic.net> wrote:
> 
>> OK, here's the types of strings that there can be in protocols.
>> 
>> 1. Counted strings.  A counted string has a count of the number of octets 
>> (it might be half the number of octets if it's UTF-16 or UCS-2, but it's 
>> really a count of octets).  There is nothing special about null characters, 
>> other than "if some software treats the string as a C string, it won't work 
>> correctly if the string happens to include a null character, so maybe that 
>> software shouldn't treat it as a C string".
>> 
>> 2. Null-terminated, non-counted strings.  A null-terminated, non-counted, 
>> string has no count of the number of octets; it runs until a null terminator 
>> (8-bit, 16-bit, whatever) is seen.  Obviously, that string can't contain a 
>> null terminator.
>> 
>> 3. Padded strings in a fixed-size buffer.  If the string is the length of 
>> the buffer, there is no null terminator; otherwise, there's a null 
>> terminator; the protocol may specify that all octets after the null 
>> terminator, if the string isn't the length of the buffer - 1, must be null, 
>> or may say there is no such requirement.
>> 
>> 4. Strings that are counted *and* null-terminated.  This is redundant, but 
>> maybe somebody wants to make sure that there's a null at the end, to make 
>> life easier for implementations that use C strings internally.  This doesn't 
>> *really* make things easier, unless you're in a 100% controlled environment, 
>> because, in any environment in which the software has to deal with packets 
>> from buggy or malicious senders, and uses C strings internally, they have to 
>> make sure there's a null terminator *anyway* - they *can't* just copy the 
>> string and blithely assume that a terminating null is part of the counted 
>> blob.
>> 
>> In the case of a counted string, if the string is a 7-octet value of 
>> "foo\0bar", it should show up as exactly that in the packet tree item; if 
>> not, we should fix it to do so.
> 
> So this does not happen, at least not when proto_tree_add_item() is used on a 
> FT_STRING.

Then it should be fixed.

> Things that call tvb_format_text() will end up producing "foo\000bar" 
> (because \0 is not one of the special cases handled, it defaults to using 
> octal in format_text() in epan/strutil.c, which is used for non-printable but 
> valid ASCII characters.

That's acceptable - if there's a control-A, it'll show up as \001, not \1.

> However, proto_tree_add_item() does not do that.
> 
> I tested this by hexediting a pcap (from here: 
> https://gitlab.com/wireshark/wireshark/-/issues/16208) to put a NUL in the 
> middle of a counted FT_STRING, and it looked like this:
> 
> I had applied the patch to allow SSIDs to be UTF-8 in that bug, it looks like 
> this:
> 
> Tag: SSID parameter set: admini\000tración
>     Tag Number: SSID parameter set (0)
>     Tag length: 15
>     SSID: admini
>         [Expert Info (Warning/Undecoded): Trailing stray characters]
> 
> (admini\000tración also showed up in the FT_COLUMN). In the first entry, 
> where \000 shows up, tvb_format_text() is used explicitly and then that text 
> is appended. In the one where it is cut off, proto_tree_add_item() is used. 
> So that is a bug that should be fixed, then, we want to display the trailing 
> characters as well as warn about them?

Display them. yes.  Warn about them, possibly, although, once we display them, 
the warning might not be necessary.

> Currently, we use FT_STRING for the first of those, FT_STRINGZ for the second 
> of those, FT_STRINGZPAD for the third of those, and FT_STRINGZ for the fourth 
> of those; the difference between the two FT_STRINGZ cases is whether a length 
> was specified or not.
> That helps, but isn't there also FT_UINT_STRING for the common case of 
> counted strings where the count of the number of octets appears immediately 
> before the string? It seems like mostly a helper combining "add a 
> FT_UINT[...] and return its value, then add a FT_STRING with that length." So 
> ideally detect_trailing_stray_characters() should also operate on the string 
> part of FT_UINT_STRING, and that just hasn't been added, correct?

It hasn't been added yet.

A lot of this is a work in progress.  I need to do some regression testing to 
discover fields that have the incorrect FT_STRING... type; when that's done, 
I'll update epan/proto.c to fix various issues.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Reply via email to