Re: [Wireshark-dev] How to print out string encoded data that contains nul characters?

Guy Harris Wed, 09 Apr 2014 14:27:38 -0700

On Apr 9, 2014, at 2:06 PM, "John Dill" <john.d...@greenfieldeng.com> wrote:


> I have several character data fields that happen to contain sections of 
> non-ascii binary data including nul characters.  I'd like to get a string 
> display that shows all of the characters according to the length of the 
> field, i.e.
> 
> 20 20 20 20 20 20 01 00 01 00 48 31 20 20 20 20
> 
> produces
> 
> "      \001\000\001\000H1    "
> 
> In proto.c, I see that all of the format_text calls use strlen(bytes) as the 
> length.
> 
> case FT_STRING:
> case FT_STRINGZ:
> case FT_UINT_STRING:
>         bytes = (guint8 *)fvalue_get(&fi->value);
>         label_fill(label_str, hfinfo, format_text(bytes, strlen(bytes)));
> 
> What is the recommended way of creating a text string that uses the octal 
> encoding '\xxx' for non-ASCII data including nul characters that uses the 
> 'length' field of 'proto_tree_add_item'?

The right short-term way would be to use proto_tree_add_string_format_value() 
to add the field, and format the string's value yourself, using format_text() 
with a byte count rather than strlen().

The right long-term way is to modify Wireshark so that this works.  The way we 
handle strings should probably be changed so that we:

        store the raw string octets as a counted array, along with the string 
encoding;

        convert the octets from the encoding to UTF-8 *with invalid octets and 
sequences shown as escapes* when displaying the strings;

        convert the octets from the encoding to UTF-8 with invalid octets and 
sequences shown as Unicode REPLACEMENT CHARACTERS when making the string 
available for processing by other software (e.g., "-T fields", etc.) (or 
somehow saying "this isn't a valid string in this encoding);

        somehow arrange that strings with invalid octets or sequences are 
*always* unequal to any character string in packet-matching expressions 
(display/read filters, color "filters", etc.), and perhaps allow strings to be 
compared against octet sequences (e.g. "foobar.name = 
20:20:20:20:20:20:01:00:01:00:48:31:20:20:20:20" matches the raw octets of the 
string), and use that with "Prepare As Filter" etc..

Alternatively, if they're *not* really character strings, display them as a set 
of subfields, with the text part shown as strings and the binary data shown as 
whatever it is, e.g.

        Frobozz text 1: {blanks}
        Frobozz count 1: 1
        Frobozz count 2: 1
        Frobozz text 2: H1{and more blanks}

or whatever it is.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Re: [Wireshark-dev] How to print out string encoded data that contains nul characters?

Reply via email to