[Wireshark-dev] Re: Inquiry Regarding Protocol Identification Process in Wireshark

Jaap Keuter Mon, 31 Mar 2025 14:33:41 -0700

Hi Yoon-Seong Jang,

Thank you for taking an interest in Wireshark and its internals. Let me try to 
discuss your inquiry.

As you may be aware Wireshark is designed to handle all kinds of protocols, at 
any layer of the OSI model (apart from the Physical layer that is). To that end 
it has support for various methods of determining the protocol stack of each 
packet. Please note that I refer to a protocol stack here, not just a single 
protocol. In that Wiresharks’ handling of a packet very much mimics the layered 
concepts of the OSI model.

Starting at the lowest level, the frame, this has associated with it an 
encapsulation type. This encapsulation, as determined by the packet capture 
mechanism, defines the first level of protocol  dissection Wireshark is using 
to look at the packet data. In many cases this is the Ethernet encapsulation, 
therefore the first protocol is often “Ethernet II”.

The Ethernet protocol itself has a Type field, with which it indicates what the 
protocol in the payload of this Ethernet packet is. To forward the payload to 
the appropriate dissector for this Type, the Ethernet dissector exposes a 
'dissector table’. Other dissectors can register with the value for their 
particular protocol at this table, indicating their capability to dissect this 
protocol. 
For example, the IPv4 dissector registers with value 0x0800 at the Ethernet 
Type dissector table, and the UDP dissector registers with value 17 at the IPv4 
Protocol dissector table. 

This same principle, based around a dissector table, is used whenever a 
protocol has a field that identifies the payload protocol. I.e. IPv4 
(protocol), IPv6 (next header), TCP (port number), UDP (port number), LLC 
(DSAP), etc.

Whenever a protocol had an explicit indication of the payload protocol and 
assuming this value is correct, Wireshark can unambiguously determine the type 
of the next protocol layer. However, not all protocols have such a field!. For 
example MPLS only has a label stack. Once it indicates ‘bottom of stack’ what 
follows is a packet of some unknown protocol.
This is where two other options of protocol identification come into play. The 
first is heuristics, the second is user configuration.

Heuristics is a method of looking at a sample of the payload and making an 
attempt to guess the right protocol. Some protocols have a distinct signature 
to them, while others might be harder to identify. When heuristics have to be 
used the dissector exposes a heuristic dissector table where dissectors can 
register their interest attempting to identify the payload as their protocol. 
Whenever a payload comes by it is presented to these registered dissectors. If 
the first dissector indicates to not recognise it, the payload is handed to the 
next, until a dissectors indicates that is has recognised the protocol and 
takes care of the dissection of it. 
Heuristics are a good as the identification of the payload can be. This is not 
without flaw, so Wireshark can not unambiguously determine the type of the next 
protocol layer. It can only make a best effort attempt.

When heuristics are of too poor quality (as in, it is not possible to determine 
the protocol from the payload with enough certainty) the dissector can also 
expose a user configurable dissector table. In this table some characteristic 
of the protocol content is used to determine the next protocol layer. This has 
to be defined by the user.
For instance, the MPLS dissector can make an attempt to heuristically attempt 
identify the next protocol layer, or the user can set the next protocol layer 
based on the last label on the stack. Even though the relationship between a 
MPLS label and the protocol contained in the MPLS payload is totally ambiguous, 
it can be present for this particular capture file. Needless to say these 
mappings often need adjustment from capture to capture.

One final method that exists are signalling protocols. These protocols are used 
to communicate the mapping between certain oblique identifiers in protocols and 
the interpretation of the protocol data. For instance RTP packets contain sound 
data, but its encoding is identified by a single 7-bit payload type value. For 
some values the encoding is defined (e.g. 0 = PCM u-law), but other values are 
dynamic. For this the SDP protocol is used, which defines the mapping between 
RTP payload type and codec used. This is an example of Out-of-Band signalling, 
a separate protocol is used to define the relationship between identifier and 
protocol. Wireshark can store these mappings learned from the signalling 
protocol and apply them in other protocols.
Whenever a protocol uses In-band signalling the solution is the same, Wireshark 
can store these mappings and use them with subsequent packets in the capture 
file.

To recap, these general methods exist in Wireshark to determine the protocol 
stack in a packet:

1. Explicit, through some protocol type field
2. Heuristically, through payload examination
3. User configuration
4. In-Band, or Out-of-Band signalling 

Regards,
Jaap

> On 31 Mar 2025, at 08:35, brave1094 <brave1...@korea.ac.kr> wrote:
> 
> Dear Wireshark Team,
> 
> My name is Yoon-Seong Jang, a combined Master's and Ph.D. student at Korea 
> University in the Republic of Korea.
> 
> We are currently conducting research focused on analyzing various types of 
> application traffic and malicious traffic, with the goal of classifying them 
> using deep learning techniques.
> 
> In this process, Wireshark has been an invaluable tool and is widely used in 
> our research.
> 
> The reason I am reaching out via email is to ask about how Wireshark 
> determines the protocol of each packet or flow when decoding a given pcap 
> file.
> 
> From our observations, it seems that the protocol is often determined based 
> on the port number. However, we would greatly appreciate a more objective 
> explanation or documentation regarding the actual rules or logic used by 
> Wireshark for protocol decoding.
> 
> A detailed explanation would be extremely helpful for our research.
> 
> Thank you very much for taking the time to read this email despite your busy 
> schedule.
> 
> Sincerely,
> Yoon-Seong Jang
>

_______________________________________________
Wireshark-dev mailing list -- wireshark-dev@wireshark.org
To unsubscribe send an email to wireshark-dev-le...@wireshark.org

[Wireshark-dev] Re: Inquiry Regarding Protocol Identification Process in Wireshark

Reply via email to