On Jul 1, 2011, at 8:07 AM, Matt Godbolt wrote:

> I've just hit an issue where an Endace packet file (ERF) that I'm trying to 
> load into wireshark is being incorrectly loaded as a "packetlogger" file type.
> 
> From looking at the source, the packetlogger_open() call doesn't to seem to 
> be very restrictive - I can see how it could generate false positives.  I can 
> also see from file_access.c that packetlogger files have sometimes been 
> mis-identified as mpegs.

Part of the problem is that "magic numbers" are ultimately just a form of 
heuristic, as there's no guarantee that a file that has the magic number in the 
appropriate location is a file of the type corresponding to that magic number.

Some magic numbers are probably pretty strong - I suspect relatively few 
non-pcap files start with A1 B2 C3 D4 or D4 C3 B2 A1.

Some magic numbers, not so much - there are probably plenty of files beginning 
with 00 00 01 that aren't MPEG-2 packetized elementary streams.

> An obvious solution would be to move the erf_open routine above 
> packetlogger_open, which would also appear require moving netscreen_open 
> above too (false positives there too)...
> 
> Given how fragile this whole process is, would that be safe - and how might I 
> go about testing that I haven't broken anything else if I were to do so?
> 
> Failing all that; there's quite a simple way to detect ERFs (in the case that 
> I'm seeing...) - relying on the '.erf' at the end of the filename. Presumably 
> that's a no-go for other reasons.

The file suffix is not an *absolute* guarantee of file type, for several 
reasons:

        1) some files are generated by UN*X command-line programs (e.g., 
tcpdump) and might not even attempt to enforce a file suffix on the files they 
write;

        2) some files were generated by classic Mac OS applications (e.g, 
EtherPeek) and didn't use suffixes (relying on type and creator code, probably, 
which is why the old *Peek format didn't have a magic number, either);

        3) some files are text files, so if they have suffixes at all, it's 
probably ".txt";

        4) some suffixes are used by multiple programs with their own different 
binary formats, such as ".cap".

However, it can be used as a *hint*, just as data in the file can be used as a 
hint.  For example, files whose "standard" creator gives them a suffix, such as 
PacketLogger, could perhaps be sorted later in the list, *but* have their open 
routine called *before* the open routine for weak-heuristic or no-heuristic 
file formats *if* the file suffix matches their specified file suffix.

I.e., for a file whose name ends in ".pklg", packetlogger_open() would be 
called before erf_open() or mpeg_open(), but, for a file whose name *doesn't* 
end in ".pklg", it would be called *after* erf_open() or mpeg_open().

(We might also want to split mpeg_open() into separate routines, so that the 
routine that checks for MPEG-2 packetized elementary streams comes later in the 
list, due to the relative weakness of its magic number.)

The files with reasonably strong magic numbers would still be checked first 
(especially if many of the files have no suffix or a non-standard suffix, such 
as pcap or pcap-ng files).
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Reply via email to