On 2019-03-29 16:34:35 +0000, Paul Moore wrote: > On Fri, 29 Mar 2019 at 16:16, Peter J. Holzer <hjp-pyt...@hjp.at> wrote: > > > Obviously you need some way to describe the specific binary format you > > want to parse - in other words, a grammar. The library could then use > > the grammar to parse the input - either by interpreting it directly, or > > by generating (Python) code from it. The latter has the advantage that > > it has to be done only once, not every time you want to parse a file. > > > > If that sounds familiar, it's what yacc does. Except that it does it for > > text files, not binary files. I am not aware of any generic binary > > parser generator for Python. I have read research papers about such > > generators for (I think) C and Java, but I don't remember the names and > > I'm not sure if the generators got beyond the proof of concept stage. > > That's precisely what I'm looking at. The construct library > (https://pypi.org/project/construct/) basically does that, but using a > DSL implemented in Python rather than generating Python code from a > grammar.
Good to know. I'll add that to my list of Tools Which I'm Not Likely To Use Soon But Which May Be Useful Some Day. > However, the resulting parser works, but it gives horrible error > messages. This is a normal problem with generated parsers, there are > plenty of books and articles covering how to persuade tools like yacc > to produce usable error reports on parse failures. Yeah, that still seems to be an unsolved problem. > I don't know which solution I'll ultimately use, but it's an > interesting exercise doing it both ways. And parsing binary data, > unlike parsing text, is actually easy enough that hand crafting a > parser isn't that much of a bother - maybe that's why there's less > existing work in this area. I'm a bit sceptical about that. Writing a hand-crafted parser for most text-based grammars isn't that hard either, but there are readily- available tools (like yacc), so people use them (despite problems like horrible error messages). For binary protocols, such tools are much less well-known. It may be true that binary grammars seem simpler. But in practice there are lots and lots of security holes because hand-crafted parsers tend to use un-warranted shortcuts (see heart-bleed or the JPEG parsing bug of the week), which an automatically generated parser would not take. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list