Giovanni Bajo wrote: > You need struct.unpack() to parse these datas, and you need custom > packer/unpacker to avoid post-processing the output of unpack() just > because it just knows of basic Python types. In binary structs, there > happen to be *types* which do not map 1:1 to Python types, nor they > are just basic C types (like the ones struct supports). Using custom > formatter is a way to better represent these types (instead of > mapping them to the "most similar" type, and then post-process it). > > In my example, "S" is a basic-type which is a "A 0-terminated 20-byte > string", and expressing it in the struct format with the single > letter "S" is more meaningful in my code than using "20s" and then > post-processing the resulting string each and every time this happens.
Another compelling example is the SSH protocol: http://www.openssh.com/txt/draft-ietf-secsh-architecture-12.txt Go to section 4, "Data Type Representations Used in the SSH Protocols", and it describes the data types used by the SSH protocol. In a perfect world, I would write some custom packers/unpackers for those types which struct does not handle already (like the "mpint" format), so that I could use struct to parse and compose SSH messages. What I ended up doing was writing a new module sshstruct.py from scratch, which duplicates struct's work, just because I couldn't extend struct. Some examples: client.py: cookie, server_algorithms, guess, reserverd = sshstruct.unpack("16b10LBu", data[1:]) client.py: prompts = sshstruct.unpack("sssu" + "sB"*num_prompts, pkt[1:]) connection.py: pkt = sshstruct.pack("busB", SSH_MSG_CHANNEL_REQUEST, self.recipient_number, type, reply) + custom kex.py: self.P, self.G = sshstruct.unpack("mm",pkt[1:]) Notice for instance how "s" is a SSH string and unpacks directly to a Python string, and "m" is a SSH mpint (infinite precision integer) but unpacks directly into a Python long. Using struct.unpack() this would have been impossible and would have required much post-processing. Actually, another thing that struct should support to cover the SSH protocol (and many other binary protocols) is the ability to parse strings whose size is not known at import-time (variable-length data types). For instance, type "string" in the SSH protocol is a string prepended with its size as uint32. So it's actual size depends on each instance. For this reason, my sshstruct did not have the equivalent of struct.calcsize(). I guess that if there's a way to extend struct, it would comprehend variable-size data types (and calcsize() would return -1 or raise an exception). -- Giovanni Bajo -- http://mail.python.org/mailman/listinfo/python-list