On 1/06/2006 9:52 PM, Giovanni Bajo wrote: > John Machin wrote: > >>> given the ongoing work on struct (which I thought was a dead >>> module), I was wondering if it would be possible to add an API to >>> register custom parsing codes for struct. Whenever I use it for >>> non-trivial tasks, I always happen to write small wrapper functions >>> to adjust the values returned by struct. >>> >>> An example API would be the following: >>> >>> ============================================ >>> def mystring_len(): >>> return 20 >>> >>> def mystring_pack(s): >>> if len(s) > 20: >>> raise ValueError, "a mystring can be at max 20 chars" >>> s = (s + "\0"*20)[:20] >> Have you considered s.ljust(20, "\0") ? > > Right. This happened to be an example... > >>> s = struct.pack("20s", s) >>> return s >> I am an idiot, so please be gentle with me: I don't understand why you >> are using struct.pack at all:
Given a choice between whether I was referring to the particular instance of using struct.pack two lines above, or whether I was doubting the general utility of the struct module, you appear to have chosen the latter, erroneously. > > Because I want to be able to parse largest chunks of binary datas with custom > formatting. Did you miss the whole point of my message: No. > > struct.unpack("3liiSiiShh", data) > > You need struct.unpack() to parse these datas, and you need custom > packer/unpacker to avoid post-processing the output of unpack() just because > it > just knows of basic Python types. In binary structs, there happen to be > *types* > which do not map 1:1 to Python types, nor they are just basic C types (like > the > ones struct supports). Using custom formatter is a way to better represent > these types (instead of mapping them to the "most similar" type, and then > post-process it). > > In my example, "S" is a basic-type which is a "A 0-terminated 20-byte string", > and expressing it in the struct format with the single letter "S" is more > meaningful in my code than using "20s" and then post-processing the resulting > string each and every time this happens. > > >>>>>> import struct >>>>>> x = ("abcde" + "\0" * 20)[:20] >>>>>> x >> 'abcde\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>>>>> len(x) >> 20 >>>>>> y = struct.pack("20s", x) >>>>>> y == x >> True >> Looks like a big fat no-op to me; you've done all the heavy lifting >> yourself. > > Looks like you totally misread my message. Not at all. Your function: def mystring_pack(s): if len(s) > 20: raise ValueError, "a mystring can be at max 20 chars" s = (s + "\0"*20)[:20] s = struct.pack("20s", s) return s can be even better replaced by (after reading the manual "For packing, the string is truncated or padded with null bytes as appropriate to make it fit.") by: def mystring_pack(s): if len(s) > 20: raise ValueError, "a mystring can be at max 20 chars" return s # return s = (s + "\0"*20)[:20] # not needed, according to the manual # s = struct.pack("20s", s) # As I said, this particular instance of using struct.pack is a big fat no-op. > Your string "x" is what I find in > binary data, and I need to *unpack* into a regular Python string, which would > be "abcde". > And you unpack it with a custom function that also contains a fat no-op: def mystring_unpack(s): assert len(s) == 20 s = struct.unpack("20s", s)[0] # does nothing idx = s.find("\0") if idx >= 0: s = s[:idx] return s > >>> idx = s.find("\0") >>> if idx >= 0: >>> s = s[:idx] >>> return s >> Have you considered this: >> >>>>>> z.rstrip("\0") >> 'abcde' > > > This would not work because, in the actual binary data I have to parse, only > the first \0 is meaningful and terminates the string (like in C). There is > absolutely no guarantees that the rest of the padding is made of \0s as well. Point taken. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list