+1 on the idea of supporting variable-length strings with the length encoded in the preceding packed element!
Several months ago I was trying to write a parser and writer of PostgreSQL's COPY ... WITH BINARY format. I started out trying to implement it in pure python using the struct module. Due to the existence of variable-length strings encoded in precisely the way you mention, it was not possible to parse an entire row of data without invoking any pure-python-level logic. This made the implementation infeasibly slow. I had to switch to using cython to get it done fast enough (implementation is here: https://github.com/spitz-dan-l/postgres-binary-parser). I believe that with this single change ($, or whatever format specifier one wishes to use), assuming it were implemented efficiently in c, I could have avoided using cython and gotten a satisfactory level of performance with the struct module and python/numpy's already-performant bytestring manipulation faculties. -Dan Spitz On Wed, Jan 18, 2017 at 5:32 AM Elizabeth Myers <[email protected]> wrote: > Hello, > > I've noticed a lot of binary protocols require variable length > bytestrings (with or without a null terminator), but it is not easy to > unpack these in Python without first reading the desired length, or > reading bytes until a null terminator is reached. > > I've noticed the netstruct library > (https://github.com/stendec/netstruct) has a format specifier, $, which > assumes the previous type to pack/unpack is the string's length. This is > an interesting idea in of itself, but doesn't handle the null-terminated > string chase. I know $ is similar to pascal strings, but sometimes you > need more than 255 characters :p. > > For null-terminated strings, it may be simpler to have a specifier for > those. I propose 0, but this point can be bikeshedded over endlessly if > desired ;) (I thought about using n/N but they're :P). > > It's worth noting that (maybe one of?) Perl's equivalent to the struct > module, whose name escapes me atm, has a module which can handle this > case. I can't remember if it handled variable length or zero-terminated > though; maybe it did both. Perl is more or less my 10th language. :p > > This pain point is an annoyance imo and would greatly simplify a lot of > code if implemented, or something like it. I'd be happy to take a look > at implementing it if the idea is received sufficiently warmly. > > -- > Elizabeth > _______________________________________________ > Python-ideas mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
