Hi comp.lang.python, I am a novice Python programmer working on a project where I deal with large binary files (>50 GB each) consisting of a series of variable sized data packets.
Each packet consists of a small header with size and other information and a much large payload containing the actual data. Using Python 2.5, struct and numpy arrays I am capable of parsing such a file quite efficiently into Header and Payload objects which I then manipulate in various ways. The most time consuming part of the parsing is the conversion of a proprietary form of 32 bit floats into the IEEE floats used internally in Python in the payloads. For many use cases I am actually not interested in doing the parsing of the payload right when I pass through it, as I may want to use the attributes of the header to select the 1/1000 payload which I actually have to look into the data for and do the resourceful float conversion. I would therefore like to have two variants of a Payload class. One which is instantiated right away with the payload being parsed up in the float arrays available as instance attributes and another variant, where the Payload object at the time of instantiation only contains a pointer to the place (f.tell()) in file where the payload begins. Only when the non-existing attribute for a parsed up module is actully accessed should the data be read, parsed up and the attribute created. In pseudocode: class PayloadInstant(object): """ This is a normal Payload, where the data are parsed up when instantiated """ @classmethod def read_from_file(cls, f, size): """ Returns a PayloadInstant instance with float data parsed up and immediately accessible in the data attribute. Instantiation is slow but after instantiation, access is fast. """ def __init___(self, the_data): self.data = the_data class PayloadOnDemand(object): """ Behaves as a PayloadInstant object, but instantiation is faster as only the position of the payload in the file is stored initially in the object. Only when acessing the initially non-existing data attribute are the data actually read and the attribure created and bound to the instance. This will actually be a little slower than in PayloadInstant as the correct file position has to be seeked out first. On later calls the object has as efficient attribute access as PayloadInstant """ @classmethod def read_from_file(cls, f, size): pos = f.tell() f.seek(pos + size) #Skip to end of payload return cls(pos) # I probably need some __getattr__ or __getattribute__ magic here...?? def __init__(self, a_file_position): self.file_position = a_file_position My question is this a a pyhtonic way to do it, and they I would like a hint as to how to make the hook inside the PayloadOnDemand class, such that the inner lazy creation of the attribute is completely hidden from the outside. I guess I could also just make a single class, and let an OnDemand attribute decide how it should behave. My real application is considerably more complicated than this, but I think the example grasps the problem in a nutshell. -- Slaunger -- http://mail.python.org/mailman/listinfo/python-list