On 3/23/07, Bjoern Schliessmann <[EMAIL PROTECTED]> wrote: > "one blank line" == "EOF"? That's strange. Intended?
In my case, I know my input data doesn't have any blank lines. However, I'm glad you (and others) clarified the issue, because I wasn't aware of the better methods for checking for EOF. > > Example 2: read lines into objects: > > # begin readobjects.py > > import sys, time > > class FileRecord: > > def __init__(self, line): > > self.line = line > > What's this class intended to do? Store a line :) I just wanted to post two runnable examples. So the above class's real intention is just to be a (contrived) example. In the program I actually wrote, my class structure was a bit more interesting. After storing the input line, I'd then call split("|") (to tokenize the line). Each token would then be assigned to an member variable. Some of the member variables turned into ints or floats as well. My input data had three record types; all had a few common attributes. So I created a parent class and three child classes. Also, many folks have suggested operating on only one line at a time (i.e. not storing the whole data set). Unfortunately, I'm constantly "looking" forward and backward in the record set while I process the data (i.e., to process any particular record, I sometimes need to know the whole contents of the file). (This is purchased proprietary vendor data that needs to be converted into our own internal format.) Finally, for what it's worth: the total run time memory requirements of my program is roughly 20x the datafile size. A 200MB file literally requires 4GB of RAM to effectively process. Note that, in addition to the class structure I defined above, I also create two caches of all the data (two dicts with different keys from the collection of objects). This is necessary to ensure the program runs in a semi-reasonable amount of time. Thanks to all for your input and suggestions. I received many more responses than I expected! Matt -- http://mail.python.org/mailman/listinfo/python-list