Sorry to reply this late guys - I cannot access news from Work, and Google Groups cannot reply to a message so I had to do it at home. Let me address a few of the remarks and questions you guys asked:
First of all, the example I gave was just that - an example. Yes, I know Python starts with 0, and I know that you cannot fit a 4-digit number in 2 positions, this was just to give the idea. To clarify, at THIS moment I need to browse 1-80 Mb size tekstfiles. At this moment, I have 16 different record definitions, numbered A,B, C1-C8, D-H. Each record definition has 20-60 different attributes. Not only that, but these formats change regularly; and I want to create or use something I can use on *other* applications or sites as well. As I said, I have encountered the type of problem I've described in numberous places already. > John wrote: > I have a Python script that takes layout info and an input file and can > produce an output file in one of two formats: Yes John, I was thinking along these lines myself. The problem is that I have to parse several of these large files each day (debugging) and browsing converted output seems just to tedious and inefficient. I would REALLY like a GIU, and preferable something portable I can re-use later on. > This should be pretty easy. If each record is CRLF terminated, then you > can get one record at a time simply by iterating over the file ("for line > in open('myfile.dat'): ..."). Jeff, this was indeed the way I was thinking. But instead of iterating I need the ability to browse forward and backward. > You can have a dictionary of classes or factory functions, one for each > record type, keyed off of the 2-character identifier. Each class/factory > would know the layout of that record type, and return a(n) > instance/dictionary with fields separated out into attributes/items. This is of course a clean approach, but would mean re-coding every time a records is changed - frequently! I really would like to edit only a data definition file. > The trickiest part would be in displaying the data; you could potentially > use COM to insert it into a Word or Excel document, or code your own GUI > in Python. The former would be pretty easy if you're happy with fairly > simple formatting; the latter would require a bit more effort, but if you > used one of Python's RAD tools (Boa Constructor, or maybe PythonCard, as > examples) you'd be able to get very nice results. I will at least look into Boa and PythonCard. Thanks for the hint. > This is plausible only under the condition that Santa Claus is paying > you $X per class/factory or per line of code, or you are so speed-crazy > that you are machine-generating C code for the factories. Unfortunately, neither is the case :) > I'd suggest "data driven" Yeah! > Then you need a function to load this layout file into dictionaries, > and build cross-references field_name -> field_number (0,1,2,...) and > vice versa. > As your record name is not in a fixed position in the record, you will > also need to supply a function (file_type, record_string) -> > record_name. I thought about supplying a flat ASCII definition such as: [record type] <TAB> [fieldname] <TAB> [start] <TAB> [end] > Then you have *ONE* function that takes a file_type, a record_name, and > a record_string, and gives you a list of the values. That is all you > need for a generic browser application. I like this. > You *don't* have to hand-craft a class for each record type. And you > wouldn't want to, if you were dealing with files whose spec keeps on > having fields added and fields obsoleted. Exactly. > I think that's overly pessimistic. I *was* presuming a case where the > number of record types was fairly small, and the definitions of those > records reasonably constant. For ~10 or fewer types whose spec doesn't > change, hand-coding the conversion would probably be quicker and/or more > straightforward than writing a spec-parser as you suggest. Unfortunately, all wrong :) Lots of records, lots of changes, lots of different record types - hardcoding doesnt seem the right way. > "Parse"? No parsing, and not much code at all: The routine to "load" > (not "parse") the layout from the layout.csv file into dicts of dicts > is only 35 lines of Python code. The routine to take an input line and > serve up an object instance is about the same. It does more than the > OP's browsing requirement already. The routine to take an object and > serve up a correctly formatted output line is only 50 lines of which > 1/4 is comment or blank. John,do you have suggestions where I can find examples of these functions? I can program, but not being proficient in Python, any help or examples I can adapt would be nice > Also, files used to "create printed pages by > an external company" (especially by a company that had "leaseplan" in > its e-mail address) would indicate "many" and "complicated" to me. How right you are. Think about production runs of 150.000 invoices, each invoice consisting of 2-10 records, and you are on the right track. > I suspect > that we're both assuming a case similar to our own personal > experiences, which are different enough to lead to different > preferred solutions. ;) Seconded. > My personal experiences and attitudes: (1) extreme aversion to having > to type (correctly) lots of numbers (column positions and lengths), and > to having to mentally translate start = 663, len = 13 to [662:675] or > having ugliness like [663-1:663+13-1] (2) cases like 17 record types > and 112 fields in one file, 8 record types and 86 fields in a second -- > this being a new relatively clean simple exercise in exchanging files > with a government department (3) Past history of this govt dept is that >there are at least another 7 file types in regular use and they change > the _major_ version number of each file type about once a year on >average (3) These things tend to start out deceptively small and simple >and turn into monsters. Our experiences are remarkably similair... Cheers, Paul -- http://mail.python.org/mailman/listinfo/python-list