Sorry to reply this late guys - I cannot access news from Work, and Google
Groups cannot reply to a message so I had to do it at home. Let me address a
few of the remarks and questions you guys asked:
First of all, the example I gave was just that - an example. Yes, I know
Python starts with 0, and I know that you cannot fit a 4-digit number in 2
positions, this was just to give the idea. To clarify, at THIS moment I need
to browse 1-80 Mb size tekstfiles. At this moment, I have 16 different
record definitions, numbered A,B, C1-C8, D-H. Each record definition has
20-60 different attributes.
Not only that, but these formats change regularly; and I want to create or
use something I can use on *other* applications or sites as well. As I said,
I have encountered the type of problem I've described in numberous places
already.
> John wrote:
> I have a Python script that takes layout info and an input file and can
> produce an output file in one of two formats:
Yes John, I was thinking along these lines myself. The problem is that I
have to parse several of these large files each day (debugging) and browsing
converted output seems just to tedious and inefficient. I would REALLY like
a GIU, and preferable something portable I can re-use later on.
> This should be pretty easy. If each record is CRLF terminated, then you
> can get one record at a time simply by iterating over the file ("for line
> in open('myfile.dat'): ...").
Jeff, this was indeed the way I was thinking. But instead of iterating I
need the ability to browse forward and backward.
> You can have a dictionary of classes or factory functions, one for each
> record type, keyed off of the 2-character identifier. Each class/factory
> would know the layout of that record type, and return a(n)
> instance/dictionary with fields separated out into attributes/items.
This is of course a clean approach, but would mean re-coding every time a
records is changed - frequently! I really would like to edit only a data
definition file.
> The trickiest part would be in displaying the data; you could potentially
> use COM to insert it into a Word or Excel document, or code your own GUI
> in Python. The former would be pretty easy if you're happy with fairly
> simple formatting; the latter would require a bit more effort, but if you
> used one of Python's RAD tools (Boa Constructor, or maybe PythonCard, as
> examples) you'd be able to get very nice results.
I will at least look into Boa and PythonCard. Thanks for the hint.
> This is plausible only under the condition that Santa Claus is paying
> you $X per class/factory or per line of code, or you are so speed-crazy
> that you are machine-generating C code for the factories.
Unfortunately, neither is the case :)
> I'd suggest "data driven"
Yeah!
> Then you need a function to load this layout file into dictionaries,
> and build cross-references field_name -> field_number (0,1,2,...) and
> vice versa.
> As your record name is not in a fixed position in the record, you will
> also need to supply a function (file_type, record_string) ->
> record_name.
I thought about supplying a flat ASCII definition such as:
[record type] [fieldname] [start] [end]
> Then you have *ONE* function that takes a file_type, a record_name, and
> a record_string, and gives you a list of the values. That is all you
> need for a generic browser application.
I like this.
> You *don't* have to hand-craft a class for each record type. And you
> wouldn't want to, if you were dealing with files whose spec keeps on
> having fields added and fields obsoleted.
Exactly.
> I think that's overly pessimistic. I *was* presuming a case where the
> number of record types was fairly small, and the definitions of those
> records reasonably constant. For ~10 or fewer types whose spec doesn't
> change, hand-coding the conversion would probably be quicker and/or more
> straightforward than writing a spec-parser as you suggest.
Unfortunately, all wrong :)
Lots of records, lots of changes, lots of different record types -
hardcoding doesnt seem the right way.
> "Parse"? No parsing, and not much code at all: The routine to "load"
> (not "parse") the layout from the layout.csv file into dicts of dicts
> is only 35 lines of Python code. The routine to take an input line and
> serve up an object instance is about the same. It does more than the
> OP's browsing requirement already. The routine to take an object and
> serve up a correctly formatted output line is only 50 lines of which
> 1/4 is comment or blank.
John,do you have suggestions where I can find examples of these functions? I
can program, but not being proficient in Python, any help or examples I can
adapt would be nice
> Also, files used to "create printed pages by
> an external company" (especially by a company that had "leaseplan" in
> its e-mail address) would indicate "many" and "complicated" to me.
How right you are. Think about production r