Browsing text ; Python the right tool?

2005-01-25 Thread Paul Kooistra
I need a tool to browse text files with a size of 10-20 Mb. These
files have a fixed record length of 800 bytes (CR/LF), and containt
records used to create printed pages by an external company.

Each line (record) contains an 2-character identifier, like 'A0' or
'C1'. The identifier identifies the record format for the line,
thereby allowing different record formats to be used in a textfile.
For example:

An A0 record may consist of:
recordnumber [1:4]
name [5:25]
filler   [26:800]

while a C1 record consists of:
recordnumber [1:4]
phonenumber  [5:15]
zipcode  [16:20]
filler   [21:800]

As you see, all records have a fixed column format. I would like to
build a utility which allows me (in a windows environment) to open a
textfile and browse through the records (ideally with a search
option), where each recordtype is displayed according to its
recordformat ('Attributename: Value' format). This would mean that
browsing from a A0 to C1 record results in a different list of
attributes + values on the screen, allowing me to analyze the data
generated a lot easier then I do now, browsing in a text editor with a
stack of printed record formats at hand.

This is of course quite a common way of encoding data in textfiles.
I've tried to find a generic text-based browser which allows me to do
just this, but cannot find anything. Enter Python; I know the language
by name, I know it handles text just fine, but I am not really
interested in learning Python just now, I just need a tool to do what
I want.

What I would REALLY like is way to define standard record formats in a
separate definition, like:
- defining a common record length; 
- defining the different record formats (attributes, position of the
line);
- and defining when a specific record format is to be used, dependent
on 1 or more identifiers in the record.

I CAN probably build something from scratch, but if I can (re)use
something that already exists it would be so much better and faster...
And a utility to do what I just described would be REALLY usefull in
LOTS of environments.

This means I have the following questions:

1. Does anybody now of a generic tool (not necessarily Python based)
that does the job I've outlined?
2. If not, is there some framework or widget in Python I can adapt to
do what I want?
3. If not, should I consider building all this just from scratch in
Python - which would probably mean not only learning Python, but some
other GUI related modules?
4. Or should I forget about Python and build someting in another
environment?

Any help would be appreciated.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Browsing text ; Python the right tool?

2005-01-31 Thread Paul Kooistra
Sorry to reply this late guys - I cannot access news from Work, and Google 
Groups cannot reply to a message so I had to do it at home. Let me address a 
few of the remarks and questions you guys asked:

First of all, the example I gave was just that - an example. Yes, I know 
Python starts with 0, and I know that you cannot fit a 4-digit number in 2 
positions, this was just to give the idea. To clarify, at THIS moment I need 
to browse 1-80 Mb size tekstfiles. At this moment, I have 16 different 
record definitions, numbered A,B, C1-C8, D-H. Each record definition has 
20-60 different attributes.

Not only that, but these formats change regularly; and I want to create or 
use something I can use on *other* applications or sites as well. As I said, 
I have encountered the type of problem I've described in numberous places 
already.

> John wrote:
> I have a Python script that takes layout info and an input file and can
> produce an output file in one of two formats:

Yes John, I was thinking along these lines myself. The problem is that I 
have to parse several of these large files each day (debugging) and browsing 
converted output seems just to tedious and inefficient. I would REALLY like 
a GIU, and preferable something portable I can re-use later on.

> This should be pretty easy.  If each record is CRLF terminated, then you 
> can get one record at a time simply by iterating over the file ("for line 
> in open('myfile.dat'): ...").

Jeff, this was indeed the way I was thinking. But instead of iterating I 
need the ability to browse forward and backward.

> You can have a dictionary of classes or factory functions, one for each 
> record type, keyed off of the 2-character identifier.  Each class/factory 
> would know the layout of that record type, and return a(n) 
> instance/dictionary with fields separated out into attributes/items.

This is of course a clean approach, but would mean re-coding every time a 
records is changed - frequently! I really would like to edit only a data 
definition file.

> The trickiest part would be in displaying the data; you could potentially 
> use COM to insert it into a Word or Excel document, or code your own GUI 
> in Python.  The former would be pretty easy if you're happy with fairly 
> simple formatting; the latter would require a bit more effort, but if you 
> used one of Python's RAD tools (Boa Constructor, or maybe PythonCard, as 
> examples) you'd be able to get very nice results.

I will at least look into Boa and PythonCard. Thanks for the hint.

> This is plausible only under the condition that Santa Claus is paying
> you $X per class/factory or per line of code, or you are so speed-crazy
> that you are machine-generating C code for the factories.

Unfortunately, neither is the case :)

> I'd suggest "data driven"

Yeah!

> Then you need a function to load this layout file into dictionaries,
> and build cross-references field_name -> field_number (0,1,2,...) and
> vice versa.

> As your record name is not in a fixed position in the record, you will
> also need to supply a function (file_type, record_string) ->
> record_name.

I thought about supplying a flat ASCII definition such as:

[record type]  [fieldname]  [start]  [end]

> Then you have *ONE* function that takes a file_type, a record_name, and
> a record_string, and gives you a list of the values. That is all you
> need for a generic browser application.

I like this.

> You *don't* have to hand-craft a class for each record type. And you
> wouldn't want to, if you were dealing with files whose spec keeps on
> having fields added and fields obsoleted.

Exactly.

> I think that's overly pessimistic.  I *was* presuming a case where the 
> number of record types was fairly small, and the definitions of those 
> records reasonably constant.  For ~10 or fewer types whose spec doesn't 
> change, hand-coding the conversion would probably be quicker and/or more 
> straightforward than writing a spec-parser as you suggest.

Unfortunately, all wrong :)

Lots of records, lots of changes, lots of different record types - 
hardcoding doesnt seem the right way.

> "Parse"? No parsing, and not much code at all: The routine to "load"
> (not "parse") the layout from the layout.csv file into dicts of dicts
> is only 35 lines of Python code. The routine to take an input line and
> serve up an object instance is about the same. It does more than the
> OP's browsing requirement already. The routine to take an object and
> serve up a correctly formatted output line is only 50 lines of which
> 1/4 is comment or blank.

John,do you have suggestions where I can find examples of these functions? I 
can program, but not being proficient in Python,  any help or examples I can 
adapt would be nice

> Also, files used to "create printed pages by
> an external company" (especially by a company that had "leaseplan" in
> its e-mail address) would indicate "many" and "complicated" to me.

How right you are. Think about production r