"gov" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > > I've just started to learn programming and was told this was a good > place to ask questions :) > > Where I work, we receive large quantities of data which is currently > all printed on large, obsolete, dot matrix printers. This is a problem > because the replacement parts will not be available for much longer. > > So I'm trying to create a program which will capture the fixed width > text file data and convert as well as sort the data (there are several > different report types) into a different format which would allow it to > be printed normally, or viewed on a computer.
Text file data has no concept of "fixed width". Somewhere in your system, text file data is being thrown at your dot matrix printer. It would seem a trivial exercise to simply plug in a newer and probably inexpensive replacement printer. What am I missing here? > I've been reading up on the Regular Expression module and ways in which > to manipulate strings however it has been difficult to think of a way > in which to extract an address. > > Here's an example of the raw text that I have to work with: > <snip> How are you intercepting this text data? Are you replacing your old printer with a Python speaking computer? How will you deliver this data to your Python program? > (the # = any number, and the X's are just regular text) > I would like to extract the address information, but the two different > text objects on the right hand side are difficult to remove. I think > it would be easier if I could just extract a fixed square of > information, but I don't have a clue as to how to go about it. Assuming you know how your Python code will "see" this data - You would need no more than standard Python string handling to perform these tasks. There is no concept of a "fixed square" here. This is a continuous stream of (probably ascii) characters. If you could pick the data up from a file, you would use readline() to build a list of individual lines. If you were picking the data from a serial port, you might assemble the whole thing into one big string and use split(/n) to build your list of lines. Once you had a full record (print page?) as a list of individual lines you could identify each line by it's position in the list *if*, as is likely, each item arrives at the same line position. If not, your code can read each line and test. For example: The line "#######" Seems to immediately precede several address lines " MRS XXX X XXXXXXX" " #####" " ####: " ###-###-#" If you can rely on this you would know that the line "#######" is immediately followed by several lines of an address - up until the empty line. And you can look at each of those address lines and use trim() to remove leading and trailing blanks. Similarly, the line that begins " LANG:" would seem to immediately precede another address. None of this is particularly difficult with standard Python. But then - if we are merely replacing an old printer - We are already working way too hard! Thomas Bartkus -- http://mail.python.org/mailman/listinfo/python-list