James Pruitt wrote:
I am looking for a way given a number of files, say 3, that represent
technical support tickets in the same format to generate regular
expressions for the different fields automatically.
An example from of one line from each file:
Date: 12/30/2008 Room: 457 Building: Main
Date: 12/31/2008 Room: A21 Building: Annex
Date: 1/4/2009 Room: L69 Building: Library
The program would then, possibly using the python diff library, generate
the regular expression needed to parse out different fields. In this
case it might return a tuple like
("^Date:[\w]+(.*)[\w]+Room","Room:[\w]+(.*)[\w]+Building","Building:[\w]+(.*)[\w]+$")
that would match each of the fields based on the common data and sort of
assume that what doesn't change between them is data we are looking for.
Why not just assume that each field consists of a word terminated by a
colon, then some text, then the next field or the end of the line?
--
http://mail.python.org/mailman/listinfo/python-list