James Pruitt wrote:
I am looking for a way given a number of files, say 3, that represent technical support tickets in the same format to generate regular expressions for the different fields automatically.

An example from of one line from each file:
Date: 12/30/2008 Room: 457 Building: Main
Date: 12/31/2008 Room: A21 Building: Annex
Date: 1/4/2009 Room: L69 Building: Library

The program would then, possibly using the python diff library, generate the regular expression needed to parse out different fields. In this case it might return a tuple like ("^Date:[\w]+(.*)[\w]+Room","Room:[\w]+(.*)[\w]+Building","Building:[\w]+(.*)[\w]+$") that would match each of the fields based on the common data and sort of assume that what doesn't change between them is data we are looking for.

Why not just assume that each field consists of a word terminated by a colon, then some text, then the next field or the end of the line?
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to