"manstey" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > > I have a text file with about 450,000 lines. Each line has 4-5 fields, > separated by various delimiters (spaces, @, etc). > > I want to load in the text file and then run routines on it to produce > 2-3 additional fields. >
<snip> Matthew - If you find re's to be a bit cryptic, here is a pyparsing version that may be a bit more readable, and will easily scan through your input file: ================ from pyparsing import OneOrMore, Word, alphas, oneOf, restOfLine, lineno data = """gee fre asd[234 ger dsf asd[243 gwer af as.:^25a""" # define format of input line, that is: # - one or more words, composed of alphabetic characters, periods, and colons # - one of the characters '[' or '^' # - the rest of the line entry = OneOrMore( Word(alphas+".:") ) + oneOf("[ ^") + restOfLine # scan for matches in input data - for each match, scanString will # report the matching tokens, and start and end locations for toks,start,end in entry.scanString(data): print toks print # scan again, this time generating additional fields for toks,start,end in entry.scanString(data): tokens = list(toks) # change these lines to implement your # desired generation code - couldn't guess # what you wanted from your example tokens.append( toks[0]+toks[1] ) tokens.append( toks[-1] + toks[-1][-1] ) tokens.append( str( lineno(start, data) ) ) print tokens ================ prints: ['gee', 'fre', 'asd', '[', '234'] ['ger', 'dsf', 'asd', '[', '243'] ['gwer', 'af', 'as.:', '^', '25a'] ['gee', 'fre', 'asd', '[', '234', 'geefre', '2344', '1'] ['ger', 'dsf', 'asd', '[', '243', 'gerdsf', '2433', '2'] ['gwer', 'af', 'as.:', '^', '25a', 'gweraf', '25aa', '3'] You asked about data structures specifically. The core collections in python are lists, dicts, and more recently, sets. Pyparsing returns tokens from its matching process using a pyparsing-defined class called ParseResults. Fortunately, using Python's "duck-typing" model, you can treat ParseResults objects just like a list, or like a dict if you have assigned names to the fields in the parsing expression. Download pyparsing at http://pyparsing.sourceforge.net. -- Paul -- http://mail.python.org/mailman/listinfo/python-list