[Ernesto] > I'm still fairly new to python, so I need some guidance here... > > I have a text file with lots of data. I only need some of the data. I > want to put the useful data into an [array of] struct-like > mechanism(s). The text file looks something like this: > > [BUNCH OF NOT-USEFUL DATA....] > > Name: David > Age: 108 Birthday: 061095 SocialSecurity: 476892771999 > > [MORE USELESS DATA....] > > Name........ > > I would like to have an array of "structs." Each struct has > > struct Person{ > string Name; > int Age; > int Birhtday; > int SS; > } > > I want to go through the file, filling up my list of structs. > > My problems are: > > 1. How to search for the keywords "Name:", "Age:", etc. in the file... > 2. How to implement some organized "list of lists" for the data > structure.
Since you're just starting out in Python, this problem presents an excellent opportunity to learn Python's two basic approaches to text parsing. The first approach involves looping over the input lines, searching for key phrases, and extracting them using string slicing and using str.strip() to trim irregular length input fields. The start/stop logic is governed by the first and last key phrases and the results get accumulated in a list. This approach is easy to program, maintain, and explain to others: # Approach suitable for inputs with fixed input positions result = [] for line in inputData.splitlines(): if line.startswith('Name:'): name = line[7:].strip() elif line.startswith('Age:'): age = line[5:8].strip() bd = line[20:26] ssn = line[45:54] result.append((name, age, bd, ssn)) print result The second approach uses regular expressions. The pattern is to search for a key phrase, skip over whitespace, and grab the data field in parenthesized group. Unlike slicing, this approach is tolerant of loosely formatted data where the target fields do not always appear in the same column position. The trade-off is having less flexibility in parsing logic (i.e. the target fields must arrive in a fixed order): # Approach for more loosely formatted inputs import re pattern = '''(?x) Name:\s+(\w+)\s+ Age:\s+(\d+)\s+ Birthday:\s+(\d+)\s+ SocialSecurity:\s+(\d+) ''' print re.findall(pattern, inputData) Other respondants have suggested the third-party PyParsing module which provides a powerful general-purpose toolset for text parsing; however, it is always worth mastering Python basics before moving on to special purpose tools. The above code fragements are easy to construct and not hard to explain to others. Maintenance is a breeze. Raymond P.S. Once you've formed a list of tuples, it is trivial to create Person objects for your pascal-like structure: class Person(object): def __init__(self, (name, age, bd, ssn)): self.name=name; self.age=age; self.bd=bd; self.ssn=ssn personlist = map(Person, result) for p in personlist: print p.name, p.age, p.bd, p.ssn -- http://mail.python.org/mailman/listinfo/python-list