"Ernesto" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I'm still fairly new to python, so I need some guidance here... > > I have a text file with lots of data. I only need some of the data. I > want to put the useful data into an [array of] struct-like > mechanism(s). The text file looks something like this: > > [BUNCH OF NOT-USEFUL DATA....] > > Name: David > Age: 108 Birthday: 061095 SocialSecurity: 476892771999 > > [MORE USELESS DATA....] > > Name........ > > I would like to have an array of "structs." Each struct has > > struct Person{ > string Name; > int Age; > int Birhtday; > int SS; > } > > I want to go through the file, filling up my list of structs. > > My problems are: > > 1. How to search for the keywords "Name:", "Age:", etc. in the file... > 2. How to implement some organized "list of lists" for the data > structure. > > Any help is much appreciated. > Ernesto -
Since you are searching for keywords and matching fields, and trying to populate data structures as you go, this sounds like a good fit for pyparsing. Pyparsing as built-in features for scanning through text and extracting data, with suitably named data fields for accessing later. Download pyparsing at http://pyparsing.sourceforge.net. -- Paul ------------------------------------------------ from pyparsing import * inputData = """[BUNCH OF NOT-USEFUL DATA....] Name: David Age: 108 Birthday: 061095 SocialSecurity: 476892771999 [MORE USELESS DATA....] Name: Fred Age: 101 Birthday: 061065 SocialSecurity: 587903882000 [MORE USELESS DATA....] Name: Barney Age: 99 Birthday: 061265 SocialSecurity: 698014993111 [MORE USELESS DATA....] """ dob = Word(nums,exact=6) # this matches your sample data, but I think SSN's are only 9 digits long socsecnum = Word(nums,exact=12) # define the personalData pattern - use results names to associate # field names with matched tokens, can then access data as if they were # attributes on an object personalData = ( "Name:" + empty + restOfLine.setResultsName("Name") + "Age:" + Word(nums).setResultsName("Age") + "Birthday:" + dob.setResultsName("Birthday") + "SocialSecurity:" + socsecnum.setResultsName("SS") ) # use personData.scanString to scan through the input, returning the matching # tokens, and their respective start/end locations in the string for person,s,e in personalData.scanString(inputData): print "Name:", person.Name print "Age:", person.Age print "DOB:", person.Birthday print "SSN:", person.SS print # or use a list comp to scan the whole file, and return your Person data, giving you # your requested array of "structs" - not really structs, but ParseResults objects persons = [person for person,s,e in personalData.scanString(inputData)] # or convert to Python dict's, which some people prefer to pyparsing's ParseResults persons = [dict(p) for p,s,e in personalData.scanString(inputData)] print persons[0] print # or create an array of Person objects, as suggested in previous postings class Person(object): def __init__(self,parseResults): self.__dict__.update(dict(parseResults)) def __str__(self): return "Person(%s, %s, %s, %s)" % (self.Name,self.Age,self.Birthday,self.SS) persons = [Person(p) for p,s,e in personalData.scanString(inputData)] for p in persons: print p.Name,"->",p -------------------------------------- prints out: Name: David Age: 108 DOB: 061095 SSN: 476892771999 Name: Fred Age: 101 DOB: 061065 SSN: 587903882000 Name: Barney Age: 99 DOB: 061265 SSN: 698014993111 {'SS': '476892771999', 'Age': '108', 'Birthday': '061095', 'Name': 'David'} David -> Person(David, 108, 061095, 476892771999) Fred -> Person(Fred, 101, 061065, 587903882000) Barney -> Person(Barney, 99, 061265, 698014993111) -- http://mail.python.org/mailman/listinfo/python-list