Hi, Below your solution ready to run. Put get_statistics () in a loop that feeds it the names from your file, makes an ouput file name from it and passes both 'statistics' and the ouput file name to file_statistics ().
Cheers, Frederic ----- Original Message ----- From: <[EMAIL PROTECTED]> Newsgroups: comp.lang.python To: <python-list@python.org> Sent: Monday, July 24, 2006 5:48 PM Subject: Parsing Baseball Stats > I would like to parse a couple of tables within an individual player's > SHTML page. For example, I would like to get the "Actual Pitching > Statistics" and the "Translated Pitching Statistics" portions of Babe > Ruth page (http://www.baseballprospectus.com/dt/ruthba01.shtml) and > store that info in a CSV file. > > Also, I would like to do this for numerous players whose IDs I have > stored in a text file (e.g.: cobbty01, ruthba01, speaktr01, etc.). > These IDs should change the URL to get the corresponding player's > stats. Is this doable and if yes, how? I have only recently finished > learning Python (used the book: How to Think Like a Computer Scientist: > Learning with Python). Thanks for your help... > > -- > http://mail.python.org/mailman/listinfo/python-list import SE, urllib Tag_Stripper = SE.SE ('"~<.*?>~= " "~<[^>]*~=" "~[^<]*>~=" ') CSV_Maker = SE.SE (' "~\s+~=(9)" ') # SE is the hacker's Swiss army knife. You find it in the Cheese Shop. # It strips your tags and puts in the CSV separator and if you needed other # translations, it would do those too on two lines of code. # If you don't want tabs, define the CSV_Maker accordingly, putting # your separator in the place of '(9)': # CSV_Maker = SE.SE ('"~\s+~=,"') # Now it's a comma def get_statistics (name_of_player): statistics = { # Uncomment those you want # 'Actual Batting Statistics' : [], 'Actual Pitching Statistics' : [], # 'Advanced Batting Statistics' : [], 'Advanced Pitching Statistics' : [], # 'Fielding Statistics as Center Fielder' : [], # 'Fielding Statistics as First Baseman' : [], # 'Fielding Statistics as Left Fielder' : [], # 'Fielding Statistics as Pitcher' : [], # 'Fielding Statistics as Right Fielder' : [], # 'Statistics as DH/PH/Other' : [], # 'Translated Batting Statistics' : [], # 'Translated Pitching Statistics' : [], } url = 'http://www.baseballprospectus.com/dt/%s.shtml' % name_of_player htm_page = urllib.urlopen (url) htm_lines = htm_page.readlines () htm_page.close () current_list = None for line in htm_lines: text_line = Tag_Stripper (line).strip () if line.startswith ('<h3'): if statistics.has_key (text_line): current_list = statistics [text_line] current_list.append (text_line) else: current_list = None else: if current_list != None: if text_line: current_list.append (CSV_Maker (text_line)) return statistics def show_statistics (statistics): for category in statistics: for record in statistics [category]: print record print def file_statistics (file_name, statistics): f = file (file_name, 'wa') for category in statistics: f.write ('%s\n' % category) for line in statistics [category][1:]: f.write ('%s\n' % line) f.close () -- http://mail.python.org/mailman/listinfo/python-list