Hi, I'm trying to write a script that will extract the value of an attribute from an element using the attribute value of another element as the basis for extraction.
For example, in my situation I have a pre-defined list of main sections and I want to extract the id attribute of the form element and create a dictionary of graphic ID and section number pairs but only for the sections in my pre-defined list but I want to exclude the id value from any section that does not appear on my list. I.e., I want to know the id value for the forms that appear in sections 1 and 3 but not in 2. Boiled down my SGML looks something like this: <main-section no="1"> <form id="graphic_1.tif"> <form id="graphic_2.tif"> <main-section no="2"> <form id="graphic_3.tif"> <main-section no="3"> <form id="graphic_4.tif"> <form id="graphic_5.tif"> <form id="graphic_6.tif"> This is what I have come up with on my own so far. My problem is that I can't seem to pick up the value of the id attribute. Any advice appreciated. Greg ### import os, re, csv root = raw_input("Enter the path where the program should run: ") fname = raw_input("Enter name of the CSV file containing the section numbers: ") sgmlname = raw_input("Enter name of the SGML file to search: ") print given,ext = os.path.splitext(fname) root_name = os.path.join(root,fname) n = given + '.new' outputName = os.path.join(root,n) reader = csv.reader(open(root_name, 'r'), delimiter=',') sections = [] for row in reader: sections.append(row[0]) inputFile = open(os.path.join(root,sgmlname), 'r') illoList ={} while 1: lines = inputFile.readlines() if not lines: break for line in lines: main = re.search(r'(?i)(?m)(?s)<main-section no=\"(\w+)\"', line) id = re.search(r'(?i)id=\"(.*?tif)\"', line) if main is not None and main.group(1) in sections: if id is not None: illoList[illo.group(1)] = main.group(1) -- http://mail.python.org/mailman/listinfo/python-list