Hi, I'm a total newbie to Python so any and all advice is greatly appreciated.
I'm trying to use regular expressions to process text in an SGML file but only in one section. So the input would look like this: <ch-part no="I"><title>RESEARCH GUIDE <sec-main no="1.01"><title>content <para>content <sec-main no="2.01"><title>content <para>content <ch-part no="II"><title>FORMS <sec-main no="3.01"><title>content <sec-sub1 no="1"><title>content <para>content <sec-sub2 no="1"><title>content <para>content and the output like this: <ch-part no="I"><title>RESEARCH GUIDE <sec-main no="1.01"><title>content <biblio> <para>content </biblio> <sec-main no="2.01"><title>content <biblio> <para>content </biblio> <ch-part no="II"><title>FORMS <sec-main no="3.01"><title>content <sec-sub1 no="1"><title>content <para>content <sec-sub2 no="1"><title>content <para>content But no matter what I try I end up changing the entire file rather than just one part. Here's what I've come up with so far but I can't think of anything else. *** import os, re setpath = raw_input("Enter the path where the program should run: ") print for root, dirs, files in os.walk(setpath): fname = files for fname in files: inputFile = file(os.path.join(root,fname), 'r') line = inputFile.read() inputFile.close() chpart_pattern = re.compile(r'<ch-part no=\"[A-Z]{1,4}\"><title>(RESEARCH)', re.IGNORECASE) while 1: if chpart_pattern.search(line): line = re.sub(r"<sec-main no=(\"[0-9]*.[0-9]*\")><title>(.*)", r"<sec-main no=\1><title>\2\n<biblio>", line) outputFile = file(os.path.join(root,fname), 'w') outputFile.write(line) outputFile.close() break if chpart_pattern.search(line) is None: print 'none' break Thanks, Greg -- http://mail.python.org/mailman/listinfo/python-list