Newbie Count Question
I have a newbie count question. I have a number of SGML documents divided into sections but over the course of editing them the some sections have been deleted (and perhaps others added). I'd like to renumber them. The input documents look like this: and after renumbering I would like the sections to look like this: so they are basically numbered sequentially from 1 thru to the end of the number of sections. I've managed to get this far thanks to looking at other posts on the board but no matter waht I try all of the sections end up being numbered for the total number of sections in the document. e.g., if there are 100 sections in the document the "no" attribute is "1.100" for each one. import os, re setpath = raw_input("Enter the path where the program should run: ") print for root, folders, files in os.walk(setpath): for name in files: filepath = os.path.join(root, name) fileopen = open(filepath, 'r') data = fileopen.read() fileopen.close() secmain_pattern = re.compile(r'', re.IGNORECASE) m = secmain_pattern.search(data) all = secmain_pattern.findall(data) counter = 0 for i in range(0,len(all)): counter = counter + 1 print counter if m is not None: def new_number(match): return '' % (match.group(1), counter) data = secmain_pattern.sub(new_number, data) outputFile = file(os.path.join(root,name), 'w') outputFile.write(data) outputFile.close() Thanks for your help! -- http://mail.python.org/mailman/listinfo/python-list
newbie write to file question
Hi, I'm trying to create a script that will search an SGML file for the numbers and titles of the hierarchical elements (section level headings) and create a dictionary with the section number as the key and the title as the value. I've managed to make some progress but I'd like to get some general feedback on my progress so far plus ask a question. When I run this script on a directory that contains multiple files even the files that don't contain any matches generate log files and usually with the contents of the last file that contained matches. I'm not sure what I'm missing so I'd appreciate some advice. Thanks, Greg Here's a very simplified version of my SGML: section title 1.01 title 1 title 2 title a title b title i section title 2.02 section title 3.03 title 1 title 2 section title 4.04 section title 5.05 And here's what I written so far: import os import re setpath = raw_input("Enter the path where the program should run: ") print table ={} for root, dirs, files in os.walk(setpath): fname = files for fname in files: inputFile = file(os.path.join(root,fname), 'r') while 1: lines = inputFile.readlines(1) if not lines: break for line in lines: main = re.search(r'(?i)\n?(.*?)\n' , line) sub_one = re.search(r'(?i)\n?(.*?)\n' , line) sub_two = re.search(r'(?i)\n?(.*?)\n' , line) sub_three = re.search(r'(?i)\n?(.*?)\n' , line) if main is not None: table[main.group(1)] = main.group(2) m = main.group(1) if main is None: pass if sub_one is not None: one = m + '[' + sub_one.group(1) + ']' table[one] = sub_one.group(2) if sub_one is None: pass if sub_two is not None: two = one + '[' + sub_two.group(1) + ']' table[two] = sub_two.group(2) if sub_two is None: pass if sub_three is not None: three = two + '[' + sub_three.group(1) + ']' table[three] = sub_three.group(2) if sub_three is None: pass str_table = str(table) (name,ext) = os.path.splitext(fname) output_name = name + '.log' outputFile = file(os.path.join(root,output_name), 'w') outputFile.write(str_table) outputFile.close() -- http://mail.python.org/mailman/listinfo/python-list
join dictionaries using keys from one & values
I'm still learning python so this might be a crazy question but I thought I would ask anyway. Can anyone tell me if it is possible to join two dictionaries together to create a new dictionary using the keys from the old dictionaries? The keys in the new dictionary would be the keys from the old dictionary one (dict1) and the values in the new dictionary would be the keys from the old dictionary two (dict2). The keys would be joined by matching the values from dict1 and dict2. The keys in each dictionary are unique. dict1 = {1:'bbb', 2:'aaa', 3:'ccc'} dict2 = {5.01:'bbb', 6.01:'ccc', 7.01:'aaa'} dict3 = {1 : 5.01, 3 : 6.01, 2 : 7.01} I looked at "update" but I don't think it's what I'm looking for. Thanks, Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: join dictionaries using keys from one & values
Thanks so much. I never would have been able to figure this out on my own. def dictionary_join(one, two): dict2x = dict( ((dict2[k], k) for k in dict2.iterkeys())) dict3 = dict(((k, dict2x[v]) for k,v in dict1.iteritems())) print dict3 dict1 = {1:'bbb', 2:'aaa', 3:'ccc'} dict2 = {'5.01':'bbb', '6.01':'ccc', '7.01':'aaa'} dictionary_join(dict1, dict2) -- http://mail.python.org/mailman/listinfo/python-list
Re: join dictionaries using keys from one & values
Thanks again. This is very helpful. -- http://mail.python.org/mailman/listinfo/python-list
Newbie Question: CSV to XML
Hi, I'm learning more and more about Python all the time but I'm still a real newbie. I wrote this little script to convert CSV to XML and I was hoping to get some feedback on it if anyone was willing to comment. It works but I was wondering if there was anything I could do better. E.g., incorporate minidom somehow? But I'm totally in the dark as how I would do this. Thanks, Greg ### #csv to XML conversion utility import os, re, csv root = raw_input("Enter the path where the program should run: ") fname = raw_input("Enter name of the uncoverted file: ") print given,ext = os.path.splitext(fname) root_name = os.path.join(root,fname) n = given + '.xml' outputName = os.path.join(root,n) reader = csv.reader(open(root_name, 'r'), delimiter=',') output = open(outputName, 'w') output.write('\n\n') output.write('\n %s %s \n\n\n' % ('TAS input file for ', given)) for row in reader: for i in range(0, len(row)): if i == 0: output.write('\n\n%s' % (i, row[i])) if i > 0 and i < len(row) - 1: output.write('\n%s' % (i, row[i])) if i == len(row) - 1: output.write('\n%s\n' % (i, row[i])) output.write('\n\n\n\n') output.close() -- http://mail.python.org/mailman/listinfo/python-list
Newbie Question: CSV to XML
Hi, Would anyone be willing to give me some feedback about this little script that I wrote to convert CSV to XML. I'll happily admit that I still have a lot to learn about Python so I'm always grateful for constructive feedback. Thanks, Greg ### #csv to XML conversion utility import os, re, csv root = raw_input("Enter the path where the program should run: ") fname = raw_input("Enter name of the uncoverted file: ") print given,ext = os.path.splitext(fname) root_name = os.path.join(root,fname) n = given + '.xml' outputName = os.path.join(root,n) reader = csv.reader(open(root_name, 'r'), delimiter=',') output = open(outputName, 'w') output.write('\n') output.write('\n %s %s \n\n\n' % ('TAS input file for ', given)) for row in reader: for i in range(0, len(row)): if i == 0: output.write('\n\n%s' % (i, row[i])) if i > 0 and i < len(row) - 1: output.write('\n%s' % (i, row[i])) if i == len(row) - 1: output.write('\n%s\n' % (i, row[i])) output.write('\n\n\n\n') output.close() -- http://mail.python.org/mailman/listinfo/python-list
Newbie ? -- SGML metadata extraction
Hi, I'm trying to write a script that will extract the value of an attribute from an element using the attribute value of another element as the basis for extraction. For example, in my situation I have a pre-defined list of main sections and I want to extract the id attribute of the form element and create a dictionary of graphic ID and section number pairs but only for the sections in my pre-defined list but I want to exclude the id value from any section that does not appear on my list. I.e., I want to know the id value for the forms that appear in sections 1 and 3 but not in 2. Boiled down my SGML looks something like this: This is what I have come up with on my own so far. My problem is that I can't seem to pick up the value of the id attribute. Any advice appreciated. Greg ### import os, re, csv root = raw_input("Enter the path where the program should run: ") fname = raw_input("Enter name of the CSV file containing the section numbers: ") sgmlname = raw_input("Enter name of the SGML file to search: ") print given,ext = os.path.splitext(fname) root_name = os.path.join(root,fname) n = given + '.new' outputName = os.path.join(root,n) reader = csv.reader(open(root_name, 'r'), delimiter=',') sections = [] for row in reader: sections.append(row[0]) inputFile = open(os.path.join(root,sgmlname), 'r') illoList ={} while 1: lines = inputFile.readlines() if not lines: break for line in lines: main = re.search(r'(?i)(?m)(?s)http://mail.python.org/mailman/listinfo/python-list
Re: Newbie ? -- SGML metadata extraction
Thanks. One more question, though. I'm not sure how to limit the scope of my search so that I'm just extracting the id attribute from the sections that I want. I.e., I want the id attributes from the forms in sections 1 and 3 but not from 2. Maybe I'm missing something. -- http://mail.python.org/mailman/listinfo/python-list
Re: Newbie ? -- SGML metadata extraction
Thanks very much for your help. It's greatly appreciated. It look a couple of tries to see what was happening but I've figured it out. Greg -- http://mail.python.org/mailman/listinfo/python-list
Is possible to combine handle_data and regular expressions?
Hi, I've experimented with regular expressions to solve my problems in the past but I have seen so many comments about HTMLParser and sgmllib that I thought I would try a different approach this time so I tried using HTMLParser. I want to search through my SGML file for various strings of text and find out what section they're in. What I have here does this to a certain extent but I was wondering if I could make handle_data and regular expressions work together to make this work a little better. For instance, when I search for "above" as I am here, I just get something like this: '174.114[1]':'above' but this isn't very useful b/c I want to know the context of above (i.e., the informaiton on either side the above) and maybe even us a regular expression to filter the search a little more. Any ideas? As always, I'd appreciate feedback on my efforts. Thanks, Greg ### from HTMLParser import HTMLParser import os, re root = raw_input("Enter the path where the program should run: ") fname = raw_input("Enter name of the file: ") print given,ext = os.path.splitext(fname) inputFile = open(os.path.join(root,fname), 'r') data = inputFile.read() class PartFinder(HTMLParser): _full = None _secDict = dict() def found(self): return self._secDict def handle_starttag(self, tag, attrs): if tag == "sec-main": self._main = dict(attrs).get('no') self._full = self._main if tag == "sec-sub1": self._subone = dict(attrs).get('no') self._full = self._main + '[' + self._subone + ']' if tag == "sec-sub2": self._subtwo = dict(attrs).get('no') self._full = self._main + '[' + self._subone + ']' + '[' + self._subtwo + ']' def handle_data(self, data): if "Pt" in data: if not self._secDict.has_key(self._main): self._secDict[self._full] = [data] print self._secDict if __name__ == "__main__": parser = PartFinder() parser.feed(data) x = parser.found() output_part = given + '.parts' outputFile = file(os.path.join(root,output_part), 'w') outputFile.write(str(x)) outputFile.close() -- http://mail.python.org/mailman/listinfo/python-list
Trying to find a elements Xpath and store it as a attribute
Hi all, I've been struggling with this for a while so I'm hoping that someone could point me in the right direction. Here's my problem: I'm trying to get the XPath for a given node in my document and then store that XPath as an attribute of the element itself. If anyone has a recommendation I'd be happy to hear it. Thanks, Provo For instance, I would take this XML ###before An XSLT Programmer Hello, World! ###after An XSLT Programmer Hello, World! ### import sets import amara from amara import binderytools doc = amara.parse('hello.xml') elems = {} for e in doc.xml_xpath('//*'): paths = elems.setdefault((e.namespaceURI, e.localName), sets.Set()) path = u'/'.join([n.nodeName for n in e.xml_xpath(u'ancestor::*')]) paths.add(u'/' + path) for name in elems: doc.name.km = elems[name] -- http://mail.python.org/mailman/listinfo/python-list
regular expressions, unicode and XML
Hi, I'm hoping someone can help me. I'm hopelessly lost. I'm trying to make a change in some XML files using a regular expression (re.sub). I can capture the text I want to replace OK but when I replace it end up with nothing: i.e., just a "" character in my file. data = re.sub(r'(?i)(?u)Sample Title\—(.*?):', ' Sample Title—\1:', data) I think my problem is that I don't understand unicode or even know how my XML is encoded b/c there is nothing in the XML declaration at the top of the file. I'd be grateful if someone could give a little adive or point me in the right direction. I've read abunch of stuff on the board but nothing seems to click.I'm guessing I have to decode my file when I read it something like this raw = inputFile.read() fileencoding = "utf-8" data = raw.decode(fileencoding) and then write it out similarly but this doesn't seem to work. Any help appreciated, Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: regular expressions, unicode and XML
Thanks for this but I'm still getting an "empty" character (I don't know what else to call it) rather than the text captured by my regular expression in my replaced text. I even added the utf encoding declaration to my input data but still no luck. Any suggestions? -- http://mail.python.org/mailman/listinfo/python-list
NewB question on text manipulation
I'm totally stumped by this problem so I'm hoping someone can give me a little advice or point me in the right direction. I have a file that looks like this: APPEAL40-24; 40-46; 42-46; 42-48; 42-62; 42-63 PROC GUIDE921(b)(1) (i.e., <[chapter name][multiple or single book page ranges][chapter name][multiple or single book page ranges][code] but I want to change it so that it looks like this <1>APPEAL40-241(b)(1) <1>APPEAL40-461(b)(1) <1>APPEAL42-461(b)(1) <1>APPEAL42-481(b)(1) <1>APPEAL42-621(b)(1) <1>APPEAL42-631(b)(1) <1>PROC GUIDE921(b)(1) but I'm not at all sure how to do it. I've come up with a simlple function that will change the order of the text but I'm not sure how to break out def Switch(m): return '%s%s' % (m.group(2), m.group(1)) data = re.sub(r'''<1>(.*?)(.*?)\n''', Switch, data) But I'm still a long way from what I need. Any pointers would be greatly appreciated. Thanks, Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: NewB question on text manipulation
Thanks very much for this I really appreciate it. I've pasted what I've got now thanks to you. I only have one issue that I can't figure out. When I print the new string I'm getting all of the values in the lt list rather than just the one that corresponds to the original entry. E.g., My original data looks like this: <1>FAM LAW ENF259-232-687 <1>APPEAL40-38; 40-44; 44-18; 45-151 I want my output to look like this: <1>FAM LAW ENF259-232-687 <1>APPEAL40-381 <1>APPEAL40-441 <1>APPEAL44-181 <1>APPEAL45-151 But istead I'm getting this -- all of the entries in the lt list are being added to my string when I just want one. I'm not sure how to select just the entry in the lt list that I want. <1>FAM LAW ENF259-232-6871 <1>APPEAL40-38-6871 <1>APPEAL40-44-6871 <1>APPEAL44-18-6871 <1>APPEAL45-15-6871 ### Here's what I've got so far: s_space = " " # a single space s_empty = "" # empty string pat = re.compile("\s*([^<]+)([^<]+)") lst = [] while True: m = pat.search(s) if not m: break title = m.group(1).strip() xc = m.group(2) xc = xc.replace(s_space, s_empty) tup = (title, xc) lst.append(tup) s = pat.sub(s_empty, s, 1) lt = s.strip() for title, xc in lst: lst_pp = xc.split(";") for pp in lst_pp: print "<1>%s%s%s" % (title, pp, lt) -- http://mail.python.org/mailman/listinfo/python-list
Re: NewB question on text manipulation
Thanks again and sorry about the lack of examples. It didn't even occur to me that my example wasn't comprehensive enough when I posted my first message but I can see the issue now. Your solution is really helpful for me to see. I can't tell you how much I apprecaite it. I thought that adding more values to the tuple was the way to go but couldn't get my mind around how to capture the info that I needed. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Looking for help with Regular Expression
Hi, I'm looking for a little advice about regular expressions. I want to capture a string of text that falls between an opening squre bracket and a closing square bracket (e.g., "[" and "]") but I've run into a small problem. I've been using this: '''\[(.*?)\]''' as my pattern. I was expecting this to be greedy but the funny thing is that it's not greedy enough in some situations. Here's my problem: The end of my string sometimes contains a cross reference to a section in a book and the subsections are cited using square brackets exactly like the one I'm using as the ending point in my original regular expression. E.g., the text string in my data looks like this: see discussion in § 512.16[3][b]] But my regular expression is stopping after the first "]" so after I add the new markup the output looks like this: see discussion in § 512.16[3][b]] So the last subsection is outside of the note tag. I want something like this: see discussion in § 512.16[3][b]] I'm not sure how to make my capture more greedy so I've resorted to cleaning up the data after I make the first round of replacements: data = re.sub(r'''\[(\d*?)\]\[(\w)\]\]''', '''[\1][\2]]''', data) There's got to be a better way but I'm not sure what it is. Thanks, Greg -- http://mail.python.org/mailman/listinfo/python-list
Newbie Class/Counter question
Hi, I've always struggled with classes and this one is no exception. I'm working in an SGML file and I want to renumber a couple of elements in the hierarchy based on the previous level. E.g., My document looks like this A. Title Text 1. Title Text 1. Title Text 1. Title Text B. Title Text 1. Title Text 1. Title Text but I want to change the numbering of the second level to sequential numbers like 1, 2, 3, etc. so my output would look like this A. Title Text 1. Title Text 2. Title Text 3. Title Text B. Title Text 1. Title Text 2. Title Text This is what I've come up with on my own but it doesn't work. I was hoping someone could critique this and point me in the right or better direction. Thanks, Greg ### def Fix(m): new = m.group(1) class ReplacePtSubNumber(object): def __init__(self): self._count = 0 self._ptsubtwo_re = re.compile(r'', re.IGNORECASE| re.UNICODE) # self._ptsubone_re = re.compile(r'' % (self._count) new = ReplacePtSubNumber().sub(new) return 'http://mail.python.org/mailman/listinfo/python-list
noobie mkdir problem/question
Hi, I'm trying to write a script that will create a new directory and then write the results to this newly created directory but it doesn't seem to work for me and I don't know why. I'm hoping someone can see my mistake or at least point me in the right direction. I start like this capturing the root directory and making my new "xrefs" directory (I can see the new folder in windows explorer): root = raw_input("Enter the path where the program should run: ") xrefs = os.path.join(root,'xrefs') if (os.path.isdir(xrefs) == 0): os.mkdir(xrefs) else: sys.exit('LOG folder already exists. Exiting program.') ...I do everything else... And then I'm trying to write the results out to xrefs. But instead of writing to xrefs they're written to the original directory, i.e., root. and I'm not sure why. outputFname = given + '.log' outputFile = open(os.path.join(xrefs,outputFname), 'w') outputFile.write(data) outputFile.close() Anyone? Thanks, Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: noobie mkdir problem/question
I understand that but I'm still puzzled. Is this the reason why I can't write files to this directory? The xrefs directory is created the way I expect it would be using mkdir but I can't seem to write to it. I thought that my results would be written to the xrefs directory here but they're ending up in the original folder not the subfolder. outputFile = open(os.path.join(xrefs,outputFname), 'w') outputFile.write(data) outputFile.close() What am I missing? [EMAIL PROTECTED] wrote: > if (os.path.isdir(xrefs) == 0): > os.mkdir(xrefs) > > > > os.path.isdir(stuff) returns > True or False -- http://mail.python.org/mailman/listinfo/python-list