[EMAIL PROTECTED] wrote: > def simplecsdtoorc(filename): > file = open(filename,"r") > alllines = file.read_until("</CsInstruments>") > pattern1 = re.compile("</") > orcfilename = filename[-3:] + "orc" > for line in alllines: > if not pattern1 > print >>orcfilename, line > > I am pretty sure my code isn't close to what I want. I need to be able > to skip html like commands from <defined> to <undefined> and to key on > another word in adition to </CsInstruments> to end the routine > > I was also looking at se 2.2 beta but didn't see any easy way to use it > for this or for that matter search and replace where I could just add > it as a menu item and not worry about it. > > thanks for any help in advance
If you're dealing with html or html-like files, do check out beautifulsoup. I had reason to use it the other day and man is it ever useful! Meantime, there are a few minor points about the code you posted: 1) open() defaults to 'r', you can leave it out when you call open() to read a file. 2) 'file' is a builtin type (it's the type of file objects returned by open()) so you shouldn't use it as a variable name. 3) file objects don't have a read_until() method. You could say something like: f = open(filename) lines = [] for line in f: lines.append(line) if '</CsInstruments>' in line: break 4) filename[-3:] will give you the last 3 chars in filename. I'm guessing that you want all but the last 3 chars, that's filename[:-3], but see the os.path.splitext() function, and indeed the other functions in os.path too: http://docs.python.org/lib/module-os.path.html 5) the regular expression objects returned by re.compile() will always evaluate True, so you want to call their search() method on the data to search: if not pattern1.search(line): But, 6) using re for a pattern as simple as "</" is way overkill. Just use 'in' or the find() method of strings: if "</" not in line: or: pos = line.find("</") if pos == -1: print >>orcfilename, line else: print >>orcfilename, line[:pos] 7) the "print >> file" usage requires a file (or file-like object, anything with a write() method I think) not a string. You need to use it like this: orcfile = open(orcfilename, 'w') #... print >> orcfile, line 8) If you have a list of lines anyway, you can use the writelines() method of files to write them in one go: open(orcfilename, 'w').writelines(lines) of course stripping out your unwanted data from that last line using find() as shown above. I hope this helps. Check out the docs on file objects: http://docs.python.org/lib/bltin-file-objects.html, but like I said, if you're dealing with html or html-like files, be sure to check out beautifulsoup. Also, there's the elementtree package for parsing XML that could help here too. ~Simon -- http://mail.python.org/mailman/listinfo/python-list