thank you both - in the end I used recode, which I wasn't aware of. Fredrik, I had come across your script while googling for solutions, but failed to make it work....
On Dec 13, 2:21 pm, "Fredrik Lundh" <[EMAIL PROTECTED]> wrote: > "ardief" wrote: > > sorry if I'm asking something very obvious but I'm stumped. I have a > > text that looks like this: > > > Sentence 401 > > 4.00pm — We set off again; this time via Tony's home to collect > > a variety of possessions, finally arriving at hospital no.3. > > Sentence 402 > > 4.55pm — Tony is ushered into a side ward with three doctors and > > I stay outside with Mum. > > > And I want the HTML char codes to turn into their equivalent plain > > text. I've looked at the newsgroup archives, the cookbook, the web in > > general and can't manage to sort it out. > > file = open('filename', 'r') > > ofile = open('otherfile', 'w') > > > done = 0 > > > while not done: > > line = file.readline() > > if 'THE END' in line: > > done = 1 > > elif '—' in line: > > line.replace('—', '--')this returns a new line; it doesn't > > update the line in place. > > > ofile.write(line) > > else: > > ofile.write(line)for a more general solution to the actual replace > > problem, see: > > http://effbot.org/zone/re-sub.htm#unescape-html > > you may also want to lookup the "fileinput" module in the library reference > manual. > > </F> -- http://mail.python.org/mailman/listinfo/python-list