Re: HTML to dictionary

bearophileHUGS Tue, 27 Feb 2007 03:16:06 -0800

Tina I:
> I have a small, probably trivial even, problem. I have the following HTML:


This is a little data munging problem.
If it's a one-shot problem, then you can just load it with a browser,
copy and paste it as text, and then process the lines of the text in a
simple way (splitting lines according to ":", and using the stripped
pairs to feed a dict).

If there are more Html files, or you want to automate things more, you
can use html2text:
http://www.aaronsw.com/2002/html2text/

A little script like this may help you:

from html2text import html2text
txt = html2text(the_html_data)
lines = str(txt).replace("**", "").strip().splitlines()
fields = [[field.strip() for field in line.split(":")] for line in
lines]
print dict(fields)

Note that splitlines() is tricky, if you find some problems, then you
may want a smarter splitter.

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTML to dictionary

Reply via email to