[web2py] Re: preferred solution for parsing {xht,ht,x}ml

Massimo Di Pierro Sat, 05 Jan 2013 16:53:27 -0800

page = """
<a keypress="1" href="kp1.html">This is some text 1</a> 
<a keypress="2" href="kp2.html">This is some text 2</a> 
<a keypress="3" href="kp3.html">This is some text 3</a> 
"""


html = TAG(page)
print html.element('a',_keypress="1")[0]
print html.element('a',_keypress="2")[0]
print html.element('a',_keypress="3")[0]

This should work but web2py parser is based on the built-in Python XML 
parser which chokes in two cases: invalid XML, non utf8 characters.

Massimo



On Saturday, 5 January 2013 13:46:49 UTC-6, rh wrote:
>
> Hello, 
>
> After using fetch to get a web page in any of xhtml,html,xml and using the 
> functions/features of web2py what is the preferred and most future-proof 
> way to extract the data from that page? 
>
> Suppose the page has these within: 
>
> <a keypress="1" href="kp1.html">This is some text 1</a> 
> <a keypress="2" href="kp2.html">This is some text 2</a> 
> <a keypress="3" href="kp3.html">This is some text 3</a> 
>
> And if I use web2py what's the preferred way to do it? 
>
> use web2pyHTMLParser or use DIV, etc. 
> An example would be very helpful too. 
>
>
> Should I do this purely with python? 
>
>

--

[web2py] Re: preferred solution for parsing {xht,ht,x}ml

Reply via email to