page = """ <a keypress="1" href="kp1.html">This is some text 1</a> <a keypress="2" href="kp2.html">This is some text 2</a> <a keypress="3" href="kp3.html">This is some text 3</a> """
html = TAG(page) print html.element('a',_keypress="1")[0] print html.element('a',_keypress="2")[0] print html.element('a',_keypress="3")[0] This should work but web2py parser is based on the built-in Python XML parser which chokes in two cases: invalid XML, non utf8 characters. Massimo On Saturday, 5 January 2013 13:46:49 UTC-6, rh wrote: > > Hello, > > After using fetch to get a web page in any of xhtml,html,xml and using the > functions/features of web2py what is the preferred and most future-proof > way to extract the data from that page? > > Suppose the page has these within: > > <a keypress="1" href="kp1.html">This is some text 1</a> > <a keypress="2" href="kp2.html">This is some text 2</a> > <a keypress="3" href="kp3.html">This is some text 3</a> > > And if I use web2py what's the preferred way to do it? > > use web2pyHTMLParser or use DIV, etc. > An example would be very helpful too. > > > Should I do this purely with python? > > --