Even though I've never tried it, you may want to look into running the html 
thru a separate javascript engine, like spidermonkey or rhino, and then parse 
the results of that.

On Friday, February 11, 2011 2:20:32 AM UTC-6, yanghq wrote:
> hi,
>     I wanna get attribute value like href,src... in html.
> 
>     for simple html page libxml2dom can help me parse it into dom, and
> get what  I want;
> 
>     but for some pages rendered by js, like:
> 
> document.write(
> '<frameset border="0" frameborder="no" rows="0,*,0" onLoad="start()"
> onUnload="end()" onResize="change()">'+
>   '<frameset border="0" frameborder="no" cols="*,*,*,*,*,0">'+
> '<frame name="cfgFrame" noresize scrolling="no"
> src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> '<frame name="mboxFrame" noresize scrolling="no"
> src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> '<frame name="cmdFrame" noresize scrolling="no"
> src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> '<frame name="msgFrame" noresize scrolling="no"
> src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> '<frame name="pabFrame" noresize scrolling="no"
> src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> '<frame name="cnFrame" noresize scrolling="no" src="../frame.html?' +
> main.clientargs + '">'+
>   ''+
>   '<frame name="mailFrame" marginwidth="0" marginheight="0" noresize
> src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> '<frame name="appletFrame" marginwidth="0" marginheight="0" noresize
> src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+
> ''
> )
> how can I get the atrribute value of 'src', thank you for any help.
> 


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to