Even though I've never tried it, you may want to look into running the html thru a separate javascript engine, like spidermonkey or rhino, and then parse the results of that.
On Friday, February 11, 2011 2:20:32 AM UTC-6, yanghq wrote: > hi, > I wanna get attribute value like href,src... in html. > > for simple html page libxml2dom can help me parse it into dom, and > get what I want; > > but for some pages rendered by js, like: > > document.write( > '<frameset border="0" frameborder="no" rows="0,*,0" onLoad="start()" > onUnload="end()" onResize="change()">'+ > '<frameset border="0" frameborder="no" cols="*,*,*,*,*,0">'+ > '<frame name="cfgFrame" noresize scrolling="no" > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+ > '<frame name="mboxFrame" noresize scrolling="no" > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+ > '<frame name="cmdFrame" noresize scrolling="no" > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+ > '<frame name="msgFrame" noresize scrolling="no" > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+ > '<frame name="pabFrame" noresize scrolling="no" > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+ > '<frame name="cnFrame" noresize scrolling="no" src="../frame.html?' + > main.clientargs + '">'+ > ''+ > '<frame name="mailFrame" marginwidth="0" marginheight="0" noresize > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+ > '<frame name="appletFrame" marginwidth="0" marginheight="0" noresize > src="../frame.html?rtfPossible=' + rtfPossibleString + '">'+ > '' > ) > how can I get the atrribute value of 'src', thank you for any help. > -- http://mail.python.org/mailman/listinfo/python-list