Re: parsing javascript

Philip Semanchuk Sun, 12 Oct 2008 07:29:04 -0700


On Oct 12, 2008, at 5:25 AM, S.Selvam Siva wrote:

I have to do a parsing on webpagesand fetch urls.My problem is ,manyurls i
need to parse are dynamically loaded using javascript function
(onload()).How to fetch those links from python? Thanks in advance.


Selvam,

You can try to find them yourself using string parsing, but that'sdifficult. The closer you want to get to "perfect" at finding URLsexpressed in JS, the closer you'll get to rewriting a JS interpreter.For instance, this is not so hard to understand:

   "http://example.com/";
but this is:

"http://ZZZ_DOMAIN_ZZZ/index.html".replace(/ZZZ_DOMAIN_ZZZ/,the_domain_variable)

This is a long-standing problem for any program that parses Web pages.You either have to embed a JS interpreter in your application or justignore the JavaScript. Most Web parsing robots take the latter route.


Good luck
Philip
--
http://mail.python.org/mailman/listinfo/python-list

Re: parsing javascript

Reply via email to