On Oct 12, 2008, at 5:25 AM, S.Selvam Siva wrote:
I have to do a parsing on webpagesand fetch urls.My problem is ,many
urls i
need to parse are dynamically loaded using javascript function
(onload()).How to fetch those links from python? Thanks in advance.
Selvam,
You can try to find them yourself using string parsing, but that's
difficult. The closer you want to get to "perfect" at finding URLs
expressed in JS, the closer you'll get to rewriting a JS interpreter.
For instance, this is not so hard to understand:
"http://example.com/"
but this is:
"http://ZZZ_DOMAIN_ZZZ/index.html".replace(/ZZZ_DOMAIN_ZZZ/,
the_domain_variable)
This is a long-standing problem for any program that parses Web pages.
You either have to embed a JS interpreter in your application or just
ignore the JavaScript. Most Web parsing robots take the latter route.
Good luck
Philip
--
http://mail.python.org/mailman/listinfo/python-list