[EMAIL PROTECTED] wrote: > hey there, > i am using beautiful soup to parse a few pages (screen scraping) > easy stuff. > the issue i am having is with one particular web page that uses a > javascript to display some numbers in tables. > > now if i open the file in mozilla and "save as" i get the numbers in > the source. cool. but i click on the "view source" or download the url > with urlretrieve, i get the source, but not the numbers. > > is there a way around this ? > > thanks
If the Javascript is automatically generated by the server with the numbers in a known location, you can use a regular expression to extract them. For example, if there's something in the code like: var numbersToDisplay = [123,456,789]; Then you could use: (warning, this is not fully tested): import re js_source = "... the source inside the <script> tag ..." numbers_str = re.search(r'numbersToDisplay = \[([^]]*)\];', \ js_source).group(1) numbers_list = numbers_str.split(",") You'll obviously have to vary this to match your particular script. Bear in mind that this won't work if the values are computed in JavaScript, instead of on the server. If that's the case, then unless you feel like implementing a complete IE- and Mozilla-compatible browser DOM and JavaScript interpreter, you're out of luck. -- David -- http://mail.python.org/mailman/listinfo/python-list