Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Paul Rubin
Steve Holden <[EMAIL PROTECTED]> writes: > I especially like the rems and conditions they ask you to acknowledge > if you want to sign up as a worker: >http://www.captchasolver.com/join/worker# Heh, cute, I guess you have to solve a different type of puzzle to read them. I'm surprised anyone

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Steve Holden
Paul Rubin wrote: > "Diez B. Roggisch" <[EMAIL PROTECTED]> writes: >> Obviously this wouldn't really help, as you can't predict what a >> website actually wants which events, in possibly which >> order. Especially if the site does not _want_ to be scrapable- think >> of a simple "click on the image

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread John Nagle
[EMAIL PROTECTED] wrote: > How extract the visible numerical data from this Microsoft financial > web site? > > http://tinyurl.com/yw2w4h > > If you simply download the HTML file you'll see the data is *not* > embedded in it but loaded from some other file. > > Surely if I can see the data in my

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Paul Rubin
"Diez B. Roggisch" <[EMAIL PROTECTED]> writes: > Obviously this wouldn't really help, as you can't predict what a > website actually wants which events, in possibly which > order. Especially if the site does not _want_ to be scrapable- think > of a simple "click on the images in the order of the nu

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Diez B. Roggisch
Paul Rubin schrieb: > "Diez B. Roggisch" <[EMAIL PROTECTED]> writes: >> Nice idea, but not really helpful in the end. Besides the rather nasty >> parts of the DOMs that make JS programming the PITA it is, I think the >> whole event-based stuff makes this basically impossible. > > Obviously the Pyt

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Paul Rubin
"Diez B. Roggisch" <[EMAIL PROTECTED]> writes: > Nice idea, but not really helpful in the end. Besides the rather nasty > parts of the DOMs that make JS programming the PITA it is, I think the > whole event-based stuff makes this basically impossible. Obviously the Python interface would need ways

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Diez B. Roggisch
Paul Rubin schrieb: > "Diez B. Roggisch" <[EMAIL PROTECTED]> writes: >> Still, some pages are AJAX, you won't be able to scrape them easily >> without analyzing the JS code. > > Sooner or later it would be great to have a JS interpreter written in > Python for this purpose. It would do all the sa

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Paul Rubin
"Diez B. Roggisch" <[EMAIL PROTECTED]> writes: > Still, some pages are AJAX, you won't be able to scrape them easily > without analyzing the JS code. Sooner or later it would be great to have a JS interpreter written in Python for this purpose. It would do all the same operations on an HTML/XML D

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Diez B. Roggisch
> It's an AJAX-site. You have to carefully analyze it and see what > actually happens in the javascript, then use that. Maybe something like > the http header plugin for firefox helps you there. ups, obviously I wasn't looking enough at the site. Sorry for the confusion. Still, some pages are

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Max Erickson
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > How extract the visible numerical data from this Microsoft > financial web site? > > http://tinyurl.com/yw2w4h > > If you simply download the HTML file you'll see the data is *not* > embedded in it but loaded from some other file. > > Surely if I

Re: help!! *extra* tricky web page to extract data from...

2007-03-13 Thread Diez B. Roggisch
[EMAIL PROTECTED] schrieb: > How extract the visible numerical data from this Microsoft financial > web site? > > http://tinyurl.com/yw2w4h > > If you simply download the HTML file you'll see the data is *not* > embedded in it but loaded from some other file. > > Surely if I can see the data in

help!! *extra* tricky web page to extract data from...

2007-03-13 Thread [EMAIL PROTECTED]
How extract the visible numerical data from this Microsoft financial web site? http://tinyurl.com/yw2w4h If you simply download the HTML file you'll see the data is *not* embedded in it but loaded from some other file. Surely if I can see the data in my browser I can grab it somehow right in a P