Hi, 2013/11/26 Luciano Montanaro <mikel...@gmail.com>: > On Nov 26, 2013 2:07 AM, "Robin Burchell" <robin.burch...@jolla.com> wrote: > [...] > My application too depends on it to scrape data from a web page. I need the > QWebElement interface, otherwise I will need to parse the html on my own. > [...] > Well, access to the DOM model...
Depending on how JavaScript-laden the page you are trying to scrape is, something like BeautifulSoup or Mechanize (both written in Python; the latter one might sound familiar to Perl programmers, it’s designed after WWW:Mechanize) might do the job, and in a more lightweight way (no need to download images or execute JS / layout the page for simple scraping): http://www.crummy.com/software/BeautifulSoup/ http://wwwsearch.sourceforge.net/mechanize/ Of course, this drags in a new dependency that also isn’t supported at the moment (Python), but as mentioned in the announcement[1], "we are actively working on getting Python support into shape”, and once that will be supported (PyOtherSide QML Plugin), it might be easier to integrate and more efficient than moving the whole webpage through a WebView and going through that with the DOM. And if your page is JavaScript-laden, and you can’t parse the static HTML using BeautifulSoup or Mechanize, chances are the data parsed by JavaScript is also available as JSON somewhere (just look into the webpage code / watch the traffic) - and that’ll definitely be easier to parse, too :) HTH :) Thomas [1] https://lists.sailfishos.org/pipermail/devel/2013-November/001319.html _______________________________________________ SailfishOS.org Devel mailing list