Hi alistair this is cool. Do you have one little example so that we can see how we can use it?
Stef On Sat, Nov 11, 2017 at 4:38 PM, Alistair Grant <akgrant0...@gmail.com> wrote: > On 9 November 2017 at 00:00, Kjell Godo <squeakl...@gmail.com> wrote: >> i like to collect some newspaper comics from an online newspaper >> but it takes really long to do it by hand by hand >> i tried Soup but i didn’t get anywhere >> the pictures were hidden behind a script or something >> is there anything to do about that? > > Most of the web pages I want to scrape use javascript to construct the > DOM, which makes Soup. XMLHTMLParser, etc. useless. > > I've extended Torsten's Pharo-Chrome library and use that to navigate > the DOM in a way similar to Soup: > > https://github.com/akgrant43/Pharo-Chrome > > This gets around the issue with javascript since it waits for the > browser to load the page, run the javascript and construct the DOM. > > HTH, > Alistair > > > >> i don’t want to collect them all >> i have the XPath .pdf but i haven’t read it yet >> >> these browsers seem to gobble up memory >> and while open they just keep getting bigger till the OS session crash >> might there be a browser that is more minimal? >> >> Vivaldi seems better at not bloating up RAM >