exampleNavigation | chrome page logger | logger := InMemoryLogger new. logger start. chrome := GoogleChrome new debugOn; debugSession; open; yourself. page := chrome tabPages first. page enablePage. page enableDOM. page navigateTo: 'http://pharo.org'. page getDocument. page getMissingChildren. page updateTitle. logger stop. ^{ chrome. page. logger. }
but in fact I realised that I would like to a simple doc :) On Sun, Nov 12, 2017 at 2:44 PM, Stephane Ducasse <stepharo.s...@gmail.com> wrote: > Hi alistair > > this is cool. > Do you have one little example so that we can see how we can use it? > > Stef > > > On Sat, Nov 11, 2017 at 4:38 PM, Alistair Grant <akgrant0...@gmail.com> wrote: >> On 9 November 2017 at 00:00, Kjell Godo <squeakl...@gmail.com> wrote: >>> i like to collect some newspaper comics from an online newspaper >>> but it takes really long to do it by hand by hand >>> i tried Soup but i didn’t get anywhere >>> the pictures were hidden behind a script or something >>> is there anything to do about that? >> >> Most of the web pages I want to scrape use javascript to construct the >> DOM, which makes Soup. XMLHTMLParser, etc. useless. >> >> I've extended Torsten's Pharo-Chrome library and use that to navigate >> the DOM in a way similar to Soup: >> >> https://github.com/akgrant43/Pharo-Chrome >> >> This gets around the issue with javascript since it waits for the >> browser to load the page, run the javascript and construct the DOM. >> >> HTH, >> Alistair >> >> >> >>> i don’t want to collect them all >>> i have the XPath .pdf but i haven’t read it yet >>> >>> these browsers seem to gobble up memory >>> and while open they just keep getting bigger till the OS session crash >>> might there be a browser that is more minimal? >>> >>> Vivaldi seems better at not bloating up RAM >>