Re: [Pharo-users] Soup bug(fix)

Stephane Ducasse Sun, 12 Nov 2017 05:48:13 -0800

exampleNavigation
| chrome page logger |
logger := InMemoryLogger new.
logger start.
chrome := GoogleChrome new
debugOn;
debugSession;
open;
yourself.
page := chrome tabPages first.
page enablePage.
page enableDOM.
page navigateTo: 'http://pharo.org'.
page getDocument.
page getMissingChildren.
page updateTitle.
logger stop.
^{ chrome. page. logger. }


but in fact I realised that I would like to a simple doc :)


On Sun, Nov 12, 2017 at 2:44 PM, Stephane Ducasse
<stepharo.s...@gmail.com> wrote:
> Hi alistair
>
> this is cool.
> Do you have one little example so that we can see how we can use it?
>
> Stef
>
>
> On Sat, Nov 11, 2017 at 4:38 PM, Alistair Grant <akgrant0...@gmail.com> wrote:
>> On 9 November 2017 at 00:00, Kjell Godo <squeakl...@gmail.com> wrote:
>>> i like to collect some newspaper comics from an online newspaper
>>>      but it takes really long to do it by hand by hand
>>> i tried Soup but i didn’t get anywhere
>>>      the pictures were hidden behind a script or something
>>> is there anything to do about that?
>>
>> Most of the web pages I want to scrape use javascript to construct the
>> DOM, which makes Soup. XMLHTMLParser, etc. useless.
>>
>> I've extended Torsten's Pharo-Chrome library and use that to navigate
>> the DOM in a way similar to Soup:
>>
>> https://github.com/akgrant43/Pharo-Chrome
>>
>> This gets around the issue with javascript since it waits for the
>> browser to load the page, run the javascript and construct the DOM.
>>
>> HTH,
>> Alistair
>>
>>
>>
>>>         i don’t want to collect them all
>>> i have the XPath .pdf but i haven’t read it yet
>>>
>>> these browsers seem to gobble up memory
>>>      and while open they just keep getting bigger till the OS session crash
>>>      might there be a browser that is more minimal?
>>>
>>> Vivaldi seems better at not bloating up RAM
>>

Re: [Pharo-users] Soup bug(fix)

Reply via email to