Re: [Pharo-users] Soup bug(fix)

Stephane Ducasse Sun, 12 Nov 2017 05:46:00 -0800

Hi alistair

this is cool.
Do you have one little example so that we can see how we can use it?


Stef


On Sat, Nov 11, 2017 at 4:38 PM, Alistair Grant <akgrant0...@gmail.com> wrote:
> On 9 November 2017 at 00:00, Kjell Godo <squeakl...@gmail.com> wrote:
>> i like to collect some newspaper comics from an online newspaper
>>      but it takes really long to do it by hand by hand
>> i tried Soup but i didn’t get anywhere
>>      the pictures were hidden behind a script or something
>> is there anything to do about that?
>
> Most of the web pages I want to scrape use javascript to construct the
> DOM, which makes Soup. XMLHTMLParser, etc. useless.
>
> I've extended Torsten's Pharo-Chrome library and use that to navigate
> the DOM in a way similar to Soup:
>
> https://github.com/akgrant43/Pharo-Chrome
>
> This gets around the issue with javascript since it waits for the
> browser to load the page, run the javascript and construct the DOM.
>
> HTH,
> Alistair
>
>
>
>>         i don’t want to collect them all
>> i have the XPath .pdf but i haven’t read it yet
>>
>> these browsers seem to gobble up memory
>>      and while open they just keep getting bigger till the OS session crash
>>      might there be a browser that is more minimal?
>>
>> Vivaldi seems better at not bloating up RAM
>

Re: [Pharo-users] Soup bug(fix)

Reply via email to