Re: [Pharo-users] Pharo-Chrome (was: Soup bug(fix))

Stephane Ducasse Sun, 12 Nov 2017 07:07:20 -0800

Tx and one day we can turn it into another little booklet :)

Stef


On Sun, Nov 12, 2017 at 3:04 PM, Alistair Grant <akgrant0...@gmail.com> wrote:
> Hi Stef,
>
> On 12 November 2017 at 14:47, Stephane Ducasse <stepharo.s...@gmail.com> 
> wrote:
>> exampleNavigation
>> | chrome page logger |
>> logger := InMemoryLogger new.
>> logger start.
>> chrome := GoogleChrome new
>> debugOn;
>> debugSession;
>> open;
>> yourself.
>> page := chrome tabPages first.
>> page enablePage.
>> page enableDOM.
>> page navigateTo: 'http://pharo.org'.
>> page getDocument.
>> page getMissingChildren.
>> page updateTitle.
>> logger stop.
>> ^{ chrome. page. logger. }
>>
>> but in fact I realised that I would like to a simple doc :)
>>
>>
>> On Sun, Nov 12, 2017 at 2:44 PM, Stephane Ducasse
>> <stepharo.s...@gmail.com> wrote:
>>> Hi alistair
>>>
>>> this is cool.
>>> Do you have one little example so that we can see how we can use it?
>>>
>>> Stef
>
> Fair enough :-)
>
> I'll try and extend the readme to include some basic documentation.
>
> Cheers,
> Alistair
>
>
>
>>> On Sat, Nov 11, 2017 at 4:38 PM, Alistair Grant <akgrant0...@gmail.com> 
>>> wrote:
>>>> On 9 November 2017 at 00:00, Kjell Godo <squeakl...@gmail.com> wrote:
>>>>> i like to collect some newspaper comics from an online newspaper
>>>>>      but it takes really long to do it by hand by hand
>>>>> i tried Soup but i didn’t get anywhere
>>>>>      the pictures were hidden behind a script or something
>>>>> is there anything to do about that?
>>>>
>>>> Most of the web pages I want to scrape use javascript to construct the
>>>> DOM, which makes Soup. XMLHTMLParser, etc. useless.
>>>>
>>>> I've extended Torsten's Pharo-Chrome library and use that to navigate
>>>> the DOM in a way similar to Soup:
>>>>
>>>> https://github.com/akgrant43/Pharo-Chrome
>>>>
>>>> This gets around the issue with javascript since it waits for the
>>>> browser to load the page, run the javascript and construct the DOM.
>>>>
>>>> HTH,
>>>> Alistair
>>>>
>>>>
>>>>
>>>>>         i don’t want to collect them all
>>>>> i have the XPath .pdf but i haven’t read it yet
>>>>>
>>>>> these browsers seem to gobble up memory
>>>>>      and while open they just keep getting bigger till the OS session 
>>>>> crash
>>>>>      might there be a browser that is more minimal?
>>>>>
>>>>> Vivaldi seems better at not bloating up RAM
>>>>
>>
>

Re: [Pharo-users] Pharo-Chrome (was: Soup bug(fix))

Reply via email to