Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

PBKResearch Sat, 30 Nov 2019 16:01:03 -0800

Sean

I used Soup a few times, but found it difficult to interpret the output,
because the parse did not seem to reflect the hierarchy of the nodes in the
original; in particular, sibling nodes were not necessarily at the same
level in the Soup. XMLHTMLParser always gets the structure right, in my
experience. I think this is essential if you want to use Xpath to process
the parse. The worked examples in the scraping booklet show how the parser
and Xpath can work together.

HTH

Peter Kenny

-----Original Message-----
From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Sean P.
DeNigris
Sent: 30 November 2019 16:43
To: pharo-users@lists.pharo.org
Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

cedreek wrote
> To me, far better than using Soup. 

Ah, interesting! I use Soup almost exclusively. What did you find superior
about XMLParserHTML? I may give it a try...

cedreek wrote
> Google chrome pharo integration helps top to scrap complex full JS web 
> site like google ;)

Also interesting! Any publicly available examples? How does one load "Google
chrome pharo integration"? Also, there is often the "poor man's" way (albeit
requiring manual intervention) by inspecting the Ajax http requests in a
developer console and then recreating directly in Pharo.

-----
Cheers,
Sean
--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

Reply via email to