Sean I used Soup a few times, but found it difficult to interpret the output, because the parse did not seem to reflect the hierarchy of the nodes in the original; in particular, sibling nodes were not necessarily at the same level in the Soup. XMLHTMLParser always gets the structure right, in my experience. I think this is essential if you want to use Xpath to process the parse. The worked examples in the scraping booklet show how the parser and Xpath can work together.
HTH Peter Kenny -----Original Message----- From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Sean P. DeNigris Sent: 30 November 2019 16:43 To: pharo-users@lists.pharo.org Subject: Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub cedreek wrote > To me, far better than using Soup. Ah, interesting! I use Soup almost exclusively. What did you find superior about XMLParserHTML? I may give it a try... cedreek wrote > Google chrome pharo integration helps top to scrap complex full JS web > site like google ;) Also interesting! Any publicly available examples? How does one load "Google chrome pharo integration"? Also, there is often the "poor man's" way (albeit requiring manual intervention) by inspecting the Ajax http requests in a developer console and then recreating directly in Pharo. ----- Cheers, Sean -- Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html