Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread PBKResearch
Sean I used Soup a few times, but found it difficult to interpret the output, because the parse did not seem to reflect the hierarchy of the nodes in the original; in particular, sibling nodes were not necessarily at the same level in the Soup. XMLHTMLParser always gets the structure right, in my

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Cédrick Béler
I couldn’t get it from Zn as (I think) there are some js lib that defer the full rendering. I have the same problem with a site in France (leboncoin). They use https://datadome.co to complicate webscrapping. So an headless browser is the only solution I know. Cheers, C

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Esteban Maringolo
Why use Chrome instead of ZnClient? To get a "real" render of the content? (including JS and whatnot). Regards! Esteban A. Maringolo On Sat, Nov 30, 2019 at 8:11 PM Cédrick Béler wrote: > > > > > > Also interesting! Any publicly available examples? How does one load "Google > > chrome pharo in

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Cédrick Béler
> > Also interesting! Any publicly available examples? How does one load "Google > chrome pharo integration »? "https://github.com/astares/Pharo-Chrome"; "https://github.com/akgrant43/Pharo-Chrome » Cheers, Cédrick

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Cédrick Béler
> cedreek wrote >> To me, far better than using Soup. > > Ah, interesting! I use Soup almost exclusively. What did you find superior > about XMLParserHTML? I may give it a try... > It’s mainly xpath which I find easier than navigating the html tree with soup or even The xmlHtmlparser. I us

Re: [Pharo-users] [ANN] XMLParserHTML moved to GitHub

2019-11-30 Thread Sean P. DeNigris
cedreek wrote > To me, far better than using Soup. Ah, interesting! I use Soup almost exclusively. What did you find superior about XMLParserHTML? I may give it a try... cedreek wrote > Google chrome pharo integration helps top to scrap complex full JS web > site like google ;) Also interestin