Siemen Stef should have added that XPath depends on using Monty's XMLParser suite. I tried your snippet on XMLDOMParser, and it parses correctly. I always use XMLHTMLParser for parsing HTML, because I can always see the exact relationship between the parsed structure and the original HTML. With Soup I often found the match difficult or even impossible.
HTH Peter Kenny -----Original Message----- From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of Stephane Ducasse Sent: 08 November 2017 21:19 To: Any question about pharo is welcome <pharo-users@lists.pharo.org> Subject: Re: [Pharo-users] Soup bug(fix) Hi Siemen let me know your loging and I can add you to commit. Paul is also taking care of Soup. Now I like XPath for scraping. Did you see the tutorial I wrote with Peter. STef On Wed, Nov 8, 2017 at 2:17 PM, Siemen Baader <siemenbaa...@gmail.com> wrote: > Hi all, > > who maintains Soup, the HTML parser? Stef? > > It seems to auto-close <button> (and <a>) tags when nested inside > another element. I wrote this test that fails: > > testNestedButton > "this works with nested <div> tags instead of <button> and when > there is no enclosing <div> at all. but here <button> is auto-closed." > > "a does not work either" > > | soup | > soup := Soup > fromString: > '<div><button> > <span>text</span> > </button> > </div>'. > self assert: soup div button span string equals: 'text' > > ---- > > > Where should I look to prevent Soup from auto-closing the tag, and > where & how should I submit my fix? > > cheers, > Siemen