Hi Monty Just to say that I have obtained the Moose 6.0 image (Pharo5.0 Latest update: #50761) and installed the XMLHTMLParser, and I seem able to reproduce the nucleus of the results I had from my old image. Some of my old XPath strings do not work (e.g. it did not recognise [1]), but I have worked my way round that, and I should soon have worked out the new syntax. Thanks for your suggestions for the variable case; I can now see the way ahead for that. Thanks for all the help.
@stef I have loaded the same version of TextLint as I had in the previous image. It was accepted with no problems. I have tested it for the limited uses I make of it (just the parsers, not the rules) and everything seems OK. So now I am upgraded to Pharo 5, so far with no problems! Peter Kenny -----Original Message----- From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of monty Sent: 03 September 2016 13:00 To: pharo-users@lists.pharo.org Subject: Re: [Pharo-users] Coding XPath as Smalltalk > Sent: Saturday, September 03, 2016 at 5:30 AM > From: PBKResearch <pe...@pbkresearch.co.uk> > To: "'Any question about pharo is welcome'" <pharo-users@lists.pharo.org> > Subject: Re: [Pharo-users] Coding XPath as Smalltalk > > Hi Monty > > Many thanks. I have picked up a project that I had not worked on for a while, > which explains why I am using an old image. I shall try the latest Moose > image, as you suggest. My only anxiety is that I need to be able to use a > rather ancient package called TextLint, and I do not know whether it will > load OK in a new Pharo. If not, I shall try to update my existing image. If you'd looked at CI job, you'd see that XPath builds on Pharo 5 through 3 (but should work back to 1.4). You can always start fresh with a clean, old image from http://files.pharo.org/image/ or the Moose website if TextLint doesn't work anymore. > With the latest XPath, will it be clear how to use the binary syntax to carry > out node tests like the example of '//div[@id=''catlinks'']//' that I cited > below? The case I am interested in is where the actual identifier ('catlinks' > in this case) is a variable rather than a constant. It would be possible to > do it in standard XPath by assembling the XPath string with a variable > component, but it might be more convenient in the binary syntax. > You could do this: ((doc // 'div') select: [:each | (each attributeAt: 'id') = catlinks]) // 'li' // 'text()' where "catlinks" is a var. Or you could use xPath:context: with an XPath var that you dynamically bind using custom contexts: doc xPath: '//div[@id=$catlinks]//li//text()' context: (XPathContext variables: {'catlinks' -> catlinks}) The advantage over this: doc xPath: '//div[@id=''', catlinks, ''']//li//text()' is that the xPath: expression string is the same each time, so it's only compiled once, the first time, and cached for later uses (inspect 'XPath compiledXPathCache') instead of being compiled each time the xPath: expression string arg changes. > Many thanks for your help. > > Peter Kenny > > -----Original Message----- > From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of > monty > Sent: 03 September 2016 06:54 > To: pharo-users@lists.pharo.org > Subject: Re: [Pharo-users] Coding XPath as Smalltalk > > Peter, you're using an ancient version with bugs that were fixed last fall. > The newest version has more tests and correct behavior (checked against a > reference implementation). Just download a new Moose image and you'll get it, > along with an up to date XMLParser. (But if you insist on upgrading in your > old image, run "XPath initialize" after) > > The binary syntax (there are keyword equivalents now) officially only > supports XPath axis selectors like #/ and #// that take node test arguments > where the node tests can be name tests like 'name,' '*', 'prefix:*' or type > tests like 'text()', 'comment()', 'element(name)'. > > Filters aren't officially supported with that syntax, but you can always use > select: on the result. ?? was removed, but I might add it back as shorthand. > Filters are implemented differently now. > > > From: PBKResearch <pe...@pbkresearch.co.uk> > > To: pharo-users@lists.pharo.org > > Subject: [Pharo-users] Coding XPath as Smalltalk > > > > Hello > > > > I am using XPath as a way of dissecting web pages, especially from > > Wiktionary. Generally I get good results, but I could get useful extra > > flexibility by using the binary Smalltalk operators to represent XPath, as > > mentioned at the end of the class comment for XPath. However, the > > description there is very terse, and I am having difficulty seeing how to > > include more complex expressions, especially attribute tests. I have put > > some of my XPath expressions through the XPath compiler and looked at the > > output, and out of that I have found expressions which work but look very > > clumsy. As an example, I have used the fragment: > > > > document xPath: '//div[@id=''catlinks'']//li//text()' > > > > and found that an equivalent is: > > > > document //'div' ?? [:node :x :y|(node attributeAt: 'id') = > > 'catlinks']//'li'//[:n| n isStringNode]]. > > (I had to put two dummy arguments in the three-argument block to get it to > > work.) > > > > Is there a more extensive explanation of the use of these binary operators? > > If not, could some kind person show me the most concise translation of the > > sample XPath above, to give me a start in working out more complex cases? > > > > Many thanks for any help. > > > > Peter Kenny > > >