Hi Cédrik

 

I started out using Soup, but I found out that it does what its name suggests, 
and jumbles up the contents of the pages. I now parse the pages with 
XMLHTMLParser, which preserves the original structure exactly. The point of 
XPath is that it is a convenient way of specifying a route through the 
structure to the desired information. So the XPath I cited says ‘find a DIV 
node, at any depth, which has id=”catlinks”, then find a descendant which is a 
LI node, then find the text of any descendants.’

 

Peter

 

 

Hi Peter,

 

Never used Path so I cannot help there. I just wander if you connote use Soup 
to « dissect » your webpages ?

http://www.smalltalkhub.com/#!/~PharoExtras/Soup

 

HTH,

 

Cédrik

 

Le 1 sept. 2016 à 15:26, PBKResearch <pe...@pbkresearch.co.uk 
<mailto:pe...@pbkresearch.co.uk> > a écrit :

 

Hello

 

I am using XPath as a way of dissecting web pages, especially from Wiktionary. 
Generally I get good results, but I could get useful extra flexibility by using 
the binary Smalltalk operators to represent XPath, as mentioned at the end of 
the class comment for XPath. However, the description there is very terse, and 
I am having difficulty seeing how to include more complex expressions, 
especially attribute tests. I have put some of my XPath expressions through the 
XPath compiler and looked at the output, and out of that I have found 
expressions which work but look very clumsy. As an example, I have used the 
fragment:

 

document xPath: '//div[@id=''catlinks'']//li//text()'

 

and found that an equivalent is:

 

document //'div' ?? [:node :x :y|(node attributeAt: 'id') = 
'catlinks']//'li'//[:n| n isStringNode]].

(I had to put two dummy arguments in the three-argument block to get it to 
work.) 

 

Is there a more extensive explanation of the use of these binary operators? If 
not, could some kind person show me the most concise translation of the sample 
XPath above, to give me a start in working out more complex cases?

 

Many thanks for any help.

 

Peter Kenny

 

Reply via email to