Monty

As an update, I have rebuilt from the Moose 6.0 download. The version of 
XML-Parser in that was dated 18 July 2016 (configuration monty.233), so I 
installed versions of XML-Parser-HTML and XML-Parser-StAX contemporary with 
that. (The respective configurations are monty.48 and monty.39). With these 
versions all my previous XMLHTMLParser operations work as before, and I have 
been able to use the StAX parser in a simple way. So I can start exploring as I 
intended.

I have made repeated attempts to update this rebuilt image to more recent 
versions of the HTML and StAX parsers, and every time I run into the same error 
reported below. I started from the latest version and worked backwards, but 
gave up quickly; it takes about 6 minutes on my machine to load and compile a 
version, and it soon gets tedious. If I feel more enthusiastic tomorrow, I 
might start working forwards from my current versions.

Anyway, I now have a working system with the StaX and HTML parsers, so I can 
continue to explore.

Best wishes

Peter Kenny

-----Original Message-----
From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of 
PBKResearch
Sent: 15 May 2017 20:44
To: 'Any question about pharo is welcome' <pharo-users@lists.pharo.org>
Subject: Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 
encoding

Monty

I have just started trying to use the StAX parsers, and I have found that the 
update has introduced a problem, which means that XMLHTMLParser no longer works 
on examples I have used before. I updated to 
ConfigurationOfXMLParser(monty.302), which is the latest version on the 
smalltalkhub repository, and then used the load version in the class comment, 
which loads the stable default. Similarly, I loaded 
ConfigurationOfXMLParserHTML(monty.62) and 
ConfigurationOfXMLParserStAX(monty.51), again using stable and default. When I 
try to run the XMLHTMLParser example I quoted below, I get an error message 
'MessageNotunderstood: receiver of "critical:" is nil'. The same message comes 
up with anything else I try with XMLHTMLParser or with StAXHTMLParser.

I am not really up to using the debugger on someone else's code, but the one 
thing I can see is that the problem lies in XMLKeyValueCache>>critical:, which 
has the code:
^ self mutex critical: aBlock
The problem being that mutex is nil. 

In my enthusiasm, I saved the updated image with the same name as the old 
image, which is now therefore overwritten. If I cannot solve this problem, my 
only way out is to rebuild my image from the Moose 6.0 download. Any 
suggestions gratefully received.

Thanks in advance

Peter Kenny

-----Original Message-----
From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of 
PBKResearch
Sent: 15 May 2017 19:16
To: 'Any question about pharo is welcome' <pharo-users@lists.pharo.org>
Subject: Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 
encoding

Monty

Many thanks for this. My original purpose was just to answer Paul deBruicker's 
query, namely to parse an html file and stop reading at the end of the <head> 
section. I solved this by trial and error using the code shown below ( which 
actually stops at the opening tag of the body). This was not my problem at all, 
but Paul's; I just tackled it for fun.

However, you note has prompted me to update my version of the whole XML system 
- I was using the version I downloaded with Moose 6.0, which was dated August 
2016. I am looking at the StAX parsers as a possible way of simplifying what I 
currently do, which involves downloading an entire web page as a DOM and then 
manipulating it with XPath to extract the bits I am interested in. I may be 
able to use StAX to do some of the selection and manipulation as I am reading.

It's all a new topic to me, so I foresee a lot of experimentation. It all helps 
to keep the grey matter active.

Thanks again

Peter Kenny

-----Original Message-----
From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of 
monty
Sent: 15 May 2017 12:15
To: pharo-users@lists.pharo.org
Subject: Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 
encoding

For that kind of incremental parsing, you could also use XMLParserStAX, a 
pull-parser that parses a document as a stream of event objects you control 
with #next, #peek, and #atEnd. It also supports pull-DOM parsing with messages 
like #nextNode, #nextElement, and #nextElementNamed:, which return the next 
event object(s) as DOM subtrees (searchable with XPath). See the StAXParser 
class comment for an example. (The StAXHTMLParser class requires XMLParserHTML 
be installed to work.)




Reply via email to