[Pharo-users] Problems loading XML System ( was Re: [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 encoding)

PBKResearch Mon, 15 May 2017 15:52:23 -0700

Monty

As an update, I have rebuilt from the Moose 6.0 download. The version of 
XML-Parser in that was dated 18 July 2016 (configuration monty.233), so I 
installed versions of XML-Parser-HTML and XML-Parser-StAX contemporary with 
that. (The respective configurations are monty.48 and monty.39). With these 
versions all my previous XMLHTMLParser operations work as before, and I have 
been able to use the StAX parser in a simple way. So I can start exploring as I 
intended.


I have made repeated attempts to update this rebuilt image to more recent 
versions of the HTML and StAX parsers, and every time I run into the same error 
reported below. I started from the latest version and worked backwards, but 
gave up quickly; it takes about 6 minutes on my machine to load and compile a 
version, and it soon gets tedious. If I feel more enthusiastic tomorrow, I 
might start working forwards from my current versions.

Anyway, I now have a working system with the StaX and HTML parsers, so I can 
continue to explore.

Best wishes

Peter Kenny

-----Original Message-----
From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of 
PBKResearch
Sent: 15 May 2017 20:44
To: 'Any question about pharo is welcome' <pharo-users@lists.pharo.org>
Subject: Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 
encoding

Monty

I have just started trying to use the StAX parsers, and I have found that the 
update has introduced a problem, which means that XMLHTMLParser no longer works 
on examples I have used before. I updated to 
ConfigurationOfXMLParser(monty.302), which is the latest version on the 
smalltalkhub repository, and then used the load version in the class comment, 
which loads the stable default. Similarly, I loaded 
ConfigurationOfXMLParserHTML(monty.62) and 
ConfigurationOfXMLParserStAX(monty.51), again using stable and default. When I 
try to run the XMLHTMLParser example I quoted below, I get an error message 
'MessageNotunderstood: receiver of "critical:" is nil'. The same message comes 
up with anything else I try with XMLHTMLParser or with StAXHTMLParser.

I am not really up to using the debugger on someone else's code, but the one 
thing I can see is that the problem lies in XMLKeyValueCache>>critical:, which 
has the code:
^ self mutex critical: aBlock
The problem being that mutex is nil. 

In my enthusiasm, I saved the updated image with the same name as the old 
image, which is now therefore overwritten. If I cannot solve this problem, my 
only way out is to rebuild my image from the Moose 6.0 download. Any 
suggestions gratefully received.

Thanks in advance

Peter Kenny

-----Original Message-----
From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of 
PBKResearch
Sent: 15 May 2017 19:16
To: 'Any question about pharo is welcome' <pharo-users@lists.pharo.org>
Subject: Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 
encoding

Monty

Many thanks for this. My original purpose was just to answer Paul deBruicker's 
query, namely to parse an html file and stop reading at the end of the <head> 
section. I solved this by trial and error using the code shown below ( which 
actually stops at the opening tag of the body). This was not my problem at all, 
but Paul's; I just tackled it for fun.

However, you note has prompted me to update my version of the whole XML system 
- I was using the version I downloaded with Moose 6.0, which was dated August 
2016. I am looking at the StAX parsers as a possible way of simplifying what I 
currently do, which involves downloading an entire web page as a DOM and then 
manipulating it with XPath to extract the bits I am interested in. I may be 
able to use StAX to do some of the selection and manipulation as I am reading.

It's all a new topic to me, so I foresee a lot of experimentation. It all helps 
to keep the grey matter active.

Thanks again

Peter Kenny

-----Original Message-----
From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of 
monty
Sent: 15 May 2017 12:15
To: pharo-users@lists.pharo.org
Subject: Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 
encoding

For that kind of incremental parsing, you could also use XMLParserStAX, a 
pull-parser that parses a document as a stream of event objects you control 
with #next, #peek, and #atEnd. It also supports pull-DOM parsing with messages 
like #nextNode, #nextElement, and #nextElementNamed:, which return the next 
event object(s) as DOM subtrees (searchable with XPath). See the StAXParser 
class comment for an example. (The StAXHTMLParser class requires XMLParserHTML 
be installed to work.)

[Pharo-users] Problems loading XML System ( was Re: [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 encoding)

Reply via email to