In another thread (on SVG Icons) Sven referred to ways of getting input from a URL for XMLDOMParser. I have recently had some problems doing this. I have found a workaround, so it is not urgent, but I thought I should put it on record in case anyone else is bitten by it, and so maybe Monty can look at it.
I am using the subclass XMLHTMLParser, and my usual way of invoking it was: 1. XMLHTMLParser parseURL: <urlstring>. This works in most cases, but with one particular site - http://www.corriere.it/....., which is an Italian newspaper - I had frequent failures, with the error message 'Invalid UTF8 encoding'. The parser has the option of parsing a string, which is obtained by other means, so I tried reading it in with Zinc: 2. XMLHTMLParser parse: <urlstring> asZnUrl retrieveContents. And this worked, so clearly the encoding on the site is OK. I realised that the XML-Parser package has its own methods, which reproduce a lot of the functionality of Zinc, so I tried the equivalent: 3. XMLHTMLParser parse: <urlstring> asXMLURI get. To my surprise, this worked equally well. I had expected problems, because presumably forms (1) and (3) use the same UTF8 decoding. For now, I am using the form (3) for all my work, and have not had any problems since. So the message to anyone who is using the form (1) and getting problems is to try (2) or (3) instead. I am using Moose 6.1 (Pharo 6.0 Latest update: #60486) on Windows 10. I think most articles on the Corriere web site will generate the error, but one which has always failed for me is: http://www.corriere.it/esteri/17_ottobre_03/felipe-spagna-catalogna-discorso -8f7ac0d6-a86d-11e7-a090-96160224e787.shtml I tried to trace through the error using the debugger, but it got too confusing. However, I did establish that the failure occurred early in decoding the HTML <head>, in the line beginning <meta name="description".. The only unusual thing at this point is the French-style open-quote: '<'. Whether this could explain the problem I don't know. Any suggestions gratefully received. Peter Kenny