Hello

 

I have been trying to use Soup class>> fromUrl: to access the contents of a
web page. It halts with a message from Zinc about malformed UTF-8. The page
displays perfectly in Firefox, so I copied the page source from there to a
local file and tried to read it from there. Again a message from Zinc:
'Invalid utf8 input detected'. It's strange, because the page is not in
UTF-8. The head contains: <meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">. I have tried to find how to specify the
character set in reading files with Zinc, but without success.*

 

If it's relevant, I am using Pharo4.0 Latest update: #40613, downloaded two
days ago. The address of the web page is:
http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109
<http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109&adr=
pe...@pbkresearch.co.uk> &adr=pe...@pbkresearch.co.uk. Other pages from the
same source are loaded and analysed with no problem. Processing this page
seems to go off course as soon as it encounters the character code 246,
which is a correct o-umlaut in ISO-8859-1.

 

Any advice gratefully received.

 

Peter Kenny

 

*I would be happy with advice to RTFM, if someone would point out the
relevant bit of the FM.

Reply via email to