On Sat, May 9, 2015 at 8:18 AM, PBKResearch <pe...@pbkresearch.co.uk> wrote:
> Sven > > Many thanks for the quick response. I always like to try to solve problems > myself before appealing for help, so I had worked out what was wrong, but > did not know how to tell Zinc to use a specific coding. I had tried by > reading through your very full note on Zinc, but did not find the trick you > describe - which works perfectly, of course. > > It seems unfortunate that Zinc does not use the coding specified in the > html head. Evidently browsers like Firefox must do it, since the page > displays correctly. If it cannot be done, I think it would be helpful to > reconsider the error message produced when the user is dumped out, because > in this context it is misleading. Now we have moldable tools by default, I wonder if ZnResponse (which I guess typically people will inspect while troubleshooting) might have a tab called something like "Someone Else's Problem" or "Protocol Errors". cheers -ben > I spent some time tracing debugger output, trying to work out what was > wrong with the UTF-8, before I spotted that one of the bytes was displayed > in character form as $ö, and began to suspect it might be a different > coding; I finally confirmed this by reading the page source in Firefox. > > Thanks again for your help. > > Peter Kenny > > -----Original Message----- > From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf > Of Sven Van Caekenberghe > Sent: 08 May 2015 20:04 > To: Any question about pharo is welcome > Subject: Re: [Pharo-users] Problem using Zinc in Pharo 4 (Moose 5.1) > > Peter, > > Thanks for the URL, it makes it much easier to help you. > > The answer is easy: the server is incorrect, it serves a specific encoding > without saying so. > > Consider: > > (ZnClient new > head: ' > http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109&adr=pe...@pbkresearch.co.uk > '; > response) contentType. > > => 'text/html' > > If no charset/encoding is specified, the modern default is UTF-8, so Zn > tries that but fails. > > You can change the default for unspecified encoding as follows: > > ZnDefaultCharacterEncoder > value: ZnByteEncoder iso88591 > during: [ > ZnClient new > get: ' > http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109&adr=pe...@pbkresearch.co.uk' > ]. > > The server should have used the following mime type to avoid the confusion: > > ZnMimeType textHtml charSet: #iso88591 > > => 'text/html;charset=iso88591' > > HTH, > > Sven > > PS: the encoding inside the document cannot be used because (1) no > interpretation inside documents is done and (2) at that point it is too > late, the contents is already converted from bytes to characters > > > On 08 May 2015, at 18:51, PBKResearch <pe...@pbkresearch.co.uk> wrote: > > > > Hello > > > > I have been trying to use Soup class>> fromUrl: to access the contents > of a web page. It halts with a message from Zinc about malformed UTF-8. The > page displays perfectly in Firefox, so I copied the page source from there > to a local file and tried to read it from there. Again a message from Zinc: > 'Invalid utf8 input detected'. It’s strange, because the page is not in > UTF-8. The head contains: <meta content="text/html; charset=ISO-8859-1" > http-equiv="Content-Type">. I have tried to find how to specify the > character set in reading files with Zinc, but without success.* > > > > If it’s relevant, I am using Pharo4.0 Latest update: #40613, downloaded > two days ago. The address of the web page is: > http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109&adr=pe...@pbkresearch.co.uk. > Other pages from the same source are loaded and analysed with no problem. > Processing this page seems to go off course as soon as it encounters the > character code 246, which is a correct o-umlaut in ISO-8859-1. > > > > Any advice gratefully received. > > > > Peter Kenny > > > > *I would be happy with advice to RTFM, if someone would point out the > relevant bit of the FM. > > > >