On Sat, May 9, 2015 at 8:18 AM, PBKResearch <pe...@pbkresearch.co.uk> wrote:

> Sven
>
> Many thanks for the quick response. I always like to try to solve problems
> myself before appealing for help, so I had worked out what was wrong, but
> did not know how to tell Zinc to use a specific coding. I had tried by
> reading through your very full note on Zinc, but did not find the trick you
> describe - which works perfectly, of course.
>
> It seems unfortunate that Zinc does not use the coding specified in the
> html head. Evidently browsers like Firefox must do it, since the page
> displays correctly. If it cannot be done, I think it would be helpful to
> reconsider the error message produced when the user is dumped out, because
> in this context it is misleading.


Now we have moldable tools by default, I wonder if ZnResponse (which I
guess typically people will inspect while troubleshooting) might have a tab
called something like  "Someone Else's Problem" or "Protocol Errors".
cheers -ben



> I spent some time tracing debugger output, trying to work out what was
> wrong with the UTF-8, before I spotted that one of the bytes was displayed
> in character form as $ö, and began to suspect it might be a different
> coding; I finally confirmed this by reading the page source in Firefox.
>
> Thanks again for your help.
>
> Peter Kenny
>
> -----Original Message-----
> From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf
> Of Sven Van Caekenberghe
> Sent: 08 May 2015 20:04
> To: Any question about pharo is welcome
> Subject: Re: [Pharo-users] Problem using Zinc in Pharo 4 (Moose 5.1)
>
> Peter,
>
> Thanks for the URL, it makes it much easier to help you.
>
> The answer is easy: the server is incorrect, it serves a specific encoding
> without saying so.
>
> Consider:
>
> (ZnClient new
>    head: '
> http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109&adr=pe...@pbkresearch.co.uk
> ';
>    response) contentType.
>
>  => 'text/html'
>
> If no charset/encoding is specified, the modern default is UTF-8, so Zn
> tries that but fails.
>
> You can change the default for unspecified encoding as follows:
>
> ZnDefaultCharacterEncoder
>   value: ZnByteEncoder iso88591
>   during: [
>     ZnClient new
>       get: '
> http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109&adr=pe...@pbkresearch.co.uk'
> ].
>
> The server should have used the following mime type to avoid the confusion:
>
> ZnMimeType textHtml charSet: #iso88591
>
>   => 'text/html;charset=iso88591'
>
> HTH,
>
> Sven
>
> PS: the encoding inside the document cannot be used because (1) no
> interpretation inside documents is done and (2) at that point it is too
> late, the contents is already converted from bytes to characters
>
> > On 08 May 2015, at 18:51, PBKResearch <pe...@pbkresearch.co.uk> wrote:
> >
> > Hello
> >
> > I have been trying to use Soup class>> fromUrl: to access the contents
> of a web page. It halts with a message from Zinc about malformed UTF-8. The
> page displays perfectly in Firefox, so I copied the page source from there
> to a local file and tried to read it from there. Again a message from Zinc:
> 'Invalid utf8 input detected'. It’s strange, because the page is not in
> UTF-8. The head contains: <meta content="text/html; charset=ISO-8859-1"
> http-equiv="Content-Type">. I have tried to find how to specify the
> character set in reading files with Zinc, but without success.*
> >
> > If it’s relevant, I am using Pharo4.0 Latest update: #40613, downloaded
> two days ago. The address of the web page is:
> http://kompakt.handelsblatt-service.com/ff/display.php?msgID=725164109&adr=pe...@pbkresearch.co.uk.
> Other pages from the same source are loaded and analysed with no problem.
> Processing this page seems to go off course as soon as it encounters the
> character code 246, which is a correct o-umlaut in ISO-8859-1.
> >
> > Any advice gratefully received.
> >
> > Peter Kenny
> >
> > *I would be happy with advice to RTFM, if someone would point out the
> relevant bit of the FM.
>
>
>
>

Reply via email to