Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 encoding

Udo Schneider Fri, 12 May 2017 11:49:53 -0700

Hi Sven,

I didn't tell the whole truth :-)

I'm /mainly/ parsing the header (extracting published dates). For somesites however I have to resort to finding a date in the body.


CU,

Udo


Am 12/05/17 um 09:03 schrieb Norbert Hartl:

Just to mention. If you are not interested in the content body you could do a 
HEAD request instead of GET.

Norbert

Am 11.05.2017 um 22:44 schrieb Udo Schneider <udo.schnei...@homeaddress.de>:

Hi Sven,

that's perfect. To be honest I don't care about the content - I'm just parsing 
the header. And even if there is a wrong decoding in there... I can live with 
that.

Thank you very very much! For your help but also your stuff in general.

CU,

Udo

Am 11/05/17 um 22:35 schrieb Sven Van Caekenberghe:
Hi Udo,

On 11 May 2017, at 21:37, Udo Schneider <udo.schnei...@homeaddress.de> wrote:

All,

I'm hitting an error where fetching web content fails. The website does indeed 
use invalid characters.

The easiest way to reproduce:

ZnEasy get: 
'http://www.darkreading.com/partner-perspectives/malwarebytes/locky-returns-with-a-new-(borrowed)-distribution-method/a/d-id/1328723'

Is there any way to tell Zinc to simply ignore that error and to continue?

CU,

Udo

That server/page has a mime-type text/plain with no explicit encoding (charset) 
setting, so we have to guess. Like utf-8, pure latin1/iso88591 does not work. 
The following does work, but you can't be sure everything went well (beLenient 
takes some bytes as they are).
ZnDefaultCharacterEncoder
   value: ZnCharacterEncoder latin1 beLenient
   during: [
     ZnClient new
       get: 
'http://www.darkreading.com/partner-perspectives/malwarebytes/locky-returns-with-a-new-(borrowed)-distribution-method/a/d-id/1328723';
       yourself ].
I added some API earlier today, so that the following should also work (you 
need to load Zn #bleedingEdge first).
  ZnClient new
   defaultEncoder: ZnCharacterEncoder latin1 beLenient;
   get: 
'http://www.darkreading.com/partner-perspectives/malwarebytes/locky-returns-with-a-new-(borrowed)-distribution-method/a/d-id/1328723';
   yourself.
HTH,
Regards,
Sven

Re: [Pharo-users] [Zinc] ZnInvalidUTF8: Illegal leading byte for utf-8 encoding

Reply via email to