Hi Sven,
I didn't tell the whole truth :-)
I'm /mainly/ parsing the header (extracting published dates). For some
sites however I have to resort to finding a date in the body.
CU,
Udo
Am 12/05/17 um 09:03 schrieb Norbert Hartl:
Just to mention. If you are not interested in the content body you could do a
HEAD request instead of GET.
Norbert
Am 11.05.2017 um 22:44 schrieb Udo Schneider <udo.schnei...@homeaddress.de>:
Hi Sven,
that's perfect. To be honest I don't care about the content - I'm just parsing
the header. And even if there is a wrong decoding in there... I can live with
that.
Thank you very very much! For your help but also your stuff in general.
CU,
Udo
Am 11/05/17 um 22:35 schrieb Sven Van Caekenberghe:
Hi Udo,
On 11 May 2017, at 21:37, Udo Schneider <udo.schnei...@homeaddress.de> wrote:
All,
I'm hitting an error where fetching web content fails. The website does indeed
use invalid characters.
The easiest way to reproduce:
ZnEasy get:
'http://www.darkreading.com/partner-perspectives/malwarebytes/locky-returns-with-a-new-(borrowed)-distribution-method/a/d-id/1328723'
Is there any way to tell Zinc to simply ignore that error and to continue?
CU,
Udo
That server/page has a mime-type text/plain with no explicit encoding (charset)
setting, so we have to guess. Like utf-8, pure latin1/iso88591 does not work.
The following does work, but you can't be sure everything went well (beLenient
takes some bytes as they are).
ZnDefaultCharacterEncoder
value: ZnCharacterEncoder latin1 beLenient
during: [
ZnClient new
get:
'http://www.darkreading.com/partner-perspectives/malwarebytes/locky-returns-with-a-new-(borrowed)-distribution-method/a/d-id/1328723';
yourself ].
I added some API earlier today, so that the following should also work (you
need to load Zn #bleedingEdge first).
ZnClient new
defaultEncoder: ZnCharacterEncoder latin1 beLenient;
get:
'http://www.darkreading.com/partner-perspectives/malwarebytes/locky-returns-with-a-new-(borrowed)-distribution-method/a/d-id/1328723';
yourself.
HTH,
Regards,
Sven