> On 20 Jul 2021, at 11:03, Sven Van Caekenberghe <s...@stfx.eu> wrote:
>
> Hi Tim,
>
> An introduction to this part of the system is in
> https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html
> [Character Encoding and Resource Meta Description] from the "Enterprise
> Pharo" book.
>
> The error means that a file that you try to read as UTF-8 does contain things
> that are invalid with respect to the UTF-8 standard.
>
> Are you sure the file is in UTF-8, maybe it is in ASCII, Latin-1 or something
> else ?
>
> It is possible to customise the encoding to something different than the
> default UTF-8. For non-UTF encoders, there is a strict/lenient option to
> disallow/allow illegal stuff (but then you will get these in your strings).
>
> I can show you how to do that if you want.
'/var/log/system.log' asFileReference readStreamDo: [ :in | in upToEnd ].
'/var/log/system.log' asFileReference binaryReadStreamDo: [ :in |
(ZnCharacterReadStream on: in encoding: #ascii) upToEnd ].
'/var/log/system.log' asFileReference binaryReadStreamDo: [ :in |
(ZnCharacterReadStream on: in encoding: ZnCharacterEncoder ascii
beLenient) upToEnd ].
HTH
> Sven
>
>> On 20 Jul 2021, at 10:31, Tim Mackinnon <tim@testit.works> wrote:
>>
>> Hi - I’m doing a bit of log file processing with Pharo - and I’ve hit an
>> unexpected error and am wondering what the best way to approach it is.
>>
>> It seems that I have a log file that has unexpected characters, and so my
>> readStream loop that reads lines gets an error: "ZnInvalidUTF8: Illegal
>> continuation byte for utf-8 encoding”.
>>
>> For some reason this file (unlike my others) seems to contain characters
>> that it shouldn’t - but what is the best way for me to continue processing?
>> Should I be opening my files in a different way - or can I resume the error
>> somehow- I’m not familiar with this area of Pharo and am after a bit of
>> advice.
>>
>> My code is like this (and I get the error when doing nextLine)
>>
>>
>> parseStream: aFileStream with: aBlock
>> | line items |
>> [ (line := aFileStream nextLine) isNil ]
>> whileFalse: [
>> items := $/ split: line.
>> items size = 3 ifTrue: [aBlock value: items]]
>>
>> My stream is created like this:
>>
>> firmEfs := (pathName , '/' , firmName , '_files') asFileReference.
>> details parseStream: firmEfs readStream.
>>
>>
>> Should I be opening the stream a bit differently - or can I catch that
>> encoding error and resume it with some safe character?
>>
>> Thanks for any help.
>>
>> Tim
>