Thanks Guille for this great post!
I will turn it into a doc :)


On Mon, Mar 19, 2018 at 6:10 PM, Guillermo Polito <guillermopol...@gmail.com
> wrote:

> Well, I'd say it we did it in the name of modularity. And yes, I believe
> that having separate responsibilities help in designing, testing and
> ensuring more easily the correctness of each of the parts in isolation.
>
> I've done also some profiling and it does not look like we've lost in
> performance either (reading and decoding a 35MB file):
>
> [file := Smalltalk sourcesFile fullName.
> (File named: file) readStreamDo: [ :binaryFile |
> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding:
> 'utf8') next: binaryFile size.
> ]] timeToRun. "0:00:00:01.976"
>
> [file := Smalltalk sourcesFile fullName.
> (MultiByteFileStream fileNamed: file)
> converter: (TextConverter newForEncoding: 'utf8');
> upToEnd
> ] timeToRun. "0:00:00:02.147"
>
>
>
> On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe <rao...@gmail.com> wrote:
>
>> The idea of stacking things like character coding and buffering &c
>> by using layers of wrapping is very much in the spirit of Java and
>> is neither good for performance nor helpful for correctness.
>>
>>
>> On 20 March 2018 at 05:19, Guillermo Polito <guillermopol...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I've put some minutes summarizing the new APIs provided by the
>>> combination of the new File implementation and the Zn encoders. They all
>>> basically follow the decorator pattern to stack different responsibilities
>>> such as buffering, encoding, line ending convertions.
>>>
>>> Please, do not hesitate to give your feedback.
>>>
>>> Guille
>>>
>>>
>>> 1. Basic Files
>>>
>>> By default files are binary. Not buffered.
>>>
>>> (File named: 'name') readStream.
>>> (File named: 'name') readStreamDo: [ :stream | ... ].
>>> (File named: 'name') writeStream.
>>> (File named: 'name') writeStreamDo: [ :stream | ... ].
>>>
>>>
>>> 2. Encoding
>>>
>>> To add encoding, wrap a stream with a corresponding
>>> ZnCharacterRead/WriteStream.
>>>
>>> "Reading"
>>> utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'.
>>> utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding:
>>> 'utf16'.
>>>
>>> "Writing"
>>> utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'.
>>> utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding:
>>> 'utf16'.
>>>
>>> 3. Buffering
>>>
>>> To add buffering, wrap a stream with a corresponding
>>> ZnBufferedRead/WriteStream.
>>>
>>> bufferedReadStream := ZnBufferedReadStream on: aStream.
>>> bufferedWriteStream := ZnBufferedWriteStream on: aStream.
>>>
>>> It is in general better to buffer the reading on the binary file and
>>> apply the encoding on the buffer in memory than the other way around. See
>>>
>>> [file := Smalltalk sourcesFile fullName.
>>> (File named: file) readStreamDo: [ :binaryFile |
>>> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile)
>>> encoding: 'utf8') upToEnd
>>> ]] timeToRun. "0:00:00:09.288"
>>>
>>> [file := Smalltalk sourcesFile fullName.
>>> (File named: file) readStreamDo: [ :binaryFile |
>>> (ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile
>>> encoding: 'utf8')) upToEnd
>>> ]] timeToRun. "0:00:00:14.189"
>>>
>>> 4. File System
>>>
>>> By default, file system files are buffered and utf8 encoded to keep
>>> backwards compatibility.
>>>
>>> 'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ].
>>> 'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ].
>>>
>>> FileStream also provides access to plain binary files using the
>>> #binaryRead/WriteStream messages. Binary streams are buffered by default
>>> also.
>>>
>>> 'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ...
>>> ].
>>> 'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream |
>>> ... ].
>>>
>>> If you want a file with another encoding (to come in the PR
>>> https://github.com/pharo-project/pharo/pull/1134), you can specify it
>>> while obtaining the stream:
>>>
>>> 'name' asFileReference
>>>     readStreamEncoded: 'utf16'
>>>     do: [ :bufferedUtf16Stream | ... ].
>>>
>>> 'name' asFileReference
>>>     writeStreamEncoded: 'utf8'
>>>     do: [ :bufferedUtf16Stream | ... ].
>>>
>>> 5. Line Ending Conventions
>>>
>>> If you want to write files following a specific line ending convention,
>>> use the ZnNewLineWriterStream.
>>> This stream decorator will transform any line ending (cr, lf, crlf) into
>>> a defined line ending.
>>> By default it chooses the platform line ending convention.
>>>
>>> lineWriter := ZnNewLineWriterStream on: aStream.
>>>
>>> If you want to choose another line ending convention you can do:
>>>
>>> lineWriter forCr.
>>> lineWriter forLf.
>>> lineWriter forCrLf.
>>> lineWriter forPlatformLineEnding.
>>>
>>> --
>>>
>>>
>>>
>>> Guille Polito
>>>
>>> Research Engineer
>>>
>>> Centre de Recherche en Informatique, Signal et Automatique de Lille
>>>
>>> CRIStAL - UMR 9189
>>>
>>> French National Center for Scientific Research - *http://www.cnrs.fr
>>> <http://www.cnrs.fr>*
>>>
>>>
>>> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>>>
>>> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>>>
>>
>>
>
>
> --
>
>
>
> Guille Polito
>
> Research Engineer
>
> Centre de Recherche en Informatique, Signal et Automatique de Lille
>
> CRIStAL - UMR 9189
>
> French National Center for Scientific Research - *http://www.cnrs.fr
> <http://www.cnrs.fr>*
>
>
> *Web:* *http://guillep.github.io* <http://guillep.github.io>
>
> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013>
>

Reply via email to