Thanks Guille for this great post! I will turn it into a doc :)
On Mon, Mar 19, 2018 at 6:10 PM, Guillermo Polito <guillermopol...@gmail.com > wrote: > Well, I'd say it we did it in the name of modularity. And yes, I believe > that having separate responsibilities help in designing, testing and > ensuring more easily the correctness of each of the parts in isolation. > > I've done also some profiling and it does not look like we've lost in > performance either (reading and decoding a 35MB file): > > [file := Smalltalk sourcesFile fullName. > (File named: file) readStreamDo: [ :binaryFile | > (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) encoding: > 'utf8') next: binaryFile size. > ]] timeToRun. "0:00:00:01.976" > > [file := Smalltalk sourcesFile fullName. > (MultiByteFileStream fileNamed: file) > converter: (TextConverter newForEncoding: 'utf8'); > upToEnd > ] timeToRun. "0:00:00:02.147" > > > > On Mon, Mar 19, 2018 at 5:51 PM, Richard O'Keefe <rao...@gmail.com> wrote: > >> The idea of stacking things like character coding and buffering &c >> by using layers of wrapping is very much in the spirit of Java and >> is neither good for performance nor helpful for correctness. >> >> >> On 20 March 2018 at 05:19, Guillermo Polito <guillermopol...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I've put some minutes summarizing the new APIs provided by the >>> combination of the new File implementation and the Zn encoders. They all >>> basically follow the decorator pattern to stack different responsibilities >>> such as buffering, encoding, line ending convertions. >>> >>> Please, do not hesitate to give your feedback. >>> >>> Guille >>> >>> >>> 1. Basic Files >>> >>> By default files are binary. Not buffered. >>> >>> (File named: 'name') readStream. >>> (File named: 'name') readStreamDo: [ :stream | ... ]. >>> (File named: 'name') writeStream. >>> (File named: 'name') writeStreamDo: [ :stream | ... ]. >>> >>> >>> 2. Encoding >>> >>> To add encoding, wrap a stream with a corresponding >>> ZnCharacterRead/WriteStream. >>> >>> "Reading" >>> utf8Encoded := ZnCharacterReadStream on: aBinaryStream encoding: 'utf8'. >>> utf16Encoded := ZnCharacterReadStream on: aBinaryStream encoding: >>> 'utf16'. >>> >>> "Writing" >>> utf8Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: 'utf8'. >>> utf16Encoded := ZnCharacterWriteStream on: aBinaryStream encoding: >>> 'utf16'. >>> >>> 3. Buffering >>> >>> To add buffering, wrap a stream with a corresponding >>> ZnBufferedRead/WriteStream. >>> >>> bufferedReadStream := ZnBufferedReadStream on: aStream. >>> bufferedWriteStream := ZnBufferedWriteStream on: aStream. >>> >>> It is in general better to buffer the reading on the binary file and >>> apply the encoding on the buffer in memory than the other way around. See >>> >>> [file := Smalltalk sourcesFile fullName. >>> (File named: file) readStreamDo: [ :binaryFile | >>> (ZnCharacterReadStream on: (ZnBufferedReadStream on: binaryFile) >>> encoding: 'utf8') upToEnd >>> ]] timeToRun. "0:00:00:09.288" >>> >>> [file := Smalltalk sourcesFile fullName. >>> (File named: file) readStreamDo: [ :binaryFile | >>> (ZnBufferedReadStream on: (ZnCharacterReadStream on: binaryFile >>> encoding: 'utf8')) upToEnd >>> ]] timeToRun. "0:00:00:14.189" >>> >>> 4. File System >>> >>> By default, file system files are buffered and utf8 encoded to keep >>> backwards compatibility. >>> >>> 'name' asFileReference readStreamDo: [ :bufferedUtf8Stream | ... ]. >>> 'name' asFileReference writeStreamDo: [ :bufferedUtf8Stream | ... ]. >>> >>> FileStream also provides access to plain binary files using the >>> #binaryRead/WriteStream messages. Binary streams are buffered by default >>> also. >>> >>> 'name' asFileReference binaryReadStreamDo: [ :bufferedBinaryStream | ... >>> ]. >>> 'name' asFileReference binaryWriteStreamDo: [ :bufferedBinaryStream | >>> ... ]. >>> >>> If you want a file with another encoding (to come in the PR >>> https://github.com/pharo-project/pharo/pull/1134), you can specify it >>> while obtaining the stream: >>> >>> 'name' asFileReference >>> readStreamEncoded: 'utf16' >>> do: [ :bufferedUtf16Stream | ... ]. >>> >>> 'name' asFileReference >>> writeStreamEncoded: 'utf8' >>> do: [ :bufferedUtf16Stream | ... ]. >>> >>> 5. Line Ending Conventions >>> >>> If you want to write files following a specific line ending convention, >>> use the ZnNewLineWriterStream. >>> This stream decorator will transform any line ending (cr, lf, crlf) into >>> a defined line ending. >>> By default it chooses the platform line ending convention. >>> >>> lineWriter := ZnNewLineWriterStream on: aStream. >>> >>> If you want to choose another line ending convention you can do: >>> >>> lineWriter forCr. >>> lineWriter forLf. >>> lineWriter forCrLf. >>> lineWriter forPlatformLineEnding. >>> >>> -- >>> >>> >>> >>> Guille Polito >>> >>> Research Engineer >>> >>> Centre de Recherche en Informatique, Signal et Automatique de Lille >>> >>> CRIStAL - UMR 9189 >>> >>> French National Center for Scientific Research - *http://www.cnrs.fr >>> <http://www.cnrs.fr>* >>> >>> >>> *Web:* *http://guillep.github.io* <http://guillep.github.io> >>> >>> *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013> >>> >> >> > > > -- > > > > Guille Polito > > Research Engineer > > Centre de Recherche en Informatique, Signal et Automatique de Lille > > CRIStAL - UMR 9189 > > French National Center for Scientific Research - *http://www.cnrs.fr > <http://www.cnrs.fr>* > > > *Web:* *http://guillep.github.io* <http://guillep.github.io> > > *Phone: *+33 06 52 70 66 13 <+33%206%2052%2070%2066%2013> >