Sven, Yes, ByteArray>>zipped/unzipped are simple, neat and intuitive way of zipping/unzipping binary data. I also love the new idioms. They look clean and concise.
Best Regards, --- tomo 2019年10月3日(木) 20:14 Sven Van Caekenberghe <s...@stfx.eu>: > > Actually, thinking about this a bit more, why not add #zipped #unzipped to > ByteArray ? > > > ByteArray>>#zipped > "Return a GZIP compressed version of the receiver as a ByteArray" > > ^ ByteArray streamContents: [ :out | > (GZipWriteStream on: out) nextPutAll: self; close ] > > ByteArray>>#unzipped > "Assuming the receiver contains GZIP encoded data, > return the decompressed data as a ByteArray" > > ^ (GZipReadStream on: self) upToEnd > > > The original oneliner then becomes > > 'string' utf8Encoded zipped. > > and > > data unzipped utf8Decoded > > which is pretty clear, simple and intention-revealing, IMHO. > > > On 3 Oct 2019, at 13:04, Sven Van Caekenberghe <s...@stfx.eu> wrote: > > > > Hi Tomo, > > > > Indeed, I stand corrected, it does indeed seem possible to use the existing > > gzip classes to work from bytes to bytes, this works fine: > > > > data := ByteArray streamContents: [ :out | (GZipWriteStream on: out) > > nextPutAll: 'foo 10 €' utf8Encoded; close ]. > > > > (GZipReadStream on: data) upToEnd utf8Decoded. > > > > Now regarding the encoding option, I am not so sure that is really > > necessary (though nice to have). Why would anyone use anything except UTF8 > > (today). > > > > Thanks again for the correction ! > > > > Sven > > > >> On 3 Oct 2019, at 12:41, Tomohiro Oda <tomohiro.tomo....@gmail.com> wrote: > >> > >> Peter and Sven, > >> > >> zip API from string to string works fine except that aWideString > >> zipped generates malformed zip string. > >> I think it might be a good guidance to define > >> String>>zippedWithEncoding: and ByteArray>>unzippedWithEncoding: . > >> Such as > >> String>>zippedWithEncoding: encoder > >> zippedWithEncoding: encoder > >> ^ ByteArray > >> streamContents: [ :stream | > >> | gzstream | > >> gzstream := GZipWriteStream on: stream. > >> encoder > >> next: self size > >> putAll: self > >> startingAt: 1 > >> toStream: gzstream. > >> gzstream close ] > >> > >> and ByteArray>>unzippedWithEncoding: encoder > >> unzippedWithEncoding: encoder > >> | byteStream | > >> byteStream := GZipReadStream on: self. > >> ^ String > >> streamContents: [ :stream | > >> [ byteStream atEnd ] > >> whileFalse: [ stream nextPut: (encoder nextFromStream: > >> byteStream) ] ] > >> > >> Then, you can write something like > >> zipped := yourLongWideString zippedWithEncoding: ZnCharacterEncoder utf8. > >> unzipped := zipped unzippedWithEncoding: ZnCharacterEncoder utf8. > >> > >> This will not affect the existing zipped/unzipped API and you can > >> handle other encodings. > >> This zippedWithEncoding: generates a ByteArray, which is kind of > >> conformant to the encoding API. > >> And you don't have to create many intermediate byte arrays and byte > >> strings. > >> > >> I hope this helps. > >> --- > >> tomo > >> > >> 2019/10/3(Thu) 18:56 Sven Van Caekenberghe <s...@stfx.eu>: > >>> > >>> Hi Peter, > >>> > >>> About #zipped / #unzipped and the inflate / deflate classes: your > >>> observation is correct, these work from string to string, while clearly > >>> the compressed representation should be binary. > >>> > >>> The contents (input, what is inside the compressed data) can be anything, > >>> it is not necessarily a string (it could be an image, so also something > >>> binary). Only the creator of the compressed data knows, you cannot assume > >>> to know in general. > >>> > >>> It would be possible (and it would be very nice) to change this, however > >>> that will have serious impact on users (as the contract changes). > >>> > >>> About your use case: why would your DB not be capable of storing large > >>> strings ? A good DB should be capable of storing any kind of string (full > >>> unicode) efficiently. > >>> > >>> What DB and what sizes are we talking about ? > >>> > >>> Sven > >>> > >>>> On 3 Oct 2019, at 11:06, PBKResearch <pe...@pbkresearch.co.uk> wrote: > >>>> > >>>> Hello > >>>> > >>>> I have a problem with text storage, to which I seem to have found a > >>>> solution, but it’s a bit clumsy-looking. I would be grateful for > >>>> confirmation that (a) there is no neater solution, (b) I can rely on > >>>> this to work – I only know that it works in a few test cases. > >>>> > >>>> I need to store a large number of text strings in a database. To avoid > >>>> the database files becoming too large, I am thinking of zipping the > >>>> strings, or at least the less frequently accessed ones. Depending on the > >>>> source, some of the strings will be instances of ByteString, some of > >>>> WideString (because they contain characters not representable in one > >>>> byte). Storing a WideString uncompressed seems to occupy 4 bytes per > >>>> character, so I decided, before thinking of compression, to store the > >>>> strings utf8Encoded, which yields a ByteArray. But zipped can only be > >>>> applied to a String, not a ByteArray. > >>>> > >>>> So my proposed solution is: > >>>> > >>>> For compression: myZipString := myWideString utf8Encoded > >>>> asString zipped. > >>>> For decompression: myOutputString := myZipString unzipped > >>>> asByteArray utf8Decoded. > >>>> > >>>> As I said, it works in all the cases I tried, whether WideString or not, > >>>> but the chains of transformations look clunky somehow. Can anyone see a > >>>> neater way of doing it? And can I rely on it working, especially when I > >>>> am handling foreign texts with many multi-byte characters? > >>>> > >>>> Thanks in advance for any help. > >>>> > >>>> Peter Kenny > >>> > >>> > >> > > > >