Hi Sven The DB system I am using at the moment is OmniBase - despite Todd Blanchard's warning, I have decided to experiment with it. It has the advantage of being fully object based, though I am not yet using anything more elaborate than strings, dictionaries and arrays as the data types. One secondary advantage is that I still use Dolphin occasionally, and my version of Dolphin 6.1 comes with OmniBase built in. I have checked that an OmniBase DB built in Pharo can be read in Dolphin.
As to size, there is no problem with storing large strings in OmniBase, except for the amount of disk space occupied in total. I am looking far ahead - my toy development DB is only about 15MB, but if I get to where I want to be, it could be tens of GB. With modern machines this may not be a problem, but I thought there might come a time when I want to think about trade-offs between storage space and unzipping time. I had a few qualms when I looked inside my development DB; it seems that an OmniBase DB consists of a few smallish index files and one ginormous file called 'objects'. I am not sure how the OS will get on with a huge single file. But all this is speculative at the moment. For now I shall continue with storing the strings unzipped (but utf8Encoded - thanks for such a neat facility), bearing in mind that if I need to save space later, my method as described will work. Peter Kenny -----Original Message----- From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Sven Van Caekenberghe Sent: 03 October 2019 10:56 To: Any question about pharo is welcome <pharo-users@lists.pharo.org> Subject: Re: [Pharo-users] How to zip a WideString Hi Peter, About #zipped / #unzipped and the inflate / deflate classes: your observation is correct, these work from string to string, while clearly the compressed representation should be binary. The contents (input, what is inside the compressed data) can be anything, it is not necessarily a string (it could be an image, so also something binary). Only the creator of the compressed data knows, you cannot assume to know in general. It would be possible (and it would be very nice) to change this, however that will have serious impact on users (as the contract changes). About your use case: why would your DB not be capable of storing large strings ? A good DB should be capable of storing any kind of string (full unicode) efficiently. What DB and what sizes are we talking about ? Sven > On 3 Oct 2019, at 11:06, PBKResearch <pe...@pbkresearch.co.uk> wrote: > > Hello > > I have a problem with text storage, to which I seem to have found a solution, > but it’s a bit clumsy-looking. I would be grateful for confirmation that (a) > there is no neater solution, (b) I can rely on this to work – I only know > that it works in a few test cases. > > I need to store a large number of text strings in a database. To avoid the > database files becoming too large, I am thinking of zipping the strings, or > at least the less frequently accessed ones. Depending on the source, some of > the strings will be instances of ByteString, some of WideString (because they > contain characters not representable in one byte). Storing a WideString > uncompressed seems to occupy 4 bytes per character, so I decided, before > thinking of compression, to store the strings utf8Encoded, which yields a > ByteArray. But zipped can only be applied to a String, not a ByteArray. > > So my proposed solution is: > > For compression: myZipString := myWideString utf8Encoded asString > zipped. > For decompression: myOutputString := myZipString unzipped asByteArray > utf8Decoded. > > As I said, it works in all the cases I tried, whether WideString or not, but > the chains of transformations look clunky somehow. Can anyone see a neater > way of doing it? And can I rely on it working, especially when I am handling > foreign texts with many multi-byte characters? > > Thanks in advance for any help. > > Peter Kenny