Hi Sven

The DB system I am using at the moment is OmniBase - despite Todd Blanchard's 
warning, I have decided to experiment with it. It has the advantage of being 
fully object based, though I am not yet using anything more elaborate than 
strings, dictionaries and arrays as the data types. One secondary advantage is 
that I still use Dolphin occasionally, and my version of Dolphin 6.1 comes with 
OmniBase built in. I have checked that an OmniBase DB built in Pharo can be 
read in Dolphin.

As to size, there is no problem with storing large strings in OmniBase, except 
for the amount of disk space occupied in total. I am looking far ahead - my toy 
development DB is only about 15MB, but if I get to where I want to be, it could 
be tens of GB. With modern machines this may not be a problem, but I thought 
there might come a time when I want to think about trade-offs between storage 
space and unzipping time. I had a few qualms when I looked inside my 
development DB; it seems that an OmniBase DB consists of a few smallish index 
files and one ginormous file called 'objects'. I am not sure how the OS will 
get on with a huge single file.

But all this is speculative at the moment. For now I shall continue with 
storing the strings unzipped (but utf8Encoded - thanks for such a neat 
facility), bearing in mind that if I need to save space later, my method as 
described will work.

Peter Kenny

-----Original Message-----
From: Pharo-users <pharo-users-boun...@lists.pharo.org> On Behalf Of Sven Van 
Caekenberghe
Sent: 03 October 2019 10:56
To: Any question about pharo is welcome <pharo-users@lists.pharo.org>
Subject: Re: [Pharo-users] How to zip a WideString

Hi Peter,

About #zipped / #unzipped and the inflate / deflate classes: your observation 
is correct, these work from string to string, while clearly the compressed 
representation should be binary.

The contents (input, what is inside the compressed data) can be anything, it is 
not necessarily a string (it could be an image, so also something binary). Only 
the creator of the compressed data knows, you cannot assume to know in general.

It would be possible (and it would be very nice) to change this, however that 
will have serious impact on users (as the contract changes).

About your use case: why would your DB not be capable of storing large strings 
? A good DB should be capable of storing any kind of string (full unicode) 
efficiently.

What DB and what sizes are we talking about ?

Sven

> On 3 Oct 2019, at 11:06, PBKResearch <pe...@pbkresearch.co.uk> wrote:
> 
> Hello
>  
> I have a problem with text storage, to which I seem to have found a solution, 
> but it’s a bit clumsy-looking. I would be grateful for confirmation that (a) 
> there is no neater solution, (b) I can rely on this to work – I only know 
> that it works in a few test cases.
>  
> I need to store a large number of text strings in a database. To avoid the 
> database files becoming too large, I am thinking of zipping the strings, or 
> at least the less frequently accessed ones. Depending on the source, some of 
> the strings will be instances of ByteString, some of WideString (because they 
> contain characters not representable in one byte). Storing a WideString 
> uncompressed seems to occupy 4 bytes per character, so I decided, before 
> thinking of compression, to store the strings utf8Encoded, which yields a 
> ByteArray. But zipped can only be applied to a String, not a ByteArray.
>  
> So my proposed solution is:
>  
> For compression:             myZipString := myWideString utf8Encoded asString 
> zipped.
> For decompression:         myOutputString := myZipString unzipped asByteArray 
> utf8Decoded.
>  
> As I said, it works in all the cases I tried, whether WideString or not, but 
> the chains of transformations look clunky somehow. Can anyone see a neater 
> way of doing it? And can I rely on it working, especially when I am handling 
> foreign texts with many multi-byte characters?
>  
> Thanks in advance for any help.
>  
> Peter Kenny



Reply via email to