On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote: > That was something I was thinking about the other day - given than the > bulk of our strings are pure 7-bit ASCII, it might be a worthwhile > optimisation to store a bit that says "this string is 7-bit ASCII", and > then store the string as a sequence of bytes.
Optimisation ? :-) IMHO the ideal is to store all strings as UTF-8 underneath the hatches anyway. All the people I've discussed this with that objected to that, turned out (after some discussion) to have a weak understanding of UTF-8, UTF-16 and of rendering complex text ;-) Of course, perhaps I should discuss with more people. The only problem with a change there is our ABI - which explicitly exposes the encoding of that. > The latest Java VM does this trick internally - it pretends that String > is stored with an array of 16-bit values, but actually it stores them as > UTF-8. Interesting - for all strings ? is there a pointer to the code / docs for that detail somewhere ? :-) Last I looked Java also stored partial strings chained to it's parent; so 'substring' takes a reference on the parent (be it ever so large), and can return a single character string out of it without re-allocation. IIRC that can cause huge grief when parsing big files into little ones ;-) > Even in an app running in a language other than US-English, strings are > used for so many internal things that >90% of the strings are 7-bit ASCII. Sure - so define the define, see what it prints, and do the quick calculation of how much time/space we save by doing it :-) Then again - last I looked we still had some real dumbness that needed hunting down relating to many (tens of?) thousands of allocations and frees of the "/" string at startup ;-) ATB, Michael. -- michael.me...@suse.com <><, Pseudo Engineer, itinerant idiot _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice