On 01/10/12 13:25, Michael Meeks wrote: > > On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote: >> That was something I was thinking about the other day - given than the >> bulk of our strings are pure 7-bit ASCII, it might be a worthwhile >> optimisation to store a bit that says "this string is 7-bit ASCII", and >> then store the string as a sequence of bytes. > > Optimisation ? :-) IMHO the ideal is to store all strings as UTF-8 > underneath the hatches anyway. All the people I've discussed this with > that objected to that, turned out (after some discussion) to have a weak > understanding of UTF-8, UTF-16 and of rendering complex text ;-) Of > course, perhaps I should discuss with more people. > > The only problem with a change there is our ABI - which explicitly > exposes the encoding of that.
the right time to do it is for LO4. sadly nobody has signed up for that yet :( ... (while there are volunteers for far sillier proposals, like getting rid of com.sun.star...) of course this would only affect C++ binding (and possibly Python -- am not up to date how that does Unicode; there are differences between 2 and 3 iirc; of course we should migrate to Python 3 as well...), while Java binding still uses UTF-16 but i assume we have to copy strings passed over the Java UNO bridge anyway. >> The latest Java VM does this trick internally - it pretends that String >> is stored with an array of 16-bit values, but actually it stores them as >> UTF-8. > > Interesting - for all strings ? is there a pointer to the code / docs > for that detail somewhere ? :-) Last I looked Java also stored partial i would expect they take advantage of JVM's tendency to generate code at runtime to some non-trivial extent :) > strings chained to it's parent; so 'substring' takes a reference on the > parent (be it ever so large), and can return a single character string > out of it without re-allocation. IIRC that can cause huge grief when > parsing big files into little ones ;-) that is a potential advantage of immutable string buffers that afaik we don't take advantage of in LO so far. _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice