Re: optimising OUString for space

2012-10-02 Thread Lionel Elie Mamane
On Mon, Oct 01, 2012 at 01:58:24PM +0200, Michael Stahl wrote: > On 01/10/12 13:25, Michael Meeks wrote: >> On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote: >>> That was something I was thinking about the other day - given than >>> the bulk of our strings are pure 7-bit ASCII, it might be a

Re: optimising OUString for space

2012-10-01 Thread Stephan Bergmann
On 10/01/2012 05:29 PM, Norbert Thiebaud wrote: removal of the need to have 2 set of classes (one for SBCS and one for UCS-2) Note that it might still be useful to keep a "heavyweight" distinction (like different sets of classes) between the concepts of "byte-serializations of sequences of en

Re: optimising OUString for space

2012-10-01 Thread Norbert Thiebaud
On Mon, Oct 1, 2012 at 9:05 AM, Stephan Bergmann wrote: > Note that in the common case of accessing (i.e., searching for, etc.) 7-bit > ASCII content in a string, regardless of whether it is internally > represented as UTF-8 or UTF-16, going via an operator[] interface that > operates directly on

Re: optimising OUString for space

2012-10-01 Thread Stephan Bergmann
On 10/01/2012 01:47 PM, Michael Stahl wrote: ... which brings me to another point: in a hypothetical future when we could efficiently create a UTF8String from a string literal in C++ without copying the darn thing, what should hypothetical operations to mutate the string's buffer do? If we cont

Re: optimising OUString for space

2012-10-01 Thread Stephan Bergmann
On 10/01/2012 02:45 PM, Michael Stahl wrote: On 01/10/12 14:23, Noel Grandin wrote: On 2012-10-01 13:58, Michael Stahl wrote: The only problem with a change there is our ABI - which explicitly exposes the encoding of that. the right time to do it is for LO4. sadly nobody has signed up

Re: optimising OUString for space

2012-10-01 Thread Michael Meeks
On Mon, 2012-10-01 at 14:23 +0200, Noel Grandin wrote: > Perhaps we need to split out some preparatory tasks? > For example > - fix code that directly accesses the underlying buffer Sure ! > - create an external iterator class (which would currently be a thin > wrapper around int) f

Re: optimising OUString for space

2012-10-01 Thread Noel Grandin
On 2012-10-01 14:45, Michael Stahl wrote: guess you could comment out operator[], that should find lots of convertible call sites Don't we have some kind of deprecated warning system in C++ ? Would be less disruptive :-) Disclaimer: http://www.peralex.com/disclaimer.html _

Re: optimising OUString for space

2012-10-01 Thread Michael Stahl
On 01/10/12 14:23, Noel Grandin wrote: > > On 2012-10-01 13:58, Michael Stahl wrote: >> On 01/10/12 13:25, Michael Meeks wrote: >> >> The only problem with a change there is our ABI - which explicitly >> exposes the encoding of that. >> the right time to do it is for LO4. sadly nobody has si

Re: optimising OUString for space

2012-10-01 Thread Noel Grandin
On 2012-10-01 13:58, Michael Stahl wrote: On 01/10/12 13:25, Michael Meeks wrote: The only problem with a change there is our ABI - which explicitly exposes the encoding of that. the right time to do it is for LO4. sadly nobody has signed up for that yet :( ... (while there are volunte

Re: optimising OUString for space

2012-10-01 Thread Noel Grandin
On 2012-10-01 14:06, Michael Stahl wrote: Or are you talking about memory management? The current OUString class allocates a new character buffer for every mutation, I assume we'd keep that strategy. you mean if i have some string and then add !a" "b" "c" to it it will re-allocate 3 times? tha

Re: optimising OUString for space

2012-10-01 Thread Michael Stahl
On 01/10/12 13:55, Noel Grandin wrote: > > On 2012-10-01 13:47, Michael Stahl wrote: >> ... which brings me to another point: in a hypothetical future when we >> could efficiently create a UTF8String from a string literal in C++ >> without copying the darn thing, what should hypothetical operati

Re: optimising OUString for space

2012-10-01 Thread Michael Stahl
On 01/10/12 13:25, Michael Meeks wrote: > > On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote: >> That was something I was thinking about the other day - given than the >> bulk of our strings are pure 7-bit ASCII, it might be a worthwhile >> optimisation to store a bit that says "this string

Re: optimising OUString for space

2012-10-01 Thread Noel Grandin
On 2012-10-01 13:47, Michael Stahl wrote: ... which brings me to another point: in a hypothetical future when we could efficiently create a UTF8String from a string literal in C++ without copying the darn thing, what should hypothetical operations to mutate the string's buffer do? We need ex

Re: optimising OUString for space

2012-10-01 Thread Michael Stahl
On 01/10/12 13:02, Noel Grandin wrote: > > On 2012-10-01 12:38, Michael Meeks wrote: >> We could do some magic there; of course - space is a bit of an issue - >> we already pointlessly bloat bazillions of ascii strings into UCS-2 >> (nominally UTF-16) representations and nail a ref-count and len

Re: optimising OUString for space

2012-10-01 Thread Noel Grandin
On 2012-10-01 13:25, Michael Meeks wrote: The latest Java VM does this trick internally - it pretends that String is stored with an array of 16-bit values, but actually it stores them as UTF-8. Interesting - for all strings ? is there a pointer to the code / docs for that detail somewhe

Re: optimising OUString for space

2012-10-01 Thread Michael Meeks
On Mon, 2012-10-01 at 13:02 +0200, Noel Grandin wrote: > That was something I was thinking about the other day - given than the > bulk of our strings are pure 7-bit ASCII, it might be a worthwhile > optimisation to store a bit that says "this string is 7-bit ASCII", and > then store the string

Re: optimising OUString for space

2012-10-01 Thread Stephan Bergmann
On 10/01/2012 01:02 PM, Noel Grandin wrote: That was something I was thinking about the other day - given than the bulk of our strings are pure 7-bit ASCII, it might be a worthwhile optimisation to store a bit that says "this string is 7-bit ASCII", and then store the string as a sequence of byte

optimising OUString for space

2012-10-01 Thread Noel Grandin
On 2012-10-01 12:38, Michael Meeks wrote: We could do some magic there; of course - space is a bit of an issue - we already pointlessly bloat bazillions of ascii strings into UCS-2 (nominally UTF-16) representations and nail a ref-count and length on the beginning. If you turn on the lifecycle