On 10/01/2012 02:45 PM, Michael Stahl wrote:
On 01/10/12 14:23, Noel Grandin wrote:
On 2012-10-01 13:58, Michael Stahl wrote:
        The only problem with a change there is our ABI - which explicitly
exposes the encoding of that.
the right time to do it is for LO4.  sadly nobody has signed up for that
yet :( ... (while there are volunteers for far sillier proposals, like
getting rid of com.sun.star...)

Perhaps we need to split out some preparatory tasks?
For example
   - fix code that directly accesses the underlying buffer
   - create an external iterator class (which would currently be a thin
wrapper around int) for looping over the buffer and indexing into it
-  fix code that indexes into an OUString to use the new external iterator

there already exists method iterateCodePoints, using a pointer to the
next code unit as the iterator (note that this interface depends on
immutability of the buffer):

     inline sal_uInt32 iterateCodePoints(
         sal_Int32 * indexUtf16, sal_Int32 incrementCodePoints = 1) const

problem is, nobody is using it...

guess you could comment out operator[], that should find lots of
convertible call sites :)

Note that in the common case of accessing (i.e., searching for, etc.) 7-bit ASCII content in a string, regardless of whether it is internally represented as UTF-8 or UTF-16, going via an operator[] interface that operates directly on the string object's innards might be more efficient than going via an iterator interface (which is, of course, necessary when potentially accessing non-ASCII content).

What an ideal string abstraction would look like is not clear to me at all.

Stephan
_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to