Re: Improvements of OUString

Herbert Duerr Tue, 03 Dec 2013 01:35:35 -0800

On 03.12.2013 09:13, Andre Fischer wrote:

A developer who apparently wants to remain anonymous has added the
function isEmpty() to the rtl::OUString class.  See
main/sal/inc/rtl/ustring.hxx for not much more information.


Sorry for being too short. The full semantic for isEmpty() is:

"The method isEmpty() returns true if the string is empty. If the lengthof the string is one or two or three or any number bigger than zero thenisEmpty() returns false."

I added isEmpty() to make it possible to cleanly express the check foran empty string. In our codebase there were quite a few constructs such as

        if( aString) {}
which were intended to mean
        if( aString.isEmpty()) {}

What's funny is that the old construct compiled but it did the wrongthing: The string was implicitly converted to a pointer to its elementsand that pointer was then compared against NULL. For our OUString thatpointer was always non-NULL though.

Please see issue 123068 for further problems caused by the implicitconversion of the OUString to a pointer to its elements. This dangerousconversion is now disabled. By making the method private all suchproblems will be found and prevented by the compiler. When we'reconfident that all has been found the operator can be removed completely.

This in itself may not yet be very exciting but I hope that it is the
first of several improvements to one of our most frequently used
classes.  Sadly, we missed the opportunity to make some more substantial
but incompatible changes for the 4.0 release. However, some changes that
make OUString more accessible to new (and old) developers might include:

- Make construction from string literal more straightforward.  At the
moment you have to write
     ::rtl::OUString("text", sizeof("text"), RTL_TEXTENCODING_ASCII_US)
   or slightly shorter and safer
     ::rtl::OUString::createFromAscii("text")

Allocating heap space, transcoding a literal string to this memory anddeallocating it later when the string is deleted are quite wastefuloperations. Especially when considering that the literal string isalready there. It would be great if constructs such

        OUString( L"hello")

used the pointer to the UTF-16 literal directly instead of copying itscontents around. The same applies for the OString(). The 'L' prefix is aWindows convention but C++11 has even more possibilities with itssupport for unicode string literals.

Also we shouldn't bother our main string classes with non-unicodesupport. Having external tooling for converting from/to other encodingsis still needed though.

Looking over our string processing I'm confident that we could get alonggreat with UTF-8 strings. Only when interfacing with other APIs aneventual conversion to UTF-16 would be needed.

And if we were using UTF-8 byte strings we could base them directly onthe standard std::string.

- Conversion back to char* is not much better
     ::rtl::OUStringToOString(sOUStringVariable,
RTL_TEXTENCODING_ASCII_US).getStr()

This awful construct could be made much simpler if our strings werealways unicode (UTF-8/UTF-16/UTF-32).

Do you have more ideas?

Using ideas from languages such as Python/Perl/Java for convenient andpowerful string processing to replace the awkward string handling thatis too often seen in our code base. E.g. having regexp enabled match()or search() methods would be a great start.


Herbert


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Re: Improvements of OUString

Reply via email to