On 03.12.2013 09:13, Andre Fischer wrote:
A developer who apparently wants to remain anonymous has added the
function isEmpty() to the rtl::OUString class.  See
main/sal/inc/rtl/ustring.hxx for not much more information.

Sorry for being too short. The full semantic for isEmpty() is:

"The method isEmpty() returns true if the string is empty. If the length of the string is one or two or three or any number bigger than zero then isEmpty() returns false."

I added isEmpty() to make it possible to cleanly express the check for an empty string. In our codebase there were quite a few constructs such as
        if( aString) {}
which were intended to mean
        if( aString.isEmpty()) {}
What's funny is that the old construct compiled but it did the wrong thing: The string was implicitly converted to a pointer to its elements and that pointer was then compared against NULL. For our OUString that pointer was always non-NULL though.

Please see issue 123068 for further problems caused by the implicit conversion of the OUString to a pointer to its elements. This dangerous conversion is now disabled. By making the method private all such problems will be found and prevented by the compiler. When we're confident that all has been found the operator can be removed completely.

This in itself may not yet be very exciting but I hope that it is the
first of several improvements to one of our most frequently used
classes.  Sadly, we missed the opportunity to make some more substantial
but incompatible changes for the 4.0 release. However, some changes that
make OUString more accessible to new (and old) developers might include:

- Make construction from string literal more straightforward.  At the
moment you have to write
     ::rtl::OUString("text", sizeof("text"), RTL_TEXTENCODING_ASCII_US)
   or slightly shorter and safer
     ::rtl::OUString::createFromAscii("text")

Allocating heap space, transcoding a literal string to this memory and deallocating it later when the string is deleted are quite wasteful operations. Especially when considering that the literal string is already there. It would be great if constructs such
        OUString( L"hello")
used the pointer to the UTF-16 literal directly instead of copying its contents around. The same applies for the OString(). The 'L' prefix is a Windows convention but C++11 has even more possibilities with its support for unicode string literals.

Also we shouldn't bother our main string classes with non-unicode support. Having external tooling for converting from/to other encodings is still needed though.

Looking over our string processing I'm confident that we could get along great with UTF-8 strings. Only when interfacing with other APIs an eventual conversion to UTF-16 would be needed.

And if we were using UTF-8 byte strings we could base them directly on the standard std::string.

- Conversion back to char* is not much better
     ::rtl::OUStringToOString(sOUStringVariable,
RTL_TEXTENCODING_ASCII_US).getStr()

This awful construct could be made much simpler if our strings were always unicode (UTF-8/UTF-16/UTF-32).

Do you have more ideas?

Using ideas from languages such as Python/Perl/Java for convenient and powerful string processing to replace the awkward string handling that is too often seen in our code base. E.g. having regexp enabled match() or search() methods would be a great start.

Herbert


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to