On Wednesday 22 of February 2012, Stephan Bergmann wrote: > On 02/22/2012 11:25 AM, Michael Meeks wrote: > > Great ! :-) incidentally, I had one minor point around the ASCII vs. > > UTF-8 side; the rtl_string2UString (cf. sal/rtl/source/string.cxx) does > > a typically slower UTF-8 length counting loop; I suggest that we could > > do better performance wise (and we do create a biggish scad of these > > strings) by sticking with ascii, and doing a single, simple copy/expand > > of the string. Perhaps in a new rtl_uString_newFromAsciiL method.
Actually rtl_string2UString() is reasonably optimized for the case when the data is ASCII or UTF-8-that-in-fact-is-ASCII, so the one loop analysing the contents is the only overhead. Makes me wonder if avoiding that one loop is really worth it. I'll go with 'no' for the time being, until somebody shows me otherwise. > Thinking about it again, the restriction to ASCII could become a > hindrance in the longer run. C++11 has provision for UTF-8 string > literals (u8"..."), but they still have type char const[], so are not > distinguishable from traditional plain "..." literals via function > overloading. So, if we ever wanted to extend the new facilities to also > support UTF-8 string literals, but would want to keep the performance > benefit for the ASCII-only case, we could not offer the same simple syntax > > rtl::OUString("foo"); > rtl::OUString(u8"I\u2764C++"); > > for both. We could have OUString::fromUtf8( utf8literal ), which I consider acceptable, especially given that IMO we are unlikely to have a larger number of utf8 literals anyway. But I think it's better to go for utf8 always and optimize if we find out it's worth it. I thought there could be a way to test string literal contents at compile-time, but string literals are not considered to be compile-time constants just because the standard says so, so templates can't take them as arguments, and while I've eventually found a way to do it, based on http://www.macieira.org/blog/2011/07/initialising-an-array-with-cx0x-using-constexpr-and-variadic-templates/ , see attachment, it turns out to be unusable in practice. Maybe later. -- Lubos Lunak l.lu...@suse.cz
// With gcc-4.5.1 this is awfully slow to compile. // Also, for longer strings the computation is no longer done at compile // time and instead code for handling it at runtime is generated. #include <stdio.h> constexpr inline int sum() { return 0; } template< typename... T > constexpr inline int sum( int v1, T... v2 ) { return v1 + sum( v2... ); } // TODO BUG // This is the other way around, it should in fact lead to skipping ret-1 // following characters, so this needs to be handled as // { utf8LengthChar( s[ i ] )... ) } (i.e. array) to ensure ordering. constexpr inline int utf8LengthChar( unsigned char c ) { return !( c & 0x80 ) ? 1 : ( c & 0xe0 ) == 0xc0 ? 2 : ( c & 0xf0 ) == 0xe0 ? 3 : ( c & 0xf8 ) == 0xf0 ? 4 : ( c & 0xfc ) == 0xf8 ? 5 : ( c & 0xfe ) == 0xfc ? 6 : 1; } template< int... > struct IndexList { }; template< typename IndexList, int Right > struct Merge; template< int... Left, int Right > struct Merge< IndexList< Left... >, Right > { typedef IndexList< Left..., Right > Range; }; template< int N > struct Indexes { typedef typename Merge< typename Indexes< N - 1 >::Range, N >::Range Range; }; template<> struct Indexes< 0 > { typedef IndexList<> Range; }; template< int N, typename T > struct Utf8LengthHelper; template< int N, int... i > struct Utf8LengthHelper< N, IndexList< i... > > { constexpr inline Utf8LengthHelper( const char s[ N ] ) : value( sum( utf8LengthChar( s[ i ] )... )) { } const int value; }; template< int N > constexpr inline int utf8Length( const char s[ N ] ) { return Utf8LengthHelper< N, typename Indexes< N >::Range >( s ).value; } template< int N > inline void foo( const char (&s)[ N ] ) { fprintf( stderr, "%s %d\n", s, utf8Length< N - 1 >( s )); } int main() { foo( "testé" ); }
_______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice