Re: RTL_CONSTASCII_USTRINGPARAM: cleanup wanted?

Lubos Lunak Wed, 22 Feb 2012 05:56:21 -0800

On Wednesday 22 of February 2012, Stephan Bergmann wrote:
> On 02/22/2012 11:25 AM, Michael Meeks wrote:
> >     Great ! :-) incidentally, I had one minor point around the ASCII vs.
> > UTF-8 side; the rtl_string2UString (cf. sal/rtl/source/string.cxx) does
> > a typically slower UTF-8 length counting loop; I suggest that we could
> > do better performance wise (and we do create a biggish scad of these
> > strings) by sticking with ascii, and doing a single, simple copy/expand
> > of the string. Perhaps in a new rtl_uString_newFromAsciiL method.


 Actually rtl_string2UString() is reasonably optimized for the case when the 
data is ASCII or UTF-8-that-in-fact-is-ASCII, so the one loop analysing the 
contents is the only overhead. Makes me wonder if avoiding that one loop is 
really worth it. I'll go with 'no' for the time being, until somebody shows 
me otherwise.

> Thinking about it again, the restriction to ASCII could become a
> hindrance in the longer run.  C++11 has provision for UTF-8 string
> literals (u8"..."), but they still have type char const[], so are not
> distinguishable from traditional plain "..." literals via function
> overloading.  So, if we ever wanted to extend the new facilities to also
> support UTF-8 string literals, but would want to keep the performance
> benefit for the ASCII-only case, we could not offer the same simple syntax
>
>    rtl::OUString("foo");
>    rtl::OUString(u8"I\u2764C++");
>
> for both.

 We could have OUString::fromUtf8( utf8literal ), which I consider acceptable, 
especially given that IMO we are unlikely to have a larger number of utf8 
literals anyway. But I think it's better to go for utf8 always and optimize 
if we find out it's worth it.

 I thought there could be a way to test string literal contents at 
compile-time, but string literals are not considered to be compile-time 
constants just because the standard says so, so templates can't take them as 
arguments, and while I've eventually found a way to do it, based on 
http://www.macieira.org/blog/2011/07/initialising-an-array-with-cx0x-using-constexpr-and-variadic-templates/
 , 
see attachment, it turns out to be unusable in practice. Maybe later.

-- 
 Lubos Lunak
 l.lu...@suse.cz

// With gcc-4.5.1 this is awfully slow to compile.
// Also, for longer strings the computation is no longer done at compile
// time and instead code for handling it at runtime is generated.

#include <stdio.h>

constexpr inline
int sum()
    {
    return 0;
    }

template< typename... T >
constexpr inline 
int sum( int v1, T... v2 )
    {
    return v1 + sum( v2... );
    }

// TODO BUG
// This is the other way around, it should in fact lead to skipping ret-1
// following characters, so this needs to be handled as
// { utf8LengthChar( s[ i ] )... ) } (i.e. array) to ensure ordering.
constexpr inline 
int utf8LengthChar( unsigned char c )
    {
    return !( c & 0x80 ) ? 1
        : ( c & 0xe0 ) == 0xc0 ? 2
        : ( c & 0xf0 ) == 0xe0 ? 3
        : ( c & 0xf8 ) == 0xf0 ? 4
        : ( c & 0xfc ) == 0xf8 ? 5
        : ( c & 0xfe ) == 0xfc ? 6
        : 1;
    }

template< int... >
struct IndexList
    {
    };

template< typename IndexList, int Right >
struct Merge;

template< int... Left, int Right >
struct Merge< IndexList< Left... >, Right >
    {
    typedef IndexList< Left..., Right > Range;
    };

template< int N >
struct Indexes
    {
    typedef typename Merge< typename Indexes< N - 1 >::Range, N >::Range Range;
    };

template<>
struct Indexes< 0 >
    {
    typedef IndexList<> Range;
    };

template< int N, typename T >
struct Utf8LengthHelper;

template< int N, int... i >
struct Utf8LengthHelper< N, IndexList< i... > >
    {
    constexpr inline Utf8LengthHelper( const char s[ N ] )
        : value( sum( utf8LengthChar( s[ i ] )... ))
        {
        }
    const int value;
    };

template< int N >
constexpr inline int utf8Length( const char s[ N ] )
    {
    return Utf8LengthHelper< N, typename Indexes< N >::Range >( s ).value;
    }

template< int N >
inline
void foo( const char (&s)[ N ] )
    {
    fprintf( stderr, "%s %d\n", s, utf8Length< N - 1 >( s ));
    }

int main()
    {
    foo( "testé" );
    }

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Re: RTL_CONSTASCII_USTRINGPARAM: cleanup wanted?

Reply via email to