Re: Long initialisation for utf8* encodings

Abdelrazak Younes Fri, 08 Feb 2008 11:14:39 -0800

Abdelrazak Younes wrote:

Juergen,
There is something fishy in this method.

set<char_type> Encoding::getSymbolsList() const
{
    // assure the used encoding is properly initialized
    init();

    // first all encodable characters
    CharSet symbols = encodable_;
    // add those below start_encodable_
    for (char_type c = 0; c < start_encodable_; ++c)
        symbols.insert(c);
    // now the ones from the unicodesymbols file
    CharInfoMap::const_iterator const end = unicodesymbols.end();
    CharInfoMap::const_iterator it = unicodesymbols.begin();
    for (; it != end; ++it)
        symbols.insert(it->first);
    return symbols;
}
The lengthy initialization for utf8* encoding is not due to iconv andthis is normal as we shouldn't have to do a lookup for utf8 encodablecharacters, all of them are.No, the problem lies is when we insert the symbols from theunicodesymbols file. For utf8, we shouldn't do that because _all_symbols are already in there. On each insertion, std::set() has tosearch if the given symbols is not already present; as you have 1114112symbols...

Well the initial insertion from 0 to ucs4_max is also quite lengthy ofcourse. I guess the solution is just to use a vector instead of a set.


Abdel.

Re: Long initialisation for utf8* encodings

Reply via email to