Abdelrazak Younes wrote:
Juergen,

There is something fishy in this method.

set<char_type> Encoding::getSymbolsList() const
{
    // assure the used encoding is properly initialized
    init();

    // first all encodable characters
    CharSet symbols = encodable_;
    // add those below start_encodable_
    for (char_type c = 0; c < start_encodable_; ++c)
        symbols.insert(c);
    // now the ones from the unicodesymbols file
    CharInfoMap::const_iterator const end = unicodesymbols.end();
    CharInfoMap::const_iterator it = unicodesymbols.begin();
    for (; it != end; ++it)
        symbols.insert(it->first);
    return symbols;
}

The lengthy initialization for utf8* encoding is not due to iconv and this is normal as we shouldn't have to do a lookup for utf8 encodable characters, all of them are. No, the problem lies is when we insert the symbols from the unicodesymbols file. For utf8, we shouldn't do that because _all_ symbols are already in there. On each insertion, std::set() has to search if the given symbols is not already present; as you have 1114112 symbols...

Well the initial insertion from 0 to ucs4_max is also quite lengthy of course. I guess the solution is just to use a vector instead of a set.

Abdel.

Reply via email to