Abdelrazak Younes wrote:
Juergen,
There is something fishy in this method.
set<char_type> Encoding::getSymbolsList() const
{
// assure the used encoding is properly initialized
init();
// first all encodable characters
CharSet symbols = encodable_;
// add those below start_encodable_
for (char_type c = 0; c < start_encodable_; ++c)
symbols.insert(c);
// now the ones from the unicodesymbols file
CharInfoMap::const_iterator const end = unicodesymbols.end();
CharInfoMap::const_iterator it = unicodesymbols.begin();
for (; it != end; ++it)
symbols.insert(it->first);
return symbols;
}
The lengthy initialization for utf8* encoding is not due to iconv and
this is normal as we shouldn't have to do a lookup for utf8 encodable
characters, all of them are.
No, the problem lies is when we insert the symbols from the
unicodesymbols file. For utf8, we shouldn't do that because _all_
symbols are already in there. On each insertion, std::set() has to
search if the given symbols is not already present; as you have 1114112
symbols...
Well the initial insertion from 0 to ucs4_max is also quite lengthy of
course. I guess the solution is just to use a vector instead of a set.
Abdel.