Andre Poenitz wrote:
On Fri, Feb 08, 2008 at 08:04:56PM +0100, Abdelrazak Younes wrote:
Juergen,
There is something fishy in this method.
set<char_type> Encoding::getSymbolsList() const
{
// assure the used encoding is properly initialized
init();
// first all encodable characters
CharSet symbols = encodable_;
// add those below start_encodable_
for (char_type c = 0; c < start_encodable_; ++c)
symbols.insert(c);
// now the ones from the unicodesymbols file
CharInfoMap::const_iterator const end = unicodesymbols.end();
CharInfoMap::const_iterator it = unicodesymbols.begin();
for (; it != end; ++it)
symbols.insert(it->first);
return symbols;
}
The lengthy initialization for utf8* encoding is not due to iconv and this
is normal as we shouldn't have to do a lookup for utf8 encodable
characters, all of them are.
No, the problem lies is when we insert the symbols from the unicodesymbols
file. For utf8, we shouldn't do that because _all_ symbols are already in
there. On each insertion, std::set() has to search if the given symbols is
not already present; as you have 1114112 symbols...
Wouldn't a bool array or such do as well?
Yep, or a bitset.
Abdel.