Re: Need to do some "dirty" UTF-8 handling

Dmitry Olshansky Sat, 25 Jun 2011 17:05:16 -0700

On 26.06.2011 3:25, Nick Sabalausky wrote:

"Dmitry Olshansky"<dmitry.o...@gmail.com>  wrote in message
news:iu5n32$2vjd$1...@digitalmars.com...

On 26.06.2011 1:49, Nick Sabalausky wrote:

"Andrej Mitrovic"<andrej.mitrov...@gmail.com>   wrote in message
news:mailman.1215.1309019944.14074.digitalmars-d-le...@puremagic.com...

I've had a similar requirement some time ago. I've had to copy and
modify the phobos function std.utf.decode for a custom text editor
because the function throws when it finds an invalid code point. This
is way too slow for my needs. I'm actually displaying invalid code
points with special marks (just like Scintilla), so I need decoding to
work as fast as possible.


The new function simply replaces throwing exceptions with flagging a
boolean.

I think I may end up doing something like that :/

I was hoping to be able to do something vaguely sensible like this:

string newStr;
foreach(dchar dc; str)
{
      if(isValidDchar(dc))
          newStr ~= dc;
      else
          newStr ~= 'X';
}
str = newStr;

But that just blows up in my face.

std.encoding to the rescue?
It looks like a well established module that was forgotten for some
reason.

And here I'm wondering what a function named sanitize could do :)

Ahh, I didn't even notice that module.

Same here, It's just a couple of days(!) ago I somehow managed to finddecode in the wrong place (in std.encoding instead of std.utf). And itlooked useful, but I never heard about it. Seriously, how many totallyirrelevant old modules we have around here? (hint: std.gregorian!)

Even if it's imperfect and goes away, it looks like it'll at least get the
job done for me. And the encoding conversions should even give me an easy
way to save at least some of the invalid chars (which wasn't really a
requirement of mine, but it'll still be nice).

Yeah, given the amount of necessary work in the Phobos realm it couldhang around for quite sometime ;)


--
Dmitry Olshansky

Re: Need to do some "dirty" UTF-8 handling

Reply via email to