Re: Facing issue in using special characters

Warner, Gary, Jr Sun, 17 Mar 2019 08:02:42 -0700

Many of us have faced character encoding issues because we are not in control 
of our input sources and made the common assumption that UTF-8 covers 
everything.


In my lab, as an example, some of our social media posts have included ZawGyi 
Burmese character sets rather than Unicode Burmese.  (Because Myanmar developed 
technology In a closed to the world environment, they made up their own 
non-standard character set which is very common still in Mobile phones.). We 
had fully tested the app with Unicode Burmese, but honestly didn’t know ZawGyi 
was even a thing that we would see in our dataset.  We’ve also had problems 
with non-Unicode word separators in Arabic.

What we’ve found to be helpful is to view the troubling code in a hex editor 
and determine what non-standard characters may be causing the problem.

It may be some data conversion is necessary before insertion. But the first 
step is knowing WHICH characters are causing the issue.

Re: Facing issue in using special characters

Reply via email to