Re: How do I display unicode value stored in a string variable using ord()

DJC Sun, 19 Aug 2012 08:38:14 -0700

On 19/08/12 15:25, Steven D'Aprano wrote:

Not necessarily. Presumably you're scanning each page into a single
string. Then only the pages containing a supplementary plane char will be
bloated, which is likely to be rare. Especially since I don't expect your
OCR application would recognise many non-BMP characters -- what does
U+110F3, "SORA SOMPENG DIGIT THREE", look like? If the OCR software
doesn't recognise it, you can't get it in your output. (If you do, the
OCR software has a nasty bug.)


Anyway, in my ignorant opinion the proper fix here is to tell the OCR
software not to bother trying to recognise Imperial Aramaic, Domino
Tiles, Phaistos Disc symbols, or Egyptian Hieroglyphs if you aren't
expecting them in your source material. Not only will the scanning go
faster, but you'll get fewer wrong characters.

Consider the automated recognition of a CAPTCHA. As the chars have to beentered by the user on a keyboard, only the most basic charset can beused, so the problem of which chars are possible is quite limited.

--
http://mail.python.org/mailman/listinfo/python-list

Re: How do I display unicode value stored in a string variable using ord()

Reply via email to