Re: UNICODE character identification

George Milten Tue, 10 Feb 2015 06:56:38 -0800

utf-8,

thank you


2015-02-10 16:54 GMT+02:00 Kool,Wouter <[email protected]>:

>  What encoding is your data in? utf8? Single-byte encoding? Marc8? That
> information matters a lot to determine whether your idea would work. If it
> is in a single-byte encoding there is often no way to determine the script
> the character belongs to.
>
>
>
>
>
> *Wouter Kool*
> Metadata Specialist *·* OCLC B.V.
> Schipholweg 99 *·* P.O. Box 876 *·* 2300 AW Leiden *·* The Netherlands
> t +31-(0)71-524 6500
>
> [email protected] *·* www.oclc.org
>
> [image: Volg @OCLC_NL op Twitter] <https://twitter.com/OCLC_NL> *[image:
> Volg OCLC (Nederland) op LinkedIn]*
> <https://www.linkedin.com/company/oclc-nederland->*[image: Abonneer op
> OCLCVideo]*
> <https://www.youtube.com/playlist?list=PLWXaAShGazu4t2h02aeXBFJO4MecNWSMO>
>
> *[image:
> https://c.na8.content.force.com/servlet/servlet.ImageServer?id=015C000000227Uz&oid=00D80000000ZRv8&lastMod=1409843680000]*
> <http://www.oclc.org/>
>
>
>
>
>
>
>
>
>
>
>
> *From:* George Milten [mailto:[email protected]]
> *Sent:* dinsdag 10 februari 2015 13:27
> *To:* [email protected]
> *Subject:* UNICODE character identification
>
>
>
> Hello friendly folks,
>
>
>
> follows what i am trying to do, and i am looking for your help in order to
> find the most clever way to achieve this:
>
>
>
> We have records, that include typos like this: we have a word say Plato,
> where the last o is inputted with the keyboard set to Greek language, so we
> need something that would parse all metadata in a per character basis,
> check against what is the script language that the majority of characters
> the word belongs to have, and return the odd characters, the script they
> belong, and the record identifier they were found in, so as to be able to
> correct them
>
>
>
> thank you in advance
>

Re: UNICODE character identification

Reply via email to