date:20140630

Re: Finding non-unicode characters

2014-06-30 Thread Patrick Hochstenbach

Hi You can use this regular expression to see if there might be non valid UTF8 errors in a piece of text (but can’t check for correctness of the unicode) perl -l -ne '/ ^( ([\x00-\x1D]) # 1-byte pattern |([\x1F-\x7F]) # 1-byte pattern |([\xC2-\xDF][\x80-\xBF]) #

Finding non-unicode characters

2014-06-30 Thread Anne Highsmith

Can someone suggest a way to identify if a MARC record, coded at LDR/09 = ‘a’ has non-unicode characters in it? I tried the following, kind of grasping at straws, against a record that I know has non-unicode characters. It didn’t report any errors. # $bib_id is defined as 001 field