Hi
You can use this regular expression to see if there might be non valid UTF8
errors in a piece of text (but can’t check for correctness of the unicode)
perl -l -ne '/
^( ([\x00-\x1D]) # 1-byte pattern
|([\x1F-\x7F]) # 1-byte pattern
|([\xC2-\xDF][\x80-\xBF]) #
Can someone suggest a way to identify if a MARC record, coded at LDR/09 = ‘a’
has non-unicode characters in it? I tried the following, kind of grasping at
straws, against a record that I know has non-unicode characters. It didn’t
report any errors.
# $bib_id is defined as 001 field