Hi
Just for the record. On 22 June 2017 I wrote:
Yesterday I wrote:
Our library receives MARC data from EKZ (a German cataloging data
provider) which includes two unwanted characters:
* a beginning "non-sorting character"
* an ending "non-sorting character"
These characters can't be seen in the OPAC and in the hitlist of the
staff client, but they do appear in the framework and also in the top
line of the webbrowser. Here is an example of a file containing such
characters: http://adminkuhn.ch/download/kuhn0000000
When opening the original .mrc file with vi these characters show as:
<98>The<9c> obsession
With "od -c" they show as:
302 230 T h e 302 234 o b s e s s i o n
Of course these characters could be removed e. g. with sed (but this
will result in a wrong character length in MARC LEADER positions 0-4)
and also it has to be done separately on the shell outside and before
the regular importing process. Or even using software like MarcEdit.
Now the question is if there is an EASY way how to delete these
unwanted characters within Koha, for example by using the MARC
modification templates which is used anyway when loading such data?
About four or even five hours later, after trying different ways I have
finally found the following solution for my case. Unfortunately there is
no "easy" way - external software is needed:
catmandu convert MARC to MARC --type XML < inputfile | sed -e
's/\xc2\x98//g' -e 's/\xc2\x9c//g' | catmandu convert MARC --type XML to
MARC > outputfile
In fact I was playing around with quite some stuff - including character
representations of course - among them yaz-marcdump (which is part of
catmandu), xml2marc by Galen Charlton and even Marcedit.
One of the problems I had with Marcedit is I couldn't find a way how to
remove one single character all over the record. So I finally settled to
first transform the original MARC file to MARCXML using yaz.marcdump,
then removing the unwanted characters with sed and finally transforming
MARCXML back to MARC using Marcedit. Since I'm not very GUI friendly I
then looked for a tool to do the same on the shell. Unfortunately Galen
Charltons slim "xml2marc" from 2011 seems to have a problem with
character sets, thus I went for the fatter catmandu (
http://librecat.org/Catmandu/ ) which eventually did the trick.
What I learned is that even a (seemingly) minor change in a MARC record
can be some kind of real hell. Of course now that I have the solution,
it looks easy. However, I was also quite surprised it is not possible to
directly load MARCXML via Koha menu "Tools > Stage MARC records for
import". And I was mildly deceived when Koha was only telling me "1
records not staged because of MARC error" but giving me no hint what the
error really was.
By the way: After deleting the unwanted characters with sed of course
the record length isn't correct anymore. You may replace the incorrect
LEADER positions 0-4 with 00000 or just transform MARCXML to MARC -
Marcedit and catmandu both created correct new LEADER positions 0-4
automatically.
Thanks again to everybody who helped giving hints and ideas!
The following command I mentioned does NOT convert the first record of
the original MARC file!
catmandu convert MARC to MARC --type XML < inputfile | sed -e
's/\xc2\x98//g' -e 's/\xc2\x9c//g' | catmandu convert MARC --type XML to
MARC > outputfile
I don't know what's the problem (and at the moment I really don't care).
However, the following command will result in an output file also
containing the very first record:
yaz-marcdump -t utf-8 -o marcxml -l 9=97 inputfile | sed -e
's/\xc2\x98//g' -e 's/\xc2\x9c//g' | catmandu convert MARC --type XML to
MARC > outputfile
Just in case someone else will ever use this command.
Best wishes: Michael
--
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E m...@adminkuhn.ch · W www.adminkuhn.ch
_______________________________________________
Koha mailing list http://koha-community.org
Koha@lists.katipo.co.nz
https://lists.katipo.co.nz/mailman/listinfo/koha