Hello community,
how would you treat the following?
I need a way to identify all tags - subfields, that have stored an ISSN
number in them.
What would you suggest as a clever approach for this?
Thank you
Hi, Sergio:
you can try MARCgrep http://en.pusc.it/bib/MARCgrep.
Its help is:
MARCgrep.pl
Extracts MARC records that match a condition on fields. Count and
invert are available.
SYNOPSIS
MARCgrep.pl [options] [-e condition] file.mrc
Options:
-h print this
Thank you dear Stefano,
i am aware of this module, it works great.
But my problem is, what clever regex to use, in order to identify if a
subfield's content, is an ISSN number. Say our mrc has ISSN numbers thrown
in any tag you could imagine...
So my approach, would be, to search the whole mrc,
Hi Sergio,
Try
^\d{4}-\d{3}[\dxX]$
if you know that they will always be formatted with a hyphen in the middle, or
^\d{4}-?\d{3}[\dxX]$
if you can't be sure of that.
(and if you're interested in spotting ISSNs in the middle of a field use
\b\d{4}-?\d{3}[\dxX]\b
but beware this also finds year
thank you very much
2016-11-02 12:28 GMT+02:00 Ben Soares :
> Hi Sergio,
>
> Try
>
> ^\d{4}-\d{3}[\dxX]$
>
> if you know that they will always be formatted with a hyphen in the
> middle, or
>
> ^\d{4}-?\d{3}[\dxX]$
>
> if you can't be sure of that.
>
> (and if you're interested in spotting ISSNs
In Catmandu you can do this with this script (which will also filter out all
valid ISSN numbers)…
# cpanm Catmandu Catmandu::Identifier
$ cat myfix.txt
marc_map('***',text.$append)
filter(text,'(\b\d{4}-?\d{3}[\dxX]\b)')
replace_all(text.*,'.*(\b\d{4}-?\d{3}[\dxX]\b).*',$1)
do list(path:text)