Re: identify ISSN numbers in an mrc file

Sergio Letuche Wed, 02 Nov 2016 03:07:45 -0700

Thank you dear Stefano,

i am aware of this module, it works great.


But my problem is, what clever regex to use, in order to identify if a
subfield's content, is an ISSN number. Say our mrc has ISSN numbers thrown
in any tag you could imagine...

So my approach, would be, to search the whole mrc, but i do non know which
regex to use...

2016-11-02 11:52 GMT+02:00 Stefano Bargioni <bargi...@pusc.it>:

> Hi, Sergio:
> you can try MARCgrep http://en.pusc.it/bib/MARCgrep.
> Its help is:
>
> MARCgrep.pl
>        Extracts MARC records that match a condition on fields. Count and
>        invert are available.
>
> SYNOPSIS
>        MARCgrep.pl [options] [-e condition] file.mrc
>
>         Options:
>           -h   print this help message and exit
>           -c   count only
>           -e   condition
>           -f   comma separated list of fields to print
>           -o   output format "marc" | "line" | "INLINE"
>           -s   separator string for condition, default ","
>           -v   invert match
>
>         Condition:
>           -e  'tag,indicator1,indicator2,subfield,value'
>
> OPTIONS
>        -h      Print this message and exit.
>
>        -c      Count and print number of matching records
>
>        -e      The condition to match in the record.
>                 For data fields, the syntax is:
>
>                   tag,indicator1,indicator2,subfield,value
>
>                 where tag, indicator1, indicator2, subfield, and value are
> regular expressions patterns.
>                 Do not put spaces around the separators.
>
>                 For control fields, the syntax is:
>
>                   tag,pos1,pos2,value
>
>                 where tag starts with '00' (use '000' or 'LDR' for
> leader), pos1 is the starting position,
>                 pos2 is the ending position, both 0-based. Value is a
> regular expression.
>
>                 Default condition (-e not specified) matches any data
> field.
>                 For control fields, only the tag is mandatory.
>
>                 Examples: -e '100,,,a,^A' will match records that contain
> 100$a starting with 'A'
>                           -e '008,35,37,(ita|eng)' will match records with
> language ita or eng in 008
>                           -e '(1|7)(0|1)(0|1),,2' will match
> 100,110,111,700,710,711 with ind2=2
>
>        -f      Comma separated list of fields (tags) to print if output
> format
>                is "line" or "inline". Default is any field.
>                 Note that if a tag is preceded by '#' sign (like in
> '#nnn'), a
>                count of occurrences will be printed instead.
>
>                 Examples: -f '100,245' will print field 100 and 245
>                           -f '400,#400' will print all occurrences of 400
> field as well as the number of its occurrences
>
>        -o      Output format: "marc" for ISO2709, "line" for each subfield
> in
>                a line, "inline" (default) for each field in a line.
>
>        -s      Specify a string separator for condition. Default is ','.
>
>        -v      Invert the sense of matching, to select non-matching
> records.
>
>        -V      Print the version and exit.
>
>        file.mrc
>                The mandatory ISO2709 file to read. Can be STDIN, '-'.
>
> DESCRIPTION
>        Like grep, the famous Unix utility, MARCgrep.pl allows to filter
> MARC
>        bibliographic
>         records based on conditions on tag, indicators, and field value.
>
>        Conditions can be applied to data fields, control fields or the
> leader.
>
>        In case of data fields, the condition can specify tag, indicators,
>        subfield and value using regular
>         expressions. In case of control fields, the condition must contain
> the
>        tag name, the starting
>         and ending position (both 0-based), and a regular expressions for
> the
>        value.
>
>        Options -c and -v allow respectively to count matching records and
> to
>        invert the match.
>
>        If option -c is not specified, the output format can be "line" or
>        "inline" (both human readable),
>         or "marc" for MARC binary (ISO2709). For formats "line" or
> "inline",
>        the -f option allows to specify
>         fields to print.
>
>        You can chain more conditions using
>
>        ./MARCGgrep.pl -o marc -e condition1 file.mrc | ./MARCGgrep.pl -e
>        condition2 -
>
> KNOWN ISSUES
>        Performance.
>
>        Accepts and returns only UTF-8.
>
>        Checks are case sensitive.
>
> AUTHOR
>        Pontificia Universita' della Santa Croce <http://www.pusc.it/bib/>
>
>        Stefano Bargioni <bargi...@pusc.it>
>
> SEE ALSO
>        marktriggs / marcgrep at <https://github.com/marktriggs/marcgrep>
> for
>        filtering large data sets
>
>
> > On 02 nov 2016, at 09:57, Sergio Letuche <code4libus...@gmail.com>
> wrote:
> >
> > Hello community,
> >
> > how would you treat the following?
> >
> > I need a way to identify all tags - subfields, that have stored an ISSN
> number in them.
> >
> > What would you suggest as a clever approach for this?
> >
> > Thank you
>
>

Re: identify ISSN numbers in an mrc file

Reply via email to