Re: Splitting a large file of MARC records into smaller files

Ashley Sanders Mon, 25 Jan 2010 07:41:22 -0800

Jennifer,

I am working with files of MARC records that are over a million records each. 
I'd like to split them down into smaller chunks, preferably using a command 
line. MARCedit works, but is slow and made for the desktop. I've looked around 
and haven't found anything truly useful- Endeavor's MARCsplit comes close but 
doesn't separate files into even numbers, only by matching criteria, so there 
could be lots of record duplication between files.


Any idea where to begin? I am a (super) novice Perl person.


Well... if you have a *nix style command line and the usual
utilities and your file of MARC records is in exchange format
with the records just delimited by the end-of-record character
0x1d, then you could do something like this:

tr '\035' '\n' < my-marc-file.mrc > recs.txt
split -1000 recs.txt

The tr command will turn the MARC end-of-record characters
into newlines. Then use the split command to carve up
the output of tr into files of 1000 records.

You then may have to use tr to convert the newlines back
to MARC end-of-record characters.

Ashley.

--
Ashley Sanders               a.sand...@manchester.ac.uk
Copac http://copac.ac.uk A Mimas service funded by JISC

Re: Splitting a large file of MARC records into smaller files

Reply via email to