Wow, I'm really confused. I'm trying to remove duplicate lines from a marc21 
text file.  I have spent countless hours searching for scripts etc. 

What I find frustrating while trying to learn Perl, is that most solutions 
assume you know what to do.  For example, someone gives the code to find and 
replace, and that's it. In other words, if the complete script was there, I 
think I could learn much faster. I have no idea of how to put the code into a 
script. 

I did manage to find a few perl one liners but it removed the blank lines 
between the records, which must be retained in order to convert the file back 
to actual marc format before downloading into the database.

It also removed non sequential lines if they were the same in another record.  
They must also be kept as they are an important part of the file.

Any help would be more than appreciated. Below is part of a very large 
file.Approx 100,000 records need to be processed. For now, I just want to 
remove adjacent duplicate fields.

=LDR  01548cam  2200397La 45{92}0
=001  ocm42328427\
=003  OCoLC
=005  20010526091201.0
=006  m\\\\\\\\u\\\\\\\\
=007  cr\cn-
=008  831108s1984\\\\inua\\\\sb\\\\001\0\eng\d
=010  \\$z   83048636 
=035  \\1234 (sirsi)
=035  \\1234 (sirsi)
=040  \\$aN{dollar}T$cN{dollar}T$dOCL
=020  \\$a0585000905 (electronic bk.)
=020  \\$z0253366062
=020  \\$z0253203252
=050  14$aNX180.F4$bL38 1984eb
=082  04$a700/.88042$219
=049  [EMAIL PROTECTED]
=100  1\$aLauter, Estella,$d1940-
=245  10$aWomen as mythmakers$h[computer file] :$bpoetry and visual art by 
twentieth-century women /$cEstella Lauter.
=260  \\$aBloomington :$bIndiana University Press,$cc1984.
=300  \\$axvii, 267 p. :$bill. ;$c24 cm.
=504  \\$aBibliography: p. 247-260.
=500  \\$aIncludes index.
=533  \\$aElectronic reproduction.$bBoulder, Colo. 
:$cNetLibrary,$d1999.$nAvailable via the World Wide Web.$nAvailable in multiple 
electronic file formats.$nAccess may be limited to NetLibrary affiliated 
libraries.
=SUBJ  \0$aFeminism and the arts.
=SUBJ  \0$aWomen artists.
=SUBJ  \0$aWomen poets.
=SUBJ  \0$aArt and mythology.
=SUBJ  \0$aArts, Modern$y20th century.
=655  \7$aElectronic books.$2local
=710  2\$aNetLibrary, Inc.
=776  1\$cOriginal$w(DLC)   83048636$w(OCoLC)10162146
=856  4\$3Bibliographic record 
display$uhttp://www.netlibrary.com/urlapi.asp?action=summary&v=1&bookid=652$zAn 
electronic book accessible through the World Wide Web; click for information
=994  \\$a92$bM7@

=LDR  01470cam  2200349La 45{92}0
=001  ocm42328450\
=003  OCoLC
=005  20010526091202.0
=006  m\\\\\\\\u\\\\\\\\
=007  cr\cn-
=008  980609s1998\\\\couab\\\sbf\\\001\0\eng\d
=010  \\$z   98026266 
=035  \\1234 (sirsi)
=035  \\1234 (sirsi)
=040  \\$aN{dollar}T$cN{dollar}T$dOCL
=020  \\$a0585001413 (electronic bk.)
=020  \\$z1555662307
=050  14$aQB581$b.L66 1998eb
=082  04$a523.3$221
=049  [EMAIL PROTECTED]
=100  1\$aLong, Kim.
=245  14$aThe moon book$h[computer file] :$bfascinating facts about the 
magnificent, mysterious moon /$cKim Long ; science advisor, Larry Sessions.
=250  \\$aRev. and expanded.
=260  \\$aBoulder, Colo. :$bJohnson Books,$cc1998.
=300  \\$a149 p. :$bill., maps ;$c22 cm.
=500  \\$aIncludes 1 errata sheet.
=504  \\$aIncludes bibliographical references (p. 132-133) and index.
=533  \\$aElectronic reproduction.$bBoulder, Colo. 
:$cNetLibrary,$d1999.$nAvailable via the World Wide Web.$nAvailable in multiple 
electronic file formats.$nAccess may be limited to NetLibrary affiliated 
libraries.
=651  \0$aMoon$vHandbooks, manuals, etc.
=655  \7$aElectronic books.$2local
=710  2\$aNetLibrary, Inc.
=776  1\$cOriginal$w(DLC)   98026266$w(OCoLC)39299241
=856  4\$3Bibliographic record 
display$uhttp://www.netlibrary.com/urlapi.asp?action=summary&v=1&bookid=140$zAn 
electronic book accessible through the World Wide Web; click for information
=994  \\$a92$bM7@
=994  \\$a92$bM7@

Reply via email to