> Wow, I'm really confused. I'm trying to remove duplicate 
> lines from a marc21 text file.  I have spent countless hours 
> searching for scripts etc. 

I'm also very new to Perl and wrote a long and newbyish script that
does exactly what the Unix command "sort FILENAME | uniq" does just to
see how it can be done.

What I did was read the file's lines into an array and use the sort()
function to sort the lines. Then it's easy, do what Joe recommended
and just check if the current line is equal to the last line.

HTH

> 
> What I find frustrating while trying to learn Perl, is that 
> most solutions assume you know what to do.  For example, 
> someone gives the code to find and replace, and that's it. In 
> other words, if the complete script was there, I think I 
> could learn much faster. I have no idea of how to put the 
> code into a script. 
> 
> I did manage to find a few perl one liners but it removed the 
> blank lines between the records, which must be retained in 
> order to convert the file back to actual marc format before 
> downloading into the database.
> 
> It also removed non sequential lines if they were the same in 
> another record.  They must also be kept as they are an 
> important part of the file.
> 
> Any help would be more than appreciated. Below is part of a 
> very large file.Approx 100,000 records need to be processed. 
> For now, I just want to remove adjacent duplicate fields.
> 
> =LDR  01548cam  2200397La 45{92}0
> =001  ocm42328427\
> =003  OCoLC
> =005  20010526091201.0
> =006  m\\\\\\\\u\\\\\\\\
> =007  cr\cn-
> =008  831108s1984\\\\inua\\\\sb\\\\001\0\eng\d
> =010  \\$z   83048636 
> =035  \\1234 (sirsi)
> =035  \\1234 (sirsi)
> =040  \\$aN{dollar}T$cN{dollar}T$dOCL
> =020  \\$a0585000905 (electronic bk.)
> =020  \\$z0253366062
> =020  \\$z0253203252
> =050  14$aNX180.F4$bL38 1984eb
> =082  04$a700/.88042$219
> =049  [EMAIL PROTECTED]
> =100  1\$aLauter, Estella,$d1940-
> =245  10$aWomen as mythmakers$h[computer file] :$bpoetry and 
> visual art by twentieth-century women /$cEstella Lauter.
> =260  \\$aBloomington :$bIndiana University Press,$cc1984.
> =300  \\$axvii, 267 p. :$bill. ;$c24 cm.
> =504  \\$aBibliography: p. 247-260.
> =500  \\$aIncludes index.
> =533  \\$aElectronic reproduction.$bBoulder, Colo. 
> :$cNetLibrary,$d1999.$nAvailable via the World Wide 
> Web.$nAvailable in multiple electronic file formats.$nAccess 
> may be limited to NetLibrary affiliated libraries.
> =SUBJ  \0$aFeminism and the arts.
> =SUBJ  \0$aWomen artists.
> =SUBJ  \0$aWomen poets.
> =SUBJ  \0$aArt and mythology.
> =SUBJ  \0$aArts, Modern$y20th century.
> =655  \7$aElectronic books.$2local
> =710  2\$aNetLibrary, Inc.
> =776  1\$cOriginal$w(DLC)   83048636$w(OCoLC)10162146
> =856  4\$3Bibliographic record 
> display$uhttp://www.netlibrary.com/urlapi.asp?action=summary&v
> =1&bookid=652$zAn electronic book accessible through the 
> World Wide Web; click for information
> =994  \\$a92$bM7@
> 
> =LDR  01470cam  2200349La 45{92}0
> =001  ocm42328450\
> =003  OCoLC
> =005  20010526091202.0
> =006  m\\\\\\\\u\\\\\\\\
> =007  cr\cn-
> =008  980609s1998\\\\couab\\\sbf\\\001\0\eng\d
> =010  \\$z   98026266 
> =035  \\1234 (sirsi)
> =035  \\1234 (sirsi)
> =040  \\$aN{dollar}T$cN{dollar}T$dOCL
> =020  \\$a0585001413 (electronic bk.)
> =020  \\$z1555662307
> =050  14$aQB581$b.L66 1998eb
> =082  04$a523.3$221
> =049  [EMAIL PROTECTED]
> =100  1\$aLong, Kim.
> =245  14$aThe moon book$h[computer file] :$bfascinating facts 
> about the magnificent, mysterious moon /$cKim Long ; science 
> advisor, Larry Sessions.
> =250  \\$aRev. and expanded.
> =260  \\$aBoulder, Colo. :$bJohnson Books,$cc1998.
> =300  \\$a149 p. :$bill., maps ;$c22 cm.
> =500  \\$aIncludes 1 errata sheet.
> =504  \\$aIncludes bibliographical references (p. 132-133) and
index.
> =533  \\$aElectronic reproduction.$bBoulder, Colo. 
> :$cNetLibrary,$d1999.$nAvailable via the World Wide 
> Web.$nAvailable in multiple electronic file formats.$nAccess 
> may be limited to NetLibrary affiliated libraries.
> =651  \0$aMoon$vHandbooks, manuals, etc.
> =655  \7$aElectronic books.$2local
> =710  2\$aNetLibrary, Inc.
> =776  1\$cOriginal$w(DLC)   98026266$w(OCoLC)39299241
> =856  4\$3Bibliographic record 
> display$uhttp://www.netlibrary.com/urlapi.asp?action=summary&v
> =1&bookid=140$zAn electronic book accessible through the 
> World Wide Web; click for information
> =994  \\$a92$bM7@
> =994  \\$a92$bM7@
> 
> 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to