> Wow, I'm really confused. I'm trying to remove duplicate > lines from a marc21 text file. I have spent countless hours > searching for scripts etc.
I'm also very new to Perl and wrote a long and newbyish script that does exactly what the Unix command "sort FILENAME | uniq" does just to see how it can be done. What I did was read the file's lines into an array and use the sort() function to sort the lines. Then it's easy, do what Joe recommended and just check if the current line is equal to the last line. HTH > > What I find frustrating while trying to learn Perl, is that > most solutions assume you know what to do. For example, > someone gives the code to find and replace, and that's it. In > other words, if the complete script was there, I think I > could learn much faster. I have no idea of how to put the > code into a script. > > I did manage to find a few perl one liners but it removed the > blank lines between the records, which must be retained in > order to convert the file back to actual marc format before > downloading into the database. > > It also removed non sequential lines if they were the same in > another record. They must also be kept as they are an > important part of the file. > > Any help would be more than appreciated. Below is part of a > very large file.Approx 100,000 records need to be processed. > For now, I just want to remove adjacent duplicate fields. > > =LDR 01548cam 2200397La 45{92}0 > =001 ocm42328427\ > =003 OCoLC > =005 20010526091201.0 > =006 m\\\\\\\\u\\\\\\\\ > =007 cr\cn- > =008 831108s1984\\\\inua\\\\sb\\\\001\0\eng\d > =010 \\$z 83048636 > =035 \\1234 (sirsi) > =035 \\1234 (sirsi) > =040 \\$aN{dollar}T$cN{dollar}T$dOCL > =020 \\$a0585000905 (electronic bk.) > =020 \\$z0253366062 > =020 \\$z0253203252 > =050 14$aNX180.F4$bL38 1984eb > =082 04$a700/.88042$219 > =049 [EMAIL PROTECTED] > =100 1\$aLauter, Estella,$d1940- > =245 10$aWomen as mythmakers$h[computer file] :$bpoetry and > visual art by twentieth-century women /$cEstella Lauter. > =260 \\$aBloomington :$bIndiana University Press,$cc1984. > =300 \\$axvii, 267 p. :$bill. ;$c24 cm. > =504 \\$aBibliography: p. 247-260. > =500 \\$aIncludes index. > =533 \\$aElectronic reproduction.$bBoulder, Colo. > :$cNetLibrary,$d1999.$nAvailable via the World Wide > Web.$nAvailable in multiple electronic file formats.$nAccess > may be limited to NetLibrary affiliated libraries. > =SUBJ \0$aFeminism and the arts. > =SUBJ \0$aWomen artists. > =SUBJ \0$aWomen poets. > =SUBJ \0$aArt and mythology. > =SUBJ \0$aArts, Modern$y20th century. > =655 \7$aElectronic books.$2local > =710 2\$aNetLibrary, Inc. > =776 1\$cOriginal$w(DLC) 83048636$w(OCoLC)10162146 > =856 4\$3Bibliographic record > display$uhttp://www.netlibrary.com/urlapi.asp?action=summary&v > =1&bookid=652$zAn electronic book accessible through the > World Wide Web; click for information > =994 \\$a92$bM7@ > > =LDR 01470cam 2200349La 45{92}0 > =001 ocm42328450\ > =003 OCoLC > =005 20010526091202.0 > =006 m\\\\\\\\u\\\\\\\\ > =007 cr\cn- > =008 980609s1998\\\\couab\\\sbf\\\001\0\eng\d > =010 \\$z 98026266 > =035 \\1234 (sirsi) > =035 \\1234 (sirsi) > =040 \\$aN{dollar}T$cN{dollar}T$dOCL > =020 \\$a0585001413 (electronic bk.) > =020 \\$z1555662307 > =050 14$aQB581$b.L66 1998eb > =082 04$a523.3$221 > =049 [EMAIL PROTECTED] > =100 1\$aLong, Kim. > =245 14$aThe moon book$h[computer file] :$bfascinating facts > about the magnificent, mysterious moon /$cKim Long ; science > advisor, Larry Sessions. > =250 \\$aRev. and expanded. > =260 \\$aBoulder, Colo. :$bJohnson Books,$cc1998. > =300 \\$a149 p. :$bill., maps ;$c22 cm. > =500 \\$aIncludes 1 errata sheet. > =504 \\$aIncludes bibliographical references (p. 132-133) and index. > =533 \\$aElectronic reproduction.$bBoulder, Colo. > :$cNetLibrary,$d1999.$nAvailable via the World Wide > Web.$nAvailable in multiple electronic file formats.$nAccess > may be limited to NetLibrary affiliated libraries. > =651 \0$aMoon$vHandbooks, manuals, etc. > =655 \7$aElectronic books.$2local > =710 2\$aNetLibrary, Inc. > =776 1\$cOriginal$w(DLC) 98026266$w(OCoLC)39299241 > =856 4\$3Bibliographic record > display$uhttp://www.netlibrary.com/urlapi.asp?action=summary&v > =1&bookid=140$zAn electronic book accessible through the > World Wide Web; click for information > =994 \\$a92$bM7@ > =994 \\$a92$bM7@ > > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>