Jim Meyering wrote: > Bob Proulx wrote: > > If you want to print only the first of a unique line then this perl > > one-liner will do it. > > > > perl -lne 'print $_ if ! defined $a{$_}; $a{$_}=$_;' > > Thanks, but with large files, isn't it better to store not > the full line, but rather a constant? > > perl -lne 'print $_ if ! defined $seen{$_}; $seen{$_}=1'
Good point! I hadn't given it much thought since it usually runs so quickly in my usage that I never worried about it. > (actually, using "1" could be seen as misleading, since 0 or even undef > would also work) > > I think you can drop the "l". > I have a slight preference for this: > > perl -ne 'defined $seen{$_} or print; $seen{$_}=1' Refering to "print" v. "print $_" here I have never liked implicit use of $_ and so I tend to avoid it. At one time there was a push in the perl community to make all uses explicit. And as to whether to use the 'if (expr) { stmt }' or 'stmt if expr' or 'expr or stmt' forms is a matter of taste. Might as well discuss the one true indention and brace styles. :-) For one-liners I do tend to use short variables to keep the line length minimized. In order to compact a line I also sacrifice whitespace when required. But you have me thinking about conserving memory. If the file was large due to long lines then memory use would be proportionately large due to the key storage needs. This could be reduced by using a hash of the line as the storage key instead of the entire line. But the savings would be relative to the average line size. If the average line size was smaller than the hash size then this would increase memory use. perl -MDigest::MD5=md5 -lne '$m=md5($_); print $_ if ! defined $a{$m}; $a{$m}=1' If you are ever going to debug and print out the md5 value then substitute md5_hex for md5 to get a printable result. Bob