I think I have this right, but before I run it on a 53-meg text file, I
want to make sure. On a test of about a dozen lines, it seems to work
just fine.

I need to sort a file that is  456,193 lines long. People were allowed
to enter a contest, but not more than once a day. Their names and the
date were logged to this file. Some people voted more than once a day,
on many days. Others, just once I day. I am getting rid of duplicates
for the same day, discarding the second and more votes of each person
per day.

The file looks as so:

7/17/0|First Last|xxxx Vinca
Cr|Charlotte|NC|28213|NA|[EMAIL PROTECTED]|name4
07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name1
07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name1
07/17/0|dd|dd|dd|NC|dd|dd|[EMAIL PROTECTED]|name5
07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name2
07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name2
07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|steve_park
07/17/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|steve_park
07/18/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|steve_park
007/18/0|M Settle|0000 Audubon
Dr.|Foley|AL|36535|555-555-5000|[EMAIL PROTECTED]|name6
07/18/0|d|d|d|NC|d|d|[EMAIL PROTECTED]
07/19/0|d|d|d|NC|d|d|[EMAIL PROTECTED]|name2
07/20/0|N Sipe|900 Way Cr|Charlotte|NC|28213|NA|[EMAIL PROTECTED]|name3

To strip out duplicate entries by someone on the same day, I am sorting
this file first by the email address, and then by the first field (yeah,
I know it should have been a date field, but the script apparently
wasn't y2k-compatible and it really doesn't matter anyway). I am doing
the following:

sort -t \| -k 8 -k 1 test.txt | uniq

It seems to work, so I am presuming it will work. Have I overlooked
anything? But will I run into problems given the huge size of this
file, as I said, 53-megs, sorting names from top to bottom each time? If
it is a problem, any advice appreciated. Is there another approach I 
should be taking?  Please email reply as well.

Gary




_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to