Re: a program to delete duplicate files

John Machin Sat, 12 Mar 2005 00:55:05 -0800

David Eppstein wrote:
> In article <[EMAIL PROTECTED]>,
>  Patrick Useldinger <[EMAIL PROTECTED]> wrote:
>
> > > Well, but the spec didn't say efficiency was the primary
criterion, it
> > > said minimizing the number of comparisons was.
> >
> > That's exactly what my program does.
>
> If you're doing any comparisons at all, you're not minimizing the
number
> of comparisons.


Gah. You two should be lawyers. I don't know why you're arguing about
whatever you are arguing about.

Just look at the efficiency of processing N files of the same size S,
where they differ after d bytes: [If they don't differ, d = S]

PU: O(Nd) reading time, O(Nd) data comparison time [Actually (N-1)d
which is important for small N and large d].

Hashing method: O(NS) reading time, O(NS) hash calc time

1. d <= S
2. Anyone have a usable hash function that's faster than string
comparison?
3. Anyone have an answer to PU's request for advice where his algorithm
could be improved by using a hash method?

Temporary injunction in favour of PU.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a program to delete duplicate files

Reply via email to