On 20Dec2019 08:23, Chris Angelico <ros...@gmail.com> wrote:
On Fri, Dec 20, 2019 at 8:06 AM Eli the Bearded <*@eli.users.panix.com> wrote:
Consider a sort that first compares file size and if the same number
of
bytes, then compares file checksum. Any decently scaled real world
implementation would memoize the checksum for speed, but only work it out
for files that do not have a unique file size. The key method requires
it worked out in advance for everything.
But I see the key method handles the memoization under the hood for you,
so those simpler, more common sorts of sort get an easy to see benefit.
I guess that's a strange situation that might actually need this kind
of optimization, but if you really do have that situation, you can
make a magical key that behaves the way you want.
[... example implementation ...]
The classic situation matching Eli's criteria is comparing file trees
for equivalent files, for backup or synchronisation or hard linking
purposes; I've a script which does exactly what he describes in terms of
comparison (size, then checksum, but I checksum a short prefix before
doing a full file checksum, so even more fiddly).
However, my example above isn't very amenable to sorts, because you
never bother looking at checksums at all for files of different sizes.
OTOH, I do sort the files by size before processing the checksum phases,
letting one sync/reclaim the big files first for example - a policy
choice.
Cheers,
Cameron Simpson <c...@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list