Re: on sorting things

Cameron Simpson Wed, 29 Jan 2020 19:09:25 -0800

On 20Dec2019 08:23, Chris Angelico <[email protected]> wrote:

On Fri, Dec 20, 2019 at 8:06 AM Eli the Bearded <*@eli.users.panix.com> wrote:

Consider a sort that first compares file size and if the same numberof

bytes, then compares file checksum. Any decently scaled real world
implementation would memoize the checksum for speed, but only work it out
for files that do not have a unique file size. The key method requires
it worked out in advance for everything.


But I see the key method handles the memoization under the hood for you,
so those simpler, more common sorts of sort get an easy to see benefit.


I guess that's a strange situation that might actually need this kind
of optimization, but if you really do have that situation, you can
make a magical key that behaves the way you want.

[... example implementation ...]

The classic situation matching Eli's criteria is comparing file treesfor equivalent files, for backup or synchronisation or hard linkingpurposes; I've a script which does exactly what he describes in terms ofcomparison (size, then checksum, but I checksum a short prefix beforedoing a full file checksum, so even more fiddly).

However, my example above isn't very amenable to sorts, because younever bother looking at checksums at all for files of different sizes.OTOH, I do sort the files by size before processing the checksum phases,letting one sync/reclaim the big files first for example - a policychoice.


Cheers,
Cameron Simpson <[email protected]>
--
https://mail.python.org/mailman/listinfo/python-list

Re: on sorting things

Reply via email to