Chris Angelico wrote:
hash_to_filename = defaultdict(list)
for fn in files:
# Step 1: Hash every file.
hash = calculate_hash(fn)
# Step 2: Locate all pairs of files with identical hashes
hash_to_filename[hash].append(fn)
I think you can avoid hashing the files altogether.
Firs
On Tue, Feb 9, 2016 at 3:13 PM, Steven D'Aprano
wrote:
> On Tuesday 09 February 2016 02:11, Chris Angelico wrote:
>
>> That's fine for comparing one file against one other. He started out
>> by saying he already had a way to compare files for equality. What he
>> wants is a way to capitalize on th
On Tuesday 09 February 2016 02:11, Chris Angelico wrote:
> That's fine for comparing one file against one other. He started out
> by saying he already had a way to compare files for equality. What he
> wants is a way to capitalize on that to find all the identical files
> in a group. A naive appro
On Tue, Feb 9, 2016 at 1:49 AM, Random832 wrote:
> On Sun, Feb 7, 2016, at 20:07, Cem Karan wrote:
>> a) Use Chris Angelico's suggestion and hash each of the files (use the
>> standard library's 'hashlib' for this). Identical files will always have
>> identical hashes, but there may be fa
On Sun, Feb 7, 2016, at 20:07, Cem Karan wrote:
> a) Use Chris Angelico's suggestion and hash each of the files (use the
> standard library's 'hashlib' for this). Identical files will always have
> identical hashes, but there may be false positives, so you'll need to verify
> that files t
Às 21:46 de 07-02-2016, Paulo da Silva escreveu:
> Hello!
>
> This may not be a strict python question, but ...
>
> Suppose I have already a class MyFile that has an efficient method (or
> operator) to compare two MyFile s for equality.
>
> What is the most efficient way to obtain all sets of eq
On Feb 7, 2016, at 4:46 PM, Paulo da Silva
wrote:
> Hello!
>
> This may not be a strict python question, but ...
>
> Suppose I have already a class MyFile that has an efficient method (or
> operator) to compare two MyFile s for equality.
>
> What is the most efficient way to obtain all sets
On 2016-02-08 00:05, Paulo da Silva wrote:
> Às 22:17 de 07-02-2016, Tim Chase escreveu:
>> all_files = list(generate_MyFile_objects())
>> interesting = [
>> (my_file1, my_file2)
>> for i, my_file1
>> in enumerate(all_files, 1)
>> for my_file2
>> in all_files[i:]
>> if m
Às 22:17 de 07-02-2016, Tim Chase escreveu:
> On 2016-02-07 21:46, Paulo da Silva wrote:
...
>
> If you the MyFile objects can be unique but compare for equality
> (e.g. two files on the file-system that have the same SHA1 hash, but
> you want to know the file-names), you'd have to do a paired se
On 2016-02-07 21:46, Paulo da Silva wrote:
> Suppose I have already a class MyFile that has an efficient method
> (or operator) to compare two MyFile s for equality.
>
> What is the most efficient way to obtain all sets of equal files (of
> course each set must have more than one file - all single
On 7 Feb 2016 21:51, "Paulo da Silva"
wrote:
>
> Hello!
>
> This may not be a strict python question, but ...
>
> Suppose I have already a class MyFile that has an efficient method (or
> operator) to compare two MyFile s for equality.
>
> What is the most efficient way to obtain all sets of equal
On Mon, Feb 8, 2016 at 8:46 AM, Paulo da Silva
wrote:
> Hello!
>
> This may not be a strict python question, but ...
>
> Suppose I have already a class MyFile that has an efficient method (or
> operator) to compare two MyFile s for equality.
>
> What is the most efficient way to obtain all sets of
Hello!
This may not be a strict python question, but ...
Suppose I have already a class MyFile that has an efficient method (or
operator) to compare two MyFile s for equality.
What is the most efficient way to obtain all sets of equal files (of
course each set must have more than one file - all
13 matches
Mail list logo