On 24/09/2013 00:17, David Christensen wrote:
I'm looking for a hash function and a related function or operator such
that:
H(string1 . string2) = f(H(string1), H(string2))
H(string1 . string2) = H(string1) op H(string2)
where:
H() is the hash function
string1 is a string
string2 is a string
. is the string concatenation operator
f() is a function
op is a binary operator
> On 09/23/13 15:29, Rob Dixon wrote:
>> Could you explain the problem you're trying to solve?
> Writing scripts that look for duplicate, similar, and/or
> missing files.
I assume this is about paths and filenames. Have you considered an rsync
dry-run?
I also assume that you want to communicate as little as possible, so you
don't have supersets of all strings on all sides. (or it would become a
simple indexing problem)
I also assume that you are more interested in missing items, so
hash-value collisions are not a problem.
I also assume that the set of string1 is smaller than that of string2,
let's say 100 vs. 10000 different values.
For local deduplication, you would store paths as a directory name and a
parent-index:
#table=path
#columns=id,name,pid
1,"",0
2,"usr",1
3."local",2
And then have a list of filenames, and per filename in which path it exists.
#table=file
#columns=id,name
#table=detail
#columns=file_id,path_id,size,md5
For combining index values, use something like: ( i1 << N ) | i2.
(where N is the number of bits needed by i2)
I would not involve string concatenation: keep things separate once
separated. Use arrays.
Use (parts of) md5's of strings, if you need to compare to remote locations.
So best first explain *more* now about what you try to solve.
A single or multiple computers, connected or not?
Suppose 1 computer sends a concise email about what it has, such that
the other computer can reply with an even conciser email about what it
has, and what it needs. IOW: diff+patch.
--
Greetings, Ruud
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/