As noted earlier I am preparing a contribution of Bloom Filter classes to
the collections module.  As part of this submission there are several
methods that operate on BitSets that are used as part  of Bloom Filter
manipulation and analysis.  My question is, should these be contributed as
Bloom Filter specific methods or would it be better to submit a BitSet
function library.

The methods in question are:
hammingDistance() = the cardinality (A xor B)
jaccardDistance()  = the 1 - jaccardSimilarity()
jaccardSimilarity() = cardinality(A xor B) / cardinality (A or B)
cosineDistance() = 1 - cosineSimilarity()
cosineSimilarity() = cardinality( A and B ) / (Sqrt( cardinality( A ) ) *
Sqrt( cardinality( B )))
estimatedLog = estimated log2 of the BitSet if considered a large unsigned
int.

Opinions requested.

Claude
--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to