Re: Different key with the same digest in log compaction

2017-01-06 Thread radai
i just noticed my link didnt parse correctly. the formula for probability of collision is explained by googling "birthday paradox". if anything, we could further optimize this code to use a trivial hash map for cases where sizeOf(hash) > sizeOf(key) On Fri, Jan 6, 2017 at 10:00 AM, Colin McCabe w

Re: Different key with the same digest in log compaction

2017-01-06 Thread Colin McCabe
That's a fair point. The calls to Arrays.equals are comparing just the hashes, not the keys. Colin On Tue, Jan 3, 2017, at 17:15, radai wrote: > looking at the code (SkimpyOffsetMap.get/put) they both start with > hashInto(key, hash1) and then ignore key from that point on - so we're > not > usi

Re: Different key with the same digest in log compaction

2017-01-03 Thread radai
looking at the code (SkimpyOffsetMap.get/put) they both start with hashInto(key, hash1) and then ignore key from that point on - so we're not using the key unless im missing something? as for the probability of collision - it depends on the hash algo and the number of keys. if you configure it to

Re: Different key with the same digest in log compaction

2017-01-03 Thread Colin McCabe
Can you be a little bit clearer on why you think that different keys with the same digest value will be treated as the same key? SkimpyOffsetMap#get and SkimpyOffsetMap#put compare the key, not just the hash digest of the key. best, Colin On Wed, Dec 21, 2016, at 23:27, Renkai Ge wrote: > Hi,al

Re: Different key with the same digest in log compaction

2016-12-22 Thread radai
i think the code assumes that with a "good enough" hash function (and maybe few enough keys) the chance of such a collision is acceptably small to justify the savings of not keeping the keys in memory. On Wed, Dec 21, 2016 at 11:50 PM, Renkai wrote: > Hi, all: > > I am just learning the Kafka