Re: Different key with the same digest in log compaction

2017-01-06 Thread radai
i just noticed my link didnt parse correctly. the formula for probability of collision is explained by googling "birthday paradox". if anything, we could further optimize this code to use a trivial hash map for cases where sizeOf(hash) > sizeOf(key) On Fri, Jan 6, 2017 at 10:00 AM, Colin McCabe w

Re: Different key with the same digest in log compaction

2017-01-06 Thread Colin McCabe
That's a fair point. The calls to Arrays.equals are comparing just the hashes, not the keys. Colin On Tue, Jan 3, 2017, at 17:15, radai wrote: > looking at the code (SkimpyOffsetMap.get/put) they both start with > hashInto(key, hash1) and then ignore key from that point on - so we're > not > usi

Re: Different key with the same digest in log compaction

2017-01-03 Thread radai
looking at the code (SkimpyOffsetMap.get/put) they both start with hashInto(key, hash1) and then ignore key from that point on - so we're not using the key unless im missing something? as for the probability of collision - it depends on the hash algo and the number of keys. if you configure it to

Re: Different key with the same digest in log compaction

2017-01-03 Thread Colin McCabe
Can you be a little bit clearer on why you think that different keys with the same digest value will be treated as the same key? SkimpyOffsetMap#get and SkimpyOffsetMap#put compare the key, not just the hash digest of the key. best, Colin On Wed, Dec 21, 2016, at 23:27, Renkai Ge wrote: > Hi,al

Re: Different key with the same digest in log compaction

2016-12-22 Thread radai
i think the code assumes that with a "good enough" hash function (and maybe few enough keys) the chance of such a collision is acceptably small to justify the savings of not keeping the keys in memory. On Wed, Dec 21, 2016 at 11:50 PM, Renkai wrote: > Hi, all: > > I am just learning the Kafka

Different key with the same digest in log compaction

2016-12-21 Thread Renkai
Hi, all: I am just learning the Kafka codebase, as what I saw in https://github.com/apache/kafka/blob/6ed3e6b1cb8a73b1f5f78926ccb247a8953a554c/core/src/main/scala/kafka/log/OffsetMap.scala#L43-L43 if different log keys have the same digest value, they will be treated as the same key in log c

Different key with the same digest in log compaction

2016-12-21 Thread Renkai Ge
Hi,all: I am just learning the kafka codebase, as what I saw in https://github.com/apache/kafka/blob/6ed3e6b1cb8a73b1f5f78926ccb247a8953a554c/core/src/main/scala/kafka/log/OffsetMap.scala#L43-L43 if different log keys have the same digest value, they will be treated as the same key in log compact