[issue33784] hash collision in instances of ipaddress.ip_network

Mark Dickinson Thu, 07 Jun 2018 00:01:53 -0700


Mark Dickinson <dicki...@gmail.com> added the comment:


This shouldn't be a problem: there's no rule that says that different objects 
should have different hashes. Indeed, with a countable infinity of possible 
different hashable inputs, a deterministic hashing algorithm, and only finitely 
many outputs, such a rule would be a mathematical impossibility. For example:

>>> hash(-1) == hash(-2)
True

Are these hash collisions causing real issues in your code? While a single hash 
collision like this shouldn't be an issue, if there are many collisions within 
a single (non-artificial) dataset, that _can_ lead to performance issues.

Looking at the code, we could probably do a better job of making the hash 
collisions less predictable. The current code looks like:

    def __hash__(self):
        return hash(int(self.network_address) ^ int(self.netmask))

I'd propose hashing a tuple instead of using the xor. For example:

    def __hash__(self):
        return hash((int(self.network_address), int(self.netmask)))

Hash collisions would almost certainly still occur with this scheme, but they'd 
be a tiny bit less obvious and harder to find.

----------
nosy: +mark.dickinson

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33784>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue33784] hash collision in instances of ipaddress.ip_network

Reply via email to