New submission from Arnim Rupp <pyt...@rupp.de>:
Problem: hashlib only offers digest() and hexdigest() but the fastest way to work with hashes is as integer. The first thing loki does after getting the hashes is to convert them to int: md5, sha1, sha256 = generateHashes(fileData) md5_num=int(md5, 16) sha1_num=int(sha1, 16) sha256_num=int(sha256, 16) https://github.com/Neo23x0/Loki/blob/master/loki.py All the ~50000 hashes to compare are also converted to int after reading them from a file. The comparison is about twice as fast compared to hexdigest in strings because it uses just half the memory. (The use case here is to compare these 50,000 hashes to the hashes of all the 200,000 files on a system that gets scanned for malicious files.) Solution: Add decdigest() to hashlib which returns the int version of the hash. This has 2 advantages: 1. It saves the time for converting the hash to hex and back 2. Having decdigest() in the documentation inspires more programmers to work with hashes as int opposed to slow strings (where it's performance relevant.) Should be just few lines of code for each algorithm, I could do the PR. static PyObject * _sha3_shake_128_hexdigest(SHA3object *self, PyObject *arg) { PyObject *return_value = NULL; unsigned long length; if (!_PyLong_UnsignedLong_Converter(arg, &length)) { goto exit; } return_value = _sha3_shake_128_hexdigest_impl(self, length); https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Modules/_sha3/clinic/sha3module.c.h ---------- components: Library (Lib) messages: 385150 nosy: 2d4d priority: normal severity: normal status: open title: Feature request: Add decdigest() to hashlib type: performance versions: Python 3.10 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42942> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com