New submission from Kevin Mills <kevin.mills226+bugs.pyt...@gmail.com>:
Sorry for the vague title. I'm not sure how to succinctly describe this issue. The following code: ``` with open("data.bin", "rb") as f: data = f.read() base = 15403807 * b'\xff' longer = base + b'\xff' print(data.find(base)) print(data.find(longer)) ``` Always hangs on the second call to find. It might complete eventually, but I've left it running and never seen it do so. Because of the structure of data.bin, it should find the same position as the first call to find. The first call to find completes and prints near instantly, which makes the pathological performance of the second (which is only searching for one b"\xff" more than the first) even more mystifying. I attempted to upload the data.bin file I was working with as an attachment here, but it failed multiple times. I assume it's too large for an attachment; it's a 32MiB file consisting only of 00 bytes and FF bytes. Since I couldn't attach it, I uploaded it to a gist. I hope that's okay. https://gist.github.com/Zeturic/7d0480a94352968c1fe92aa62e8adeaf I wasn't able to trigger the pathological runtime behavior with other sequences of bytes, which is why I uploaded it in the first place. For example, if it is randomly generated, it doesn't trigger it. I've verified that this happens on multiple versions of CPython (as well as PyPy) and on multiple computers / operating systems. ---------- messages: 378197 nosy: Zeturic priority: normal severity: normal status: open title: bytes.find consistently hangs in a particular scenario type: performance versions: Python 3.8, Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41972> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com