New submission from Kevin Mills <kevin.mills226+bugs.pyt...@gmail.com>:

Sorry for the vague title. I'm not sure how to succinctly describe this issue.

The following code:

```
with open("data.bin", "rb") as f:
    data = f.read()

base = 15403807 * b'\xff'
longer = base + b'\xff'

print(data.find(base))
print(data.find(longer))
```

Always hangs on the second call to find.

It might complete eventually, but I've left it running and never seen it do so. 
Because of the structure of data.bin, it should find the same position as the 
first call to find.

The first call to find completes and prints near instantly, which makes the 
pathological performance of the second (which is only searching for one b"\xff" 
more than the first) even more mystifying.

I attempted to upload the data.bin file I was working with as an attachment 
here, but it failed multiple times. I assume it's too large for an attachment; 
it's a 32MiB file consisting only of 00 bytes and FF bytes.

Since I couldn't attach it, I uploaded it to a gist. I hope that's okay.

https://gist.github.com/Zeturic/7d0480a94352968c1fe92aa62e8adeaf

I wasn't able to trigger the pathological runtime behavior with other sequences 
of bytes, which is why I uploaded it in the first place. For example, if it is 
randomly generated, it doesn't trigger it.

I've verified that this happens on multiple versions of CPython (as well as 
PyPy) and on multiple computers / operating systems.

----------
messages: 378197
nosy: Zeturic
priority: normal
severity: normal
status: open
title: bytes.find consistently hangs in a particular scenario
type: performance
versions: Python 3.8, Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41972>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to