[issue24555] Python logic error when deal with re and muti-threading
New submission from bee13oy: Bug 0x01 is the main problem. t.start() t.join(timeout) In normal case, I run a while() in sub-thread, the main thread will get the control of the program after the sub-thread is timed out. But, in our POC, even the sub-thread timed out, the main thread still can't execute continue. After analyzing, I found the main thread trapped into an infinite loop like I described in the PDF. -- components: Regular Expressions files: python_logic_error.pdf messages: 246138 nosy: bee13oy, ezio.melotti, mrabarnett priority: normal severity: normal status: open title: Python logic error when deal with re and muti-threading type: behavior versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5 Added file: http://bugs.python.org/file39850/python_logic_error.pdf ___ Python tracker <http://bugs.python.org/issue24555> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24555] Python logic error when deal with re and muti-threading
bee13oy added the comment: #Python logic error when deal with re and muti-threading ##Bug Description When use re and multi-threading it will trigger the bug. Bug type: `Logic Error` Test Enviroment: * `Windows 7 SP1 x64 + python 3.4.3` * `Linux kali 3.14-kali1-amd64 + python 2.7.3 ` -Normal Case - 1. main-thread: join(timeout), wait for sub-thread finished - - 2. sub-thread: while(1), an infinite loop- Test Code: #!/usr/bin/python __author__ = 'bee13oy' import re import threading timeout = 2 source = "(.*(.)?)*bcd\\t\\n\\r\\f\\a\\e\\071\\x3b\\$\?caxyz" def run(source): while(1): print("test1") def handle(): try: t = threading.Thread(target=run,args=(source,)) t.setDaemon(True) t.start() t.join(timeout) print("thread finished...It's an normal case!\n") except: print("exception ...\n") handle() + -Bug Case- - 1. main-thread: join(timeout), wait for sub-thread finished - - 2. sub-thread: 1)we construct the special pattern "(.*(.)?)*bcd\\t\\n\\r\\f\\a\\e\\071\\x3b\\$\?caxyz" - 2)regexp.search() can't deal with it, and hang up - 3)join(timeout), and the sub-thread was over time, at this time, main-thread should have got- the control of the program. But it didn't. - ------ POC: #!/usr/bin/python __author__ = 'bee13oy' import re import os import threading timeout = 2 source = "(.*(.)?)*bcd\\t\\n\\r\\f\\a\\e\\071\\x3b\\$\?caxyz" def run(source): regexp = re.compile(r''+source+'') sgroup = regexp.search(source) def handle(): try: t = threading.Thread(target=run,args=(source,)) t.setDaemon(True) t.start() t.join(timeout) print("finished...\n") except: print("exception ...\n") handle() + - Bug Analyze - When we use Python multithreading, and use `join(timeout)` to wait until the **thread terminates** or **timed out**. 1. In normal case, I run a while() in sub-thread, the main thread will get the control of the program after the sub-thread is timed out. 2. In our POC, even the sub-thread timed out, the main thread still can't execute continue. After analyzing, I found the main thread trapped into an infinite loop. At first, it will run into the sub-thread, but it can't end normally. At this time, join(timeout) will wait for the sub-thread return or timed out, and try to call timed out function in order that main thread can get the control of the program. The bug is that the sub-thread was into an infinite loop and the main-thread was into an infinite loop too, which causes the program to be hang up. By analyzing the source code of Python, we found that: - sub-thread is into an infinite loop (code block 0) - main-thread is into an infinite loop (code block 1) -code block 0-- - the following code is where sub-thread trapped into an infinite loop: - --- the following code is where the sub-thread trapped into an **infinite loop**: ``` LOCAL(Py_ssize_t) SRE(match)(SRE_STATE* state, SRE_CODE* pattern, int match_all) { SRE_CHAR* end = (SRE_CHAR *)state->end; Py_ssize_t alloc_pos, ctx_pos = -1; Py_ssize_t i, ret = 0; Py_ssize_t jump; unsigned int sigcount=0; SRE(match_context)* ctx; SRE(match_context)* nextctx; TRACE(("|%p|%p|ENTER\n", pattern, state->ptr)); DATA_ALLOC(SRE(match_context), ctx); ctx->last_ctx_pos = -1; ctx->jump = JUMP_NONE; ctx->pattern = pattern; ctx->match_all = match_all; ctx_pos = alloc_pos; . /* Cycle code which will never return*/ for (;;) { ++sigcount; if ((0 == (sigcount &
[issue24566] Unsigned Integer Overflow in sre_lib.h
New submission from bee13oy: I found an Unsigned Integer Overflow in sre_lib.h. Tested on En Windows 7 x86 + Python 3.4.3 / Python 3.5.0b2 Crash: -- (1a84.16b0): Access violation - code c005 (!!! second chance !!!) eax=0002 ebx=0038f40c ecx=0002 edx=0526cbb8 esi=83e0116b edi=c3e011eb eip=58bcfa53 esp=0038f384 ebp=0038f394 iopl=0 nv up ei ng nz na po cy cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010283 python35+0x1fa53: 58bcfa53 380ecmp byte ptr [esi],cl ds:002b:83e0116b=?? code: -- 58bcfa3d 8b4a04 mov ecx,dword ptr [edx+4] 58bcfa40 0fb6c1 movzx eax,cl 58bcfa43 3bc1cmp eax,ecx 58bcfa45 0f859300jne python35+0x1fade (58bcfade) 58bcfa4b 3bf7cmp esi,edi 58bcfa4d 0f838b00jae python35+0x1fade (58bcfade) 58bcfa53 380ecmp byte ptr [esi],cl ds:002b:83e0116b=?? 58bcfa55 0f858300jne python35+0x1fade (58bcfade) stack: -- 0:000> kb ChildEBP RetAddr Args to Child WARNING: Stack unwind information not available. Following frames may be wrong. 0038f394 58bcfedf 4080 0038f40c 83e0116c python35+0x1fa53 0038f3c0 58bd0f58 06016508 0526cb60 python35+0x1fedf 0038f400 58bd5039 58e40c58 83e0116b 03e01158 python35+0x20f58 0038f480 58bd76b2 7fff python35+0x25039 0038f4a4 58c925cf 0526cb60 0528a4d0 python35+0x276b2 0038f4c4 58cf3633 06016508 0528a4d0 python35!PyCFunction_Call+0x2f 0038f4f8 58cf0b05 05840f90 03e0ab90 0001 python35!PyEval_GetFuncDesc+0x373 0038f570 58cf3791 03e0ab90 0001 python35!PyEval_EvalFrameEx+0x22d5 0038f594 58cf3692 0001 0001 python35!PyEval_GetFuncDesc+0x4d1 0038f5c8 58cf0b05 03e08de0 0012e850 python35!PyEval_GetFuncDesc+0x3d2 0038f640 58cf25bb 0012e850 065feff0 python35!PyEval_EvalFrameEx+0x22d5 0038f68c 58d29302 03dcfaa8 python35!PyEval_EvalFrameEx+0x3d8b 0038f6c8 58d29195 03dcfaa8 03dcfaa8 0038f790 python35!PyRun_FileExFlags+0x1f2 0038f6f4 58d2820a 05994fc8 052525a8 0101 python35!PyRun_FileExFlags+0x85 0038f738 58bfe9f7 05994fc8 052525a8 0001 python35!PyRun_SimpleFileExFlags+0x20a 0038f764 58bff32b 0038f790 5987b648 5987cc94 python35!Py_hashtable_copy+0x5e17 0038f808 1c6f11df 0003 05796f70 05210f50 python35!Py_Main+0x90b source code: LOCAL(Py_ssize_t) SRE(search)(SRE_STATE* state, SRE_CODE* pattern) { SRE_CHAR* ptr = (SRE_CHAR *)state->start; SRE_CHAR* end = (SRE_CHAR *)state->end; Py_ssize_t status = 0; Py_ssize_t prefix_len = 0; Py_ssize_t prefix_skip = 0; SRE_CODE* prefix = NULL; SRE_CODE* charset = NULL; SRE_CODE* overlap = NULL; int flags = 0; if (pattern[0] == SRE_OP_INFO) { /* optimization info block */ /* <1=skip> <2=flags> <3=min> <4=max> <5=prefix info> */ flags = pattern[2]; if (pattern[3] > 1) { /* adjust end point (but make sure we leave at least one character in there, so literal search will work) */ end -= pattern[3] - 1; if (end <= ptr) end = ptr; } ... } ... } else /* general case */ while (ptr <= end) { TRACE(("|%p|%p|SEARCH\n", pattern, ptr)); state->start = state->ptr = ptr++; status = SRE(match)(state, pattern, 0); if (status != 0) break; } } SRE(count)(SRE_STATE* state, SRE_CODE* pattern, Py_ssize_t maxcount) { SRE_CODE chr; SRE_CHAR c; SRE_CHAR* ptr = (SRE_CHAR *)state->ptr; SRE_CHAR* end = (SRE_CHAR *)state->end; Py_ssize_t i; /* adjust end */ if (maxcount < end - ptr && maxcount != SRE_MAXREPEAT) end = ptr + maxcount; ... #if SIZEOF_SRE_CHAR < 4 if ((SRE_CODE) c != chr) ; /* literal can't match: doesn't fit in char width */ else #endif while (ptr < end && *ptr == c) // crash here, ptr points to an unreadable memory. ptr++; break; } poc code: ---cut import re pattern = "([\\2]{1073741952})" regexp = re.compile(r''+pattern+'') sgroup = regexp.search(pattern) ---cut--- 1.) In SRE(search), pattern[3] is equal to 1073741952 (0x40080). What's more, the program doesn't limit the max size, which causes the end pointer is pointed to an invalid and large address( bigger than ptr). 2.) Then program run while (ptr <= end) { state->start = state->ptr = ptr++,..} , but state->end pointer is the orignal value.3.) After a while's running, it comes to SRE(count) and adjust the
[issue24566] Unsigned Integer Overflow in sre_lib.h
bee13oy added the comment: I didn't test that path, I just found this bug in python3.4.3 by fuzzing re module, and tested Python 3.5.0b2 on windows 7 x86, It has the same problem. -- ___ Python tracker <http://bugs.python.org/issue24566> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24566] Unsigned Integer Overflow in sre_lib.h
bee13oy added the comment: I have just tested python 2.7.10 on Windows 7 x86 with the poc code, it will also result in python crash. -- ___ Python tracker <http://bugs.python.org/issue24566> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24566] Unsigned Integer Overflow in sre_lib.h
bee13oy added the comment: I tested this path, and It really fixed this issue. But I'm wondering Python 2.7.10 was released at May 23, 2015, and this path was created at March 22,2015. So does it mean, Python 2.7.10/3.5.0b2 was compiled and released without applying this path? -- ___ Python tracker <http://bugs.python.org/issue24566> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24566] Unsigned Integer Overflow in sre_lib.h
bee13oy added the comment: Thank you. I got it. 2015-07-06 18:53 GMT+08:00 Serhiy Storchaka : > > Serhiy Storchaka added the comment: > > Yes, this patch was not applied because it had no visible effect on Linux. > Now, with your report, there is a case on Windows. > > -- > > ___ > Python tracker > <http://bugs.python.org/issue24566> > ___ > -- ___ Python tracker <http://bugs.python.org/issue24566> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com