Bugs item #1366311, was opened at 2005-11-25 13:57 Message generated for change (Comment added) made by eric_noyau You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1366311&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Eric Noyau (eric_noyau) Assigned to: Nobody/Anonymous (nobody) Summary: SRE engine do not release the GIL Initial Comment: In a multi-threaded program that does lots of regular expression searching, some of them on very long strings with complex regex we've noticed that everything stops when a regular expression is searching. One of the issue is that the re engine does not release the interpreter lock while it is running. All the other threads are therefore blocked for the entire time it takes to do the regular expression search. See the thread in python-dev about it: http://mail.python.org/pipermail/python-dev/2005-November/058316.html ---------------------------------------------------------------------- >Comment By: Eric Noyau (eric_noyau) Date: 2005-11-28 14:11 Message: Logged In: YES user_id=1388768 Thanks for your comments. I've updated the patch to fix your issues, but without introducing a per-state object lock. What I did instead is to mark a state as not supporting concurrency when a scanner object creates it. So the GIL will not be released for scanners objects at all. For consistency match also release the GIL now, if possible. ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2005-11-25 21:38 Message: Logged In: YES user_id=4771 The patch looks good, but I wonder if it is safe. The SRE_STATE structure that SRE_SEARCH_INNER uses is potentially visible to the application-level Python code, via the (undocumented) scanner objects: >>> r = re.compile(r"hello") >>> s = r.scanner("big string in which to search") >>> s.search() <_sre.SRE_Match object at 0x12345678> Each call to s.search() continues the previous search with the same SRE_STATE. The problem with releasing the GIL as you do is that several threads could call s.search() concurrently, which would most probably crash CPython. This probably means that you need to add a lock in SRE_STATE and acquire it while searching, to serialize its usage. Of course, we should then be careful about what overhead this gives to applications that use regexps on a lot of small strings... Another note: for consistency, match() should also release the GIL if search() does. ---------------------------------------------------------------------- Comment By: Eric Noyau (eric_noyau) Date: 2005-11-25 14:02 Message: Logged In: YES user_id=1388768 I'm attaching a diff to this bug that remove this limitation if it sane to do so. If a search is done on a string or a unicode object (which by definition are immutable) the GIL is released and reacquired everytime a search is done. I've tested this patch in both a simple tests (start a thread with a greedy regex on a monstruous string and verify that the othe python threads are still active) and by running our internal application verifying that nothing is blocking anymore. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1366311&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com