Feature Requests item #1701452, was opened at 2007-04-16 14:28 Message generated for change (Comment added) made by thomasda You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Thomas Dybdahl Ahle (thomasda) Assigned to: Gustavo Niemeyer (niemeyer) Summary: Feature request: Comparing regexps Initial Comment: It would be very nice with a function in the re module to compare two regular expressions and see how they overlap. Return value would perhaps be an a constant like DISJOINT, SUBSET etc. ---------------------------------------------------------------------- >Comment By: Thomas Dybdahl Ahle (thomasda) Date: 2007-04-18 16:33 Message: Logged In: YES user_id=1304417 Originator: YES I talked with the guy who wrote the ZZ comparator. """I can give you the source code under the GPL if you like. However, I think it would be difficult to port to python. It's 1100 lines of very SML-style mostly-uncommented code. Do you know SML? It's available at svn://mlton.org/mltonlib/trunk/ca/terpstra/regexp. I would expect credit for the algorithm. :-) Let me know if you succeed in porting it.""" I don't know any SML though. If anybody does or can write a python equaliant of the algorithm: """1. Parse the regular expression (in GNU regular expression syntax) 2. Convert that parse tree into an NFA 3. Convert the NFA into a DFA by an algorithm I invented (it's why I wrote this program; I thought of the algorithm and found it amusing to see it in action) For comparing regular expressions I take the following additional steps 4. Combine the two DFA's (with the cross product) 4a. For finding common words, I intersect them 4b. For finding words in A, but not B, I intersect A with the negated DFA of B 4c. ... 5. Find the minimal DFA corresponding to this intersection 6. Run a weighted depth-first search (similar to Dijkstra's) to find the least-weighted path from the initial state to an accepting state (the weighting makes it prefer human readable characters in the examples)""" It could be cool. Otherwise I can also try to learn sml and port it. ---------------------------------------------------------------------- Comment By: Thomas Dybdahl Ahle (thomasda) Date: 2007-04-17 09:51 Message: Logged In: YES user_id=1304417 Originator: YES I found this page with the function to do so: http://terpstra.ca/compare.html I also found this thread: http://bumppo.net/lists/fun-with-perl/1999/09/msg00000.html which discusses how to do this with some canonical formed expressions. A function like this would really be able to speed some applications up, which matches a lot of strings with a number of expressions. If you know which ones are disjoint, you can break the test when one test matches. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2007-04-16 22:43 Message: Logged In: YES user_id=80475 Originator: NO Can this be done in the existing implementation by comparing the racetrack diagrams (character transitions)? Thomas, do you know of anywhere this have been done (third-party modules or in other languages)? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1701452&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com