Bryan Olson wrote: > Licheng Fang wrote: > > Oh, please do have a look at the second link I've posted. There's a > > table comparing the regexp engines. The engines you've tested probably > > all use an NFA implementation. > > Unfortunately, the stuff about NFA's is wrong. Friedl's awful > book was the first time I saw this confusion about what NFA is; > I don't know if he originated the mess or just propagated it. > > "Nondeterministic finite automata" is well defined in theory > of computation. The set of languages accepted by NFA's is > exactly the same as the set accepted by DFA's. > > What Python uses is search-and-backtrack. Unfortunately such > engines don't have much theory behind them, and it's hard to > reason generally about what they do. > > > -- > --Bryan
Thanks for the valuable information. Indeed, when I read the pages, I was a little confused about what it meant by 'NFA'. But I faintly felt, there might be an indirect, and not not very exact mapping from the search-and-backtrack strategy to NFAs in the sense of computer science, e.g. a state machine with the capability to be in several states at one time. Say, when reading 'oneselfsufficient', the engine goes along the NFA first to state 1, and then faces the choice between one self, none selfsufficient, none (start)------->((1))------------>((2))----------------------->((3)) 1) matching 'self', 2) going on to state 2 without matching anything, or 3) just give 'one' as the matching result because state 1 is already a terminal state In such situations it always chooses the greedy way. To match more, it goes to the state 2, munching 'self'. And now it's left with only 'sufficient'. Here it's choices are: 1) matching nothing and going to state 3 2) just give 'oneself' as result because state 2 is also a terminal state Again it's greedy, going on to state 3, in hope of matching more. But here the pattern comes to an end, represented by state 3 as a terminal state. So the engine gives 'oneself' as result and forgets about its previously unexplored possibilities, because it only performs backtrack when a match cannot be found. I think if the backtrack is carried out in an exaustive way, we may say the engine trys every possibility on the NFA, though it's not an NFA itself. -- http://mail.python.org/mailman/listinfo/python-list