I think you've understood correctly. Back references mostly aren't there. Greedy operators aren't there. For back references, this may be due to philosophical reservations; I have a few myself. For greedy operators, I suspect it's more because noone has cared enough to do it. It wouldn't be too hard, as Russ' article says. If someone is going to to this I would suggest going all the way and implementing tags. See http://laurikari.net/ville/spire2000-tnfa.ps.
> > well reading the code would be a travesty. it's curious > > that neither the sam paper nor regexp(6) mentions > > submatches. maybe i missed them. > > > > sed -n 's:.*(KRAK[A-Z]+*) +([a-zA-Z]+).*:\2, \1:gp' </lib/volcanoes > > - erik > > Ok, so despite the documentation, some submatch tracking is there. > But in all (?) your examples, as well as in the scripts you mentioned, > this tracking is exclusively used with the s command (which is said to > be unnecessary at least in sam/acme). If I try sth. like > /( b(.)b)/a/\1\2/ > on > bla blb 56 > I get > bla blb\1\2 56 > which is not quite what I want... How then? (I'd like to get 'bla blblblb 56' > ) > > Further, in R. Cox's text (http://swtch.com/~rsc/regexp/regexp1.html) > he claims that all nice features except for backreferences can be > implemented with Thomson's NFA algorithm. And even the backreferences > can be handled gracefully somehow. That is: ALL: non-greedy operators, > generalized assertions, counted repetitions, character classes CAN be > processed using the fast algorithm. Why then we don't have it? I once > wrote a program in python and was pretty happy to have non-greedy > operators and lookahead assertions on hand. Should I hadn't had those, > I probably wouldn't have been able to write it (nicely). > > Ruda > -- John Stalker School of Mathematics Trinity College Dublin tel +353 1 896 1983 fax +353 1 896 2282