Re: [9fans] non greedy regular expressions

2008-12-11 Thread Rudolf Sykora
> Here's a version that works on your example: > > 0k > ,x/ABC/+#0;/CBA|EFG/{ >',.-#0d >.+#0??d >.+#0k > } > > -Derek Thanks. This is what I wanted to see... Ruda

Re: [9fans] non greedy regular expressions

2008-11-30 Thread Yard Ape
"Rudolf Sykora" wrote: > I have a file say like this > > ABC asassadfasdf asdfasdf asdfasdf CBA hhjjioioioi > sodifs > sdfsd > ABC > dasdfas aasdfa > njnjn CBA > > and I want to get > > ' asassadfasdf asdfasdf asdfasdf ' > 'dasdfas aasdfa' > 'njnjn' > > ...i.e. delimited with AB

Re: [9fans] non greedy regular expressions

2008-10-28 Thread Eris Discordia
Thanks for the explanations. The lowlife learns a bit or two :-) --On Tuesday, October 28, 2008 2:51 PM + "Brian L. Stuart" <[EMAIL PROTECTED]> wrote: > This guy seems to blur the distinctions here. His discussion He doesn't. If one reads the whole section part of which was quoted one w

Re: [9fans] non greedy regular expressions

2008-10-28 Thread Eris Discordia
It is merely the traditional POSIX flavor. Some people like that flavor, some don't. Understandable. It is more that Perl simply was never part of the picture for the people who develop(ed) and use(d) Plan 9. It's like asking why the paper on the Plan 9 C compiler doesn't state that C++ clas

Re: [9fans] non greedy regular expressions

2008-10-28 Thread Brian L. Stuart
> > This guy seems to blur the distinctions here. His discussion > > He doesn't. If one reads the whole section part of which was quoted one > will see that he clearly states DFA and NFA are theoretically equivalent, > but then goes on to explain that DFA and NFA _implementations_ are not > id

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Aharon Robbins
> > As other mails have pointed out, anything that isn't leftmost longest > > has weird semantics. Non-greedy operators are mostly syntactic sugar. > > Is (leftmost-longest + all-greedy operators) syntactic salt then? It is merely the traditional POSIX flavor. Some people like that flavor, some

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Eris Discordia
As other mails have pointed out, anything that isn't leftmost longest has weird semantics. Non-greedy operators are mostly syntactic sugar. Is (leftmost-longest + all-greedy operators) syntactic salt then? Not in the least. The Plan 9 regexp library in fact gives you close to the same nirvana

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Eris Discordia
First of all, thanks for the explanation. It's above my head, but thanks anyway. This guy seems to blur the distinctions here. His discussion He doesn't. If one reads the whole section part of which was quoted one will see that he clearly states DFA and NFA are theoretically equivalent, bu

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Eris Discordia
there's a reason they're not called regularly expressions. As explained in the post by Brian L. Stuart it's a matter of "grammar" :-P (if this were the definition, an expression's regularlyness would depend on the target text, would it not?) Yes, and that _would_ be why you wouldn't craft a

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Aharon Robbins
> > GNU grep takes a simple but effective approach. It uses a DFA when > > possible, reverting to an NFA when backreferences are used. GNU awk does > > something similar---it uses GNU grep's fast shortest-leftmost DFA engine > > for simple "does it match" checks, and reverts to a different engine f

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Brian L. Stuart
> The set of "big books on regular expressions" includes Jeffrey Friedl's > "Mastering Regular Expressions" that happens to contain a chapter by the > title "NFA, DFA, and POSIX" wherein he says: > > > DFA Speed with NFA Capabilities: Regex Nirvana? This guy seems to blur the distinctions here.

Re: [9fans] non greedy regular expressions

2008-10-27 Thread erik quanstrom
>> practical application. now there are big books on `regular expressions' >> mainly because they are no longer regular but a big collection of ad-hoc > > I thought they were "regular" because they "regularly" occurred in the > target text. Turns out other interpretations are possible. Though, m

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Eris Discordia
practical application. now there are big books on `regular expressions' mainly because they are no longer regular but a big collection of ad-hoc I thought they were "regular" because they "regularly" occurred in the target text. Turns out other interpretations are possible. Though, mine has t

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Charles Forsyth
>Both systems are complex enough that essentially no one completely understands >them. this touches on an important point. the first introduction of regular expressions to editors was great, because it took some formal language theory and made it useful in an `every day' way. conversely, the th

Re: [9fans] non greedy regular expressions

2008-10-27 Thread Rudolf Sykora
> Leftmost-first matching is difficult to explain. > When POSIX had to define the rules for regexps, > they chose to define them as leftmost-longest > even though all the implementations were leftmost-first, > because describing leftmost-first precisely was too > complex. > > Leftmost-first matchin

Re: [9fans] non greedy regular expressions

2008-10-26 Thread Russ Cox
As I explained in an earlier post, your suggested > /ABC(.*?)CBA/ is less robust than Charles's spelled-out version, since yours doesn't handle nested expressions well. That's a serious enough limitation that your scenario stops being a compelling argument for leftmost-first matching and non-gree

Re: [9fans] non greedy regular expressions

2008-10-26 Thread Rob Pike
The ability to put \1 in the right hand side of a substitution was done by jason@ at the Uni of Sydney, but after the Sam papers were published. It was a welcome feature that added special functionality to the 's' command within Sam. (Ed(1) had the feature, within its limited regexps, long before,

Re: [9fans] non greedy regular expressions

2008-10-26 Thread Eris Discordia
I loved this thread. Thanks everyone. Thanks Rudolf Sykora.

Re: [9fans] non greedy regular expressions

2008-10-26 Thread John Stalker
> Now. If the leftmost-longest match is usable for my problem, I am fine > with C + regexp(6). If not I only see the possibility to use > perl/python nowadays (if I don't want to go mad like above). There is another option: yacc. I'm not saying it's simpler than perl or python, but it's not much

Re: [9fans] non greedy regular expressions

2008-10-25 Thread Steve Simon
> ...I have started to like to use as few tools as > possible. I entirely agree, there is too much to learn and you have to be selective; however my selection is: sed, awk, C and sam (for impromptu, well, editing). I cannot really comment directly on gready operators, I have never knowingly us

Re: [9fans] non greedy regular expressions

2008-10-25 Thread Rudolf Sykora
2008/10/25 Tom Simons <[EMAIL PROTECTED]>: > Is awk available? This worked for me, but it's not on Plan9. It does copy > the newline after the 2nd "ABC" (I wasn't sure if leading or all blank lines > should be deleted). > $ awk 'BEGIN {RS = "ABC"; FS = "CBA"}NR == 1 {next}{print $1}' a.data To t

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Tom Simons
Is awk available? This worked for me, but it's not on Plan9. It does copy the newline after the 2nd "ABC" (I wasn't sure if leading or all blank lines should be deleted). $ cat a.data dflkdl dlkrwo3je4ogjmdmxd ABC asassadfasdf asdfasdf asdfasdf CBA hhjjioioioi sodifs sdfsd ABC da

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Charles Forsyth
>i didn't spend any time deciding whether there was a better way >to express the trailing word delimiter. it's too late. [EMAIL PROTECTED] meanwhile pointed out one way: change the words to otherwise unused special characters

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Charles Forsyth
although it can be satisfying to do things in one instruction (one command) i confess i often find it quicker to split up the problem in sam or acme: 1. delete everything not between delimiters ,y/ABC([^C]|C[^B]|CB[^A]|\n)+CBA/d 2. delete the delimeters ,x/ABC|CBA/d 3. look to deci

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Gabriel Diaz Lopez de la Llave
hello using sed and only one reg-exp is mandatory? cat t.txt| sed 's/(ABC | CBA)/ \n\1\n /g' | awk '/ABC/,/CBA/' | grep - v 'ABC|CBA' that's a naive and simple approach, but i can't see why you need to use just one reg-exp and just one sed. May be i missed something through the thread :-

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rudolf Sykora
> doesn't s/ABC(the_interesting_part)CBA/x/g work for you? > maybe i don't understand the example. if so, could you explain? > > - erik I think not. I have a file say like this ABC asassadfasdf asdfasdf asdfasdf CBA hhjjioioioi sodifs sdfsd ABC dasdfas aasdfa njnjn CBA and I wan

Re: [9fans] non greedy regular expressions

2008-10-24 Thread erik quanstrom
> Ok, I finally see the point... thanks. > Only one last question: So is there any simple way (using existing > regexps) to find 'the interesting parts' in my last example?: > ABCthe_interesting_partCBA blabla bla ABCthe_interesting_partCBA ... > ...i.e. delimited with ABC from the left and CBA (or

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rudolf Sykora
> Greedy leftmost-first is different from leftmost-longest. > Search for /a*(ab)?/ in "ab". The leftmost-longest match > is "ab", but the leftmost-first match (because of the > greedy star) is "a". In the leftmost-first case, the greediness > of the star caused an overall short match. Ok, I fina

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Russ Cox
> I thought greedy=leftmost-longest, while non-greedy=leftmost-first: Greedy leftmost-first is different from leftmost-longest. Search for /a*(ab)?/ in "ab". The leftmost-longest match is "ab", but the leftmost-first match (because of the greedy star) is "a". In the leftmost-first case, the gree

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rudolf Sykora
2008/10/24 Charles Forsyth <[EMAIL PROTECTED]>: >>If I try sth. like >>/( b(.)b)/a/\1\2/ >>on >>bla blb 56 >>I get >>bla blb\1\2 56 >>which is not quite what I want... How then? (I'd like to get 'bla blblblb 56') > > echo bla blb 56 | sed 's/( b(.)b)/&\1\2/' > bla blb blbl 56 > > similarly use `s'

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rudolf Sykora
> In that model, it is not accurate to describe the * + ? > operators as greedy or not. None of them is working > toward any goal other than the overall longest match > at the leftmost position. So then I must be mistaken about the terminology. I thought greedy=leftmost-longest, while non-greedy=

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Charles Forsyth
>If I try sth. like >/( b(.)b)/a/\1\2/ >on >bla blb 56 >I get >bla blb\1\2 56 >which is not quite what I want... How then? (I'd like to get 'bla blblblb 56') echo bla blb 56 | sed 's/( b(.)b)/&\1\2/' bla blb blbl 56 similarly use `s' not `a' in sam.

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Russ Cox
Regarding greedy vs non-greedy etc. In Plan 9, a regular expression search always looks for the "leftmost-longest" match, which means a match that starts as far to the left as possible in the target text, and of the matches that start there, the longest one. In that model, it is not accurate to d

Re: [9fans] non greedy regular expressions

2008-10-24 Thread John Stalker
> you probably mean NON-greedy ops. Yes, my mistake. I'll risk making a very minor correction to Rob's post as well: > Backreferences within the pattern (such as in /(.*)\1/) make the > matcher non-regular and exponentially hard. They do change the class of the grammar and nobody knows how to i

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rudolf Sykora
2008/10/24 John Stalker <[EMAIL PROTECTED]>: > I think you've understood correctly. Back references mostly aren't > there. Greedy operators aren't there. you probably mean NON-greedy ops. I was not that concerned about backreferences but submatch extraction (according to the terminology used by

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rob Pike
Backreferences within the pattern (such as in /(.*)\1/) make the matcher non-regular and exponentially hard. It was a deliberate decision not to implement them in sam and I'd make the same decision today. As far as greedy/non-greedy operators go, they have more utility but I believe they have beco

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Uriel
On Fri, Oct 24, 2008 at 6:11 PM, Rudolf Sykora <[EMAIL PROTECTED]> wrote: > Further, in R. Cox's text (http://swtch.com/~rsc/regexp/regexp1.html) > he claims that all nice features except for backreferences can be > implemented with Thomson's NFA algorithm. And even the backreferences > can be hand

Re: [9fans] non greedy regular expressions

2008-10-24 Thread John Stalker
I think you've understood correctly. Back references mostly aren't there. Greedy operators aren't there. For back references, this may be due to philosophical reservations; I have a few myself. For greedy operators, I suspect it's more because noone has cared enough to do it. It wouldn't be to

Re: [9fans] non greedy regular expressions

2008-10-24 Thread erik quanstrom
> Ok, so despite the documentation, some submatch tracking is there. > But in all (?) your examples, as well as in the scripts you mentioned, > this tracking is exclusively used with the s command (which is said to > be unnecessary at least in sam/acme). If I try sth. like > /( b(.)b)/a/\1\2/ this

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rudolf Sykora
> well reading the code would be a travesty. it's curious > that neither the sam paper nor regexp(6) mentions > submatches. maybe i missed them. > > sed -n 's:.*(KRAK[A-Z]+*) +([a-zA-Z]+).*:\2, \1:gp' - erik Ok, so despite the documentation, some submatch tracking is there. But in all (?) your

Re: [9fans] non greedy regular expressions

2008-10-24 Thread erik quanstrom
> But any manual page (regexp(6), that of sam) keeps completely silent > about eg. any submatch tracking. > So what's wrong? Can anybody clarify the situation for me or do I > really have to read the codes? well reading the code would be a travesty. it's curious that neither the sam paper nor re

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Aharon Robbins
You are not missing anything. Subexpression matching means when you have an expression like q(a+b)(c*d)z that you can get access to the exact text matched by the two parenthesized subexpressions. You asked about non-greedy regular expressions which were first popularized by perl. IIRC

Re: [9fans] non greedy regular expressions

2008-10-24 Thread Rudolf Sykora
> russ has a great writeup on this. > http://swtch.com/~rsc/regexp/ > i think it covers all your questions. > > - erik I read trough some of that already yesterday. Anyway, am still puzzled. In the text of Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python,

Re: [9fans] non greedy regular expressions

2008-10-23 Thread erik quanstrom
> regexp(6) seems to know only greedy regular expressions. So does > probably awk, sed, grep, ..., since these are based on that regexp. > My question is what to do if one needs non-greedy regexps. russ has a great writeup on this. http://swtch.com/~rsc/regexp/ i think it covers all your questio

[9fans] non greedy regular expressions

2008-10-23 Thread Rudolf Sykora
Hello regexp(6) seems to know only greedy regular expressions. So does probably awk, sed, grep, ..., since these are based on that regexp. My question is what to do if one needs non-greedy regexps. Also, is there anything like submatch extraction, counted repetition of regexp (like {3,5}), (looka