> Here's a version that works on your example:
>
> 0k
> ,x/ABC/+#0;/CBA|EFG/{
>',.-#0d
>.+#0??d
>.+#0k
> }
>
> -Derek
Thanks. This is what I wanted to see...
Ruda
"Rudolf Sykora" wrote:
> I have a file say like this
>
> ABC asassadfasdf asdfasdf asdfasdf CBA hhjjioioioi
> sodifs
> sdfsd
> ABC
> dasdfas aasdfa
> njnjn CBA
>
> and I want to get
>
> ' asassadfasdf asdfasdf asdfasdf '
> 'dasdfas aasdfa'
> 'njnjn'
>
> ...i.e. delimited with AB
Thanks for the explanations. The lowlife learns a bit or two :-)
--On Tuesday, October 28, 2008 2:51 PM + "Brian L. Stuart"
<[EMAIL PROTECTED]> wrote:
> This guy seems to blur the distinctions here. His discussion
He doesn't. If one reads the whole section part of which was quoted one
w
It is merely the traditional POSIX flavor. Some people like that
flavor, some don't.
Understandable.
It is more that Perl simply was never part of the picture for the people
who develop(ed) and use(d) Plan 9. It's like asking why the paper on
the Plan 9 C compiler doesn't state that C++ clas
> > This guy seems to blur the distinctions here. His discussion
>
> He doesn't. If one reads the whole section part of which was quoted one
> will see that he clearly states DFA and NFA are theoretically equivalent,
> but then goes on to explain that DFA and NFA _implementations_ are not
> id
> > As other mails have pointed out, anything that isn't leftmost longest
> > has weird semantics. Non-greedy operators are mostly syntactic sugar.
>
> Is (leftmost-longest + all-greedy operators) syntactic salt then?
It is merely the traditional POSIX flavor. Some people like that
flavor, some
As other mails have pointed out, anything that isn't leftmost longest
has weird semantics. Non-greedy operators are mostly syntactic sugar.
Is (leftmost-longest + all-greedy operators) syntactic salt then?
Not in the least. The Plan 9 regexp library in fact gives you close to
the same nirvana
First of all, thanks for the explanation. It's above my head, but thanks
anyway.
This guy seems to blur the distinctions here. His discussion
He doesn't. If one reads the whole section part of which was quoted one
will see that he clearly states DFA and NFA are theoretically equivalent,
bu
there's a reason they're not called regularly expressions.
As explained in the post by Brian L. Stuart it's a matter of "grammar" :-P
(if this were the definition, an expression's regularlyness would
depend on the target text, would it not?)
Yes, and that _would_ be why you wouldn't craft a
> > GNU grep takes a simple but effective approach. It uses a DFA when
> > possible, reverting to an NFA when backreferences are used. GNU awk does
> > something similar---it uses GNU grep's fast shortest-leftmost DFA engine
> > for simple "does it match" checks, and reverts to a different engine f
> The set of "big books on regular expressions" includes Jeffrey Friedl's
> "Mastering Regular Expressions" that happens to contain a chapter by the
> title "NFA, DFA, and POSIX" wherein he says:
>
> > DFA Speed with NFA Capabilities: Regex Nirvana?
This guy seems to blur the distinctions here.
>> practical application. now there are big books on `regular expressions'
>> mainly because they are no longer regular but a big collection of ad-hoc
>
> I thought they were "regular" because they "regularly" occurred in the
> target text. Turns out other interpretations are possible. Though, m
practical application. now there are big books on `regular expressions'
mainly because they are no longer regular but a big collection of ad-hoc
I thought they were "regular" because they "regularly" occurred in the
target text. Turns out other interpretations are possible. Though, mine has
t
>Both systems are complex enough that essentially no one completely understands
>them.
this touches on an important point. the first introduction of regular
expressions
to editors was great, because it took some formal language theory and made it
useful
in an `every day' way. conversely, the th
> Leftmost-first matching is difficult to explain.
> When POSIX had to define the rules for regexps,
> they chose to define them as leftmost-longest
> even though all the implementations were leftmost-first,
> because describing leftmost-first precisely was too
> complex.
>
> Leftmost-first matchin
As I explained in an earlier post, your suggested
> /ABC(.*?)CBA/
is less robust than Charles's spelled-out version,
since yours doesn't handle nested expressions well.
That's a serious enough limitation that your
scenario stops being a compelling argument for
leftmost-first matching and non-gree
The ability to put \1 in the right hand side of a substitution was
done by jason@
at the Uni of Sydney, but after the Sam papers were published. It was a welcome
feature that added special functionality to the 's' command within
Sam. (Ed(1) had
the feature, within its limited regexps, long before,
I loved this thread. Thanks everyone. Thanks Rudolf Sykora.
> Now. If the leftmost-longest match is usable for my problem, I am fine
> with C + regexp(6). If not I only see the possibility to use
> perl/python nowadays (if I don't want to go mad like above).
There is another option: yacc. I'm not saying it's simpler
than perl or python, but it's not much
> ...I have started to like to use as few tools as
> possible.
I entirely agree, there is too much to learn and you have to be selective;
however my selection is: sed, awk, C and sam (for impromptu, well, editing).
I cannot really comment directly on gready operators, I have never
knowingly us
2008/10/25 Tom Simons <[EMAIL PROTECTED]>:
> Is awk available? This worked for me, but it's not on Plan9. It does copy
> the newline after the 2nd "ABC" (I wasn't sure if leading or all blank lines
> should be deleted).
> $ awk 'BEGIN {RS = "ABC"; FS = "CBA"}NR == 1 {next}{print $1}' a.data
To t
Is awk available? This worked for me, but it's not on Plan9. It does copy
the newline after the 2nd "ABC" (I wasn't sure if leading or all blank lines
should be deleted).
$ cat a.data
dflkdl dlkrwo3je4ogjmdmxd
ABC asassadfasdf asdfasdf asdfasdf CBA hhjjioioioi
sodifs
sdfsd
ABC
da
>i didn't spend any time deciding whether there was a better way
>to express the trailing word delimiter. it's too late.
[EMAIL PROTECTED] meanwhile pointed out one way: change the words to otherwise
unused special
characters
although it can be satisfying to do things in one instruction (one command)
i confess i often find it quicker to split up the problem in sam or acme:
1. delete everything not between delimiters
,y/ABC([^C]|C[^B]|CB[^A]|\n)+CBA/d
2. delete the delimeters
,x/ABC|CBA/d
3. look to deci
hello
using sed and only one reg-exp is mandatory?
cat t.txt| sed 's/(ABC | CBA)/ \n\1\n /g' | awk '/ABC/,/CBA/' | grep -
v 'ABC|CBA'
that's a naive and simple approach, but i can't see why you need to
use just one reg-exp and just one sed. May be i missed something
through the thread :-
> doesn't s/ABC(the_interesting_part)CBA/x/g work for you?
> maybe i don't understand the example. if so, could you explain?
>
> - erik
I think not.
I have a file say like this
ABC asassadfasdf asdfasdf asdfasdf CBA hhjjioioioi
sodifs
sdfsd
ABC
dasdfas aasdfa
njnjn CBA
and I wan
> Ok, I finally see the point... thanks.
> Only one last question: So is there any simple way (using existing
> regexps) to find 'the interesting parts' in my last example?:
> ABCthe_interesting_partCBA blabla bla ABCthe_interesting_partCBA ...
> ...i.e. delimited with ABC from the left and CBA (or
> Greedy leftmost-first is different from leftmost-longest.
> Search for /a*(ab)?/ in "ab". The leftmost-longest match
> is "ab", but the leftmost-first match (because of the
> greedy star) is "a". In the leftmost-first case, the greediness
> of the star caused an overall short match.
Ok, I fina
> I thought greedy=leftmost-longest, while non-greedy=leftmost-first:
Greedy leftmost-first is different from leftmost-longest.
Search for /a*(ab)?/ in "ab". The leftmost-longest match
is "ab", but the leftmost-first match (because of the
greedy star) is "a". In the leftmost-first case, the gree
2008/10/24 Charles Forsyth <[EMAIL PROTECTED]>:
>>If I try sth. like
>>/( b(.)b)/a/\1\2/
>>on
>>bla blb 56
>>I get
>>bla blb\1\2 56
>>which is not quite what I want... How then? (I'd like to get 'bla blblblb 56')
>
> echo bla blb 56 | sed 's/( b(.)b)/&\1\2/'
> bla blb blbl 56
>
> similarly use `s'
> In that model, it is not accurate to describe the * + ?
> operators as greedy or not. None of them is working
> toward any goal other than the overall longest match
> at the leftmost position.
So then I must be mistaken about the terminology.
I thought greedy=leftmost-longest, while non-greedy=
>If I try sth. like
>/( b(.)b)/a/\1\2/
>on
>bla blb 56
>I get
>bla blb\1\2 56
>which is not quite what I want... How then? (I'd like to get 'bla blblblb 56')
echo bla blb 56 | sed 's/( b(.)b)/&\1\2/'
bla blb blbl 56
similarly use `s' not `a' in sam.
Regarding greedy vs non-greedy etc.
In Plan 9, a regular expression search always looks
for the "leftmost-longest" match, which means
a match that starts as far to the left as possible
in the target text, and of the matches that start there,
the longest one.
In that model, it is not accurate to d
> you probably mean NON-greedy ops.
Yes, my mistake. I'll risk making a very minor correction to
Rob's post as well:
> Backreferences within the pattern (such as in /(.*)\1/) make the
> matcher non-regular and exponentially hard.
They do change the class of the grammar and nobody knows how to
i
2008/10/24 John Stalker <[EMAIL PROTECTED]>:
> I think you've understood correctly. Back references mostly aren't
> there. Greedy operators aren't there.
you probably mean NON-greedy ops.
I was not that concerned about backreferences but submatch extraction
(according to the terminology used by
Backreferences within the pattern (such as in /(.*)\1/) make the
matcher non-regular and exponentially hard. It was a deliberate
decision not to implement them in sam and I'd make the same decision
today.
As far as greedy/non-greedy operators go, they have more utility but I
believe they have beco
On Fri, Oct 24, 2008 at 6:11 PM, Rudolf Sykora <[EMAIL PROTECTED]> wrote:
> Further, in R. Cox's text (http://swtch.com/~rsc/regexp/regexp1.html)
> he claims that all nice features except for backreferences can be
> implemented with Thomson's NFA algorithm. And even the backreferences
> can be hand
I think you've understood correctly. Back references mostly aren't
there. Greedy operators aren't there. For back references, this may
be due to philosophical reservations; I have a few myself. For greedy
operators, I suspect it's more because noone has cared enough to do
it. It wouldn't be to
> Ok, so despite the documentation, some submatch tracking is there.
> But in all (?) your examples, as well as in the scripts you mentioned,
> this tracking is exclusively used with the s command (which is said to
> be unnecessary at least in sam/acme). If I try sth. like
> /( b(.)b)/a/\1\2/
this
> well reading the code would be a travesty. it's curious
> that neither the sam paper nor regexp(6) mentions
> submatches. maybe i missed them.
>
> sed -n 's:.*(KRAK[A-Z]+*) +([a-zA-Z]+).*:\2, \1:gp' - erik
Ok, so despite the documentation, some submatch tracking is there.
But in all (?) your
> But any manual page (regexp(6), that of sam) keeps completely silent
> about eg. any submatch tracking.
> So what's wrong? Can anybody clarify the situation for me or do I
> really have to read the codes?
well reading the code would be a travesty. it's curious
that neither the sam paper nor re
You are not missing anything.
Subexpression matching means when you have an expression like
q(a+b)(c*d)z
that you can get access to the exact text matched by the two
parenthesized subexpressions.
You asked about non-greedy regular expressions which were first
popularized by perl.
IIRC
> russ has a great writeup on this.
> http://swtch.com/~rsc/regexp/
> i think it covers all your questions.
>
> - erik
I read trough some of that already yesterday. Anyway, am still
puzzled. In the text of
Regular Expression Matching Can Be Simple And Fast
(but is slow in Java, Perl, PHP, Python,
> regexp(6) seems to know only greedy regular expressions. So does
> probably awk, sed, grep, ..., since these are based on that regexp.
> My question is what to do if one needs non-greedy regexps.
russ has a great writeup on this.
http://swtch.com/~rsc/regexp/
i think it covers all your questio
Hello
regexp(6) seems to know only greedy regular expressions. So does
probably awk, sed, grep, ..., since these are based on that regexp.
My question is what to do if one needs non-greedy regexps.
Also, is there anything like submatch extraction, counted repetition
of regexp (like {3,5}), (looka
45 matches
Mail list logo