Now that MRAB has shown me the follies of my ways I would
like to learn how to properly write the regular expression I
need.

This part:

> rx_works = '\$<[^<:]+?::.*?::\d*?>\$|\$<[^<:]+?::.*?::\d+-\d+>\$'
> # it fails if switched around:
> rx_fails = '\$<[^<:]+?::.*?::\d+-\d+>\$|\$<[^<:]+?::.*?::\d*?>\$'

suggests that I already have a solution. However, in reality this line:

> line = 'junk  $<match_A::options A::4>$  junk  $<match_B::options B::4-5>$  
> junk'

can be either way round (match_A, then match_B or the vice
versa) which, in turn, will switch the rx_works/rx_fails.

Let my try to explain the expression I am actually after
(assuming .compile with re.VERBOSE):

rx_works = '
        \$<                             # start of match is literal '$<' 
anywhere inside string
        [^<:]+?::               # followed by at least one "character", except 
'<' or ':', until the next '::'          (this is the placeholder "name")
        .*?::                   # followed by any number of any "character", 
until the next '::'                                        (this is the 
placeholder "options")
        \d*?                    # followed by any number of digits              
                                                                                
        (the max length of placeholder output)
        >\$                             # followed by '>$'
        |                               # -- OR (in *either* order) --
        \$<                             # start of match is literal '$<' 
anywhere inside string
        [^<:]+?::               # followed by at least one "character", except 
'<' or ':', until the next '::'          (this is the placeholder "name")
        .*?::                   # followed by any number of any "character", 
until the next '::'                                        (this is the 
placeholder "options")
                                        # now the difference:
        \d+-\d+                 # followed by one-or-many digits, a '-', and 
one-or-many digits                                         (this is the *range* 
from with placeholder output)
        >\$'                    # followed by '>$'

I want this to work for

        any number of matches

        in any order of max-lenght or output-range

inside one string.

Now, why the [^<:]+? dance ?

Because three levels of placeholders

        $<...::...::>$
        $<<...::...::>>$
        $<<<...::...::>>>$

need to be nestable inside each other ;-)

Anyone able to help ?

This seems beyond my current grasp of regular expressions.

Thanks,
Karsten
-- 
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to