AW: How to use replaceregexp in multi-line context?

Oliver Ashoff Thu, 18 May 2006 09:43:37 -0700

Hello David!
 
Sorry, my explanation was a little bit short, I guess. ;)
 
I wondered --- as you --- that the replaceregexp works fine in 
    (1)  one-line-mode
but not in 
    (2)  multi-line mode.
 
So, let's see what is the difference of the inputs! 
Outline:
 
    (1)   <any character sequence without any line feed><an optional
line feed>
 
    (2)   <any character sequence without any line feed><a line feed>
           <any character sequence without any line feed><a line feed>
           ....
          <any character sequence without any line feed><an optional
line feed>
 
So, I thought the 'line feed's could be the problem because not all line
feeds are equal :)
To be more precise, not all 'line end markers' (charcters or character
sequences that mark the end of a line) are equal.
For an explanation for 'line feed' see
http://en.wikipedia.org/wiki/Line_Feed
 
How to insert a 'line feed' in a regular expression?
 
I tried --- as you--- '\n'. But that did not help.
 
The problem could be that the line end marker used in the input are  2
characters!
For example: <carriage return><line feed>=<ASCII 13><ASCII 10>
 
Hence, inserting an extra ASCII 10 encoded as "&#10;"  in the regular
expression solved
the problem!  Hurra! :)
 
For 'Numerical Character References' see
http://www.w3.org/MarkUp/html3/latin1.html   (&#10;               Line
feed)
 
Example: Consider the following lines of an input file:
 
      <entry name="statistics.enabled">
        <value>true</value>
      </entry>
 
Now, we want to replace the string "true" by "false". But this should be
done
only for the entry "statistics.enabled". So, I use a regular expression
that 
matches the above three line:
 
 <replaceregexp byline="false" flags="m">
   <regexp
pattern="(.*&lt;entry.*name=&#34;statistics.enabled&#34;.*&gt;.*&#10;.*&
lt;value&gt;).*(&lt;/value&gt;.*&#10;.*&lt;/entry&gt;.*$)" />
   <substitution expression="\1false\2" />
   <fileset dir="${etc.dir}" includes="config.xml" />
  </replaceregexp>
 
As you can see, I inserted 2 times the sequences  
   &#10;
so that the 'line breaks' are recognized. Without that it does not work.
Additionally, I use further 'Numerical Character References'  for the
characters '<' and '>'.
 
I guess, you got now the crucial point. ;)
I did not investigate your regular expression because I mean that you
know you regular expression
and you only missed the trick with the "&#10;" character seqence to
insert.
I dont know if there is an other solution, perhaps a smarter one.
But this is at least an acceptable work-around for me.
 
If you dont succeed let me know. Then, I try to give you further
assitance... ;)
 
 
Cheers, Oliver




________________________________

        Von: David [mailto:[EMAIL PROTECTED] 
        Gesendet: Donnerstag, 18. Mai 2006 16:50
        An: Oliver Ashoff; Ant Apache User Group
        Betreff: Re:How to use replaceregexp in multi-line context?
        
        
        Dear Oliver, 
         
        Thanks for your interest on my problem. Concerning to your
comment, I don't understand wery will, please could you be a little bit
more explicit.
         
        As far as I understand, I think you mean to include the new line
character on the match expression. I was tested this too, without
success, so:
         
        <replaceregexp byline = "false" flags = "g"
                file="${sql.dir}/oracle/lra-create-index-oracle.sql"
                match="[EMAIL PROTECTED](\)\n]*;"
                replace=";">
         
        doesn't work, so adding the \n character. I have tested to using
the java property 
        ${line.separator}, so:
         
        <replaceregexp byline = "false" flags = "g"
                file="${sql.dir}/oracle/lra-create-index-oracle.sql"
        
match="[EMAIL PROTECTED](\)${line.separator}]*;"
                replace=";">
         
        both solutions compiles with Ant, but the input file doesn't
change.
         
        I have a simple example that work on multi-line context, but I
don't have to specify the list of allowed characters on the match
expression:
         
                <replaceregexp byline = "false" flags = "gs">
                    <regexp pattern = "${CVI.begin}(.*)${CVI.end}"/>
                    <substitution expression =
"${CVI.begin}${nl}${CVI.body.java}${CVI.end}"/>
                     <fileset dir = ".">
                        <exclude name="**/*.properties"/>
                        <patternset refid = "java.patternset"/>
                    </fileset>
                 </replaceregexp>
         
        where:
        CVI.begin                   = @BEGIN_CONTROL_VERSION_INFO@
        CVI.end                     = @END_CONTROL_VERSION_INFO@
         
        and ${nl} = ${line.separator}, with this peace of code the
delete the contains of the CVI block code, for example:
         
        @BEGIN_CONTROL_VERSION_INFO@
        Control Version Information
        
========================================================================
========
        $Log: DynamicInstance.java,v $
        Revision 1.4  2004/10/04 19:24:03  UF367151
        Checkstyle test passed.
        Revision 1.3  2004/09/14 17:56:48  UF367151
        
========================================================================
========
        @END_CONTROL_VERSION_INFO@
         
        for this case is easy because the end token is at new line and
it is a string instead of character like my case(";"), so we can specify
the "s" flag and "eat" every thing with .* pattern (including the new
line because the "s" option stays that)
         
        Please let me know any suggestion about that,
         
        Thanks,
         
        David

        
________________________________

        Be a chatter box. Enjoy free PC-to-PC calls
<http://us.rd.yahoo.com/mail_us/taglines/postman12/*http://us.rd.yahoo.c
om/evt=39663/*http://messenger.yahoo.com> with Yahoo! Messenger with
Voice.

AW: How to use replaceregexp in multi-line context?

Reply via email to