Hi, <target name="depends"> <echo file="Y:/test.html"> <![CDATA[ <html> <head> <title>summary</title> <link rel="stylesheet" href="summary.css" type="text/css"> </head> <body> <a name="overview"></a> <center> <table class="summary"> was wrong </table> </center> </html> ]]> </echo> </target>
<target name="main" depends="depends"> <loadfile srcfile="Y:/test.html" property="summary"> <filterchain> <containsregex pattern='<table[^</]*>(.*?)</table>' replace="\1" byline="true" /> <tokenfilter> <!-- to get rid of whitespace in ${summary} --> <trim/> </tokenfilter> </filterchain> </loadfile> <echo>Summary == ${summary}</echo> </target> gives only the text = depends: main: [echo] Summary == was wrong BUILD SUCCESSFUL Total time: 407 milliseconds you have to use \1 and byline=true Regards, Gilbert -----Original Message----- From: George Bills [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 28, 2006 6:14 AM To: Ant Users List Subject: Re: containsregex and concat Thanks: the regular expression works now, which is progress. Unfortunately I'm getting all of the concatenated text, not just the matching text. If I use replace: <filterchain> <!--<tokenfilter><filetokenizer />--> <containsregex flags="isg" pattern="${summary.regex}" replace="SUMMARYTABLE" byline="false" <!-- implies filetokenizer --> /> <!-- </tokenfilter>--> </filterchain> I end up getting something like: [concat] <html> [concat] <head> [concat] <title>summary</title> [concat] <link rel="stylesheet" href="summary.css" type="text/css"> [concat] </head> [concat] <body> [concat] <a name="overview"></a> [concat] <center> [concat] SUMMARYTABLE [concat] </center> [concat] ...more HTML here... [concat] </html> I'm assuming it's because the file is just one big token - but if I use a line tokenizer, will I be able to match regular expressions over multiple lines? Thanks for the help. Rebhan, Gilbert wrote: > Hi, > > <table[^>/]*>(.*?)</table> > > should match : > > <table class="summary">foobar</table> > > also with more than one attribute > > <table class="summary" foo="bar">foobar</table> > > > foobar is /1 (group 1) > > > Regards, Gilbert > > > -----Original Message----- > From: George Bills [mailto:[EMAIL PROTECTED] > Sent: Monday, November 27, 2006 6:41 AM > To: Ant Users List > Subject: Re: containsregex and concat > > Hrm, it probably isn't since advanced regexs are still black magic to > me. The "." was supposed to match any character, including a newline > (with the s flag), the * to say match 0-n of them and the ? to say be > lazy, match as little as possible (so that I don't pull in > <table>...</table><table>...</table> in one match). > > I just tried [^<], but it doesn't seem to work - I think because of such > > things as "<table><tr>...</tr></table>" - the opening bracket of <tr> > conflicts. I tried [.<>]*? to make sure that the "regex.body" part > was matching the brackets, but that didn't work either. > > Also, <table class="summary"> was wrong - <table class="summary"(.*?)> > is a little better since the tables can have more than the class > attribute (in fact, all of them do). But after changing that I'm > matching the entire document - <html> through to </html>. That might > just be because I'm using filetokenizer - if I make one match within > filetokenizer, do I end up getting the entire document? If so, how do I > get only the matching text? > > Regex is now: <table class="summary".*?>.*?</table> > > Thanks for the help, I appreciate it. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]