DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19187>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19187 ReplaceRegExp cannot handle multi-byte encodings Summary: ReplaceRegExp cannot handle multi-byte encodings Product: Ant Version: 1.5.1 Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: Optional Tasks AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] The ReplaceRegExp task throws IndexOutOfBoundsException for files containing multi-byte encodings. java.lang.IndexOutOfBoundsException at java.io.BufferedReader.read(BufferedReader.java:256) at org.apache.tools.ant.taskdefs.optional.ReplaceRegExp.doReplace(ReplaceRegExp.java:404) at org.apache.tools.ant.taskdefs.optional.ReplaceRegExp.execute(ReplaceRegExp.java:491) at org.apache.tools.ant.Task.perform(Task.java:319) at org.apache.tools.ant.Target.execute(Target.java:309) at org.apache.tools.ant.Target.performTasks(Target.java:336) at org.apache.tools.ant.Project.executeTarget(Project.java:1306) at org.apache.tools.ant.Project.executeTargets(Project.java:1250) at org.apache.tools.ant.Main.runBuild(Main.java:610) at org.apache.tools.ant.Main.start(Main.java:196) at org.apache.tools.ant.Main.main(Main.java:235) The task was: <replaceregexp flags="g" file="regtst"> <regexp pattern="((Header:\s+\S+|Revision)\s+\S+\s+\S+\s+\S+)\s+(\w+)"/> <substitution expression="\1"/> </replaceregexp> The root cause seems to be the assumption that the length of the file is the same as the number of characters in the file. This assumption fails for multi-byte encodings. ReplaceRegExp.java lines 398 to 406 are: int flen = (int) f.length(); char tmpBuf[] = new char[flen]; int numread = 0; int totread = 0; while (numread != -1 && totread < flen) { numread = br.read(tmpBuf, totread, flen); totread += numread; } The flen is the number of bytes in the file, but it's being misused as the number of characters. Related symptom: if you use a fileset, you don't get the full stacktrace, only a summary: [replaceregexp] An error occurred processing file: '/home/jdb/projects/foo/regtst': java.lang.IndexOutOfBoundsException Work around: byline="true" uses a different block of code. (But it's still apt to munge your encoding.) Suggested enhancement: add a file encoding parameter to the task. Sorry I don't have time to fix this right now. 11011011