I am having a weird problem with the encoding during token replacement (ant code below).

Details: CentOS 5, SunOS 5, Darwin 9 // ant 1.6/1.7 // java 1.3/1.5/1.6

I have a simple token file with a simple variable ('A') and a single Chinese character (广 or cat -v: M-eM-9M-?).

The template is just "Hello @a...@."

The file command says that the tokens file is UTF8 and the templace file is ASCII.

If specify UTF-8 or don't specify an encoding, the encoding gets messed up. The 3-byte chinese character gets replaced with 4 bytes which prints as nonsense. BUT, if I specify latin1 for the encoding, the chinese character is maintained properly. Note that I also tried putting a chinese character into the template as well to get the file command to see the template file as UTF-8, which didn't help the token replacement if I went back to UTF-8 encoding. Interestingly, the Chinese character in the template itself is maintained regardless of the copy encoding used (again, the token replacement still gets messed up under UTF-8).

I simply cannot explain this. What am I missing? If all of the files are seen as UTF-8 and the Chinese character indeed seems to be encoded as a 3-byte UTF-8 character (as opposed to unicode), what does latin1 have to do with this?

Any clues will be appreciated!

Brian

-------------------------------------------------------------------------------------------------------------------
<project name="TokenReplacement" default="replace.tokens" basedir="../.">
<property name="replace.dir" value="/tmp/ant-char-probl
em"/>
<property name="tokens.file" value="/tmp/ant-char-probl
em/tokens"/>
<target name="replace.tokens" >
<copy todir="${replace.dir}" overwrite="true" encoding="utf-8">
<fileset dir="${replace.dir}">
<include name="**/*.tmpl"/>
</fileset>
<filterset begintoken="@" endtoken="@">
<filtersfile file="${tokens.file}"/>
</filterset>
<mapper type="glob" from="*.tmpl" to="*"/>
</copy>
</target>
</project>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@ant.apache.org
For additional commands, e-mail: user-h...@ant.apache.org

Reply via email to