Hi devs,

I ran into an issue where a test file that contained UTF-8 text was being 
displayed in Eclipse as us-ascii.

I had thought that Tika would use UTF-8 everywhere for file encodings, but…

Currently the tika-parent/pom.xml has:

  <properties>
    <maven.compiler.source>1.7</maven.compiler.source>
    <maven.compiler.target>1.7</maven.compiler.target>
    
<project.reporting.outputEncoding>${project.build.sourceEncoding}</project.reporting.outputEncoding>
    <commons.compress.version>1.10</commons.compress.version>
    <commons.io.version>2.4</commons.io.version>
    <slf4j.version>1.7.12</slf4j.version>
    <pax.exam.version>4.4.0</pax.exam.version>
  </properties>

Note that project.reporting.outputEncoding is set to 
project.build.sourceEncoding, but that's not specified anywhere.

Is there a reason for this? I can go ahead and switch it to be explicitly 
UTF-8, in the 2.x branch.

Thanks,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Reply via email to