Hi devs,

I’m trying to remember the history of how Tika’s current mime-type detection 
has evolved, regarding handling of plain text files.

Currently if I run a Shift-JIS encoded file through Tika (suffix is “.env”) it 
gets returned as application/octet-stream.

I thought that previously we had something which would check if the file only 
had tab/LF/CR bytes in the 0x00-0x1F range (so no other control chars besides 
these), and a reasonable number of line ending chars, and if so then we’d 
return text/plain instead of application/octet-stream

Thanks,

— Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to