Hello, Im trying to parse a file whose content type is UTF-16. Im unable to parse the document using the following code. Please Help me.
ContentHandler textHandler = new BodyContentHandler(); TeeContentHandler teeHandler = new TeeContentHandler(textHandler); parser.parse(input, teeHandler, metadata, context); String tt = textHandler.toString(); //to print the text byte[] converttoBytes = tt.getBytes("UTF-16"); String string = new String(converttoBytes, "utf-8"); System.out.println(string); but its printing along with all html tags. Thank You, Rajesh Chejerla -- View this message in context: http://lucene.472066.n3.nabble.com/AutoDetectParser-is-not-parsing-UTF-16-content-types-tp4004075.html Sent from the Apache Tika - Development mailing list archive at Nabble.com.