Thanks Erick
I already have gone through the link from tika example you shared.
Please look at the code in bold.
I believe still the entire contents is pushed to memory with handler object.
sorry i copied lengthy code from tika site.
Regards
Neo
*Streaming the plain text in chunks*
Sometimes, you want to chunk the resulting text up, perhaps to output as you
go minimising memory use, perhaps to output to HDFS files, or any other
reason! With a small custom content handler, you can do that.
public List<String> parseToPlainTextChunks() throws IOException,
SAXException, TikaException {
final List<String> chunks = new ArrayList<>();
chunks.add("");
ContentHandlerDecorator handler = new ContentHandlerDecorator() {
@Override
public void characters(char[] ch, int start, int length) {
String lastChunk = chunks.get(chunks.size() - 1);
String thisStr = new String(ch, start, length);
if (lastChunk.length() + length > MAXIMUM_TEXT_CHUNK_SIZE) {
chunks.add(thisStr);
} else {
chunks.set(chunks.size() - 1, lastChunk + thisStr);
}
}
};
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
try (InputStream stream =
ContentHandlerExample.class.getResourceAsStream("test2.doc")) {
*parser.parse(stream, handler, metadata);*
return chunks;
}
}
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html