RE: fetching content from archives and images

Maciej Liżewski Mon, 07 Jan 2013 07:29:57 -0800

Could you provide example how to use it to recursively index files in
archive? Lets say I have archive.zip with 3 files: file.txt, file.doc,
file.pdf. I would like to have output with text content of all those files.
I am not very familiar with Tika, just using it as extract handler in Solr
server, so more specific help will be appreciated.
Thanks in advance.

Maciek

-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: Monday, January 07, 2013 2:58 PM
To: Maciej Liżewski
Cc: [email protected]
Subject: RE: fetching content from archives and images

On Mon, 7 Jan 2013, Maciej Liżewski wrote:
> And is there some default parser to recursively index all files in
archive?

You can just use AutoDetectParser, if you don't need any special handling. 
I think a lot of people have a small custom parser that outputs some special
markup / flags it in some way, then delegates to AutoDetectParser to handle
the contents

Nick

RE: fetching content from archives and images

Reply via email to