Hi, I am indexing documents using tika and ExtractRequest handler. <requestHandler class= "org.apache.solr.handler.extraction.ExtractingRequestHandler" name= "/update/extract"> <lst name="defaults"> <str name="fmap.content"> DocContentS</str> <str name="lowernames">false</str> </lst> <lst name= "date.formats"> <str>yyyy-MM-dd</str> </lst> </requestHandler>
After indexing I see my field DocContentS cointains not only the text of the documents. As well metadata like. \n \n stream_size 204286 \n X-Parsed-By org.apache.tika.parser.DefaultParser \n X-Parsed-By org.apache.tika.parser.pkg.PackageParser \n stream_content_type application/zip \n stream_name http://localhost:808 I tried to use the extractFormat=text but I realised only works when using extractOnly=true and in this cases only gets the text. Is ther any way of extracting only the text when the data get's indexed? Thanks and Regards, Sergio Maroto