Found the solution. We have to captureAttr=true on solrconfix.xml and upfrefix=_ignored
<requestHandler class= "org.apache.solr.handler.extraction.ExtractingRequestHandler" name= "/update/extract"> <lst name="defaults"> <str name="fmap.content"> DocContentS</str> <str name="lowernames">false</str> <str name="captureAttr">true</str> <str name="uprefix">ignored_</str></lst> <lst name="date.formats"> <str> yyyy-MM-dd</str> </lst> </requestHandler> After on schema.xml. Add <dynamicField name="ignored_*" type="ignored" multiValued="true" /> <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" /> On Fri, 3 Feb 2023 at 16:50, Sergio García Maroto <marot...@gmail.com> wrote: > Hi, > > I am indexing documents using tika and ExtractRequest handler. > <requestHandler class= > "org.apache.solr.handler.extraction.ExtractingRequestHandler" name= > "/update/extract"> <lst name="defaults"> <str name="fmap.content"> > DocContentS</str> <str name="lowernames">false</str> </lst> <lst name= > "date.formats"> <str>yyyy-MM-dd</str> </lst> </requestHandler> > > After indexing I see my field DocContentS cointains not only the text of > the documents. As well metadata like. > \n \n stream_size 204286 \n X-Parsed-By > org.apache.tika.parser.DefaultParser \n X-Parsed-By > org.apache.tika.parser.pkg.PackageParser \n stream_content_type > application/zip \n stream_name http://localhost:808 > > I tried to use the extractFormat=text but I realised only works when > using extractOnly=true and in this cases only gets the text. > > Is ther any way of extracting only the text when the data get's indexed? > > Thanks and Regards, > Sergio Maroto > >