Hi Grant, Sorry I already posted on the solr-user mailing list, i thought that since it had more to do with the SVN version, i would post here as well. Here is a link to my mail:http://www.mail-archive.com/[email protected]/msg34964.html As for the request handler:<requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="fmap.content">content</str> <str name="lowernames">false</str> <str name="uprefix">tika_</str> <str name="defaultField">content</str> <str name="captureAttr">true</str> <str name="fmap.a">links</str> </lst></requestHandler> Fields are basic indexed and multivalued strings for testing purposes. I would gladly help debug solr or tika, but i can't really compile Tika. It's using maven and it cannot get through my company's proxy to download dependencies... I can debug solr though. Marc
From: [email protected] Subject: Re: Problem with PDF extraction Date: Mon, 26 Apr 2010 18:08:04 -0400 To: [email protected] Hi Marc, Can you ask on [email protected] and give more information about any errors that occur in your Solr log plus the setup of the ExtractingRequestHandler and related schema. -Grant On Apr 26, 2010, at 5:04 PM, Marc Ghorayeb wrote:Hello, I have been having problems with PDF randomly crashing the 1.4 Solr server so i tried out the SVN version which contains a newer Tika library. On its own, the tika app extracts correctly the content of my PDF. However, inside Solr, when i upload a pdf file to my update/extract handler, it does not seem to parse it (a blank file is outputted...). The literal values do get indexed though. I have had no luck in getting the tika parsing to work. For some reason, i get the same result whether or not the tika-parsers-0.7.jar is present in the lib folder. Whereas if the tika-core-0.7 jar is absent, it just crashes (which seems normal to me...). I don't seem to be the only one having this problem (on the user mailing list that is). Can anyone help me out? It would be greatly appreciated. I use a fairly classic schema and default requesthandlers. Marc Ghorayeb. Hotmail débarque sur votre téléphone ! Paramétrez Hotmail sur votre téléphone! Gratuit ! --------------------------Grant Ingersollhttp://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search _________________________________________________________________ Découvrez comment SURFER DISCRETEMENT sur un site de rencontres ! http://clk.atdmt.com/FRM/go/206608211/direct/01/
