Hi,

I'm using ExtractingRequestHandler and Solr CEL to extract data from
files.  And I'm getting an error when uploading images (jpeg, tiff,
png), but otherwise PDF, docs, xlsx etc have no problems.  So what's
happening is the server thinks the date (probably the archive_dt field
because that field is closest to the Date that is rejecting) isn't a
java.util.Date object so it falls into the DateMathParser because somewhere
it's converting the Date into a date string that can't be parsed by
DateMathParser, and kablowie.

I'm on Solr 6.6.6, and I'm using Solrj to communicate with the server.  I
believe I'm properly calling the server (ie using solr date format, etc).
Remember it's working just fine for PDFs and other document types without
issue.  And Dates are modeled the exactly same way for those documents.
The only thing different between the files that work and the files that
don't is the content of that file (doc vs image).  I'm not doing anything
fancy on the server either; this is pretty close to default configuration
when it comes to the ExtractingRequestHandler.

Any ideas on how to fix this would be very much appreciated.

thanks
Charlie

Here is the request and the exception that is happening in Solr:

2021-05-19 04:31:23.744 INFO  (qtp1543727556-21) [   x:igloo]
o.a.s.u.p.LogUpdateProcessorFactory [igloo]  webapp=/solr
path=/update/extract
params={literal.archiveDate_dt=2021-05-19T04:31:23.688Z&literal.categories=ONBEE&literal.categories=Uploaded&literal._type=document&literal._accountId=7&
literal.id=c308556b-e5f7-4d26-92d6-e6aa7648b2b2&version=2&literal._employeeNumber=PFC010001&literal.organization_id_i=114277&literal.org_level_1_name_s=XXX&literal.org_level_2_name_s=YYYY&literal._batchId=1038&literal.effectiveDate_dt=2019-01-29T05:00:00.000Z&literal._filename=015494_CKEQPQ0I8010.jpeg&wt=javabin{}
0 37

2021-05-19 04:31:23.746 ERROR (qtp1543727556-21) [   x:igloo]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Invalid
Date String:'Wed May 19 04:31:23 +00:00 2021'
        at
org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:234)
        at org.apache.solr.schema.TrieField.createField(TrieField.java:644)
        at org.apache.solr.schema.TrieField.createFields(TrieField.java:681)
        at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:72)
        at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:179)
        at
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:102)
        at
org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:922)
        at
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:913)
        at
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302)
        at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
        at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
        at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
        at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
        at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
        at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:126)
        at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:131)
        at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:237)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
        at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:724)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:534)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
        at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        at java.lang.Thread.run(Thread.java:748)

Here is my solrconfig.xml:

<requestHandler name="/update/extract"

                startup="lazy"
                class="solr.extraction.ExtractingRequestHandler">
    <lst name="defaults">
        <str name="uprefix">ignored_</str>

        <!-- capture link hrefs but ignore div attributes -->
        <str name="captureAttr">true</str>
        <str name="fmap.a">attr_links</str>
        <str name="fmap.div">ignored_</str>
        <str name="fmap.p">ignored_</str>
        <str name="fmap.img">ignored_</str>
        <str name="fmap.meta">ignored_</str>
    </lst>
    <!--<str name="tika.config">tika.config</str>-->
</requestHandler>

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">

<processor class="solr.ParseDateFieldUpdateProcessorFactory">
    <arr name="format">
        <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str>
        <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str>
        <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str>
        <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str>
        <str>yyyy-MM-dd'T'HH:mm:ssZ</str>
        <str>yyyy-MM-dd'T'HH:mm:ss</str>
        <str>yyyy-MM-dd'T'HH:mmZ</str>
        <str>yyyy-MM-dd'T'HH:mm</str>
        <str>yyyy-MM-dd HH:mm:ss.SSSZ</str>
        <str>yyyy-MM-dd HH:mm:ss,SSSZ</str>
        <str>yyyy-MM-dd HH:mm:ss.SSS</str>
        <str>yyyy-MM-dd HH:mm:ss,SSS</str>
        <str>yyyy-MM-dd HH:mm:ssZ</str>
        <str>yyyy-MM-dd HH:mm:ss</str>
        <str>yyyy-MM-dd HH:mmZ</str>
        <str>yyyy-MM-dd HH:mm</str>
        <str>yyyy-MM-dd</str>
        <str>dd-MMM-yyyy</str>
        <str>dd-MMM-yy</str>
    </arr>

</processor>

Reply via email to