Hi, I'm using ExtractingRequestHandler and Solr CEL to extract data from files. And I'm getting an error when uploading images (jpeg, tiff, png), but otherwise PDF, docs, xlsx etc have no problems. So what's happening is the server thinks the date (probably the archive_dt field because that field is closest to the Date that is rejecting) isn't a java.util.Date object so it falls into the DateMathParser because somewhere it's converting the Date into a date string that can't be parsed by DateMathParser, and kablowie.
I'm on Solr 6.6.6, and I'm using Solrj to communicate with the server. I believe I'm properly calling the server (ie using solr date format, etc). Remember it's working just fine for PDFs and other document types without issue. And Dates are modeled the exactly same way for those documents. The only thing different between the files that work and the files that don't is the content of that file (doc vs image). I'm not doing anything fancy on the server either; this is pretty close to default configuration when it comes to the ExtractingRequestHandler. Any ideas on how to fix this would be very much appreciated. thanks Charlie Here is the request and the exception that is happening in Solr: 2021-05-19 04:31:23.744 INFO (qtp1543727556-21) [ x:igloo] o.a.s.u.p.LogUpdateProcessorFactory [igloo] webapp=/solr path=/update/extract params={literal.archiveDate_dt=2021-05-19T04:31:23.688Z&literal.categories=ONBEE&literal.categories=Uploaded&literal._type=document&literal._accountId=7& literal.id=c308556b-e5f7-4d26-92d6-e6aa7648b2b2&version=2&literal._employeeNumber=PFC010001&literal.organization_id_i=114277&literal.org_level_1_name_s=XXX&literal.org_level_2_name_s=YYYY&literal._batchId=1038&literal.effectiveDate_dt=2019-01-29T05:00:00.000Z&literal._filename=015494_CKEQPQ0I8010.jpeg&wt=javabin{} 0 37 2021-05-19 04:31:23.746 ERROR (qtp1543727556-21) [ x:igloo] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Invalid Date String:'Wed May 19 04:31:23 +00:00 2021' at org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:234) at org.apache.solr.schema.TrieField.createField(TrieField.java:644) at org.apache.solr.schema.TrieField.createFields(TrieField.java:681) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:72) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:179) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:102) at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:922) at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:913) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:131) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:237) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:724) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748) Here is my solrconfig.xml: <requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="uprefix">ignored_</str> <!-- capture link hrefs but ignore div attributes --> <str name="captureAttr">true</str> <str name="fmap.a">attr_links</str> <str name="fmap.div">ignored_</str> <str name="fmap.p">ignored_</str> <str name="fmap.img">ignored_</str> <str name="fmap.meta">ignored_</str> </lst> <!--<str name="tika.config">tika.config</str>--> </requestHandler> <updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> <processor class="solr.ParseDateFieldUpdateProcessorFactory"> <arr name="format"> <str>yyyy-MM-dd'T'HH:mm:ss.SSSZ</str> <str>yyyy-MM-dd'T'HH:mm:ss,SSSZ</str> <str>yyyy-MM-dd'T'HH:mm:ss.SSS</str> <str>yyyy-MM-dd'T'HH:mm:ss,SSS</str> <str>yyyy-MM-dd'T'HH:mm:ssZ</str> <str>yyyy-MM-dd'T'HH:mm:ss</str> <str>yyyy-MM-dd'T'HH:mmZ</str> <str>yyyy-MM-dd'T'HH:mm</str> <str>yyyy-MM-dd HH:mm:ss.SSSZ</str> <str>yyyy-MM-dd HH:mm:ss,SSSZ</str> <str>yyyy-MM-dd HH:mm:ss.SSS</str> <str>yyyy-MM-dd HH:mm:ss,SSS</str> <str>yyyy-MM-dd HH:mm:ssZ</str> <str>yyyy-MM-dd HH:mm:ss</str> <str>yyyy-MM-dd HH:mmZ</str> <str>yyyy-MM-dd HH:mm</str> <str>yyyy-MM-dd</str> <str>dd-MMM-yyyy</str> <str>dd-MMM-yy</str> </arr> </processor>