[ 
https://issues.apache.org/jira/browse/SOLR-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Hacker resolved SOLR-4090.
------------------------------

       Resolution: Duplicate
    Fix Version/s: 4.1
    
> Indexing mangles documents under load
> -------------------------------------
>
>                 Key: SOLR-4090
>                 URL: https://issues.apache.org/jira/browse/SOLR-4090
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.0
>         Environment: Red Hat 5.8 x86_64, Oracle JDK 1.7.0_09
>            Reporter: Jay Hacker
>             Fix For: 4.1
>
>         Attachments: docs.json, docs.xml
>
>
> Sometimes when indexing documents with multiple concurrent processes, I get 
> intermittent data corruption errors.  The data submitted for indexing is 
> valid, but Solr will complain that it is malformed and return an HTTP 500 or 
> 400 error.  The errors are similar whether submitting as XML, JSON, or CSV.  
> The problem has not occurred using a single process (i.e., one process 
> POSTing to Solr), is rare with four processes, and is pretty reproducible 
> with 16 (or 64, or 128).  I've seen this problem since at least Solr 1.4; we 
> generally just use four threads and hope for the best.  It seems a bit more 
> common with Solr 4, so I decided to track it down.  
> I have dummy documents for the example ("collection1") schema that, when 
> posted using many simultaneous processes, often generate parsing errors.  I 
> use something like this to pound Solr with repeated POSTs of the same 
> document:
> {code}
> yes docs.xml | xargs -P64 -I{} curl -s --data-binary @{} --header 
> 'Content-type: text/xml' 'http://localhost:8983/solr/collection1/update'
> {code}
> Which fairly reliably gives this error:
> {noformat}
> SEVERE: org.apache.solr.common.SolrException: Unexpected '<'  in attribute 
> value
>  at [row,col {unknown-source}]: [2765,74]
>         at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:159)
>         at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>         at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>         at org.eclipse.jetty.server.Server.handle(Server.java:351)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
>         at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
>         at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
>         at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
>         at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: com.ctc.wstx.exc.WstxParsingException: Unexpected '<'  in 
> attribute value
>  at [row,col {unknown-source}]: [2765,74]
>         at 
> com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
>         at 
> com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.parseNormalizedAttrValue(BasicStreamReader.java:1951)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3037)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:2936)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2848)
>         at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
>         at 
> org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:370)
>         at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:229)
>         at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
>         ... 31 more
> {noformat}
> When posting the same data as JSON:
> {code}
> yes docs.json | xargs -P64 -I{} curl -s --data-binary @{} --header 
> 'Content-type: application/json' 
> 'http://localhost:8983/solr/collection1/update'
> {code}
> The errors look like this:
> {noformat}
> SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=3] multiple values 
> encountered for non multiValued field f0008_t: 
> [388888888888888888888888888888888888888888888888888, 
> 9888888888888888888888888888888888888888888888888888]
>         at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:243)
>         at 
> org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
>         at 
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:208)
>         at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
>         at 
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:432)
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:557)
>         at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:325)
>         at 
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>         at 
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:386)
>         at 
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:111)
>         at 
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:95)
>         at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:59)
>         at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>         at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
>         at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
>         at org.eclipse.jetty.server.Server.handle(Server.java:351)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
>         at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
>         at 
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
>         at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
>         at 
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
>         at 
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
>         at java.lang.Thread.run(Thread.java:722)
> {noformat}
> I would like to emphasize that the stack traces are misleading: there is no 
> problem with the data. Often the exact same document will index just fine a 
> thousand times before throwing an error.  Somewhere between being submitted 
> and parsed, the data is being occasionally corrupted.  The errors are 
> intermittent, but when the do happen, they often complain about exactly the 
> same byte in the input.
> Here's what I know:
> * Small POSTs, or single large documents, don't seem to trigger the problem; 
> it seems to take multiple documents totaling ~150K or more.
> * More threads trigger the error more often.
> * Restricting the JVM heap to say -Xmx384M seems to trigger the problem more 
> often.
> * Setting {{commit=true}} with each post seems to alleviate the problem.
> * I can't seem to reproduce it with 3.6.1, or 4.1-2012-11-16_01-01-43 nightly 
> -- maybe related to 
> [SOLR-3621|https://issues.apache.org/jira/browse/SOLR-3621]?
> * To test if it was Jetty, I wrote a small servlet that just decodes JSON and 
> pounded it under both Jetty 8.1.2 and 8.1.7  without problems -- however, 
> [SOLR-4031|https://issues.apache.org/jira/browse/SOLR-4031] sounds awfully 
> similar, and...
> * If I unzip the 4.0 tarball, and replace jetty (example/{lib/,start.jar}) 
> with the jetty 8.1.7 from 4.1 nightly, it seems to fix the problem.
> * If I take Jetty 8.1.2 from 4.0 and use it with 4.1, it breaks 4.1.
> I'm posting this in case anyone runs into similar problems, and meanwhile I 
> will keep testing 4.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to