[ 
https://issues.apache.org/jira/browse/SOLR-10806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039054#comment-16039054
 ] 

Erick Erickson commented on SOLR-10806:
---------------------------------------

[~jpountz][~thetaphi][~mikemccand] Any insights here as the error is coming 
from Lucene?

> Solr Replica goes down with NumberFormatException: Invalid shift value (64) 
> in prefixCoded bytes (is encoded value really an INT?)
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-10806
>                 URL: https://issues.apache.org/jira/browse/SOLR-10806
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.3
>            Reporter: Sachin Goyal
>
> Our Solr nodes go down within 20-30 minutes of indexing.
> It does not seem that load-rate is too high because the exception in the logs 
> is pointing to a data problem:
> {color:darkred}
> INFO  - 2017-06-02 23:21:19.094; org.apache.solr.core.SolrCore; 
> \[node-instances_shard2_replica3\] Registered new searcher 
> Searcher@6740879c\[node-instances_shard2_replica3\] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_ne(6.3.0):C200591/8616:delGen=20)
>  Uninverting(_wx(6.3.0):C72132/697:delGen=5) 
> Uninverting(_y0(6.3.0):c5798/27:delGen=3) 
> Uninverting(_yv(6.3.0):c10935/827:delGen=2) 
> Uninverting(_z4(6.3.0):C4163/2277:delGen=1)))}
> ERROR - 2017-06-02 23:21:19.105; org.apache.solr.core.CoreContainer; Error 
> waiting for SolrCore to be created
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core 
> \[node-instances_shard2_replica3\]
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at 
> org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:526)
>         at 
> org.apache.solr.core.CoreContainer$$Lambda$38/199449817.run(Unknown Source)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$9/1611272577.run(Unknown
>  Source)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: Unable to create core 
> \[node-instances_shard2_replica3\]
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:855)
>         at 
> org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:498)
>         at 
> org.apache.solr.core.CoreContainer$$Lambda$37/1402433372.call(Unknown Source)
>         ... 6 more
> Caused by: java.lang.NumberFormatException: Invalid shift value (64) in 
> prefixCoded bytes (is encoded value really an INT?)
>         at 
> org.apache.lucene.util.LegacyNumericUtils.getPrefixCodedLongShift(LegacyNumericUtils.java:163)
>         at 
> org.apache.lucene.util.LegacyNumericUtils$1.accept(LegacyNumericUtils.java:392)
>         at 
> org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:232)
>         at org.apache.lucene.index.Terms.getMax(Terms.java:169)
>         at 
> org.apache.lucene.util.LegacyNumericUtils.getMaxLong(LegacyNumericUtils.java:504)
>         at 
> org.apache.solr.update.VersionInfo.getMaxVersionFromIndex(VersionInfo.java:233)
>         at 
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1584)
>         at 
> org.apache.solr.update.UpdateLog.seedBucketsWithHighestVersion(UpdateLog.java:1610)
>         at org.apache.solr.core.SolrCore.seedVersionBuckets(SolrCore.java:949)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:931)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:776)
>         at org.apache.solr.core.CoreContainer.create(CoreContainer.java:842)
>         ... 8 more
> {color}
> It does not seem right that Solr Node itself should go down for such a 
> problem.
> # Error waiting for SolrCore to be created
> java.util.concurrent.ExecutionException: 
> org.apache.solr.common.SolrException: Unable to create core
> # Unable to create core
> # NumberFormatException: Invalid shift value (64) in prefixCoded bytes (is 
> encoded value really an INT?)
> i.e. Core creation fails because there was some confusion between long and 
> integer.
> If there is a data issue then somehow it should communicate it with an 
> exception during ingestion.
> \\
> \\
> *UPDATE*:
> Another issue I see with the above problem is that solr cluster is completely 
> inaccessible.
> Solr-UI is also not coming up. I restarted the Solr servers and they refuse 
> to recover.
> I am not even able to delete the collections and create them afresh.
> It seems the only way out is to do an *rm -rf* and re-install
> Note that it is not related to network as I can ssh to the Solr machines and 
> send messages to other Solr machines using nc
> \\
> \\
> *UPDATE 2*:
> I had a 24 node cluster with 2 collections.
> Each collection used  6 nodes and had 2 shard, 3 replica configuration.
> So 12 nodes used out of 24 nodes.
> Rest 12 nodes had Solr running with same zookeeper but no collections/cores.
> After the above errors begin to happen, Solr-UI of all 24 nodes became 
> unresponsive!
> So I tried the delete-collection API from the command line - no response.
> Ultimately I ran the delete-collection from the command line in a loop and it 
> deleted a part of the collection.
> Then I had to manually delete the *<coreName>/data/index/write.lock* file on 
> some nodes to purge those bad collections.
> Its been a few hours since then. There are no collections and still few nodes 
> are unresponsive with following messages in the logs:
> {color:brown}
> INFO  - 2017-06-03 06:40:51.308; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.408; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.508; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> INFO  - 2017-06-03 06:40:51.608; org.apache.solr.core.SolrCore; Core 
> sync-status_shard1_replica2 is not yet closed, waiting 100 ms before checking 
> again.
> {color}
> It looks like a serious stability problem to me.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to