Hi Again, Trying to push this up as I wasn't able to find the root cause of this issue. Perhaps I need to upgrade to 3.0 first? Will be happy to get some ideas.
Opened https://issues.apache.org/jira/browse/CASSANDRA-15172 with more details. Thanks! On Thu, Jun 6, 2019 at 5:31 AM Jonathan Koppenhofer <j...@koppedomain.com> wrote: > Not sure about why repair is running, but we are also seeing the same > merkle tree issue in a mixed version cluster in which we have intentionally > started a repair against 2 upgraded DCs. We are currently researching, and > can post back if we find the issue, but also would appreciate if someone > has a suggestion. We have also run a local repair in an upgraded DC in this > same mixed version cluster without issue. > > We are going 2.1.x to 3.0.x... and yes, we know you are not supposed to > run repairs in mixed version clusters, so don't do it :) this is kind of a > special circumstances where other things have gone wrong. > > Thanks > > On Wed, Jun 5, 2019, 5:23 PM shalom sagges <shalomsag...@gmail.com> wrote: > >> If anyone has any idea on what might cause this issue, it'd be great. >> >> I don't understand what could trigger this exception. >> But what I really can't understand is why repairs started to run suddenly >> :-\ >> There's no cron job running, no active repair process, no Validation >> compactions, Reaper is turned off.... I see repair running only in the >> logs. >> >> Thanks! >> >> >> On Wed, Jun 5, 2019 at 2:32 PM shalom sagges <shalomsag...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> I'm having a bad situation where after upgrading 2 nodes (binaries only) >>> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows: >>> >>> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on >>> thread Thread[ReadStage-5,5,main]: {} >>> java.lang.ArrayIndexOutOfBoundsException: null >>> >>> >>> I also see errors on repairs but no repair is running at all. I verified >>> this with ps -ef command and nodetool compactionstats. The error I see is: >>> Failed creating a merkle tree for [repair >>> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], / >>> 1.2.3.4 (see log for details) >>> >>> I saw repair errors on data tables as well. >>> nodetool status shows all are UN and nodetool describecluster shows two >>> schema versions as expected. >>> >>> >>> After the warnings appeared, clients started to get timed out read/write >>> queries. >>> Restarting the 2 nodes solved the clients' connection issues, but the >>> warnings are still being generated in the logs. >>> >>> Did anyone encounter such an issue and knows what this means? >>> >>> Thanks! >>> >>>