[ https://issues.apache.org/jira/browse/CASSANDRA-19427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821676#comment-17821676 ]
Stefan Miklosovic commented on CASSANDRA-19427: ----------------------------------------------- [CASSANDRA-19427-5.0|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19427-5.0] {noformat} java17_pre-commit_tests java17_separate_tests java11_pre-commit_tests ✓ j11_build 8m 44s ✓ j11_cqlsh_dtests_py311 8m 48s ✓ j11_cqlsh_dtests_py311_vnode 9m 9s ✓ j11_cqlsh_dtests_py38 9m 53s ✓ j11_cqlsh_dtests_py38_vnode 9m 55s ✓ j11_cqlshlib_cython_tests 12m 27s ✓ j11_cqlshlib_tests 10m 19s ✓ j11_dtests 35m 5s ✓ j11_dtests_vnode 33m 46s ✓ j11_jvm_dtests 19m 26s ✓ j11_jvm_dtests_vnode 16m 28s ✓ j11_simulator_dtests 6m 29s ✓ j11_unit_tests 16m 13s ✓ j11_utests_oa 15m 25s ✓ j11_utests_system_keyspace_directory 14m 45s ✓ j17_cqlsh_dtests_py311 6m 27s ✓ j17_cqlsh_dtests_py311_vnode 6m 17s ✓ j17_cqlsh_dtests_py38 6m 5s ✓ j17_cqlsh_dtests_py38_vnode 6m 18s ✓ j17_cqlshlib_cython_tests 7m 56s ✓ j17_cqlshlib_tests 6m 58s ✓ j17_dtests 31m 6s ✓ j17_dtests_vnode 34m 14s ✓ j17_jvm_dtests 18m 53s ✓ j17_jvm_dtests_vnode 16m 34s ✓ j17_unit_tests 17m 52s ✓ j17_utests_oa 16m 31s java11_separate_tests {noformat} [java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3934/workflows/123d3f12-1fbf-46d4-a621-d4c7270a5a67] [java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3934/workflows/9cd4ca93-e60a-4fce-9899-70a8b76c855d] [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3934/workflows/7102e866-10e3-4847-8480-64c932810416] [java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3934/workflows/29ac91cb-d626-4abc-bf75-14c21d490e9e] > Fix concurrent access of ClientWarn causing AIOBE for SELECT WHERE IN queries > with multiple coordinator-local partitions > ------------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-19427 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19427 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Legacy/Local Write-Read Paths > Reporter: Abe Ratnofsky > Assignee: Abe Ratnofsky > Priority: Normal > Fix For: 3.11.x, 4.0.x, 4.1.x, 5.0.x, 5.x > > Time Spent: 50m > Remaining Estimate: 0h > > On one of our clusters, we noticed rare but periodic > ArrayIndexOutOfBoundsExceptions: > > {code:java} > message="Uncaught exception on thread Thread[ReadStage-3,5,main]" > exception="java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.ArrayIndexOutOfBoundsException"{code} > > > The error was in a Runnable, so the stacktrace didn't directly indicate where > the error was coming from. We enabled JFR to log the underlying exception > that was thrown: > > {code:java} > message="Uncaught exception on thread Thread[ReadStage-2,5,main]" > exception="java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 0 > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2579) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds > for length 0 > at java.base/java.util.ArrayList.add(ArrayList.java:487) > at java.base/java.util.ArrayList.add(ArrayList.java:499) > at org.apache.cassandra.service.ClientWarn$State.add(ClientWarn.java:84) > at > org.apache.cassandra.service.ClientWarn$State.access$000(ClientWarn.java:77) > at org.apache.cassandra.service.ClientWarn.warn(ClientWarn.java:51) > at > org.apache.cassandra.db.ReadCommand$1MetricRecording.onClose(ReadCommand.java:596) > at > org.apache.cassandra.db.transform.BasePartitions.runOnClose(BasePartitions.java:70) > at org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95) > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2260) > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2575) > ... 6 more"{code} > > > An AIOBE on ArrayList.add(E) should only be possible when multiple threads > attempt to call the method at the same time. > > This was seen while executing a SELECT WHERE IN query with multiple partition > keys. This exception could happen when multiple local reads are dispatched by > the coordinator in > org.apache.cassandra.service.reads.AbstractReadExecutor#makeRequests. In this > case, multiple local reads exceed the tombstone warning threshold, so > multiple tombstone warnings are added to the same ClientWarn.State reference. > Currently, org.apache.cassandra.service.ClientWarn.State#warnings is an > ArrayList, which isn't safe for concurrent modification, causing the AIOBE to > be thrown. > > I have a patch available for this, and I'm preparing it now. The patch is > simple - it just changes > org.apache.cassandra.service.ClientWarn.State#warnings to a thread-safe > CopyOnWriteArrayList. I also have a jvm-dtest that demonstrates the issue but > doesn't need to be merged - it shows how a SELECT WHERE IN query with local > reads that add client warnings can add to the same ClientWarn.State from > different threads. I'll push that in a separate branch just for demonstration > purposes. > > Demonstration branch: > [https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19427-aiobe-clientwarn-demo] > Fix branch: > [https://github.com/apache/cassandra/compare/trunk...aratno:cassandra:CASSANDRA-19427-aiobe-clientwarn-fix] > (PR linked below) > > This appears to have been an issue since at least 3.11, that was the earliest > release I checked. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org