[ https://issues.apache.org/jira/browse/CASSANDRA-18858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ke Han updated CASSANDRA-18858: ------------------------------- Description: After dropping the keyspace and then perform the drain, I met the following ERROR message {code:java} rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]] INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 - Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549 MigrationManager.java:362 - Drop Keyspace 'uuid00bd53ebd0c84be39baf96c086b9f8d0' INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353 StorageService.java:1679 - DRAINING: starting drain process INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358 HintsService.java:210 - Paused hints dispatch INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 Server.java:179 - Stop listening for CQL clients INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365 Gossiper.java:1747 - Announcing shutdown INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369 StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373 MessagingService.java:985 - Waiting for messaging service to quiesce INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 MessagingService.java:1346 - MessagingService has terminated the accept() thread ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384 StorageService.java:4889 - Caught an exception while draining java.lang.AssertionError: Unknown keyspace uuid00bd53ebd0c84be39baf96c086b9f8d0 at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106) at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92) at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89) at com.google.common.collect.Iterators$8.transform(Iterators.java:799) at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) at org.apache.cassandra.service.StorageService.drain(StorageService.java:4824) at org.apache.cassandra.service.StorageService.drain(StorageService.java:4749) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953 CassandraDaemon.java:576 - Cassandra shutting down... INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 - Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node configuration:[allocate_tokens_for_keyspace=null; {code} This is a concurrency problem related to the metadata update. There are two threads interleaving: * Thread1 (meta-data update related thread): update the keyspace metadata * Thread2 (drain): flush all in-memory data to disk. Here's how it happened in detail Step1: The user issues a drop keyspace command. However, the update of the metadata gets delayed. Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing operation {code:java} protected synchronized void drain(boolean isFinalShutdown) throws IOException, InterruptedException, ExecutionException { for (Keyspace keyspace : Keyspace.nonSystem()) { for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores()) flushes.add(cfs.forceFlush()); }{code} keyspace.getColumnFamilyStores() returns a copy of the keyspaces {code:java} // Schema.java public List<String> getNonSystemKeyspaces() { return ImmutableList.copyOf(getNonSystemKeyspacesSet()); } private Set<String> getNonSystemKeyspacesSet() { return Sets.difference(keyspaces.keySet(), SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES); }{code} Step3: Thread1 (Metadata update related threads) updates the metadata and removes the keyspace Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs into exceptions since the keyspace has been dropped. The symptom is similar to CASSANDRA-18636, but the required triggering thread interleaving is different. For CASSANDRA-18636, it's the interleaving between _thread updating metadata_ and {_}thread estimating the size{_}. For this case, it's interleaving between the _thread updating the metadata_ and _thread performing the drain._ They might be fixed together with the [patch|https://github.com/apache/cassandra/pull/2457] by [Stefan Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic] if exhaust all inferences to the old keyspace copies. (The previous patch only targets at one interleaving situation, this ticked shows that there could be other cases). I have attached the system.log file. was: After dropping the keyspace and then perform the drain, I met the following ERROR message {code:java} rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]] INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 - Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549 MigrationManager.java:362 - Drop Keyspace 'uuid00bd53ebd0c84be39baf96c086b9f8d0' INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353 StorageService.java:1679 - DRAINING: starting drain process INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358 HintsService.java:210 - Paused hints dispatch INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 Server.java:179 - Stop listening for CQL clients INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365 Gossiper.java:1747 - Announcing shutdown INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369 StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373 MessagingService.java:985 - Waiting for messaging service to quiesce INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 MessagingService.java:1346 - MessagingService has terminated the accept() thread ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384 StorageService.java:4889 - Caught an exception while draining java.lang.AssertionError: Unknown keyspace uuid00bd53ebd0c84be39baf96c086b9f8d0 at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106) at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92) at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89) at com.google.common.collect.Iterators$8.transform(Iterators.java:799) at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) at org.apache.cassandra.service.StorageService.drain(StorageService.java:4824) at org.apache.cassandra.service.StorageService.drain(StorageService.java:4749) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953 CassandraDaemon.java:576 - Cassandra shutting down... INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 - Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node configuration:[allocate_tokens_for_keyspace=null; {code} This is a concurrency problem related to the metadata update. There are two threads interleaving: * Thread1 (meta-data update related thread): update the keyspace metadata * Thread2 (drain): flush all in-memory data to disk. Here's how it happened in detail Step1: The user issues a drop keyspace command. However, the update of the metadata gets delayed. Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing operation {code:java} protected synchronized void drain(boolean isFinalShutdown) throws IOException, InterruptedException, ExecutionException { for (Keyspace keyspace : Keyspace.nonSystem()) { for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores()) flushes.add(cfs.forceFlush()); }{code} keyspace.getColumnFamilyStores() returns a copy of the keyspaces {code:java} // Schema.java public List<String> getNonSystemKeyspaces() { return ImmutableList.copyOf(getNonSystemKeyspacesSet()); } private Set<String> getNonSystemKeyspacesSet() { return Sets.difference(keyspaces.keySet(), SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES); }{code} Step3: Thread1 (Metadata update related threads) updates the metadata and removes the keyspace Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs into exceptions since the keyspace has been dropped. The symptoms is similar to CASSANDRA-18636, but the required triggering thread interleaving is different. For CASSANDRA-18636, it's the interleaving between _thread updating metadata_ and {_}thread estimating the size{_}. For this case, it's interleaving between the _thread updating the metadata_ and _thread performing the drain._ They might be fixed together with the [patch|https://github.com/apache/cassandra/pull/2457] by [Stefan Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic] if exhaust all inferences to the old keyspace copies. (The previous patch only targets at one interleaving situation, this ticked shows that there could be other cases). I have attached the system.log file. > AssertionError: Unknown keyspace after dropping a keyspace combined with > drain operation > ---------------------------------------------------------------------------------------- > > Key: CASSANDRA-18858 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18858 > Project: Cassandra > Issue Type: Bug > Reporter: Ke Han > Priority: Normal > Attachments: system.log > > > After dropping the keyspace and then perform the drain, I met the following > ERROR message > {code:java} > rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]] > INFO [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 - > Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh > INFO [Native-Transport-Requests-6] 2023-09-09 02:06:40,549 > MigrationManager.java:362 - Drop Keyspace > 'uuid00bd53ebd0c84be39baf96c086b9f8d0' > INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353 > StorageService.java:1679 - DRAINING: starting drain process > INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358 > HintsService.java:210 - Paused hints dispatch > INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 > Server.java:179 - Stop listening for CQL clients > INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365 > Gossiper.java:1747 - Announcing shutdown > INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369 > StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown > INFO [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373 > MessagingService.java:985 - Waiting for messaging service to quiesce > INFO [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 > MessagingService.java:1346 - MessagingService has terminated the accept() > thread > ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384 > StorageService.java:4889 - Caught an exception while draining > java.lang.AssertionError: Unknown keyspace > uuid00bd53ebd0c84be39baf96c086b9f8d0 > at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106) > at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92) > at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89) > at com.google.common.collect.Iterators$8.transform(Iterators.java:799) > at > com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) > at > org.apache.cassandra.service.StorageService.drain(StorageService.java:4824) > at > org.apache.cassandra.service.StorageService.drain(StorageService.java:4749) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) > at sun.rmi.transport.Transport$1.run(Transport.java:200) > at sun.rmi.transport.Transport$1.run(Transport.java:197) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) > at java.security.AccessController.doPrivileged(Native Method) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > INFO [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953 > CassandraDaemon.java:576 - Cassandra shutting down... > INFO [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 - > Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml > INFO [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node > configuration:[allocate_tokens_for_keyspace=null; {code} > This is a concurrency problem related to the metadata update. > There are two threads interleaving: > * Thread1 (meta-data update related thread): update the keyspace metadata > * Thread2 (drain): flush all in-memory data to disk. > Here's how it happened in detail > Step1: The user issues a drop keyspace command. However, the update of the > metadata gets delayed. > Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing > operation > {code:java} > protected synchronized void drain(boolean isFinalShutdown) throws > IOException, InterruptedException, ExecutionException > { > for (Keyspace keyspace : Keyspace.nonSystem()) > { > for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores()) > flushes.add(cfs.forceFlush()); > }{code} > keyspace.getColumnFamilyStores() returns a copy of the keyspaces > > {code:java} > // Schema.java > public List<String> getNonSystemKeyspaces() > { > return ImmutableList.copyOf(getNonSystemKeyspacesSet()); > } > private Set<String> getNonSystemKeyspacesSet() > { > return Sets.difference(keyspaces.keySet(), > SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES); > }{code} > Step3: Thread1 (Metadata update related threads) updates the metadata and > removes the keyspace > Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs > into exceptions since the keyspace has been dropped. > > The symptom is similar to CASSANDRA-18636, but the required triggering thread > interleaving is different. For CASSANDRA-18636, it's the interleaving between > _thread updating metadata_ and {_}thread estimating the size{_}. For this > case, it's interleaving between the _thread updating the metadata_ and > _thread performing the drain._ > They might be fixed together with the > [patch|https://github.com/apache/cassandra/pull/2457] by [Stefan > Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic] > if exhaust all inferences to the old keyspace copies. (The previous patch > only targets at one interleaving situation, this ticked shows that there > could be other cases). > I have attached the system.log file. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org