[ 
https://issues.apache.org/jira/browse/CASSANDRA-18858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated CASSANDRA-18858:
-------------------------------
    Description: 
After dropping the keyspace and then perform the drain, I met the following 
ERROR message
{code:java}
rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 - 
Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
INFO  [Native-Transport-Requests-6] 2023-09-09 02:06:40,549 
MigrationManager.java:362 - Drop Keyspace 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353 
StorageService.java:1679 - DRAINING: starting drain process
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358 
HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 Server.java:179 
- Stop listening for CQL clients
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365 
Gossiper.java:1747 - Announcing shutdown
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369 
StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373 
MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 MessagingService.java:1346 
- MessagingService has terminated the accept() thread
ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384 
StorageService.java:4889 - Caught an exception while draining 
java.lang.AssertionError: Unknown keyspace uuid00bd53ebd0c84be39baf96c086b9f8d0
    at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
    at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
    at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
    at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
    at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
    at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
    at 
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
    at 
org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
    at 
org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
    at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
    at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
    at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
    at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
    at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
    at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
    at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
    at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
    at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
    at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
    at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
    at sun.rmi.transport.Transport$1.run(Transport.java:200)
    at sun.rmi.transport.Transport$1.run(Transport.java:197)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
    at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
    at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
    at java.security.AccessController.doPrivileged(Native Method)
    at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
INFO  [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953 
CassandraDaemon.java:576 - Cassandra shutting down...
INFO  [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 - 
Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
INFO  [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node 
configuration:[allocate_tokens_for_keyspace=null;  {code}
This is a concurrency problem related to the metadata update.

There are two threads interleaving:
 * Thread1 (meta-data update related thread): update the keyspace metadata
 * Thread2 (drain): flush all in-memory data to disk.

Here's how it happened in detail

Step1: The user issues a drop keyspace command. However, the update of the 
metadata gets delayed.

Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing 
operation
{code:java}
protected synchronized void drain(boolean isFinalShutdown) throws IOException, 
InterruptedException, ExecutionException
{
    for (Keyspace keyspace : Keyspace.nonSystem())
    {
        for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
            flushes.add(cfs.forceFlush());
    }{code}
keyspace.getColumnFamilyStores() returns a copy of the keyspaces

 
{code:java}
// Schema.java
public List<String> getNonSystemKeyspaces()
{
    return ImmutableList.copyOf(getNonSystemKeyspacesSet());
}

private Set<String> getNonSystemKeyspacesSet()
{
    return Sets.difference(keyspaces.keySet(), 
SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
}{code}
Step3: Thread1 (Metadata update related threads) updates the metadata and 
removes the keyspace
Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs into 
exceptions since the keyspace has been dropped.

 

The symptom is similar to CASSANDRA-18636, but the required triggering thread 
interleaving is different. For CASSANDRA-18636, it's the interleaving between 
_thread updating metadata_ and {_}thread estimating the size{_}. For this case, 
it's interleaving between the _thread updating the metadata_ and _thread 
performing the drain._

They might be fixed together with the 
[patch|https://github.com/apache/cassandra/pull/2457] by [Stefan 
Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
 if exhaust all inferences to the old keyspace copies. (The previous patch only 
targets at one interleaving situation, this ticked shows that there could be 
other cases).

I have attached the system.log file.

  was:
After dropping the keyspace and then perform the drain, I met the following 
ERROR message
{code:java}
rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 - 
Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
INFO  [Native-Transport-Requests-6] 2023-09-09 02:06:40,549 
MigrationManager.java:362 - Drop Keyspace 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353 
StorageService.java:1679 - DRAINING: starting drain process
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358 
HintsService.java:210 - Paused hints dispatch
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 Server.java:179 
- Stop listening for CQL clients
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365 
Gossiper.java:1747 - Announcing shutdown
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369 
StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373 
MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 MessagingService.java:1346 
- MessagingService has terminated the accept() thread
ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384 
StorageService.java:4889 - Caught an exception while draining 
java.lang.AssertionError: Unknown keyspace uuid00bd53ebd0c84be39baf96c086b9f8d0
    at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
    at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
    at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
    at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
    at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
    at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
    at 
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
    at 
org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
    at 
org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
    at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
    at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
    at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
    at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
    at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
    at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
    at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
    at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
    at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
    at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
    at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
    at sun.rmi.transport.Transport$1.run(Transport.java:200)
    at sun.rmi.transport.Transport$1.run(Transport.java:197)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
    at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
    at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
    at java.security.AccessController.doPrivileged(Native Method)
    at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
INFO  [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953 
CassandraDaemon.java:576 - Cassandra shutting down...
INFO  [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 - 
Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
INFO  [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node 
configuration:[allocate_tokens_for_keyspace=null;  {code}
 

This is a concurrency problem related to the metadata update.

There are two threads interleaving:
 * Thread1 (meta-data update related thread): update the keyspace metadata
 * Thread2 (drain): flush all in-memory data to disk.

Here's how it happened in detail

Step1: The user issues a drop keyspace command. However, the update of the 
metadata gets delayed.

Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing 
operation
{code:java}
protected synchronized void drain(boolean isFinalShutdown) throws IOException, 
InterruptedException, ExecutionException
{
    for (Keyspace keyspace : Keyspace.nonSystem())
    {
        for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
            flushes.add(cfs.forceFlush());
    }{code}
keyspace.getColumnFamilyStores() returns a copy of the keyspaces

 
{code:java}
// Schema.java
public List<String> getNonSystemKeyspaces()
{
    return ImmutableList.copyOf(getNonSystemKeyspacesSet());
}

private Set<String> getNonSystemKeyspacesSet()
{
    return Sets.difference(keyspaces.keySet(), 
SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
}{code}
Step3: Thread1 (Metadata update related threads) updates the metadata and 
removes the keyspace
Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs into 
exceptions since the keyspace has been dropped.

 

 

The symptoms is similar to CASSANDRA-18636, but the required triggering thread 
interleaving is different. For CASSANDRA-18636, it's the interleaving between 
_thread updating metadata_ and {_}thread estimating the size{_}. For this case, 
it's interleaving between the _thread updating the metadata_ and _thread 
performing the drain._

They might be fixed together with the 
[patch|https://github.com/apache/cassandra/pull/2457] by [Stefan 
Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
 if exhaust all inferences to the old keyspace copies. (The previous patch only 
targets at one interleaving situation, this ticked shows that there could be 
other cases).

I have attached the system.log file.


> AssertionError: Unknown keyspace after dropping a keyspace combined with 
> drain operation
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18858
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18858
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ke Han
>            Priority: Normal
>         Attachments: system.log
>
>
> After dropping the keyspace and then perform the drain, I met the following 
> ERROR message
> {code:java}
> rzayqyaovyytmgo],droppedColumns={},triggers=[],indexes=[]]
> INFO  [MigrationStage:1] 2023-09-09 02:06:40,500 ColumnFamilyStore.java:432 - 
> Initializing uuid00bd53ebd0c84be39baf96c086b9f8d0.hlknh
> INFO  [Native-Transport-Requests-6] 2023-09-09 02:06:40,549 
> MigrationManager.java:362 - Drop Keyspace 
> 'uuid00bd53ebd0c84be39baf96c086b9f8d0'
> INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,353 
> StorageService.java:1679 - DRAINING: starting drain process
> INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,358 
> HintsService.java:210 - Paused hints dispatch
> INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,364 
> Server.java:179 - Stop listening for CQL clients
> INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,365 
> Gossiper.java:1747 - Announcing shutdown
> INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:13,369 
> StorageService.java:2604 - Node /192.168.7.2 state jump to shutdown
> INFO  [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,373 
> MessagingService.java:985 - Waiting for messaging service to quiesce
> INFO  [ACCEPT-/192.168.7.2] 2023-09-09 02:07:15,375 
> MessagingService.java:1346 - MessagingService has terminated the accept() 
> thread
> ERROR [RMI TCP Connection(2)-127.0.0.1] 2023-09-09 02:07:15,384 
> StorageService.java:4889 - Caught an exception while draining 
> java.lang.AssertionError: Unknown keyspace 
> uuid00bd53ebd0c84be39baf96c086b9f8d0
>     at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
>     at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
>     at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
>     at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
>     at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
>     at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
>     at 
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
>     at 
> org.apache.cassandra.service.StorageService.drain(StorageService.java:4824)
>     at 
> org.apache.cassandra.service.StorageService.drain(StorageService.java:4749)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
>     at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>     at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>     at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>     at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>     at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>     at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>     at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>     at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
>     at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
>     at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
>     at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
>     at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
>     at sun.rmi.transport.Transport$1.run(Transport.java:200)
>     at sun.rmi.transport.Transport$1.run(Transport.java:197)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>     at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
>     at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
>     at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> INFO  [RMI TCP Connection(4)-127.0.0.1] 2023-09-09 02:07:17,953 
> CassandraDaemon.java:576 - Cassandra shutting down...
> INFO  [main] 2023-09-09 02:07:26,619 YamlConfigurationLoader.java:104 - 
> Configuration location: file:/etc/apache-cassandra-4.1.3/cassandra.yaml
> INFO  [main] 2023-09-09 02:07:27,094 Config.java:1171 - Node 
> configuration:[allocate_tokens_for_keyspace=null;  {code}
> This is a concurrency problem related to the metadata update.
> There are two threads interleaving:
>  * Thread1 (meta-data update related thread): update the keyspace metadata
>  * Thread2 (drain): flush all in-memory data to disk.
> Here's how it happened in detail
> Step1: The user issues a drop keyspace command. However, the update of the 
> metadata gets delayed.
> Step2: Thread2 (Drain) tries to iterate all the keyspace for the flushing 
> operation
> {code:java}
> protected synchronized void drain(boolean isFinalShutdown) throws 
> IOException, InterruptedException, ExecutionException
> {
>     for (Keyspace keyspace : Keyspace.nonSystem())
>     {
>         for (ColumnFamilyStore cfs : keyspace.getColumnFamilyStores())
>             flushes.add(cfs.forceFlush());
>     }{code}
> keyspace.getColumnFamilyStores() returns a copy of the keyspaces
>  
> {code:java}
> // Schema.java
> public List<String> getNonSystemKeyspaces()
> {
>     return ImmutableList.copyOf(getNonSystemKeyspacesSet());
> }
> private Set<String> getNonSystemKeyspacesSet()
> {
>     return Sets.difference(keyspaces.keySet(), 
> SchemaConstants.LOCAL_SYSTEM_KEYSPACE_NAMES);
> }{code}
> Step3: Thread1 (Metadata update related threads) updates the metadata and 
> removes the keyspace
> Step4: Thread2: it keeps iterating the old copy of the keyspaces and runs 
> into exceptions since the keyspace has been dropped.
>  
> The symptom is similar to CASSANDRA-18636, but the required triggering thread 
> interleaving is different. For CASSANDRA-18636, it's the interleaving between 
> _thread updating metadata_ and {_}thread estimating the size{_}. For this 
> case, it's interleaving between the _thread updating the metadata_ and 
> _thread performing the drain._
> They might be fixed together with the 
> [patch|https://github.com/apache/cassandra/pull/2457] by [Stefan 
> Miklosovic|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=stefan.miklosovic]
>  if exhaust all inferences to the old keyspace copies. (The previous patch 
> only targets at one interleaving situation, this ticked shows that there 
> could be other cases).
> I have attached the system.log file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to