[ 
https://issues.apache.org/jira/browse/CASSANDRA-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-20135:
------------------------------------------
    Description: 
We see there are assertion errors thrown in 4.1 at least in 
StoppingTransformation like these:

{code}
java.lang.RuntimeException: java.lang.AssertionError
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
        at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
        at 
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.AssertionError: null
        at 
org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
        at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:104)
        at 
org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:49)
        at 
org.apache.cassandra.db.transform.Transformation.add(Transformation.java:198)
        at 
org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:140)
        at 
org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:616)
        at 
org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:604)
        at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97)
        at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:303)
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
        at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
        at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:337)
        at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:63)
        at 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
        ... 6 common frames omitted
{code}

This does not make sense at first sight and it is quite a rabbit hole to go 
through. If you follow the stacktrace, you see that 

{code}
Caused by: java.lang.AssertionError: null
        at 
org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
{code}

but ... why? It means that this (1) was called twice because that is the only 
place where "this.rows" are ever updated in that class (and this.rows is 
private) which means that _something_ has to call this twice in a row. Once it 
sets it just fine and another time it goes to set it again but it fails as it 
is not null anymore. Hence, the question is why is that set twice?

The reason is quite elaborative. "attachTo" which throws is ever called in 
BaseRows#add(Transformation) (2) and just on the next line it calls 
"super.add(transformation);" which adds that transformation at the end of a 
stack in Stack class which BaseRows extends.

{code}
    void add(Transformation add)
    {
        if (length == stack.length)
            stack = resize(stack);
        stack[length++] = add;
    }
{code}

Next thing we see from the stacktrace is that CheckForAbort.applyToPartition is 
calling Transformation.apply (3) and what that ultimately does is that it will 
add itself, again, at the end of the stack (4).

When we look at that stacktrace as a whole, what it does is that it is 
iterating over Unfiliteredpartition while building a local data response on a 
read and as it does so, it calls "BasePartitions.hasNext". Now we are getting 
to that ... (5). What "hasNext" is doing is that while this.next is null, it 
will take the stack and it loops in while by taking "next" from "input" and it 
applies all the transformations by calling "fs[i].applyToParition(next)".

So, there is a stack of transformations and they are called just one after 
another until some result of "applyToPartition" returns null or we iterated 
over all transformations. The chain of transformations also include 
"CheckForAbort" transformation which we added here (6) so what happens is that 
when we call "applyToPartitions" for the first time on CheckForAbort, it will 
run just fine, but when that while loop / for loop in BasePartitions is called 
_again_ (e.g. we are calling "hasNext" upon iterating in 
UnfilteredPartitionIterators), then "applyToPartition" for "CheckForAbort" will 
be called again as well. But CheckForAbort is doing this (7).

{code}
        protected UnfilteredRowIterator applyToPartition(UnfilteredRowIterator 
partition)
        {
            if (maybeAbort())
            {
                partition.close();
                return null;
            }

            return Transformation.apply(partition, this);
        }
{code}

Check the last line where it applies itself when it is not aborted:

{code}
Transformation.apply(partition, this)
{code}

The application of this stopping transformation to given partition means that 
it will add that transformation at the end of the stack as we already showed. 
Then, we will iterate over that stack again upon iterating in BasePartitions, 
which eventually calls "attachTo" for the second time, hence the assertion 
error.

The stack might look like

{code}
stack[0] = transformation1
stack[1] = transformation2
stack[2] = CheckForAbort
{code}

then we call "fs[i].applyToParition(next)" which will modify the stack like 
this:

{code}
stack[0] = transformation1
stack[1] = transformation2
stack[2] = CheckForAbort
stack[3] = CheckForAbort 
{code}

Then we will loop over that again and if I am not mistaken, when we hit 
stack[2], it will call applyToPartition on that and it will do 

{code}
stack[0] = transformation1
stack[1] = transformation2
stack[2] = CheckForAbort // this basically adds itself at the end again
stack[3] = CheckForAbort 
stack[4] = CheckForAbort 
{code}

I also see that CheckForAbort was replaced by CASSANDRA-17810 (8) by 
QueryCancellationChecker but except some "cosmetic" changes, the logic remains 
the same. That stopping transformation is applying itself in applyToPartition 
so I think that this problem is present in 5.0+ too. No transformation is 
applying itself like that but this one. 

I am not completely sure what we should do about this but two ideas are obvious 
(with non-zero probability that both of them are wrong)

1) We apply _new instance_ of QueryCancellationChecker / CheckForAbort so we 
will never call attachTo on the _same_ instance
2) We will remove "assert" in attachTo in StoppingTransformation so we will 
enable this to be called twice.

(1) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/transform/StoppingTransformation.java#L72
(2) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BaseRows.java#L104
(3) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L627
(4) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/Transformation.java#L198
(5) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BasePartitions.java#L87-L109
(6) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L436
(7) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L619-L628
(8) 
https://github.com/apache/cassandra/commit/f4b69ba0e82bb051e56a92d792142034d9f617f0#diff-554e7dff38b500f5eaed0b9b651c7098c3f8a1bd4f6aca12063eab352e685b9fR690

cc [~jmckenzie] [~marcuse] [~aleksey]

  was:
We see there are assertion errors thrown in 4.1 at least in 
StoppingTransformation like these:

{code}
java.lang.RuntimeException: java.lang.AssertionError
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
        at 
org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
        at 
org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.AssertionError: null
        at 
org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
        at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:104)
        at 
org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:49)
        at 
org.apache.cassandra.db.transform.Transformation.add(Transformation.java:198)
        at 
org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:140)
        at 
org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:616)
        at 
org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:604)
        at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97)
        at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:303)
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
        at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
        at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:337)
        at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:63)
        at 
org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
        at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
        ... 6 common frames omitted
{code}

This does not make sense at first sight and it is quite a rabbit hole to go 
through. If you follow the stacktrace, you see that 

{code}
Caused by: java.lang.AssertionError: null
        at 
org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
{code}

but ... why? It means that this (1) was called twice because that is the only 
place where "this.rows" are ever updated in that class (and this.rows is 
private) which means that _something_ has to call this twice in a row. Once it 
sets it just fine and another time it goes to set it again but it fails as it 
is not null anymore. Hence, the question is why is that set twice?

The reason is quite elaborative. "attachTo" which throws is ever called in 
BaseRows#add(Transformation) (2) and just on the next line it calls 
"super.add(transformation);" which adds that transformation at the end of a 
stack in Stack class which BaseRows extends.

{code}
    void add(Transformation add)
    {
        if (length == stack.length)
            stack = resize(stack);
        stack[length++] = add;
    }
{code}

Next thing we see from the stacktrace is that CheckForAbort.applyToPartition is 
calling Transformation.apply (3) and what that ultimately does is that it will 
add itself, again, at the end of the stack (4).

When we look at that stacktrace as a whole, what it does is that it is 
iterating over Unfiliteredpartition while building a local data response on a 
read and as it does so, it calls "BasePartitions.hasNext". Now we are getting 
to that ... (5). What "hasNext" is doing is that while this.next is null, it 
will take the stack and it loops in while by taking "next" from "input" and it 
applies all the transformations by calling "fs[i].applyToParition(next)".

So, there is a stack of transformations and they are called just one after 
another until some result of "applyToPartition" returns null or we iterated 
over all transformations. The chain of transformations also include 
"CheckForAbort" transformation which we added here (6) so what happens is that 
when we call "applyToPartitions" for the first time on CheckForAbort, it will 
run just fine, but when that while loop / for loop in BasePartitions is called 
_again_ (e.g. we are calling "hasNext" upon iterating in 
UnfilteredPartitionIterators), then "applyToPartition" for "CheckForAbort" will 
be called again as well. But CheckForAbort is doing this (7).

{code}
        protected UnfilteredRowIterator applyToPartition(UnfilteredRowIterator 
partition)
        {
            if (maybeAbort())
            {
                partition.close();
                return null;
            }

            return Transformation.apply(partition, this);
        }
{code}

Check the last line where it applies itself when it is not aborted:

{code}
Transformation.apply(partition, this)
{code}

The application of this stopping transformation to given partition means that 
it will add that transformation at the end of the stack as we already showed. 
Then, we will iterate over that stack again upon iterating in BasePartitions, 
which eventually calls "attachTo" for the second time, hence the assertion 
error.

I also see that CheckForAbort was replaced by CASSANDRA-17810 (8) by 
QueryCancellationChecker but except some "cosmetic" changes, the logic remains 
the same. That stopping transformation is applying itself in applyToPartition 
so I think that this problem is present in 5.0+ too. No transformation is 
applying itself like that but this one. 

I am not completely sure what we should do about this but two ideas are obvious 
(with non-zero probability that both of them are wrong)

1) We apply _new instance_ of QueryCancellationChecker / CheckForAbort so we 
will never call attachTo on the _same_ instance
2) We will remove "assert" in attachTo in StoppingTransformation so we will 
enable this to be called twice.

(1) 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/transform/StoppingTransformation.java#L72
(2) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BaseRows.java#L104
(3) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L627
(4) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/Transformation.java#L198
(5) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BasePartitions.java#L87-L109
(6) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L436
(7) 
https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L619-L628
(8) 
https://github.com/apache/cassandra/commit/f4b69ba0e82bb051e56a92d792142034d9f617f0#diff-554e7dff38b500f5eaed0b9b651c7098c3f8a1bd4f6aca12063eab352e685b9fR690

cc [~jmckenzie] [~marcuse] [~aleksey]


> Assertion errors on CheckForAbort / QueryCancellationChecker on multiple 
> calls of applyToPartition
> --------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20135
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20135
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>            Reporter: Stefan Miklosovic
>            Priority: Normal
>
> We see there are assertion errors thrown in 4.1 at least in 
> StoppingTransformation like these:
> {code}
> java.lang.RuntimeException: java.lang.AssertionError
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>         at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:142)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.AssertionError: null
>         at 
> org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
>         at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:104)
>         at 
> org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:49)
>         at 
> org.apache.cassandra.db.transform.Transformation.add(Transformation.java:198)
>         at 
> org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:140)
>         at 
> org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:616)
>         at 
> org.apache.cassandra.db.ReadCommand$CheckForAbort.applyToPartition(ReadCommand.java:604)
>         at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:97)
>         at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:303)
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
>         at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
>         at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:337)
>         at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:63)
>         at 
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
>         ... 6 common frames omitted
> {code}
> This does not make sense at first sight and it is quite a rabbit hole to go 
> through. If you follow the stacktrace, you see that 
> {code}
> Caused by: java.lang.AssertionError: null
>         at 
> org.apache.cassandra.db.transform.StoppingTransformation.attachTo(StoppingTransformation.java:72)
> {code}
> but ... why? It means that this (1) was called twice because that is the only 
> place where "this.rows" are ever updated in that class (and this.rows is 
> private) which means that _something_ has to call this twice in a row. Once 
> it sets it just fine and another time it goes to set it again but it fails as 
> it is not null anymore. Hence, the question is why is that set twice?
> The reason is quite elaborative. "attachTo" which throws is ever called in 
> BaseRows#add(Transformation) (2) and just on the next line it calls 
> "super.add(transformation);" which adds that transformation at the end of a 
> stack in Stack class which BaseRows extends.
> {code}
>     void add(Transformation add)
>     {
>         if (length == stack.length)
>             stack = resize(stack);
>         stack[length++] = add;
>     }
> {code}
> Next thing we see from the stacktrace is that CheckForAbort.applyToPartition 
> is calling Transformation.apply (3) and what that ultimately does is that it 
> will add itself, again, at the end of the stack (4).
> When we look at that stacktrace as a whole, what it does is that it is 
> iterating over Unfiliteredpartition while building a local data response on a 
> read and as it does so, it calls "BasePartitions.hasNext". Now we are getting 
> to that ... (5). What "hasNext" is doing is that while this.next is null, it 
> will take the stack and it loops in while by taking "next" from "input" and 
> it applies all the transformations by calling "fs[i].applyToParition(next)".
> So, there is a stack of transformations and they are called just one after 
> another until some result of "applyToPartition" returns null or we iterated 
> over all transformations. The chain of transformations also include 
> "CheckForAbort" transformation which we added here (6) so what happens is 
> that when we call "applyToPartitions" for the first time on CheckForAbort, it 
> will run just fine, but when that while loop / for loop in BasePartitions is 
> called _again_ (e.g. we are calling "hasNext" upon iterating in 
> UnfilteredPartitionIterators), then "applyToPartition" for "CheckForAbort" 
> will be called again as well. But CheckForAbort is doing this (7).
> {code}
>         protected UnfilteredRowIterator 
> applyToPartition(UnfilteredRowIterator partition)
>         {
>             if (maybeAbort())
>             {
>                 partition.close();
>                 return null;
>             }
>             return Transformation.apply(partition, this);
>         }
> {code}
> Check the last line where it applies itself when it is not aborted:
> {code}
> Transformation.apply(partition, this)
> {code}
> The application of this stopping transformation to given partition means that 
> it will add that transformation at the end of the stack as we already showed. 
> Then, we will iterate over that stack again upon iterating in BasePartitions, 
> which eventually calls "attachTo" for the second time, hence the assertion 
> error.
> The stack might look like
> {code}
> stack[0] = transformation1
> stack[1] = transformation2
> stack[2] = CheckForAbort
> {code}
> then we call "fs[i].applyToParition(next)" which will modify the stack like 
> this:
> {code}
> stack[0] = transformation1
> stack[1] = transformation2
> stack[2] = CheckForAbort
> stack[3] = CheckForAbort 
> {code}
> Then we will loop over that again and if I am not mistaken, when we hit 
> stack[2], it will call applyToPartition on that and it will do 
> {code}
> stack[0] = transformation1
> stack[1] = transformation2
> stack[2] = CheckForAbort // this basically adds itself at the end again
> stack[3] = CheckForAbort 
> stack[4] = CheckForAbort 
> {code}
> I also see that CheckForAbort was replaced by CASSANDRA-17810 (8) by 
> QueryCancellationChecker but except some "cosmetic" changes, the logic 
> remains the same. That stopping transformation is applying itself in 
> applyToPartition so I think that this problem is present in 5.0+ too. No 
> transformation is applying itself like that but this one. 
> I am not completely sure what we should do about this but two ideas are 
> obvious (with non-zero probability that both of them are wrong)
> 1) We apply _new instance_ of QueryCancellationChecker / CheckForAbort so we 
> will never call attachTo on the _same_ instance
> 2) We will remove "assert" in attachTo in StoppingTransformation so we will 
> enable this to be called twice.
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/transform/StoppingTransformation.java#L72
> (2) 
> https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BaseRows.java#L104
> (3) 
> https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L627
> (4) 
> https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/Transformation.java#L198
> (5) 
> https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/transform/BasePartitions.java#L87-L109
> (6) 
> https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L436
> (7) 
> https://github.com/apache/cassandra/blob/cassandra-4.1/src/java/org/apache/cassandra/db/ReadCommand.java#L619-L628
> (8) 
> https://github.com/apache/cassandra/commit/f4b69ba0e82bb051e56a92d792142034d9f617f0#diff-554e7dff38b500f5eaed0b9b651c7098c3f8a1bd4f6aca12063eab352e685b9fR690
> cc [~jmckenzie] [~marcuse] [~aleksey]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to