[
https://issues.apache.org/jira/browse/KAFKA-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941474#comment-14941474
]
Parth Brahmbhatt commented on KAFKA-2587:
-----------------------------------------
I looked at the code to reason around why this can happen. The state reported
is indeed one of the valid states during our test
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/test/scala/unit/kafka/security/auth/SimpleAclAuthorizerTest.scala#L217
After that line we actually remove all acls for that resource, add one acl back
to it and remove that one acl. All those steps pass verification.
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/test/scala/unit/kafka/security/auth/SimpleAclAuthorizerTest.scala#L225
and
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/test/scala/unit/kafka/security/auth/SimpleAclAuthorizerTest.scala#L226
Given we are using the same instance of the authorizer the cache of that
instance is immediately updated for both add and remove.
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala#L171
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala#L189
The only other place that can update the cache is notification handler as part
of handling acl-changed notification.
https://github.com/apache/kafka/blob/5764e54de147af81aac85acc00687c23e9646a5c/core/src/main/scala/kafka/security/auth/SimpleAclAuthorizer.scala#L269
However in that case we read the data from zookeeper and then update the cache.
If the notifications processing was delayed for some reason, it should still
read the acls from zk and then update the cache.
There are pathological cases that can lead to this failure , for example:
1) Notification handler starts, reads acls from zk and a thread switch happens
before it can update the cache
2) All the other cache updates go through (remove resource, add the acl, remove
the acl).
3) Before verification finishes for the last "remove one acl" a thread switch
happens and notification handler update the cache with stale acls that it read
before.
Even with this case there should be follow up notifications about adding an acl
and removing an acl which should again cause the notification process to read
state from zookeeper and update the cache to correct state. Plus this seems
unlikely enough that it would not happen every other day.
I will continue to look into this. In the meantime if this is a continuous dev
pain, we can remove the last 3 lines of test that removes the last acl and
tries to verify that the zookeeper path is deleted.
> Transient test failure: `SimpleAclAuthorizerTest`
> -------------------------------------------------
>
> Key: KAFKA-2587
> URL: https://issues.apache.org/jira/browse/KAFKA-2587
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Ismael Juma
> Assignee: Parth Brahmbhatt
> Fix For: 0.9.0.0
>
>
> I've seen `SimpleAclAuthorizerTest ` fail a couple of times since its recent
> introduction. Here's one such build:
> https://builds.apache.org/job/kafka-trunk-git-pr/576/console
> [~parth.brahmbhatt], can you please take a look and see if it's an easy fix?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)