Hrishikesh Gadre created HADOOP-14044:
-----------------------------------------

             Summary: Synchronization issue in delegation token cancel 
functionality
                 Key: HADOOP-14044
                 URL: https://issues.apache.org/jira/browse/HADOOP-14044
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Hrishikesh Gadre


We are using Hadoop delegation token authentication functionality in Apache 
Solr. As part of the integration testing, I found following issue with the 
delegation token cancelation functionality.

Consider a setup with 2 Solr servers (S1 and S2) which are configured to use 
delegation token functionality backed by Zookeeper. Now invoke following steps,

[Step 1] Send a request to S1 to create a delegation token.
  (Delegation token DT is created successfully)
[Step 2] Send a request to cancel DT to S2
  (DT is canceled successfully. client receives HTTP 200 response)
[Step 3] Send a request to cancel DT to S2 again
  (DT cancelation fails. client receives HTTP 404 response)
[Step 4] Send a request to cancel DT to S1

At this point we get two different responses.

- DT cancelation fails. client receives HTTP 404 response
- DT cancelation succeeds. client receives HTTP 200 response

Also as per the current implementation, each server maintains an in_memory 
cache of current tokens which is updated using the ZK watch mechanism. e.g. the 
ZK watch on S1 will ensure that the in_memory cache is synchronized after step 
2.

After investigation, I found the root cause for this behavior is due to the 
race condition between step 4 and the firing of ZK watch on S1. Whenever the 
watch fires before the step 4 - we get HTTP 404 response (as expected). When 
that is not the case - we get HTTP 200 response along with following ERROR 
message in the log,

{noformat}
Attempted to remove a non-existing znode /ZKDTSMTokensRoot/DT_XYZ
{noformat}

>From client perspective, the server *should* return HTTP 404 error when the 
>cancel request is sent out for an invalid token.

Ref: Here is the relevant Solr unit test for reference,
https://github.com/apache/lucene-solr/blob/746786636404cdb8ce505ed0ed02b8d9144ab6c4/solr/core/src/test/org/apache/solr/cloud/TestSolrCloudWithDelegationTokens.java#L285





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to