[ 
https://issues.apache.org/jira/browse/SOLR-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886272#comment-15886272
 ] 

Steve Rowe commented on SOLR-9401:
----------------------------------

This has been showing up on Jenkins-s, including mine, every once in a while.  
I beasted it a few hundred times and it happens on my box about 5% of the time.

Looking at the beasting failures, I noticed that the following is always 
present in the logs just before the NPE triggers a test failure (below is from 
[https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Windows/750/consoleText]):

{noformat}
   [junit4]   2> 403449 ERROR 
(TEST-TestPKIAuthenticationPlugin.test-seed#[D2BF175262C5311]) [    ] 
o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp: 1488029294206 , 
received timestamp: 1488029300727 , TTL: 5000
{noformat}

In short, the NPEs are triggered because the 5 second TTL has expired (which is 
itself weird - what the hell is taking so long???), causing 
{{PKIAuthenticationPlugin.doAuthenticate()}} to skip the code path that would 
pull the user principal from the headers and populate it on the request (line 
#134 below).  As a result, when test code attempts to verify that the user 
principal on the request has the appropriate value, the NPE is triggered.

{code:java|title=PKIAuthenticationPlugin.java}
 92:   public boolean doAuthenticate(ServletRequest request, ServletResponse 
response, FilterChain filterChain) throws Exception {
[...]
 99:     long receivedTime = System.currentTimeMillis();
100:     String header = ((HttpServletRequest) request).getHeader(HEADER);
[...]
108:     List<String> authInfo = StrUtils.splitWS(header, false);
[...]
115:     String nodeName = authInfo.get(0);
116:     String cipher = authInfo.get(1);
117: 
118:     PKIHeaderData decipher = decipherHeader(nodeName, cipher);
[...]
124:     if ((receivedTime - decipher.timestamp) > MAX_VALIDITY) {
125:       log.error("Invalid key request timestamp: {} , received timestamp: 
{} , TTL: {}", decipher.timestamp, receivedTime, MAX_VALIDITY);
126:         filterChain.doFilter(request, response);
127:         return true;
128:     }
129: 
130:     final Principal principal = "$".equals(decipher.userName) ?
131:         SU :
132:         new BasicUserPrincipal(decipher.userName);
133: 
134:     filterChain.doFilter(getWrapper((HttpServletRequest) request, 
principal), response);
135:     return true;
136:   }
{code}

Why doesn't the overridden {{doAuthenticate()}} follow the contract laid out in 
its superclass's javadocs, i.e. throw an exception on error?:

{code:java|title:AuthenticationPlugin.java}
  /**
   * This method attempts to authenticate the request. Upon a successful 
authentication, this
   * must call the next filter in the filter chain and set the user principal 
of the request,
   * or else, upon an error or an authentication failure, throw an exception.
[...]
   * @throws Exception any exception thrown during the authentication, e.g. 
PrivilegedActionException
  public abstract boolean doAuthenticate(ServletRequest request, 
ServletResponse response,
      FilterChain filterChain) throws Exception;
{code}

MAX_VALIDITY could maybe be modified by the test class (via sysprop 
{{pikiauth.ttl}}) to wait around longer:

{code:java|title=PKIAuthenticationPlugin.java}
63:  private final int MAX_VALIDITY = 
Integer.parseInt(System.getProperty("pkiauth.ttl", "5000"));
{code}

But that ^^ feels hackish.  Seems like client code should react to timeouts 
like this by retrying at least once?

I'm not sure which path is correct.  Ping [~noble.paul] , who AFAICT wrote most 
of this code, and [~janhoy], who was asking on the dev list about one of these 
failures the other day.

> TestPKIAuthenticationPlugin NPE
> -------------------------------
>
>                 Key: SOLR-9401
>                 URL: https://issues.apache.org/jira/browse/SOLR-9401
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Steve Rowe
>
> Failure from my Jenkins, doesn't reproduce for me (this is 
> {{tests-failures.txt}}):
> {noformat}
>   2> Creating dataDir: 
> /var/lib/jenkins/jobs/Lucene-Solr-tests-master/workspace/solr/build/solr-core/test/J7/temp/solr.security.TestPKIAuthenticationPlugi
> n_7AC33B2240CB767D-001/init-core-data-001
>   2> 14521 INFO  
> (SUITE-TestPKIAuthenticationPlugin-seed#[7AC33B2240CB767D]-worker) [    ] 
> o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (fal
> se) via: @org.apache.solr.util.RandomizeSSL(reason=, ssl=NaN, value=NaN, 
> clientAuth=NaN)
>   2> 14540 INFO  
> (TEST-TestPKIAuthenticationPlugin.test-seed#[7AC33B2240CB767D]) [    ] 
> o.a.s.SolrTestCaseJ4 ###Starting test
>   2> 15553 ERROR 
> (TEST-TestPKIAuthenticationPlugin.test-seed#[7AC33B2240CB767D]) [    ] 
> o.a.s.s.PKIAuthenticationPlugin No SolrAuth header present
>   2> 15843 ERROR 
> (TEST-TestPKIAuthenticationPlugin.test-seed#[7AC33B2240CB767D]) [    ] 
> o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp: 9 ,
>  received timestamp: 1470760833176 , TTL: 5000
>   2> 15843 INFO  
> (TEST-TestPKIAuthenticationPlugin.test-seed#[7AC33B2240CB767D]) [    ] 
> o.a.s.SolrTestCaseJ4 ###Ending test
>   2> NOTE: download the large Jenkins line-docs file by running 'ant 
> get-jenkins-line-docs' in the lucene directory.
>   2> NOTE: reproduce with: ant test  -Dtestcase=TestPKIAuthenticationPlugin 
> -Dtests.method=test -Dtests.seed=7AC33B2240CB767D -Dtests.slow=true -Dtests.li
> nedocsfile=/home/jenkins/lucene-data/enwiki.random.lines.txt 
> -Dtests.locale=cs -Dtests.timezone=Europe/Tirane -Dtests.asserts=true 
> -Dtests.file.encoding=U
> TF-8
> [12:40:32.094] ERROR   1.35s J7  | TestPKIAuthenticationPlugin.test <<<
>    > Throwable #1: java.lang.NullPointerException
>    >    at 
> __randomizedtesting.SeedInfo.seed([7AC33B2240CB767D:F29704F8EE371B85]:0)
>    >    at 
> org.apache.solr.security.TestPKIAuthenticationPlugin.test(TestPKIAuthenticationPlugin.java:144)
> [...]
>   2> 15867 INFO  
> (SUITE-TestPKIAuthenticationPlugin-seed#[7AC33B2240CB767D]-worker) [    ] 
> o.a.s.SolrTestCaseJ4 ###deleteCore
>   2> NOTE: leaving temporary files on disk at: 
> /var/lib/jenkins/jobs/Lucene-Solr-tests-master/workspace/solr/build/solr-core/test/J7/temp/solr.security.TestPKIAuthenticationPlugin_7AC33B2240CB767D-001
>   2> NOTE: test params are: codec=Asserting(Lucene62): {}, docValues:{}, 
> maxPointsInLeafNode=752, maxMBSortInHeap=5.390190554185364, 
> sim=RandomSimilarity(queryNorm=true): {}, locale=cs, timezone=Europe/Tirane
>   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 
> (64-bit)/cpus=16,threads=1,free=255922760,total=336592896
>   2> NOTE: All tests run in this JVM: [TestIndexingPerformance, 
> TestPKIAuthenticationPlugin]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to