[GitHub] lucene-solr pull request #433: SOLR-12357 Premptive creation of collections ...

dsmiley Tue, 07 Aug 2018 21:03:16 -0700

Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/433#discussion_r208448157
  
    --- Diff: 
solr/core/src/java/org/apache/solr/update/processor/TimeRoutedAliasUpdateProcessor.java
 ---
    @@ -167,66 +173,121 @@ private String getAliasName() {
       public void processAdd(AddUpdateCommand cmd) throws IOException {
         SolrInputDocument solrInputDocument = cmd.getSolrInputDocument();
         final Object routeValue = 
solrInputDocument.getFieldValue(timeRoutedAlias.getRouteField());
    -    final Instant routeTimestamp = parseRouteKey(routeValue);
    -
    +    final Instant docTimestampToRoute = parseRouteKey(routeValue);
         updateParsedCollectionAliases();
    -    String targetCollection;
    -    do { // typically we don't loop; it's only when we need to create a 
collection
    -      targetCollection = 
findTargetCollectionGivenTimestamp(routeTimestamp);
    +    String candidateCollection = 
findCandidateCollectionGivenTimestamp(docTimestampToRoute, 
cmd.getPrintableId());
    +    String targetCollection = 
createCollectionsIfRequired(docTimestampToRoute, candidateCollection, 
cmd.getPrintableId());
    +    if (thisCollection.equals(targetCollection)) {
    +      // pass on through; we've reached the right collection
    +      super.processAdd(cmd);
    +    } else {
    +      // send to the right collection
    +      SolrCmdDistributor.Node targetLeaderNode = 
routeDocToSlice(targetCollection, solrInputDocument);
    +      cmdDistrib.distribAdd(cmd, 
Collections.singletonList(targetLeaderNode), new 
ModifiableSolrParams(outParamsToLeader));
    +    }
    +  }
     
    -      if (targetCollection == null) {
    -        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
    -            "Doc " + cmd.getPrintableId() + " couldn't be routed with " + 
timeRoutedAlias.getRouteField() + "=" + routeTimestamp);
    +  private String createCollectionsIfRequired(Instant docTimestamp, String 
targetCollection, String printableId) {
    +    try {
    +      CreationType creationType = requiresCreateCollection(docTimestamp, 
timeRoutedAlias.getPreemptiveCreateWindow());
    +      switch (creationType) {
    +        case SYNCHRONOUS:
    +          // This next line blocks until all collections required by the 
current document have been created
    +          return maintain(targetCollection, docTimestamp, printableId);
    +        case ASYNC_PREEMPTIVE:
    +          // Note: creating an executor is slightly expensive, but only 
likely to happen once per hour/day/week
    +          // (depending on time slice size for the TRA). Executor is used 
to ensure we pick up the MDC logging stuff
    +          // from ExecutorUtil. Even though it is possible that multiple 
requests hit this code in the 1-2 sec that
    +          // it takes to create a collection, it's an established 
anti-pattern to feed data with a very large number
    +          // of client connections. This in mind, we only guard against 
spamming the overseer within a batch of
    +          // updates, intentionally tolerating a low level of redundant 
requests in favor of simpler code. Most
    +          // super-sized installations with many update clients will 
likely be multi-tenant and multiple tenants
    +          // probably don't write to the same alias. As such, we have 
deferred any solution the "many clients causing
    +          // collection creation simultaneously" problem until such time 
as someone actually has that problem in a
    +          // real world use case that isn't just an anti-pattern.
    +          synchronized (execLock) {
    +            if (preemptiveCreationExecutor == null) {
    +              preemptiveCreationExecutor = preemptiveCreationExecutor();
    +            }
    +            preemptiveCreationExecutor.execute(() -> {
    +              maintain(targetCollection, docTimestamp, printableId);
    +              preemptiveCreationExecutor = null;
    +            });
    +            preemptiveCreationExecutor.shutdown(); // shutdown immediately 
to ensure no new requests accepted
    +          }
    +          return targetCollection;
    +        case NONE:
    +          return targetCollection; // just for clarity...
    +        default:
    +          return targetCollection; // could use fall through, but fall 
through is fiddly for later editors.
           }
    +      // do nothing if creationType == NONE
    +    } catch (SolrException e) {
    +      throw e;
    +    } catch (Exception e) {
    +      throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
    +    }
    +  }
     
    -      // Note: the following rule is tempting but not necessary and is not 
compatible with
    -      // only using this URP when the alias distrib phase is NONE; 
otherwise a doc may be routed to from a non-recent
    -      // collection to the most recent only to then go there directly 
instead of realizing a new collection is needed.
    -      //      // If it's going to some other collection (not "this") then 
break to just send it there
    -      //      if (!thisCollection.equals(targetCollection)) {
    -      //        break;
    -      //      }
    -      // Also tempting but not compatible:  check that we're the leader, 
if not then break
    -
    -      // If the doc goes to the most recent collection then do some checks 
below, otherwise break the loop.
    -      final Instant mostRecentCollTimestamp = 
parsedCollectionsDesc.get(0).getKey();
    -      final String mostRecentCollName = 
parsedCollectionsDesc.get(0).getValue();
    -      if (!mostRecentCollName.equals(targetCollection)) {
    -        break;
    -      }
    +  /**
    +   * Create an executor that can only handle one task at a time. 
Additional tasks are rejected silently.
    +   * If we receive a batch update with hundreds of docs, that could queue 
up hundreds of calls to maintain().
    +   * Such a situation will typically create the required collection on the 
first document and then uselessly
    +   * spend time calculating that we don't need to create anything for 
subsequent documents. Therefore we simply
    +   * want to silently discard any additional attempts to maintain this 
alias until the one in progress has completed.
    +   */
    +  private ExecutorService preemptiveCreationExecutor() {
     
    -      // Check the doc isn't too far in the future
    -      final Instant maxFutureTime = 
Instant.now().plusMillis(timeRoutedAlias.getMaxFutureMs());
    -      if (routeTimestamp.isAfter(maxFutureTime)) {
    -        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
    -            "The document's time routed key of " + routeValue + " is too 
far in the future given " +
    -                TimeRoutedAlias.ROUTER_MAX_FUTURE + "=" + 
timeRoutedAlias.getMaxFutureMs());
    -      }
    +    ThreadPoolExecutor.DiscardPolicy discardPolicy = new 
ThreadPoolExecutor.DiscardPolicy(); // exception never thrown
    +    ArrayBlockingQueue<Runnable> oneAtATime = new ArrayBlockingQueue<>(1);
    +    DefaultSolrThreadFactory threadFactory = new 
DefaultSolrThreadFactory("TRA-preemptive-creation");
     
    -      // Create a new collection?
    -      final Instant nextCollTimestamp = 
timeRoutedAlias.computeNextCollTimestamp(mostRecentCollTimestamp);
    -      if (routeTimestamp.isBefore(nextCollTimestamp)) {
    -        break; // thus we don't need another collection
    -      }
    +    // Note: There is an interesting case that could crop up when the 
pre-create interval is longer than
    +    // a single time slice in the alias. With that configuration it is 
possible that a doc that should
    +    // cause several collections to be created is preceded by one that 
only creates a single collection. In that
    +    // case we might be too conservative in our pre-creation, and only 
create one collection. Dealing with that
    +    // presumably rare case adds complexity and is intentionally ignored 
at this time. If this shows itself to
    +    // be a frequent or otherwise important use case this decision can be 
revisited.
     
    -      createCollectionAfter(mostRecentCollName); // *should* throw if 
fails for some reason but...
    -      final boolean updated = updateParsedCollectionAliases();
    -      if (!updated) { // thus we didn't make progress...
    -        // this is not expected, even in known failure cases, but we check 
just in case
    -        throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
    -            "We need to create a new time routed collection but for 
unknown reasons were unable to do so.");
    -      }
    -      // then retry the loop ...
    -    } while(true);
    -    assert targetCollection != null;
    +    return newMDCAwareSingleThreadExecutor(threadFactory, discardPolicy, 
oneAtATime);
    +  }
     
    -    if (thisCollection.equals(targetCollection)) {
    -      // pass on through; we've reached the right collection
    -      super.processAdd(cmd);
    +  /**
    +   * Determine if the a new collection will be required based on the 
document timestamp. Passing null for
    +   * preemptiveCreateInterval tells you if the document is beyond all 
existing collections with a response of
    +   * {@link CreationType#NONE} or {@link CreationType#SYNCHRONOUS}, and 
passing a valid date math for
    +   * preemptiveCreateMath additionally distinguishes the case where the 
document is close enough to the end of
    +   * the TRA to trigger preemptive creation but not beyond all existing 
collections with a value of
    +   * {@link CreationType#ASYNC_PREEMPTIVE}.
    +   *
    +   * @param routeTimestamp The timestamp from the document
    +   * @param preemptiveCreateMath The date math indicating the {@link 
TimeRoutedAlias#preemptiveCreateMath}
    +   * @return a {@code CreationType} indicating if and how to create a 
collection
    +   */
    +  private CreationType requiresCreateCollection(Instant routeTimestamp,  
String preemptiveCreateMath) {
    +    // Create a new collection?
    +    final Instant mostRecentCollTimestamp = 
parsedCollectionsDesc.get(0).getKey();
    +    final Instant nextCollTimestamp = 
timeRoutedAlias.computeNextCollTimestamp(mostRecentCollTimestamp);
    +    if (isBlank(preemptiveCreateMath)) {
    +      return !routeTimestamp.isBefore(nextCollTimestamp) ? SYNCHRONOUS : 
NONE;
         } else {
    -      // send to the right collection
    -      SolrCmdDistributor.Node targetLeaderNode = 
routeDocToSlice(targetCollection, solrInputDocument);
    -      cmdDistrib.distribAdd(cmd, 
Collections.singletonList(targetLeaderNode), new 
ModifiableSolrParams(outParamsToLeader));
    +      DateMathParser dateMathParser = new DateMathParser();
    --- End diff --
    
    Lets pull out a utility method to compute preemptNextColCreateTime so that 
the logic here isn't distracted by that & exception handling.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] lucene-solr pull request #433: SOLR-12357 Premptive creation of collections ...

Reply via email to