cecemei commented on code in PR #19541:
URL: https://github.com/apache/druid/pull/19541#discussion_r3350993725
##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorManager.java:
##########
@@ -186,14 +186,101 @@ public boolean
createOrUpdateAndStartSupervisor(SupervisorSpec spec)
synchronized (lock) {
Preconditions.checkState(started, "SupervisorManager not started");
- final boolean shouldUpdateSpec = shouldUpdateSupervisor(spec);
+ // Persist whenever the spec actually changed (or is new) — independent
of whether a restart is
Review Comment:
this method is now `createOrUpdateAndStartSupervisor(spec, false)` ? also we
dont need `shouldUpdateSupervisor` anymore?
##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorManager.java:
##########
@@ -186,14 +186,101 @@ public boolean
createOrUpdateAndStartSupervisor(SupervisorSpec spec)
synchronized (lock) {
Preconditions.checkState(started, "SupervisorManager not started");
- final boolean shouldUpdateSpec = shouldUpdateSupervisor(spec);
+ // Persist whenever the spec actually changed (or is new) — independent
of whether a restart is
+ // required. This stops/recreates the supervisor regardless; persistence
must not be gated on the
+ // restart decision, otherwise a no-restart change (e.g. taskCount under
autoscaling) would be
+ // applied to the running supervisor but lost from the metadata store.
+ final boolean specChanged = isSpecChangedAndValidate(spec);
SupervisorSpec existingSpec =
possiblyStopAndRemoveSupervisorInternal(spec.getId(), false);
spec.merge(existingSpec);
- createAndStartSupervisorInternal(spec, shouldUpdateSpec);
- return shouldUpdateSpec;
+ createAndStartSupervisorInternal(spec, specChanged);
+ return specChanged;
}
}
+ /**
+ * Outcome of {@link #createOrUpdateAndStartSupervisor(SupervisorSpec,
boolean)}.
+ */
+ public enum SpecUpdateOutcome
+ {
+ /** Spec was byte-identical to the running spec and {@code
skipRestartIfUnmodified} was set: nothing done. */
+ UNCHANGED,
+ /** Spec changed but did not require a restart: persisted to metadata,
running supervisor left in place. */
+ PERSISTED_WITHOUT_RESTART,
+ /** Supervisor was (re)created and started; spec persisted if it changed.
*/
+ RESTARTED
+ }
+
+ /**
+ * Decides whether the submitted spec needs a restart and applies it under a
single lock, so the decision
+ * cannot go stale between deciding and acting (which would let a concurrent
POST drop a write or persist a
+ * spec that the running supervisor needs to be recreated for). With {@code
skipRestartIfUnmodified} set, an
+ * unchanged spec is a no-op and a changed spec whose {@link
SupervisorSpec#requireRestart} is false (e.g. a
+ * taskCount change under autoscaling) is persisted without recreating the
supervisor; otherwise the
+ * supervisor is stopped and recreated (the only behavior when the flag is
false).
+ */
+ public SpecUpdateOutcome createOrUpdateAndStartSupervisor(SupervisorSpec
spec, boolean skipRestartIfUnmodified)
Review Comment:
nit: maybe we need another `forceRestart`? since supervisor might not
restart when spec is modified but it didnt require so?
##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/supervisor/SupervisorResource.java:
##########
@@ -165,29 +165,39 @@ public Response specPost(
.build();
}
- if (Boolean.TRUE.equals(skipRestartIfUnmodified) &&
!manager.shouldUpdateSupervisor(spec)) {
- return Response.ok(ImmutableMap.of("id", spec.getId(),
"restarted", false)).build();
- }
+ // Decide and apply atomically so the restart decision cannot go
stale under a concurrent POST.
+ final SupervisorManager.SpecUpdateOutcome outcome =
+ manager.createOrUpdateAndStartSupervisor(spec,
Boolean.TRUE.equals(skipRestartIfUnmodified));
- manager.createOrUpdateAndStartSupervisor(spec);
-
- final String auditPayload
- = StringUtils.format("Update supervisor[%s] for datasource[%s]",
spec.getId(), spec.getDataSources());
- auditManager.doAudit(
- AuditEntry.builder()
- .key(spec.getId())
- .type("supervisor")
- .auditInfo(AuthorizationUtils.buildAuditInfo(req))
-
.request(AuthorizationUtils.buildRequestInfo("overlord", req))
- .payload(auditPayload)
- .build()
- );
+ // Audit any path that mutated the persisted spec; a no-op UNCHANGED
submission is not audited.
+ if (outcome != SupervisorManager.SpecUpdateOutcome.UNCHANGED) {
+ auditSupervisorUpdate(spec, req);
+ }
- return Response.ok(ImmutableMap.of("id", spec.getId(), "restarted",
true)).build();
+ final boolean restarted = outcome ==
SupervisorManager.SpecUpdateOutcome.RESTARTED;
+ return Response.ok(ImmutableMap.of("id", spec.getId(), "restarted",
restarted)).build();
Review Comment:
nit: my hunch is that maybe we should have a `modified` boolean field as
well since sometimes it modifies but doesnt restart?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]