yashmayya commented on code in PR #15990:
URL: https://github.com/apache/pinot/pull/15990#discussion_r2138269311
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/TableRebalanceIntegrationTest.java:
##########
@@ -1258,6 +1265,41 @@ private void checkRebalanceDryRunSummary(RebalanceResult
rebalanceResult, Rebala
assertEquals(numServersUnchanged,
summaryResult.getServerInfo().getServersUnchanged().size());
}
+ @Test
+ public void testDisallowMultipleConcurrentRebalancesOnSameTable() throws
Exception {
+ // Manually write an IN_PROGRESS rebalance job to ZK instead of trying to
collide multiple actual rebalance
+ // attempts which will be prone to race conditions and cause this test to
be flaky. We only reject a rebalance job
+ // if there is an IN_PROGRESS rebalance job for the same table in ZK, so
we could actually end up with more than
+ // one active rebalance job if both are started at the exact same time
since the progress stats are written to ZK
+ // after some initial pre-checks are done. However, rebalances are
idempotent, and we don't actually care too much
+ // about avoiding this edge case race condition as long as in most cases
we are able to prevent users from
+ // triggering a rebalance for a table that already has an in-progress
rebalance job.
+ String jobId = TableRebalancer.createUniqueRebalanceJobIdentifier();
+ String tableNameWithType =
TableNameBuilder.forType(TableType.REALTIME).tableNameWithType(getTableName());
+ TableRebalanceProgressStats progressStats = new
TableRebalanceProgressStats();
+ progressStats.setStatus(RebalanceResult.Status.IN_PROGRESS);
+ Map<String, String> jobMetadata = new HashMap<>();
+ jobMetadata.put(CommonConstants.ControllerJob.TABLE_NAME_WITH_TYPE,
tableNameWithType);
+ jobMetadata.put(CommonConstants.ControllerJob.JOB_ID, jobId);
+ jobMetadata.put(CommonConstants.ControllerJob.SUBMISSION_TIME_MS,
Long.toString(System.currentTimeMillis()));
+ jobMetadata.put(CommonConstants.ControllerJob.JOB_TYPE,
ControllerJobType.TABLE_REBALANCE);
+
jobMetadata.put(RebalanceJobConstants.JOB_METADATA_KEY_REBALANCE_PROGRESS_STATS,
+ JsonUtils.objectToString(progressStats));
+ ControllerZkHelixUtils.addControllerJobToZK(_propertyStore, jobId,
jobMetadata, ControllerJobType.TABLE_REBALANCE,
+ prevJobMetadata -> true);
+
+ RebalanceConfig rebalanceConfig = new RebalanceConfig();
+ String response = sendPostRequest(getRebalanceUrl(rebalanceConfig,
TableType.REALTIME));
+ assertTrue(response.contains("Rebalance job is already in progress for
table"));
Review Comment:
Ah whoops, that would have been a lot simpler 😄
Let me know if you're good with the current state of things or would prefer
it to be rolled back to the earlier state - i.e., continue returning `200 OK`,
but with the rebalance result containing `FAILED` and the relevant error
message.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]