Copilot commented on code in PR #3957:
URL: https://github.com/apache/solr/pull/3957#discussion_r2626794303
##########
solr/core/src/test/org/apache/solr/cloud/LeaderElectionTest.java:
##########
@@ -579,23 +581,33 @@ public void run() {
connLossThread.join();
killThread.join();
- int seq = threads.get(getLeaderThread()).getSeq();
+ // Retry getting leader with extended timeout to handle edge cases where
+ // getLeaderUrl() gets an unexpected exception and throws
RuntimeException
+ RetryUtil.retryOnException(
+ Exception.class,
+ 60000, // 60 seconds total timeout
+ 100, // 100ms between retries
+ () -> {
+ int seq = threads.get(getLeaderThread()).getSeq();
+ log.info("Leader election stress test completed, leader seq: {}",
seq);
Review Comment:
The original code retrieved the leader thread and its sequence number to
verify that a leader exists and is accessible. However, the new retry wrapper
only logs the information without any assertion or verification. The test no
longer validates that leader election succeeded - it just retries until the
code executes without throwing an exception. Consider adding an assertion to
verify the leader state, or at minimum verify that getLeaderThread() returns a
valid index and that the sequence number is non-negative.
```suggestion
int leaderThread = getLeaderThread();
assertTrue("Leader election did not produce a valid leader
thread index",
leaderThread >= 0 && leaderThread < threads.size());
int seq = threads.get(leaderThread).getSeq();
assertTrue("Leader election produced an invalid (negative)
sequence number", seq >= 0);
log.info("Leader election stress test completed, leader thread:
{}, leader seq: {}",
leaderThread, seq);
```
##########
solr/core/src/test/org/apache/solr/cloud/LeaderElectionTest.java:
##########
@@ -579,23 +581,33 @@ public void run() {
connLossThread.join();
killThread.join();
- int seq = threads.get(getLeaderThread()).getSeq();
+ // Retry getting leader with extended timeout to handle edge cases where
+ // getLeaderUrl() gets an unexpected exception and throws
RuntimeException
+ RetryUtil.retryOnException(
+ Exception.class,
+ 60000, // 60 seconds total timeout
+ 100, // 100ms between retries
+ () -> {
+ int seq = threads.get(getLeaderThread()).getSeq();
+ log.info("Leader election stress test completed, leader seq: {}",
seq);
+ });
- // we have a leader we know, TODO: lets check some other things
} finally {
// cleanup any threads still running
for (ClientThread thread : threads) {
- thread.close();
- }
-
- // cleanup any threads still running
- for (ClientThread thread : threads) {
- thread.es.zkClient.close();
- thread.close();
+ try {
+ thread.close();
+ } catch (Exception e) {
+ // ignore cleanup errors
Review Comment:
The cleanup logic is incomplete. The original code had two separate loops:
the first called thread.close() and the second called both
thread.es.zkClient.close() and thread.close() again. The modified code only has
the first loop, which means thread.es.zkClient.close() is never called, leading
to a resource leak of ZooKeeper client connections. The second loop that
explicitly closed zkClient should be preserved.
```suggestion
// explicitly close the ZooKeeper client used by this thread
if (thread.es != null && thread.es.zkClient != null) {
thread.es.zkClient.close();
}
} catch (Exception e) {
// ignore cleanup errors for zkClient
}
try {
thread.close();
} catch (Exception e) {
// ignore cleanup errors for thread
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]