[
https://issues.apache.org/jira/browse/CASSANDRA-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704821#comment-17704821
]
Stefan Miklosovic commented on CASSANDRA-18366:
-----------------------------------------------
4.0 currently has this in DefaultFSErrorHandler
{code}
@Override
public void handleCorruptSSTable(CorruptSSTableException e)
{
if (!StorageService.instance.isDaemonSetupCompleted())
handleStartupFSError(e);
switch (DatabaseDescriptor.getDiskFailurePolicy())
{
case die:
case stop_paranoid:
// exception not logged here on purpose as it is already logged
logger.error("Stopping transports as disk_failure_policy is " +
DatabaseDescriptor.getDiskFailurePolicy());
StorageService.instance.stopTransports();
break;
}
}
{code}
"case: die" was added in 18294.
Now, when I remove this "case die:", all tests pass.
However, when I do this:
{code}
@Override
public void handleCorruptSSTable(CorruptSSTableException e)
{
if (!StorageService.instance.isDaemonSetupCompleted())
handleStartupFSError(e);
switch (DatabaseDescriptor.getDiskFailurePolicy())
{
case die:
// exception not logged here on purpose as it is already logged
logger.error("Stopping transports as disk_failure_policy is " +
DatabaseDescriptor.getDiskFailurePolicy());
StorageService.instance.stopTransports();
break;
case stop_paranoid:
// exception not logged here on purpose as it is already logged
logger.error("Stopping transports as disk_failure_policy is " +
DatabaseDescriptor.getDiskFailurePolicy());
StorageService.instance.stopTransports();
break;
}
}
{code}
It fails again, obviously. Basically, when we hit "die" and we stop transports,
for some unknow-yet reason, the code in FailingRepairTest which waits for this
loops forever:
{code}
IInvokableInstance replicaInstance = CLUSTER.get(replica);
while (replicaInstance.killAttempts() <= 0)
Uninterruptibles.sleepUninterruptibly(50, TimeUnit.MILLISECONDS);
{code}
> Test failure: org.apache.cassandra.distributed.test.FailingRepairTest -
> testFailingMessage[VALIDATION_REQ/parallel/true]
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18366
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18366
> Project: Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Brandon Williams
> Priority: Normal
> Fix For: 4.0.x
>
>
> First seen
> [here|https://app.circleci.com/pipelines/github/driftx/cassandra/928/workflows/f4e93a72-d4aa-47a2-996f-aa3fb018d848/jobs/16206]
> this test times out for me consistently on both j8 and j11 where 4.1 and
> trunk do not.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]