+1 I agree with Brandon. It's more like a bug imo.

On 20/9/23 21:42, Caleb Rackliffe wrote:
+1 on a 5.0 backport

On Wed, Sep 20, 2023 at 2:26 PM Brandon Williams <dri...@gmail.com> wrote:

    I think it could be argued that not retrying messages is a bug, I am
    +1 on including this in 5.0.

    Kind Regards,
    Brandon

    On Tue, Sep 19, 2023 at 1:16 PM David Capwell <dcapw...@apple.com>
    wrote:
    >
    > To try to get repair more stable, I added optional retry logic
    (patch is still in review) to a handful of critical repair verbs. 
    This patch is disabled by default but allows you to opt-in to
    retries so ephemeral issues don’t cause a repair to fail after
    running for a long time (assuming they resolve within the retry
    window). There are 2 protocol level changes to enable this:
    VALIDATION_RSP and SYNC_RSP now send an ACK (if the sender doesn’t
    attach a callback, these ACKs get ignored in all versions; see
    org.apache.cassandra.net
    <http://org.apache.cassandra.net>.ResponseVerbHandler#doVerb and
    Verb.REPAIR_RSP).  Given that we have already forked, I believe we
    would need to give a waiver to allow this patch due to this change.
    >
    > The patch was written on trunk, but figured back porting 5.0
    would be rather trivial and this was brought up during the review,
    so floating this to a wider audience.
    >
    > If you look at the patch you will see that it is very large, but
    this is only to make testing of repair coordination easier and
    deterministic, the biggest code changes are:
    >
    > 1) Moving from ActiveRepairService.instance to
    ActiveRepairService.instance() (this is the main reason so many
    files were touched; this was needed so unit tests don’t load the
    whole world)
    > 2) Repair no longer reaches into global space and instead is
    provided the subsystems needed to perform repair; this change is
    local to repair code
    >
    > Both of these changes were only for testing as they allow us to
    simulate 1k repairs in around 15 seconds with 100% deterministic
    execution.

Reply via email to