Re: Hanging repairs in Cassandra

Bowen Song Wed, 19 Jan 2022 02:40:02 -0800

May I ask how do you run the repair? Is it manually via the nodetoolcommand line tool, or a tool or script, such as Cassandra Reaper? If youare running the repairs manually, would you mind give Cassandra Reaper atry?

I have a fairly large cluster under my management, and last time I tried"nodetool repair -full -pr" on a large table was maybe 3 years ago, andit randomly stuck (i.e. it sometimes works fine, sometimes stuck). Tofinish the repair, I had to either keep retrying or break down the tokenranges into smaller subsets and use the "-st" and "-et" parameters.Since then I've switched to use Cassandra Reaper and have never hadsimilar issues.



On 19/01/2022 02:22, manish khandelwal wrote:

Agree with you on that. Just wanted to highlight that I amexperiencing the same behavior.


Regards
Manish

On Tue, Jan 18, 2022, 22:50 Bowen Song <bo...@bso.ng> wrote:

    The link was related to Cassandra 1.2, and it was 9 years ago.
    Cassandra was full of bugs at that time, and it has improved a lot
    since then. For that reason, I would rather not compare the issue
    you have with some 9 years old issues someone else had.


    On 18/01/2022 16:11, manish khandelwal wrote:

I am not sure what is happening but it has happened thrice. It is
happening that merkle trees are not received from nodes of other
data center. Getting issue on similar lines as mentioned here

https://user.cassandra.apache.narkive.com/GTbqO6za/repair-hangs-when-merkle-tree-request-is-not-acknowledged

Regards
Manish

On Tue, Jan 18, 2022, 18:18 Bowen Song <bo...@bso.ng> wrote:

Keep reading the log on the initiator and the node sending
the merkle tree, anything follows that? FYI, not all log has
the repair ID in it, therefore please read the relevant logs
in the chronological order without filtering (e.g. "grep") on
the repair ID.

I'm sceptical network issue is causing all this. The merkle
tree is send over TCP connections, therefore some dropped
packets over a few second of network connectivity issue
occasionally should not cause any issue to the repair. You
should only start to see network related issues if the
network problem persists over a period of time close to or
longer than the timeout values set in the cassandra.yaml
file, in the case of repair it's the request_timeout_in_ms
which is default to 10 seconds.

Carry on examine the logs, you may find something useful.

BTW, talking about stuck repair, in my experience this can
happen if two or more repairs were ran concurrently on the
same node (regardless which node was the initiator) involving
the same table. This could happen if you accidentally ran
"nodetool repair" on two nodes and both involve the same
table, or if you cancelled and then restarted a "nodetool
repair" on a node without waiting or killing the remannings
of the first repair session on other nodes.

On 18/01/2022 11:55, manish khandelwal wrote:

        In the system logs, on the node where repair was initiated,
        I see that the node has requested merkle tree from all nodes
        including itself

        INFO  [Repair#3:1] 2022-01-14 03:32:18,805
        RepairJob.java:172 - *[repair
        #6e3385e0-74d1-11ec-8e66-9f084ace9968*] Requesting merkle
        trees for *tablename* (to [*/xyz.abc.def.14,
        /xyz.abc.def.13, /xyz.abc.def.12, /xyz.mkn.pq.18,
        /xyz.mkn.pq.16, /xyz.mkn.pq.17*])
        INFO  [AntiEntropyStage:1] 2022-01-14 03:32:18,841
        RepairSession.java:180 - [repair
        #6e3385e0-74d1-11ec-8e66-9f084ace9968] Received merkle tree
        for *tablename* from */xyz.mkn.pq.17*
        INFO  [AntiEntropyStage:1] 2022-01-14 03:32:18,847
        RepairSession.java:180 - [repair
        #6e3385e0-74d1-11ec-8e66-9f084ace9968] Received merkle tree
        for *tablename* from */xyz.mkn.pq.16*
        INFO  [AntiEntropyStage:1] 2022-01-14 03:32:18,851
        RepairSession.java:180 - [repair
        #6e3385e0-74d1-11ec-8e66-9f084ace9968] Received merkle tree
        for *tablename* from */xyz.mkn.pq.18*
        INFO  [AntiEntropyStage:1] 2022-01-14 03:32:18,856
        RepairSession.java:180 - [repair
        #6e3385e0-74d1-11ec-8e66-9f084ace9968] Received merkle tree
        for *tablename* from */xyz.abc.def.14*
        Line 2480: INFO  [AntiEntropyStage:1] *2022-01-14
        03:32:18*,876 RepairSession.java:180 - [*repair
        #6e3385e0-74d1-11ec-8e66-9f084ace9968*] Received merkle tree
        for *tablename* from */xyz.abc.def.12*
        *
        *
        As per the logs merkle tree is not received from node with
        ip *xyz.abc.def.13*
        *
        *
        In the system logs of node with ip *xyz.abc.def.13, *I can
        see following logs

        NFO  [AntiEntropyStage:1] *2022-01-14 03:32:18*,850
        Validator.java:281 - [*repair
        #6e3385e0-74d1-11ec-8e66-9f084ace9968*] Sending completed
        merkle tree to */* *xyz.mkn.pq.17*  for *keyspace.tablename*

        From the above I inferred that the repair task has become
        orphaned since it is waiting for merkle tree from a node and
        it is not going to receive it since it has been lost in the
        network somewhere between.

        Regards
        Manish

        On Tue, Jan 18, 2022 at 4:39 PM Bowen Song <bo...@bso.ng> wrote:

            The entry in the debug.log is not specific to a repair
            session, and it could also be caused by reasons other
            than network connectivity issue, such as long STW GC
            pauses. I usually don't start troubleshooting an issue
            from the debug log, as it can be rather noisy. The
            system.log is a better starting point.

            If I was to troubleshoot the issue, I would start from
            the system logs on the node that initiated the repair,
            i.e. the node you ran the "nodetool repair" command on.
            Follow the repair ID (an UUID) in the logs on all nodes
            involved in the repair and read all related logs in
            chronological order to find out what exactly had happened.

            BTW, If the issue is easily reproducible, I would re-run
            the repair with a reduce scope (such as table and token
            range) to get less logs related to the repair session.
            Less logs means less time spend on reading and analysing
            them.

            Hope this helps.

            On 18/01/2022 10:03, manish khandelwal wrote:

            I have a Cassandra 3.11.2 cluster with two DCs. While
            running repair , I am observing the following behavior.

            I am seeing that node is not able to receive merkle
            tree from one or two nodes. Also I am able to see that
            the missing nodes did send the merkle tree but it was
            not received. This make repair hangs on consistent
            basis. In netstats I can see output as follows

            *Mode: NORMAL*
            *Not sending any streams. Attempted: 7858888*
            *Mismatch (Blocking): 2560*
            *Mismatch (Background): 17173*
            *Pool Name Active Pending Completed Dropped*
            *Large messages n/a 0 6313 3*
            *Small messages n/a 0 55978004 3*
            *Gossip messages n/a 0 93756 125**Does it represent
            network issues? In Debug logs I saw something*DEBUG
            [MessagingService-Outgoing-hostname/xxx.yy.zz.kk-Large]
            2022-01-14 05:00:19,031 OutboundTcpConnection.java:349
            - Error writing to hostname/xxx.yy.zz.kk
            java.io.IOException: Connection timed out
            at sun.nio.ch
            <http://sun.nio.ch/>.FileDispatcherImpl.write0(Native
            Method) ~[na:1.8.0_221]
            at sun.nio.ch
            
<http://sun.nio.ch/>.SocketDispatcher.write(SocketDispatcher.java:47)
            ~[na:1.8.0_221]
            at sun.nio.ch
            <http://sun.nio.ch/>.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
            ~[na:1.8.0_221]
            at sun.nio.ch
            <http://sun.nio.ch/>.IOUtil.write(IOUtil.java:65)
            ~[na:1.8.0_221]
            at sun.nio.ch
            
<http://sun.nio.ch/>.SocketChannelImpl.write(SocketChannelImpl.java:471)
            ~[na:1.8.0_221]
            at
            java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
            ~[na:1.8.0_221]
            at
            java.nio.channels.Channels.writeFully(Channels.java:98)
            ~[na:1.8.0_221]
            at
            java.nio.channels.Channels.access$000(Channels.java:61)
            ~[na:1.8.0_221]
            at
            java.nio.channels.Channels$1.write(Channels.java:174)
            ~[na:1.8.0_221]
            at
            
net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
            ~[lz4-1.3.0.jar:na]
            at
            
net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:158)
            ~[lz4-1.3.0.jar:na] (edited)

            Does this show any network fluctuations?

            Regards
            Manish

Re: Hanging repairs in Cassandra

Reply via email to