OversizedMessageException in AntiEntropyStage

Sébastien Rebecchi Wed, 10 Nov 2021 00:52:25 -0800

Hi,

Digging in Cassandra logs, I saw this exception happening several times,
see trace below.


I was wondering, what can cause such big messages, and could this error
make a repair session fail? I currently see that my repairs are very long
and sometimes I even stop them cause it seems they are just hanging and
will never return. For information, I run repair (node+table) by
(node+table), 1 after 1 sequentially, and I stop after 5 days stucking on
the same (node+table). I see the repair session is still alive, the pid is
here, no repair failure message, just nothing happens and repair % is not
increasing.

Is there something I can do to avoid that error? If I could only fix by
changing the configuration, what is the configuration setting to increase?
I guess it is commitlog_segment_size_in_mb, but I am not able to find a
definite confirmation on the web.

Here some informations of the installation:
- Apache Cassandra 4.0 GA release
- output of cqlsh: cqlsh 6.0.0 | Cassandra 4.0.1 | CQL spec 3.4.5 | Native
protocol v5
- OS: CentOS Linux release 8.4.2105
- Output of nodetool status (hiding address column which is sensitive). As
you can see cluster remains imbalanced 2 weeks after I joined the 2 last
nodes (the ones having the lesser load), before it was perfectly balanced
with the same data model, if this information may help:
--          Load        Tokens  Owns (effective)  Host ID
            Rack
UN     712.72 GiB  8       32.6%
00f8bb86-5283-4b01-9819-fe4d59337680  rack1
UN     830.91 GiB  8       35.5%
b9490ee5-44ba-4898-add7-159a7eeb06d9  rack1
UN   763.69 GiB  8       34.7%
931ccba9-9aef-4d79-9fa0-53e4654554f5  rack1
UN    720.29 GiB  8       33.0%
4b5445ad-fb3c-419e-98b9-80599014e2b4  rack1
UN     273.6 GiB   8       22.8%
34db97d3-6c6d-4544-b656-9d4ae2a82dca  rack1
UN    616.92 GiB  8       41.3%
631d7b77-de74-4fe3-86a2-8bd3beec191e  rack1

Thank you for your help!

Sébastien.

--

/var/log/cassandra/system.log:ERROR [AntiEntropyStage:1] 2021-11-05
00:48:28,074 CassandraDaemon.java:579 - Exception in thread
Thread[AntiEntropyStage:1,5,main]
/var/log/cassandra/system.log-org.apache.cassandra.net.Message$OversizedMessageException:
Message of size 140580605 bytes exceeds allowed maximum of 134217728 bytes
/var/log/cassandra/system.log- at
org.apache.cassandra.net.OutboundConnection.enqueue(OutboundConnection.java:328)
/var/log/cassandra/system.log- at
org.apache.cassandra.net.OutboundConnections.enqueue(OutboundConnections.java:84)
/var/log/cassandra/system.log- at
org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:338)
/var/log/cassandra/system.log- at
org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
/var/log/cassandra/system.log- at
org.apache.cassandra.net.MessagingService.send(MessagingService.java:327)
/var/log/cassandra/system.log- at
org.apache.cassandra.net.MessagingService.send(MessagingService.java:314)
/var/log/cassandra/system.log- at
org.apache.cassandra.repair.Validator.respond(Validator.java:269)
/var/log/cassandra/system.log- at
org.apache.cassandra.repair.Validator.run(Validator.java:257)
/var/log/cassandra/system.log- at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
/var/log/cassandra/system.log- at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
/var/log/cassandra/system.log- at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
/var/log/cassandra/system.log- at java.lang.Thread.run(Thread.java:748)

OversizedMessageException in AntiEntropyStage

Reply via email to