Cluster nodes failures while under load

alin-corodescu Mon, 13 Nov 2017 03:22:18 -0800

Hello,

I have a pretty vague description for the problem I am having with my Ignite
cluster and I was wondering if anyone is experiencing similar behaviour. I
am holding an in-memory database on an Ignite cluster, with about 500 GB of
data in it (across 3 nodes). Data is constantly being streamed into the
cache while other entries are evicted / expire (no memory problems). When
trying to run complex queries (diverse, and sometimes they work, sometimes
they don't) using the JDBC driver, some nodes fail (they either leave the
topolgy - but somehow the node is still up, forming a topology by himself,
or the ignite node gets shut down completely) - this usually happens after a
spike in CPU usage (due to query execution). Logs aren't that helpful in
this matter, simply saying that the respective node is unreachable). I
tested this cluster using the 2.1 version of Ignite, which doesn't lazily
stream the result sets from other nodes to the reducer node (as it can be
done in 2.3), and I suspect the behaviour might be caused by loading the
whole result set in memory at once. 
I tried adjusting the JVM heap size to 20G per node, set failure detection
timeout on each node to be 50000 (10 times higher than the default), reduce
query parallelism for the cache, increased the system thread pool size, but
to no avail. One thing to mention is that the cpu usage even during spikes
was at about 70%.


I am trying my luck here, maybe someone experienced something similar, as I
am aware that the description is not very precise. I will update with any
other findings which are relevant to this problem.

Thank you,
Alin



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Cluster nodes failures while under load

Reply via email to