Problem: Ignite client hangs forever when performing a cache operation. We
have 6 ignite servers running, the problem goes away when reducing this to
3. What effect does expanding/reducing the server cluster have that could
cause this?

See attached for sample stack trace of hanging client thread, server config
snippet, client config snippet, and cache key snippet. From looking through
the logs, there essentially seem to be various TCP communication errors such
as the attached client and server errors. We tried increasing the (client)
failure detection timeout values as suggested by the server error message,
but that just made system startup hang for a long time (close to an hour).

Usage:

We have large number data objects (64k-400M) stored within HDF5 files and
process hundreds of millions of records a day, with total data throughput
ranging from 500GB - 10TB of data a day. We utilize ignite as an in memory
distributed cache in front of the process that interacts with the HDF5
files.

Configuration:

1. Ignite version is 2.9.
2. The configuration is a 6 node ignite cluster using a partitioned cache.
3. Ignite’s persistence is disabled and we wrote a cache store
implementation to persist the cache entries to the backing hdf5 files.
4. Ignite is configured in a write behind / read through manner.
5. There are four primary caches split up by data type to reduce amount of
traffic on any one cache. The caches are all configured the same except for
write behind properties and the data types within each cache to help manage
how much data is in a specific cache.
6. The cache key is a compound object of path to the file and then a group /
locator string within the file.

Hardware:

1. In our failure site, there are 6 physical systems running Red Hat
Hyperconverged Infrastructure.
2. Each physical node had a pinned VM running apache ignite. The VM has
128GB of memory. Ignite is configured with 16GB of heap memory, and 64GB of
off heap cache.
3. There are 6 other VMs, each running 3 processes that all store to ignite.
4. There is a single VM that fronts the HDF5 files that Ignite talks to for
persistent storage.

hangingStackTrace.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t3178/hangingStackTrace.txt>
  
serverConfig.xml
<http://apache-ignite-users.70518.x6.nabble.com/file/t3178/serverConfig.xml>  
clientConfig.xml
<http://apache-ignite-users.70518.x6.nabble.com/file/t3178/clientConfig.xml>  
DataStoreKey.java
<http://apache-ignite-users.70518.x6.nabble.com/file/t3178/DataStoreKey.java>  

serverErrors.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t3178/serverErrors.txt>  
clientErrors.txt
<http://apache-ignite-users.70518.x6.nabble.com/file/t3178/clientErrors.txt>  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to