Re: Re[2]: What does "First 10 long running cache futures" ?

John Smith Tue, 02 Nov 2021 16:51:59 -0700

That feels like what happened. But I just rebooted the servers. But I also
force all my clients = true.


On Mon, 1 Nov 2021 at 16:40, Mike Wiesenberg <mike.wiesenb...@gmail.com>
wrote:

> In my experience, seeing the ' First 10 long running cache futures'
> warning (which should probably be logged as Fatal) means the ignite node
> your client is connected to is busted and needs to be rebooted. In
> addition, since ignite clients aren't smart enough to disconnect from a
> node in a busted state and try another one, you need to reboot all clients
> which may be connected to those nodes. All in all, a very problematic
> situation, and the lack of automatic failover in particular has made me
> seriously question if Ignite is production-ready software.
>
> On Wed, Oct 6, 2021 at 8:52 AM John Smith <java.dev....@gmail.com> wrote:
>
>> Ok. For now I rebooted all nodes... But it's fairly easy to reproduce.
>>
>> On Wed., Oct. 6, 2021, 2:17 a.m. Zhenya Stanilovsky, <arzamas...@mail.ru>
>> wrote:
>>
>>>
>>> Ok, seems something goes wrong on node with
>>> id=36edbfd5-4feb-417e-b965-bdc34a0a6f4f If you still have a problem, can u
>>> send here or directly by me these logs ?
>>>
>>>
>>>
>>>
>>> And finally this on the coordinator node....
>>>
>>> [14:07:41,282][WARNING][exchange-worker-#42%xxxxxx%][GridDhtPartitionsExchangeFuture]
>>> Unable to await partitions release latch within timeout. Some nodes have
>>> not sent acknowledgement for latch completion. It's possible due to
>>> unfinishined atomic updates, transactions or not released explicit locks on
>>> that nodes. Please check logs for errors on nodes with ids reported in
>>> latch `pendingAcks` collection [latch=ServerLatch [permits=1,
>>> pendingAcks=HashSet [36edbfd5-4feb-417e-b965-bdc34a0a6f4f],
>>> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
>>> topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]]]]]
>>>
>>> On Tue, 5 Oct 2021 at 10:07, John Smith <java.dev....@gmail.com
>>> <//e.mail.ru/compose/?mailto=mailto%3ajava.dev....@gmail.com>> wrote:
>>>
>>> And I see this...
>>>
>>> [14:04:15,150][WARNING][exchange-worker-#43%raange%][GridDhtPartitionsExchangeFuture]
>>> Unable to await partitions release latch within timeout. For more details
>>> please check coordinator node logs [crdNode=TcpDiscoveryNode
>>> [id=36ad785d-e344-43bb-b685-e79557572b54,
>>> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
>>> [127.0.0.1, xxxxxx.65], sockAddrs=HashSet [xxxxxx-0002/xxxxxx.65:47500, /
>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>> lastExchangeTime=1633370987399, loc=false,
>>> ver=2.8.1#20200521-sha1:86422096, isClient=false]] [latch=ClientLatch
>>> [coordinator=TcpDiscoveryNode [id=36ad785d-e344-43bb-b685-e79557572b54,
>>> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
>>> [127.0.0.1, xxxxxx.65], sockAddrs=HashSet [xxxxxx-0002/xxxxxx.65:47500, /
>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>> lastExchangeTime=1633370987399, loc=false,
>>> ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
>>> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
>>> topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]]]]]
>>>
>>> On Tue, 5 Oct 2021 at 10:02, John Smith <java.dev....@gmail.com
>>> <//e.mail.ru/compose/?mailto=mailto%3ajava.dev....@gmail.com>> wrote:
>>>
>>> Actually to be more clear...
>>>
>>> http://xxxxxx-0001:8080/ignite?cmd=version responds immediately.
>>>
>>> http://xxxxxx-0001:8080/ignite?cmd=size&cacheName=my-cache doesn't
>>> respond at all.
>>>
>>> On Tue, 5 Oct 2021 at 09:59, John Smith <java.dev....@gmail.com
>>> <//e.mail.ru/compose/?mailto=mailto%3ajava.dev....@gmail.com>> wrote:
>>>
>>> Yeah ever since I got this erro for example the REST APi wont return and
>>> the request are slower. But when I connect with visor I can get stats I can
>>> scan the cache etc...
>>>
>>> Is it possible that these async futures/threads are not released?
>>>
>>> On Tue, 5 Oct 2021 at 04:11, Zhenya Stanilovsky <arzamas...@mail.ru
>>> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>>>
>>> Hi, this is just a warning shows that something suspicious observed.
>>> There is no simple reply for your question, in common case all these
>>> messages are due to cluster (resources or settings)  limitation.
>>> Check documentation for tuning performance [1]
>>>
>>> [1]
>>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>>>
>>>
>>>
>>> Hi, using 2.8.1 I understand the message as in my async TRX is taking
>>> longer but is there a way to prevent it?
>>>
>>> When this happened I was pushing about 50, 000 get/puts per second from
>>> my API.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: Re[2]: What does "First 10 long running cache futures" ?

Reply via email to