[
https://issues.apache.org/jira/browse/IGNITE-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669984#comment-16669984
]
Vladimir Ozerov edited comment on IGNITE-9828 at 10/31/18 11:38 AM:
--------------------------------------------------------------------
[~rkondakov], [~gvvinblade], my comments:
1) Currently we have five (!!!) nested loops: group -> partitions -> gaps->
caches -> listeners. Looks like we'd better to assemble gaps, and then apply
them to underlying caches only once. So it would be: groups -> partitions ->
caches -> listeners -> gaps. This way we will minimize garbage and unnecessary
method calls.
2) I think we should not use {{CacheContinuousQueryManager#skipUpdateCounter}}.
First, it generates unnecessary garbage ({{CounterSkipContext}}). Second, it
has very tricky logic, and from what I understand only small subset of the code
could potentially be invoked during the specific case we target. Let's create
separate method which will do only necessary things and put clear assertions in
code for the sake of safety and efficiency.
3) {{GridDhtPartitionTopologyImpl#finalizeUpdateCounters}] - we always pass
{{backup=false}}. But don't we filter out non-backup partitions in the first
place? Is it ok?
4) {{GridDhtPartitionTopologyImpl.finalizeUpdateCounters}} - topology read lock
is obtained in this method. However, several calls deeper we reach
{{GridCacheDataStore#finalizeUpdateCounters}} where store could be potentially
initialized under checkpoint read lock. But we have a strict policy that
checkpoint read lock must be acquired before topology read lock. Otherwise
deadlocks are possible. We need to have an answer to the question: "Is it
possible that a store (partition) will be uninitialized in this place?"
5) {{GridDhtPartitionsExchangeFuture#finalizePartitionCounters}} -
{{doInParallel}} could occupy all system threads. This is safe only assuming
that no blocking operation ever exist in system thread. Our experience shows
that this is not always the case. To be on the safe side we should not occupy
all system threads. Instead, let's occupy (cnt - 2), this is how it is done in
other places where {{doInParallel}} overload is used. Be careful with situation
when there are <= 2 system threads.
6) Once p.1, p.2 and p.4 are addressed, we should re-evaluate whether we need
to separate threads or not. Typically we will have not so many gaps. So if we
rule out any heavy operations and user code execution and make more efficient
loops, then there is no need to delegate to separate threads.
was (Author: vozerov):
[~rkondakov], [~gvvinblade], my comments:
1) Currently we have five (!!!) nested loops: group -> partitions -> gaps->
caches -> listeners. Looks like we'd better to assemble gaps, and then apply
them to underlying caches only once. So it would be: groups -> partitions ->
caches -> listeners -> gaps. This way we will minimize garbage and unnecessary
method calls.
2) I think we should not use {{CacheContinuousQueryManager#skipUpdateCounter}}.
First, it generates unnecessary garbage ({{CounterSkipContext}}). Second, it
has very tricky logic, and from what I understand only small subset of the code
could potentially be invoked during the specific case we target. Let's create
separate method which will do only necessary things and put clear assertions in
code for the sake of safety and efficiency.
3) {{GridDhtPartitionTopologyImpl#finalizeUpdateCounters}] - we always pass
{{backup-false}}. But don't we filter out non-backup partitions in the first
place? Is it ok?
4) {{GridDhtPartitionTopologyImpl.finalizeUpdateCounters}} - topology read lock
is obtained in this method. However, several calls deeper we reach
{{GridCacheDataStore#finalizeUpdateCounters}} where store could be potentially
initialized under checkpoint read lock. But we have a strict policy that
checkpoint read lock must be acquired before topology read lock. Otherwise
deadlocks are possible. We need to have an answer to the question: "Is it
possible that a store (partition) will be uninitialized in this place?"
5) {{GridDhtPartitionsExchangeFuture#finalizePartitionCounters}} -
{{doInParallel}} could occupy all system threads. This is safe only assuming
that no blocking operation ever exist in system thread. Our experience shows
that this is not always the case. To be on the safe side we should not occupy
all system threads. Instead, let's occupy (cnt - 2), this is how it is done in
other places where {{doInParallel}} overload is used. Be careful with situation
when there are <= 2 system threads.
6) Once p.1, p.2 and p.4 are addressed, we should re-evaluate whether we need
to separate threads or not. Typically we will have not so many gaps. So if we
rule out any heavy operations and user code execution and make more efficient
loops, then there is no need to delegate to separate threads.
> MVCC: Continuous query failover.
> --------------------------------
>
> Key: IGNITE-9828
> URL: https://issues.apache.org/jira/browse/IGNITE-9828
> Project: Ignite
> Issue Type: Task
> Components: mvcc
> Reporter: Roman Kondakov
> Assignee: Roman Kondakov
> Priority: Major
> Fix For: 2.7
>
>
> We need to implement failover procedure for CQ with MVCC caches. Thoughts:
> * CQ failover tests should be adopted for mvcc scenarios. See
> {{IgniteCacheQuerySelfTestSuite4}}
> * We need to ensure the correctness of CQ and partition counters consistency
> in cases when some of TX participants are in prepared state and others are
> marked as rollback only. It looks like it doesn't work properly now.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)