[
https://issues.apache.org/jira/browse/IGNITE-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mirza Aliev updated IGNITE-16406:
---------------------------------
Description:
For some reasons select operation couldn't return expected number of rows. We
noticed that this happens when raft leader is changing. To increase
reproducibility, we can slow down a bit message handling, for example by adding
this code to {{MessageServiceImpl#onMessage(java.lang.String,
org.apache.ignite.network.NetworkMessage)}}
{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}
Possible direction of research:
we could check that we do not lose cursor.next command as a raft response
during the process of leader changing.
UPD: We decided to add checking for consistency between received scan command
and handled scan command in partition listener, so now a user will get state
machine error and could retry his command. But we found another inconsistency
when RocksDB could return hasNext == false after an unexpected step down of the
leader (https://issues.apache.org/jira/browse/IGNITE-16478).
So, we decided then to change the replica factor to 1 in
{{ItMixedQueriesTest}}, so there will be only one node in a partition Raft
group, but we couldn't enable {{ItMixedQueriesTest}} because of new error
https://issues.apache.org/jira/browse/IGNITE-16502
was:
For some reasons select operation couldn't return expected number of rows. We
noticed that this happens when raft leader is changing. To increase
reproducibility, we can slow down a bit message handling, for example by adding
this code to {{MessageServiceImpl#onMessage(java.lang.String,
org.apache.ignite.network.NetworkMessage)}}
{code:java}
if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
try {
Thread.sleep(300);
} catch (Exception ex) {
ex.printStackTrace();
}
}
{code}
Possible direction of research:
we could check that we do not lose cursor.next command as a raft response
during the process of leader changing
> SQL select operation could return incomplete data
> -------------------------------------------------
>
> Key: IGNITE-16406
> URL: https://issues.apache.org/jira/browse/IGNITE-16406
> Project: Ignite
> Issue Type: Bug
> Reporter: Mirza Aliev
> Assignee: Mirza Aliev
> Priority: Blocker
> Labels: ignite-3
>
> For some reasons select operation couldn't return expected number of rows. We
> noticed that this happens when raft leader is changing. To increase
> reproducibility, we can slow down a bit message handling, for example by
> adding this code to {{MessageServiceImpl#onMessage(java.lang.String,
> org.apache.ignite.network.NetworkMessage)}}
> {code:java}
> if (ThreadLocalRandom.current().nextInt(3) % 2 == 0) {
> try {
> Thread.sleep(300);
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
> {code}
> Possible direction of research:
> we could check that we do not lose cursor.next command as a raft response
> during the process of leader changing.
> UPD: We decided to add checking for consistency between received scan command
> and handled scan command in partition listener, so now a user will get state
> machine error and could retry his command. But we found another inconsistency
> when RocksDB could return hasNext == false after an unexpected step down of
> the leader (https://issues.apache.org/jira/browse/IGNITE-16478).
> So, we decided then to change the replica factor to 1 in
> {{ItMixedQueriesTest}}, so there will be only one node in a partition Raft
> group, but we couldn't enable {{ItMixedQueriesTest}} because of new error
> https://issues.apache.org/jira/browse/IGNITE-16502
--
This message was sent by Atlassian Jira
(v8.20.1#820001)