Re: IEP-50 Thin Client Continuous Queries

2020-07-15 Thread Alex Plehanov
Pavel,

> OP_QUERY_CONTINUOUS_END_NOTIFICATION is another client -> server message
I think you mean "server -> client" here.

But I still didn't get why do we need it.
I've briefly looked to the POC implementation and, as far as I understand,
OP_QUERY_CONTINUOUS_END_NOTIFICATION can be sent only when
OP_RESOURCE_CLOSE is received by server (client closes the CQ explicitly).
If client closes CQ it doesn't want to receive any new events. Why can't we
just ignore events for this CQ after that moment?
Also, in current implementation OP_QUERY_CONTINUOUS_END_NOTIFICATION is
sent before OP_RESOURCE_CLOSE response, so OP_RESOURCE_CLOSE response can
be used the same way as OP_QUERY_CONTINUOUS_END_NOTIFICATION.

Such notification (or something more generalized like OP_RESOURCE_CLOSED)
can be helpful if CQ is closed by someone else (for example if
administrator call QueryMXBean.cancelContinuous), but AFAIK now we don't
have callbacks for this action on user side.


ср, 15 июл. 2020 г. в 01:26, Pavel Tupitsyn :

> Igniters,
>
> I've made an important change to the IEP (and the POC):
> OP_QUERY_CONTINUOUS_END_NOTIFICATION is another client -> server message
> that notifies the client that the query has stopped and no more events
> should be expected.
>
> This is important because client can't immediately stop listening
> for OP_QUERY_CONTINUOUS_EVENT_NOTIFICATION
> after sending OP_RESOURCE_CLOSE - some more events can be present in one of
> the buffers/queues of the server and/or the OS.
>
> Let me know if this makes sense.
>
> On Tue, Jul 14, 2020 at 3:25 PM Pavel Tupitsyn 
> wrote:
>
> > I've removed Initial Query from the POC and IEP (left a note there about
> > the decision).
> >
> > Since there are no other comments and concerns, I'll move on with the
> > final implementation.
> >
> > On Fri, Jul 10, 2020 at 4:14 PM Pavel Tupitsyn 
> > wrote:
> >
> >> Igor, Alex,
> >>
> >> I was aware of the duplicates issue with the initial query, but I did
> not
> >> give it a second thought.
> >>
> >> Now I see that Vladimir was right - Initial query seems to be pointless,
> >> since users can
> >> achieve the same by simply invoking the regular query.
> >>
> >> I will remove Initial Query from the IEP and POC next week if there are
> >> no objections by then.
> >>
> >>
> >> On Fri, Jul 10, 2020 at 3:58 PM Alex Plehanov 
> >> wrote:
> >>
> >>> Igor, Pavel,
> >>>
> >>> Here is discussion about removal: [1]
> >>>
> >>> [1] :
> >>>
> >>>
> http://apache-ignite-developers.2346864.n4.nabble.com/ContinuousQueryWithTransformer-implementation-questions-2-td21418i20.html#a22041
> >>>
> >>> пт, 10 июл. 2020 г. в 17:44, Igor Sapego :
> >>>
> >>> > Can not find proposal to remove them, so maybe it was not on devlist,
> >>> > but here is discussion about the problem: [1]
> >>> >
> >>> > [1] -
> >>> >
> >>> >
> >>>
> http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-duplicates-td39444.html
> >>> >
> >>> > Best Regards,
> >>> > Igor
> >>> >
> >>> >
> >>> > On Fri, Jul 10, 2020 at 3:27 PM Pavel Tupitsyn  >
> >>> > wrote:
> >>> >
> >>> > > > What's about "stop" message? How can user unsubscribe from
> >>> receiving
> >>> > > notifications?
> >>> > > OP_RESOURCE_CLOSE is used for that. I've updated the IEP in an
> >>> attempt to
> >>> > > make this cleaner.
> >>> > >
> >>> > > >  I've seen discussions on removing initial query from continuous
> >>> > queries
> >>> > > Interesting, I'm not aware of this. Can you please link those
> >>> > discussions?
> >>> > >
> >>> > > On Fri, Jul 10, 2020 at 2:04 PM Igor Sapego 
> >>> wrote:
> >>> > >
> >>> > > > Pavel,
> >>> > > >
> >>> > > > What's about "stop" message? How can user unsubscribe from
> >>> receiving
> >>> > > > notifications?
> >>> > > >
> >>> > > > Also, I believe I've seen discussions on removing initial query
> >>> from
> >>> > > > continuous queries,
> >>> > > > as there are not any guarantees about getting consistent results
> >>> with
> >>> > > them.
> >>> > > > Should
> >>> > > > we avoid adding them in thin protocol maybe? It would also
> simplify
> >>> > > > protocol a lot.
> >>> > > >
> >>> > > > Best Regards,
> >>> > > > Igor
> >>> > > >
> >>> > > >
> >>> > > > On Tue, Jun 30, 2020 at 2:39 PM Pavel Tupitsyn <
> >>> ptupit...@apache.org>
> >>> > > > wrote:
> >>> > > >
> >>> > > > > Igniters,
> >>> > > > >
> >>> > > > > Let's discuss Thin Client Continuous Queries,
> >>> > > > > I've prepared an IEP [1] and a PoC [2].
> >>> > > > >
> >>> > > > > [1]
> >>> > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-50%3A+Thin+Client+Continuous+Queries
> >>> > > > >
> >>> > > > > [2] https://github.com/apache/ignite/pull/7966
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
>


Re: Proposal of new event QUERY_EXECUTION_EVENT

2020-07-15 Thread Max Timonin
Hi Denis, thanks for the answer!

We already checked EVT_CACHE_QUERY_EXECUTED and found that it works only in
cases:
1. Scan queries and Select queries (common pattern is access to cache data);
2. This event triggers only if query execution succeeds, in case of failure
while execution this event won't fire.

Our additional requirements are to protocol queries:
1. that aren't cache related (for example, alter user);
2. that relate to multiple caches (while EVT_CACHE_QUERY_EXECUTED have
field cacheName related to specific cache);
3. we need to protocol also DDL and DML queries.

Regards,
Maksim

On Tue, Jul 14, 2020 at 10:20 PM Denis Magda  wrote:

> Hi Max,
>
> Could you check if the EVT_CACHE_QUERY_EXECUTED event is what you're
> looking for?
>
> https://www.gridgain.com/docs/latest/developers-guide/events/events#cache-query-events
>
> -
> Denis
>
>
> On Fri, Jul 10, 2020 at 3:54 AM Max Timonin 
> wrote:
>
> > Hi Igniters!
> >
> > We're going to protocol all input SQL queries from our users. Currently
> > there is no such mechanism in Ignite to use for it. So we're proposing to
> > add a new event: QUERY_EXECUITION_EVENT.
> >
> > Requirements for the event:
> > 1. If this event fires it means that a query is correct and will be
> > executed (and failed only in exceptional cases);
> >
> > 2. Event fires for all query types;
> >
> > 3. Required fields are:
> > - text of a query (with hidden arguments);
> > - arguments of query;
> > - query type;
> > - node id.
> >
> > Looks that this event should go along with `runningQryMgr::register` in
> > class `IgniteH2Indexing` as this method invoked for all input queries
> too.
> >
> > What do you think?
> >
> > Regards,
> > Maksim
> >
>


[jira] [Created] (IGNITE-13260) Improve javadoc documentation for FilePageStore abstraction.

2020-07-15 Thread Sergey Chugunov (Jira)
Sergey Chugunov created IGNITE-13260:


 Summary: Improve javadoc documentation for FilePageStore 
abstraction.
 Key: IGNITE-13260
 URL: https://issues.apache.org/jira/browse/IGNITE-13260
 Project: Ignite
  Issue Type: Task
Reporter: Sergey Chugunov
 Fix For: 2.10


FilePageStore class javadoc comment doesn't provide any useful information 
about role of this important class in the whole picture of Ignite Native 
Persistence.

We need to add information about responsibilities of the class and its 
relationships with other classes in Ignite Persistence module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


New component: control.sh

2020-07-15 Thread Ilya Kasnacheev
Hello!

I have added a new component control.sh to IGNITE project.

Kindly ask you to put all of your control.sh tickets under shis new
component.

Regards,
-- 
Ilya Kasnacheev


Re: IEP-50 Thin Client Continuous Queries

2020-07-15 Thread Pavel Tupitsyn
Alex,

You are correct, OP_RESOURCE_CLOSE is enough.
Removed the extra op.

> If client closes CQ it doesn't want to receive any new events. Why can't
we
> just ignore events for this CQ after that moment?
I don't think that our protocol should involve ignoring messages.
If the client stops the query, the server should guarantee that no events
will be sent
to the client after the OP_RESOURCE_CLOSE response.

I had some concerns about this guarantee, but after reviewing GridNioServer
logic,
the current implementation with OP_RESOURCE_CLOSE seems to be fine.



On Wed, Jul 15, 2020 at 10:09 AM Alex Plehanov 
wrote:

> Pavel,
>
> > OP_QUERY_CONTINUOUS_END_NOTIFICATION is another client -> server message
> I think you mean "server -> client" here.
>
> But I still didn't get why do we need it.
> I've briefly looked to the POC implementation and, as far as I understand,
> OP_QUERY_CONTINUOUS_END_NOTIFICATION can be sent only when
> OP_RESOURCE_CLOSE is received by server (client closes the CQ explicitly).
> If client closes CQ it doesn't want to receive any new events. Why can't we
> just ignore events for this CQ after that moment?
> Also, in current implementation OP_QUERY_CONTINUOUS_END_NOTIFICATION is
> sent before OP_RESOURCE_CLOSE response, so OP_RESOURCE_CLOSE response can
> be used the same way as OP_QUERY_CONTINUOUS_END_NOTIFICATION.
>
> Such notification (or something more generalized like OP_RESOURCE_CLOSED)
> can be helpful if CQ is closed by someone else (for example if
> administrator call QueryMXBean.cancelContinuous), but AFAIK now we don't
> have callbacks for this action on user side.
>
>
> ср, 15 июл. 2020 г. в 01:26, Pavel Tupitsyn :
>
> > Igniters,
> >
> > I've made an important change to the IEP (and the POC):
> > OP_QUERY_CONTINUOUS_END_NOTIFICATION is another client -> server message
> > that notifies the client that the query has stopped and no more events
> > should be expected.
> >
> > This is important because client can't immediately stop listening
> > for OP_QUERY_CONTINUOUS_EVENT_NOTIFICATION
> > after sending OP_RESOURCE_CLOSE - some more events can be present in one
> of
> > the buffers/queues of the server and/or the OS.
> >
> > Let me know if this makes sense.
> >
> > On Tue, Jul 14, 2020 at 3:25 PM Pavel Tupitsyn 
> > wrote:
> >
> > > I've removed Initial Query from the POC and IEP (left a note there
> about
> > > the decision).
> > >
> > > Since there are no other comments and concerns, I'll move on with the
> > > final implementation.
> > >
> > > On Fri, Jul 10, 2020 at 4:14 PM Pavel Tupitsyn 
> > > wrote:
> > >
> > >> Igor, Alex,
> > >>
> > >> I was aware of the duplicates issue with the initial query, but I did
> > not
> > >> give it a second thought.
> > >>
> > >> Now I see that Vladimir was right - Initial query seems to be
> pointless,
> > >> since users can
> > >> achieve the same by simply invoking the regular query.
> > >>
> > >> I will remove Initial Query from the IEP and POC next week if there
> are
> > >> no objections by then.
> > >>
> > >>
> > >> On Fri, Jul 10, 2020 at 3:58 PM Alex Plehanov <
> plehanov.a...@gmail.com>
> > >> wrote:
> > >>
> > >>> Igor, Pavel,
> > >>>
> > >>> Here is discussion about removal: [1]
> > >>>
> > >>> [1] :
> > >>>
> > >>>
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/ContinuousQueryWithTransformer-implementation-questions-2-td21418i20.html#a22041
> > >>>
> > >>> пт, 10 июл. 2020 г. в 17:44, Igor Sapego :
> > >>>
> > >>> > Can not find proposal to remove them, so maybe it was not on
> devlist,
> > >>> > but here is discussion about the problem: [1]
> > >>> >
> > >>> > [1] -
> > >>> >
> > >>> >
> > >>>
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-duplicates-td39444.html
> > >>> >
> > >>> > Best Regards,
> > >>> > Igor
> > >>> >
> > >>> >
> > >>> > On Fri, Jul 10, 2020 at 3:27 PM Pavel Tupitsyn <
> ptupit...@apache.org
> > >
> > >>> > wrote:
> > >>> >
> > >>> > > > What's about "stop" message? How can user unsubscribe from
> > >>> receiving
> > >>> > > notifications?
> > >>> > > OP_RESOURCE_CLOSE is used for that. I've updated the IEP in an
> > >>> attempt to
> > >>> > > make this cleaner.
> > >>> > >
> > >>> > > >  I've seen discussions on removing initial query from
> continuous
> > >>> > queries
> > >>> > > Interesting, I'm not aware of this. Can you please link those
> > >>> > discussions?
> > >>> > >
> > >>> > > On Fri, Jul 10, 2020 at 2:04 PM Igor Sapego 
> > >>> wrote:
> > >>> > >
> > >>> > > > Pavel,
> > >>> > > >
> > >>> > > > What's about "stop" message? How can user unsubscribe from
> > >>> receiving
> > >>> > > > notifications?
> > >>> > > >
> > >>> > > > Also, I believe I've seen discussions on removing initial query
> > >>> from
> > >>> > > > continuous queries,
> > >>> > > > as there are not any guarantees about getting consistent
> results
> > >>> with
> > >>> > > them.
> > >>> > > > Should
> > >>> > > > we avoid adding them in thin protocol maybe? It would also
> > s

[jira] [Created] (IGNITE-13261) Using transactions or continuous queries inside the ignite sandbox can throw an AccessControlException

2020-07-15 Thread Denis Garus (Jira)
Denis Garus created IGNITE-13261:


 Summary: Using transactions or continuous queries inside the 
ignite sandbox can throw an AccessControlException
 Key: IGNITE-13261
 URL: https://issues.apache.org/jira/browse/IGNITE-13261
 Project: Ignite
  Issue Type: Bug
  Components: security
Reporter: Denis Garus
Assignee: Denis Garus


Any subject should be able to use transactions or continuous queries inside the 
ignite sandbox without additional permissions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSSION] Ignite integration testing framework.

2020-07-15 Thread Max Shonichev

Anton,

I've prepared a PoC of running Tiden in dockerized environment.

The code is in fork of your repo at 
https://github.com/mshonichev/ignite.git, branch 'ignite-ducktape', 
module 'integration-tests.



Steps to run PoC are as follows:
```
$ mkdir -p $HOME/tiden_poc
$ cd $HOME/tiden_poc
$ git clone -b ignite-ducktape https://github.com/mshonichev/ignite.git
$ cd ignite
$ scripts/build.sh
$ cd modules/integration-tests
$ mvn -DskipTests -Dmaven.javadoc.skip=true verify

```

Few changes in Tiden 0.6.5 include ability to set num_nodes upon 
application start and minor fixes in API.


Most of the work to run in docker done by external bash scripts, so no 
changes in tests themselves would be required to run them in bare-metal.


You can run them manually, or as a part of maven verify stage, they are 
hooked in pom.xml via maven-exec-plugin for that.


I've noted comments in your PoC and instead of single docker image
we prepare a bunch of images:
 - tiden-master
 - tiden-slave:${JDK_VERSION}
 - tiden-artifacts-ignite:${IGNITE_VERSION}
 - tiden-artifacts-...

During provisioning stage, all those images are hotlinked via separate 
volumes into /opt container dir. Tiden itself is installed as package 
either of specific version or in 'develop' mode.


Also, it turns out that using `docker run --user=...` work great only in 
MacOs or in Ubuntu under default user. Trying to run ducktests under 
user with UID != 1000 produces unaccessible files in `results` dir. So, 
in Tiden PoC I've fixed that too, you can review Dockerfile's in 
modules/integration-tests/tiden/docker/ dir.


Next, I've recreated all your benchmarks, their code is in
modules/integration-tests/tiden/suites/benchmarks

Your Java applications copy-pasted with no changes into
modules/integration-tests/src, a thin Python wrapper over it is in 
modules/integration-tests/apps/igniteaware.app


Unfortunately, no version-filtering decorators yet present, so instead 
of single run across all versions, run-tests.sh internally runs Tiden 
several times.



Please, checkout sources and share some thoughts.


On 09.07.2020 10:11, Max Shonichev wrote:

Anton,

well, strange thing, but clean up and rerun helped.


Ubuntu 18.04

 


SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:   2020-07-06--003
run time: 4 minutes 44.835 seconds
tests run:    5
passed:   5
failed:   0
ignored:  0
 

test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 


status: PASS
run time:   41.927 seconds
{"Rebalanced in (sec)": 1.02205491065979}
 

test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev 


status: PASS
run time:   51.985 seconds
{"Rebalanced in (sec)": 0.0760810375213623}
 

test_id: 
ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=2.7.6 


status: PASS
run time:   1 minute 4.283 seconds
{"Streamed txs": "1900", "Measure duration (ms)": "34818", "Worst 
latency (ms)": "31035"}
 

test_id: 
ignitetest.tests.benchmarks.pme_free_switch_test.PmeFreeSwitchTest.test.version=dev 


status: PASS
run time:   1 minute 13.089 seconds
{"Streamed txs": "73134", "Measure duration (ms)": "35843", "Worst 
latency (ms)": "139"}
 

test_id: 
ignitetest.tests.spark_integration_test.SparkIntegrationTest.test_spark_client 


status: PASS
run time:   53.332 seconds
 




MacBook
 


SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:   2020-07-06--001
run time: 6 minutes 58.612 seconds
tests run:    5
passed:   5
failed:   0
ignored:  0
 

test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 


status: PASS
run time:   48.724 seconds
{"Rebalanced in (sec)": 3.2574470043182373}
 

test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=dev 


status: PASS
run time:   1 minute 23.210 secon

Re: Choosing historical rebalance heuristics

2020-07-15 Thread Ivan Rakov
Hi Vladislav,

Thanks for raising this topic.
Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is 500_000)
is controversial. Assuming that the default number of partitions is 1024,
cache should contain a really huge amount of data in order to make WAL
delta rebalancing possible. In fact, it's currently disabled for most
production cases, which makes rebalancing of persistent caches unreasonably
long.

I think, your approach [1] makes much more sense than the current
heuristic, let's move forward with the proposed solution.

Though, there are some other corner cases, e.g. this one:
- Configured size of WAL archive is big (>100 GB)
- Cache has small partitions (e.g. 1000 entries)
- Infrequent updates (e.g. ~100 in the whole WAL history of any node)
- There is another cache with very frequent updates which allocate >99% of
WAL
In such scenario we may need to iterate over >100 GB of WAL in order to
fetch <1% of needed updates. Even though the amount of network traffic is
still optimized, it would be more effective to transfer partitions with
~1000 entries fully instead of reading >100 GB of WAL.

I want to highlight that your heuristic definitely makes the situation
better, but due to possible corner cases we should keep the fallback lever
to restrict or limit historical rebalance as before. Probably, it would be
handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low
default value (1000, 500 or even 0) and apply your heuristic only for
partitions with bigger size.

Regarding case [2]: it looks like an improvement that can mitigate some
corner cases (including the one that I have described). I'm ok with it as
long as it takes data updates reordering on backup nodes into account. We
don't track skipped updates for atomic caches. As a result, detection of
the absence of updates between two checkpoint markers with the same
partition counter can be false positive.

--
Best Regards,
Ivan Rakov

On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov 
wrote:

> Hi guys,
>
> I want to implement a more honest heuristic for historical rebalance.
> Before, a cluster makes a choice between the historical rebalance or not it
> only from a partition size. This threshold more known by a name of property
> IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
> It might prevent a historical rebalance when a partition is too small, but
> not if WAL contains more updates than a size of partition, historical
> rebalance still can be chosen.
> There is a ticket where need to implement more fair heuristic[1].
>
> My idea for implementation is need to estimate a size of data which will be
> transferred owe network. In other word if need to rebalance a part of WAL
> that contains N updates, for recover a partition on another node, which
> have to contain M rows at all, need chooses a historical rebalance on the
> case where N < M (WAL history should be presented as well).
>
> This approach is easy implemented, because a coordinator node has the size
> of partitions and counters' interval. But in this case cluster still can
> find not many updates in too long WAL history. I assume a possibility to
> work around it, if rebalance historical iterator will not handle
> checkpoints where not contains updates of particular cache. Checkpoints can
> skip if counters for the cache (maybe even a specific partitions) was not
> changed between it and next one.
>
> Ticket for improvement rebalance historical iterator[2]
>
> I want to hear a view of community on the thought above.
> Maybe anyone has another opinion?
>
> [1]: https://issues.apache.org/jira/browse/IGNITE-13253
> [2]: https://issues.apache.org/jira/browse/IGNITE-13254
>
> --
> Vladislav Pyatkov
>


Re: [DISCUSSION] Ignite integration testing framework.

2020-07-15 Thread Max Shonichev

Anton, Nikolay,

I want to share some more findings about ducktests I've stubmled upon 
during porting them to Tiden.



First problem was that GridGain Tiden-based tests by default use real 
production-like configuration for Ignite nodes, notably:


 - persitence enabled
 - ~120 caches in ~40 groups
 - data set size around 1M keys per each cache
 - primitive and PoJo cache values
 - extensive use of query entities (indices)

When I've tried to run 4 nodes with such configuration in docker, my 
notebook nearly burns. Nevertheless, grid was starting and working OK, 
but for one little 'but': each successive version under test was 
starting slower and slower.


The 2.7.6 was the fastest, 2.8.0 and 2.8.1 a little bit slower, and your 
fork (2.9.0-SNAPSHOT) failed to start 4 persistence-enabled nodes within 
default 120 seconds timeout. In order to mimick behavior of your tests I 
had to turn off persistence and use only 1 cache too.


It's a pity that you completely ignore persistence and indices in your 
ducktests, otherwise you would quickly stuck into same limitation.


I hope in the nearest time I would adopt Tiden docker PoC to our 
TeamCity and we'll try to git-bisect in order to find where this 
slowdown comes from. After that I'll file a bug to IGNITE Jira.




Another problem with your rebalance benchmark is it's low accuracy due 
to granularity of measurements.


You don't actually measure rebalance time, you measure time that takes 
you to find a specific string in logs, that's confusing.


The scenario of your test is as follows:

1. start 3 server nodes
2. start 1 data loading client, preload a data, stop client
3. start 1 more server node
4. wait till server joins topology
5. wait till this server node completes exchange and write 
'rebalanced=true, wasRebalanced=false' message to log

6. report time was taken by step 5 as 'Rebalance time'

Confusing thing here is that 'wait till' implementation - you actually 
continuously re-scan logs sleeping each second and wait till message 
appear. So, that means that rebalance time is at least of second 
granularity or even higher, though it is reported with nanosecond 
precision.


But for such lightweight configuration (single in-memory cache) and such 
small set of data (1M keys only), rebalancing is very fast, and usually 
performs under 1 second or just slightly slower.


Before waiting for rebalance message you first wait for topology message 
and that wait also takes time to execute.


So, at the time Python part of the test performs first scan of the logs, 
rebalancing is in most cases already done and time you report as 
'0.0760810375213623' is actually the time to execute logs scanning code.


However, if rebalancing perform just a little bit slower after topology 
update, then first scan of logs is failed, you sleep for whole one 
second and rescan logs and there you got your message and report it as 
'1.02205491065979'.


Under different conditions, dockerized application may run a little 
slower or a little faster, that depends on overall system load, free 
memory, etc. I've tried to increase load on my laptop by running browser 
or maven build, and time to scan logs may fluctuate from 0.02 to 0.09 or 
even 1.02 seconds. Note, that in CI environment, high system load from 
tenants is a quite ordinary situation.


Suppose we adopted rebalance improvements and all versions after 2.9.0 
would perform within 1 second just as 2.9.0 itself. Then your benchmark 
would either report false negative (e.g. 0.02 for master and 0.03 for 
PR), while actually on next re-run it would pass (e.g. 0.07 for master 
and 0.03 for PR). That's not quite the 'stable and non-flacky' test 
Ignite community wants.


What suggestions do you have to improve benchmark measurement accuracy?


A third question is about PME free switch benchmark. Under some 
conditions, LongTxStreamerApplication actually hangs up PME. It need to 
be investigated further, but either this was due to persistence enabled 
or due to missing -DIGNITE_ALLOW_ATOMIC_OPS_IN_TX=false


Can you share some details about IGNITE_ALLOW_ATOMIC_OPS_IN_TX option?
Also, have you had performed a test of PME free switch with 
persistence-enabled caches?



On 09.07.2020 10:11, Max Shonichev wrote:

Anton,

well, strange thing, but clean up and rerun helped.


Ubuntu 18.04

 


SESSION REPORT (ALL TESTS)
ducktape version: 0.7.7
session_id:   2020-07-06--003
run time: 4 minutes 44.835 seconds
tests run:    5
passed:   5
failed:   0
ignored:  0
 

test_id: 
ignitetest.tests.benchmarks.add_node_rebalance_test.AddNodeRebalanceTest.test_add_node.version=2.8.1 


status: PASS
run time:   41.927 seconds
{"Rebalanced in (sec)": 1.02205491065979}
-

Re: [RESULT][VOTE] Stop Maintenance of Ignite Web Console

2020-07-15 Thread Ilya Kasnacheev
Hello!

Well, why? I think we can keep Web Console docs for now, just add some note
that it is discontinued. We can also put deprecation notice here.

Regards,
-- 
Ilya Kasnacheev


пн, 13 июл. 2020 г. в 19:46, Dmitriy Pavlov :

> Hi Folks,
>
> I've just found our doc still point and recommend using it.
>
> I suggest removing the automatic RDBMS import option from Cache-Store
> documentation:
> https://apacheignite.readme.io/docs/3rd-party-store#automatic
>
> Sincerely
> Dmitriy Pavlov
>
> ср, 20 мая 2020 г. в 01:49, Denis Magda :
>
> > Igniters,
> >
> > The vote is over [1] with a decision to phase out Ignite Web Console by
> > moving to a separate repository and stopping its development/maintenance
> > officially. The following PMC members cast +1 binding votes:
> >
> >- Saikat Maitra
> >- Alexey Zinoviev
> >- Nikolay Izhikov
> >
> > There are no -1 votes thus we can proceed with the tool discontinuation
> > procedure [2].
> >
> > [1]
> >
> >
> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Stop-Maintenance-of-Ignite-WebConsole-td47451.html
> > [2] https://issues.apache.org/jira/browse/IGNITE-13038
> >
> > -
> > Denis
> >
>


Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]

2020-07-15 Thread Ivan Bessonov
Guys,

can you please backport https://issues.apache.org/jira/browse/IGNITE-13246
to ignite-2.9? Me and Alexey Kuznetsov really want these new events in
release.

This time I prepared PR with resolved conflicts:
https://github.com/apache/ignite/pull/8042

Thank you!

вт, 14 июл. 2020 г. в 19:39, Zhenya Stanilovsky :

>
>
>
> Alex, i also suggest to merge this
> https://issues.apache.org/jira/browse/IGNITE-13229 too, GridClient
> leakage and further TC OOM preventing.
>
> >Ivan,
> >
> >It was already in release scope as discussed in this thread.
> >
> >вт, 14 июл. 2020 г. в 14:31, Ivan Rakov < ivan.glu...@gmail.com >:
> >
> >> Hi,
> >>
> >> We are still waiting for a final review of Tracing functionality [1]
> until
> >> the end of tomorrow (July 15).
> >> We anticipate that it will be merged to Ignite master no later than July
> >> 16.
> >>
> >> Sorry for being a bit late here. Alex P., can you include [1] to the
> >> release scope?
> >>
> >> [1]:  https://issues.apache.org/jira/browse/IGNITE-13060
> >>
> >> --
> >> Best Regards,
> >> Ivan Rakov
> >>
> >> On Tue, Jul 14, 2020 at 6:16 AM Alexey Kuznetsov <
> akuznet...@gridgain.com >
> >> wrote:
> >>
> >>> Alex,
> >>>
> >>> Can you cherry-pick to Ignite 2.9 this issue:
> >>>  https://issues.apache.org/jira/browse/IGNITE-13246 ?
> >>>
> >>> This issue is about BASELINE events and it is very useful for
> notification
> >>> external tools about changes in baseline.
> >>>
> >>> Thank you!
> >>>
> >>> ---
> >>> Alexey Kuznetsov
> >>>
> >>
>
>
>
>



-- 
Sincerely yours,
Ivan Bessonov


A few small pull requests

2020-07-15 Thread Stephen Darlington
Hi,

I have a few small quality-of-life pull requests languishing in the backlog. 
Can anyone take a look or suggest what else I need to do to progress them?

IGNITE-12192 Allow ignitevisorcmd to quit when pressing ^D 

IGNITE-12182 ExecutorService defaults to only server nodes 

IGNITE-11715 Allow extra options to be passed in to ignite.sh from Docker 


Thanks,
Stephen



Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]

2020-07-15 Thread Alex Plehanov
Zhenya, Ivan,

I've cherry-picked IGNITE-13229 and IGNITE-13246 to ignite-2.9 branch.
Thank you.

ср, 15 июл. 2020 г. в 18:31, Ivan Bessonov :

> Guys,
>
> can you please backport https://issues.apache.org/jira/browse/IGNITE-13246
> to ignite-2.9? Me and Alexey Kuznetsov really want these new events in
> release.
>
> This time I prepared PR with resolved conflicts:
> https://github.com/apache/ignite/pull/8042
>
> Thank you!
>
> вт, 14 июл. 2020 г. в 19:39, Zhenya Stanilovsky  >:
>
> >
> >
> >
> > Alex, i also suggest to merge this
> > https://issues.apache.org/jira/browse/IGNITE-13229 too, GridClient
> > leakage and further TC OOM preventing.
> >
> > >Ivan,
> > >
> > >It was already in release scope as discussed in this thread.
> > >
> > >вт, 14 июл. 2020 г. в 14:31, Ivan Rakov < ivan.glu...@gmail.com >:
> > >
> > >> Hi,
> > >>
> > >> We are still waiting for a final review of Tracing functionality [1]
> > until
> > >> the end of tomorrow (July 15).
> > >> We anticipate that it will be merged to Ignite master no later than
> July
> > >> 16.
> > >>
> > >> Sorry for being a bit late here. Alex P., can you include [1] to the
> > >> release scope?
> > >>
> > >> [1]:  https://issues.apache.org/jira/browse/IGNITE-13060
> > >>
> > >> --
> > >> Best Regards,
> > >> Ivan Rakov
> > >>
> > >> On Tue, Jul 14, 2020 at 6:16 AM Alexey Kuznetsov <
> > akuznet...@gridgain.com >
> > >> wrote:
> > >>
> > >>> Alex,
> > >>>
> > >>> Can you cherry-pick to Ignite 2.9 this issue:
> > >>>  https://issues.apache.org/jira/browse/IGNITE-13246 ?
> > >>>
> > >>> This issue is about BASELINE events and it is very useful for
> > notification
> > >>> external tools about changes in baseline.
> > >>>
> > >>> Thank you!
> > >>>
> > >>> ---
> > >>> Alexey Kuznetsov
> > >>>
> > >>
> >
> >
> >
> >
>
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>


Re: [RESULT][VOTE] Stop Maintenance of Ignite Web Console

2020-07-15 Thread Denis Magda
We should remove the docs and all the mentioning from the website once this
ticket is complete:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/IGNITE-13038

Alex Kuznetsov, are you planning to finish the task in Ignite 2.9 timeframe?

Denis

On Wednesday, July 15, 2020, Ilya Kasnacheev 
wrote:

> Hello!
>
> Well, why? I think we can keep Web Console docs for now, just add some note
> that it is discontinued. We can also put deprecation notice here.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 13 июл. 2020 г. в 19:46, Dmitriy Pavlov :
>
> > Hi Folks,
> >
> > I've just found our doc still point and recommend using it.
> >
> > I suggest removing the automatic RDBMS import option from Cache-Store
> > documentation:
> > https://apacheignite.readme.io/docs/3rd-party-store#automatic
> >
> > Sincerely
> > Dmitriy Pavlov
> >
> > ср, 20 мая 2020 г. в 01:49, Denis Magda :
> >
> > > Igniters,
> > >
> > > The vote is over [1] with a decision to phase out Ignite Web Console by
> > > moving to a separate repository and stopping its
> development/maintenance
> > > officially. The following PMC members cast +1 binding votes:
> > >
> > >- Saikat Maitra
> > >- Alexey Zinoviev
> > >- Nikolay Izhikov
> > >
> > > There are no -1 votes thus we can proceed with the tool discontinuation
> > > procedure [2].
> > >
> > > [1]
> > >
> > >
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/VOTE-Stop-Maintenance-of-Ignite-WebConsole-td47451.html
> > > [2] https://issues.apache.org/jira/browse/IGNITE-13038
> > >
> > > -
> > > Denis
> > >
> >
>


-- 
-
Denis


Re: [RESULT][VOTE] Stop Maintenance of Ignite Web Console

2020-07-15 Thread Ilya Kasnacheev
Hello!

I think that we indeed should version this change, i.e., only remove
mentions of Web Console in the version where it's already in the attic.

Regards,
-- 
Ilya Kasnacheev


ср, 15 июл. 2020 г. в 17:17, Denis Magda :

> We should remove the docs and all the mentioning from the website once this
> ticket is complete:
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/IGNITE-13038
>
> Alex Kuznetsov, are you planning to finish the task in Ignite 2.9
> timeframe?
>
> Denis
>
> On Wednesday, July 15, 2020, Ilya Kasnacheev 
> wrote:
>
> > Hello!
> >
> > Well, why? I think we can keep Web Console docs for now, just add some
> note
> > that it is discontinued. We can also put deprecation notice here.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пн, 13 июл. 2020 г. в 19:46, Dmitriy Pavlov :
> >
> > > Hi Folks,
> > >
> > > I've just found our doc still point and recommend using it.
> > >
> > > I suggest removing the automatic RDBMS import option from Cache-Store
> > > documentation:
> > > https://apacheignite.readme.io/docs/3rd-party-store#automatic
> > >
> > > Sincerely
> > > Dmitriy Pavlov
> > >
> > > ср, 20 мая 2020 г. в 01:49, Denis Magda :
> > >
> > > > Igniters,
> > > >
> > > > The vote is over [1] with a decision to phase out Ignite Web Console
> by
> > > > moving to a separate repository and stopping its
> > development/maintenance
> > > > officially. The following PMC members cast +1 binding votes:
> > > >
> > > >- Saikat Maitra
> > > >- Alexey Zinoviev
> > > >- Nikolay Izhikov
> > > >
> > > > There are no -1 votes thus we can proceed with the tool
> discontinuation
> > > > procedure [2].
> > > >
> > > > [1]
> > > >
> > > >
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/VOTE-Stop-Maintenance-of-Ignite-WebConsole-td47451.html
> > > > [2] https://issues.apache.org/jira/browse/IGNITE-13038
> > > >
> > > > -
> > > > Denis
> > > >
> > >
> >
>
>
> --
> -
> Denis
>


Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]

2020-07-15 Thread Alex Plehanov
Ivan,

Looks like master is broken after IGNITE-13246 (but everything is ok in 2.9
branch)

ср, 15 июл. 2020 г. в 18:54, Alex Plehanov :

> Zhenya, Ivan,
>
> I've cherry-picked IGNITE-13229 and IGNITE-13246 to ignite-2.9 branch.
> Thank you.
>
> ср, 15 июл. 2020 г. в 18:31, Ivan Bessonov :
>
>> Guys,
>>
>> can you please backport
>> https://issues.apache.org/jira/browse/IGNITE-13246
>> to ignite-2.9? Me and Alexey Kuznetsov really want these new events in
>> release.
>>
>> This time I prepared PR with resolved conflicts:
>> https://github.com/apache/ignite/pull/8042
>>
>> Thank you!
>>
>> вт, 14 июл. 2020 г. в 19:39, Zhenya Stanilovsky
>> > >:
>>
>> >
>> >
>> >
>> > Alex, i also suggest to merge this
>> > https://issues.apache.org/jira/browse/IGNITE-13229 too, GridClient
>> > leakage and further TC OOM preventing.
>> >
>> > >Ivan,
>> > >
>> > >It was already in release scope as discussed in this thread.
>> > >
>> > >вт, 14 июл. 2020 г. в 14:31, Ivan Rakov < ivan.glu...@gmail.com >:
>> > >
>> > >> Hi,
>> > >>
>> > >> We are still waiting for a final review of Tracing functionality [1]
>> > until
>> > >> the end of tomorrow (July 15).
>> > >> We anticipate that it will be merged to Ignite master no later than
>> July
>> > >> 16.
>> > >>
>> > >> Sorry for being a bit late here. Alex P., can you include [1] to the
>> > >> release scope?
>> > >>
>> > >> [1]:  https://issues.apache.org/jira/browse/IGNITE-13060
>> > >>
>> > >> --
>> > >> Best Regards,
>> > >> Ivan Rakov
>> > >>
>> > >> On Tue, Jul 14, 2020 at 6:16 AM Alexey Kuznetsov <
>> > akuznet...@gridgain.com >
>> > >> wrote:
>> > >>
>> > >>> Alex,
>> > >>>
>> > >>> Can you cherry-pick to Ignite 2.9 this issue:
>> > >>>  https://issues.apache.org/jira/browse/IGNITE-13246 ?
>> > >>>
>> > >>> This issue is about BASELINE events and it is very useful for
>> > notification
>> > >>> external tools about changes in baseline.
>> > >>>
>> > >>> Thank you!
>> > >>>
>> > >>> ---
>> > >>> Alexey Kuznetsov
>> > >>>
>> > >>
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Sincerely yours,
>> Ivan Bessonov
>>
>


Re: Continuous Queries with several remote filter on the same cache

2020-07-15 Thread Denis Magda
Hi Roman,

Apologize for the late reply. Could you please clarify why it's useful to
the threads count parameter? How are you planning to use it?

-
Denis


On Tue, Jul 7, 2020 at 5:25 AM  wrote:

> Hi Denis,
>
> What do you think about some improvements in @IgniteAsyncCallback
> regarding usability? What if we add a number of threads parameter in
> @IgniteAsyncCallback?
>
> Best regards,
> Roman
>
> -Original Message-
> From: Denis Magda 
> Sent: Friday, June 26, 2020 7:38 PM
> To: dev 
> Subject: Re: Continuous Queries with several remote filter on the same
> cache
>
> Roman,
>
> The updates are ordered per partition. Let's take this example of an
> application updating several records:
>
> put (k1, val1) => mapped to partition_10 => node_A put (k2, val2) =>
> mapped to partition_5 => node_B put (k3, val3) => mapped to partition_10 =>
> node_A
>
> It's guaranteed that a continuous query listener will be notified about k1
> and k3 updates in this order - k1 first and k3 after. As for the k2 update,
> it can arrive at any time (i.e., before k1, after k3 or in the middle).
>
>
>
>
> -
> Denis
>
>
> On Fri, Jun 26, 2020 at 12:58 AM  wrote:
>
> > Hi Denis,
> >
> > Thanks! Is there some guarantee about the order of the updates? Even
> > when we have multiple cache nodes.
> >
> > Best regards,
> > Roman
> >
> > -Original Message-
> > From: Denis Magda 
> > Sent: Monday, June 8, 2020 10:20 PM
> > To: dev 
> > Subject: Re: Continuous Queries with several remote filter on the same
> > cache
> >
> > Roman,
> >
> > Please check the following methods:
> > * CacheContiniousQueryHandler (the filter usage):
> >
> > https://github.com/apache/ignite/blob/6955ac291352dd67c1f84a006cda512e
> > e54f38bb/modules/core/src/main/java/org/apache/ignite/internal/process
> > ors/cache/query/continuous/CacheContinuousQueryHandler.java#L994
> > * CacheContinuousQueryManager (the listener execution):
> >
> > https://github.com/apache/ignite/blob/master/modules/core/src/main/jav
> > a/org/apache/ignite/internal/processors/cache/query/continuous/CacheCo
> > ntinuousQueryManager.java#L376
> >
> > -
> > Denis
> >
> >
> > On Sun, Jun 7, 2020 at 12:19 AM  wrote:
> >
> > > Hi Denis,
> > > A big thank you for the answer.
> > > Could you please tell me where can I find this logic in the sources.
> > > Which package should I look into?
> > >
> > > -Original Message-
> > > From: Denis Magda 
> > > Sent: Saturday, June 6, 2020 2:07 AM
> > > To: dev 
> > > Subject: Re: Continuous Queries with several remote filter on the
> > > same cache
> > >
> > > Hi Roman,
> > >
> > > Every continuous query is a unique entity that is processed by
> > > servers independently. With your example, the server node will
> > > execute all 20 filters for every cache insert/update operation. The
> > > server will notify through local listeners only those clients whose
> > > remote filters returned 'true'.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Jun 4, 2020 at 8:44 PM  wrote:
> > >
> > > > Hi Community,
> > > >
> > > > I ask this question here because I haven't found the answer in the
> > > > documentation.
> > > >
> > > > Could you please clarify how Continuous Queries work? What the
> > > > behavior of Continuous Queries if we have several clients with
> > > > different Remote Filters on the same cache? For example, if we have:
> > > > one server node with cache and we have up to 20 client nodes each
> > > > of them will execute Continuous Query on the same cache but with
> > > > different Remote Filters. Will each client get the data according
> > > > to its remote filter? Or it is supposed to have only one Remote
> > > > Filter for all clients and every client should filter data in its
> > > > local event
> > > listener?
> > > > I would be grateful if you send some link which describes the
> > > > behavior of Continuous Queries more thoroughly.
> > > > Best regards,
> > > > Roman
> > > >
> > >
> >
>


Re: Choosing historical rebalance heuristics

2020-07-15 Thread Vladislav Pyatkov
Ivan,

I agree with a combined approach: threshold for small partitions and count
of update for partition that outgrew it.
This helps to avoid partitions that update not frequently.

Reading of a big WAL piece (more than 100Gb) it can happen, when a client
configured it intentionally.
There are no doubts we can to read it, otherwise WAL space was not
configured that too large.

I don't see a connection optimization of iterator and issue in atomic
protocol.
Reordering in WAL, that happened in checkpoint where counter was not
changing, is an extremely rare case and the issue will not solve for
generic case, this should be fixed in bound of protocol.

I think we can modify the heuristic so
1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD -
reduce it to 500)
2) Select only that partition for historical rebalance where difference
between counters less that partition size.

Also implement mentioned optimization for historical iterator, that may
reduce a time on reading large WAL interval.

On Wed, Jul 15, 2020 at 3:15 PM Ivan Rakov  wrote:

> Hi Vladislav,
>
> Thanks for raising this topic.
> Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is 500_000)
> is controversial. Assuming that the default number of partitions is 1024,
> cache should contain a really huge amount of data in order to make WAL
> delta rebalancing possible. In fact, it's currently disabled for most
> production cases, which makes rebalancing of persistent caches unreasonably
> long.
>
> I think, your approach [1] makes much more sense than the current
> heuristic, let's move forward with the proposed solution.
>
> Though, there are some other corner cases, e.g. this one:
> - Configured size of WAL archive is big (>100 GB)
> - Cache has small partitions (e.g. 1000 entries)
> - Infrequent updates (e.g. ~100 in the whole WAL history of any node)
> - There is another cache with very frequent updates which allocate >99% of
> WAL
> In such scenario we may need to iterate over >100 GB of WAL in order to
> fetch <1% of needed updates. Even though the amount of network traffic is
> still optimized, it would be more effective to transfer partitions with
> ~1000 entries fully instead of reading >100 GB of WAL.
>
> I want to highlight that your heuristic definitely makes the situation
> better, but due to possible corner cases we should keep the fallback lever
> to restrict or limit historical rebalance as before. Probably, it would be
> handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low
> default value (1000, 500 or even 0) and apply your heuristic only for
> partitions with bigger size.
>
> Regarding case [2]: it looks like an improvement that can mitigate some
> corner cases (including the one that I have described). I'm ok with it as
> long as it takes data updates reordering on backup nodes into account. We
> don't track skipped updates for atomic caches. As a result, detection of
> the absence of updates between two checkpoint markers with the same
> partition counter can be false positive.
>
> --
> Best Regards,
> Ivan Rakov
>
> On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov 
> wrote:
>
> > Hi guys,
> >
> > I want to implement a more honest heuristic for historical rebalance.
> > Before, a cluster makes a choice between the historical rebalance or not
> it
> > only from a partition size. This threshold more known by a name of
> property
> > IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
> > It might prevent a historical rebalance when a partition is too small,
> but
> > not if WAL contains more updates than a size of partition, historical
> > rebalance still can be chosen.
> > There is a ticket where need to implement more fair heuristic[1].
> >
> > My idea for implementation is need to estimate a size of data which will
> be
> > transferred owe network. In other word if need to rebalance a part of WAL
> > that contains N updates, for recover a partition on another node, which
> > have to contain M rows at all, need chooses a historical rebalance on the
> > case where N < M (WAL history should be presented as well).
> >
> > This approach is easy implemented, because a coordinator node has the
> size
> > of partitions and counters' interval. But in this case cluster still can
> > find not many updates in too long WAL history. I assume a possibility to
> > work around it, if rebalance historical iterator will not handle
> > checkpoints where not contains updates of particular cache. Checkpoints
> can
> > skip if counters for the cache (maybe even a specific partitions) was not
> > changed between it and next one.
> >
> > Ticket for improvement rebalance historical iterator[2]
> >
> > I want to hear a view of community on the thought above.
> > Maybe anyone has another opinion?
> >
> > [1]: https://issues.apache.org/jira/browse/IGNITE-13253
> > [2]: https://issues.apache.org/jira/browse/IGNITE-13254
> >
> > --
> > Vladislav Pyatkov
> >
>


-- 
Vladislav Pyatkov


Removal of "default" cache from REST APIs

2020-07-15 Thread Evgeniy Rudenko
Hi guys,

Most of the cache APIs are trying to use "default" cache when cacheName is
not provided. This is pointless, because we don't have such cache by
default. I would like to change that and just return "Failed to find
mandatory parameter in request" error if name is absent.

Please tell if you have any concerns. Update can be found at
https://github.com/apache/ignite/pull/8041

-- 
Best regards,
Evgeniy