John, Ok, we nailed it. That's the current expected behavior. Generally, I agree with you that the platform should support an option when operations fail if the cluster is deactivated. Could you propose the change by starting a discussion on the dev list? You can refer to this user list discussion for reference. Let me know if you need help with this.
- Denis On Thu, Aug 13, 2020 at 5:55 PM John Smith <[email protected]> wrote: > No I, reuse the instance. The cache instance is created once at startup of > the application and I pass it to my "repository" class > > public abstract class AbstractIgniteRepository<K,V> implements > CacheRepository<K, V> { > public final long DEFAULT_OPERATION_TIMEOUT = 2000; > > private Vertx vertx; > private IgniteCache<K, V> cache; > > AbstractIgniteRepository(Vertx vertx, IgniteCache<K, V> cache) { > this.vertx = vertx; > this.cache = cache; > } > > ... > > Future<List<JsonArray>> query(final String sql, final long timeoutMs, > final Object... args) { > final Promise<List<JsonArray>> promise = Promise.promise(); > > vertx.setTimer(timeoutMs, l -> { > promise.tryFail(new TimeoutException("Cache operation did not > complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE DOESN"T > COMPLETE IN TIME. > }); > > vertx.<List<JsonArray>>executeBlocking(code -> { > SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args); > query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS); > > > try (QueryCursor<List<?>> cursor = cache.query(query)) { // <--- > BLOCKS HERE. > List<JsonArray> rows = new ArrayList<>(); > Iterator<List<?>> iterator = cursor.iterator(); > > while(iterator.hasNext()) { > List currentRow = iterator.next(); > JsonArray row = new JsonArray(); > > currentRow.forEach(o -> row.add(o)); > > rows.add(row); > } > > code.complete(rows); > } catch(Exception ex) { > code.fail(ex); > } > }, result -> { > if(result.succeeded()) { > promise.tryComplete(result.result()); > } else { > promise.tryFail(result.cause()); > } > }); > > return promise.future(); > } > > public <T> T cache() { > return (T) cache; > } > } > > > > On Thu, 13 Aug 2020 at 16:29, Denis Magda <[email protected]> wrote: > >> I've created a simple test and always getting the exception below on an >> attempt to get a reference to an IgniteCache instance in cases when the >> cluster is not activated: >> >> *Exception in thread "main" class org.apache.ignite.IgniteException: Can >> not perform the operation because the cluster is inactive. Note, that the >> cluster is considered inactive by default if Ignite Persistent Store is >> used to let all the nodes join the cluster. To activate the cluster call >> Ignite.active(true)* >> >> Are you trying to get a new IgniteCache reference whenever the client >> reconnects successfully to the cluster? My guts feel that currently, Ignite >> verifies the activation status and generates the exception above whenever >> you're getting a reference to an IgniteCache or IgniteCompute. But once you >> got those references and try to run some operations then those get stuck if >> the cluster is not activated. >> - >> Denis >> >> >> On Thu, Aug 13, 2020 at 6:37 AM John Smith <[email protected]> >> wrote: >> >>> The cache.query() starts to block when ignite server nodes are being >>> restarted and there's no baseline topology yet. The server nodes do not >>> block. It's the client that blocks. >>> >>> The dumpfiles are of the server nodes. The screen shot is from the >>> client app using your kit profiler on the client side the threads are >>> marked as red on your kit. >>> >>> The app is simple, make http request, it runs cache Sql query on ignite >>> and if it succeeds does a put back to ignite. >>> >>> The Client disconnected exception only happens when all server nodes in >>> the cluster are down. The blockage only happens when the cluster is trying >>> to establish baseline topology. >>> >>> On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda, <[email protected]> >>> wrote: >>> >>>> John, >>>> >>>> I don't see any traits of an application-caused deadlock in the thread >>>> dumps. Please elaborate on the following: >>>> >>>> 7- Restart 1st node, run operation, operation fails with >>>>> ClientDisconectedException but application still able to complete it's >>>>> request. >>>> >>>> >>>> What's the IP address of the server node the client app uses to join >>>> the cluster? If that's not the address of the 1st node, that is already >>>> restarted, then the client couldn't join the cluster and it's expected that >>>> it fails with the ClientDisconnectedException. >>>> >>>> 8- Start 2nd node, run operation, from here on all operations just >>>>> block. >>>> >>>> >>>> Are the operations unblocked and completed successfully when the third >>>> node joins the cluster and the cluster gets activated automatically? >>>> >>>> - >>>> Denis >>>> >>>> >>>> On Wed, Aug 12, 2020 at 11:08 AM John Smith <[email protected]> >>>> wrote: >>>> >>>>> Ok Denis here they are... >>>>> >>>>> 3 nodes and I capture a yourlit screenshot of what it thinks are >>>>> deadlocks on the client app. >>>>> >>>>> >>>>> https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0 >>>>> >>>>> On Wed, 12 Aug 2020 at 11:07, John Smith <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Denis. I will asap but you I think you were right it is the query >>>>>> that blocks. >>>>>> >>>>>> My application first first runs a select on the cache and then does a >>>>>> put to cache. >>>>>> >>>>>> On Tue, 11 Aug 2020 at 19:22, Denis Magda <[email protected]> wrote: >>>>>> >>>>>>> John, >>>>>>> >>>>>>> It sounds like a deadlock caused by the application logic. Is there >>>>>>> any chance that the operation you run on step 8 accesses several keys in >>>>>>> one order while the other operations work with the same keys but in a >>>>>>> different order. The deadlocks are possible when you use Ignite >>>>>>> Transaction >>>>>>> API or simply execute bulk operations such as cache.readAll() or >>>>>>> cache.writeAll(..). >>>>>>> >>>>>>> Please take and attach thread dumps from all the cluster nodes for >>>>>>> analysis if we need to dig deeper. >>>>>>> >>>>>>> - >>>>>>> Denis >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 10, 2020 at 6:23 PM John Smith <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Denis, I think you are right. It's the query that blocks the >>>>>>>> other k/v operations are ok. >>>>>>>> >>>>>>>> Any thoughts on this? >>>>>>>> >>>>>>>> On Mon, 10 Aug 2020 at 15:28, John Smith <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I tried with 2.8.1, same issue. Operations block indefinitely... >>>>>>>>> >>>>>>>>> 1- Start 3 node cluster >>>>>>>>> 2- Start client application client = true with Ignition.start() >>>>>>>>> 3- Run some cache operations, everything ok... >>>>>>>>> 4- Shut down one node, run operation, still ok >>>>>>>>> 5- Shut down 2nd node, run operation, still ok >>>>>>>>> 6- Shut down 3rd node, run operation, still ok... Operations start >>>>>>>>> failing with ClientDisconectedException... >>>>>>>>> 7- Restart 1st node, run operation, operation fails >>>>>>>>> with ClientDisconectedException but application still able to >>>>>>>>> complete it's >>>>>>>>> request. >>>>>>>>> 8- Start 2nd node, run operation, from here on all operations just >>>>>>>>> block. >>>>>>>>> >>>>>>>>> Basically the client application is an HTTP Server on each HTTP >>>>>>>>> request does cache exception. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, 7 Aug 2020 at 19:46, John Smith <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> No, everything blocks... Also using 2.7.0 just in case. >>>>>>>>>> >>>>>>>>>> Only time I get exception is if the cluster is completely off, >>>>>>>>>> then I get ClientDisconectedException... >>>>>>>>>> >>>>>>>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> If I'm not mistaken, key-value operations (cache.get/put) and >>>>>>>>>>> compute calls fail with an exception if the cluster is deactivated. >>>>>>>>>>> Do >>>>>>>>>>> those fail on your end? >>>>>>>>>>> >>>>>>>>>>> As for the async and SQL operations, let's see what other >>>>>>>>>>> community members say. >>>>>>>>>>> >>>>>>>>>>> - >>>>>>>>>>> Denis >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi any thoughts on this? >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Here is another example where it blocks. >>>>>>>>>>>>> >>>>>>>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery( >>>>>>>>>>>>> "select * from my_table") >>>>>>>>>>>>> .setArgs(providerId, carrierCode); >>>>>>>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS); >>>>>>>>>>>>> >>>>>>>>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query)) >>>>>>>>>>>>> >>>>>>>>>>>>> cache.query just blocks even with the timeout set. >>>>>>>>>>>>> >>>>>>>>>>>>> Is there a way to timeout and at least have the application >>>>>>>>>>>>> continue and respond with an appropriate message? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi running 2.7.0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I reboot a node and it begins to rejoin the cluster or >>>>>>>>>>>>>> the cluster is not yet activated with baseline topology >>>>>>>>>>>>>> operations seem to >>>>>>>>>>>>>> block forever, operations that are supposed to return >>>>>>>>>>>>>> IgniteFuture. I.e: >>>>>>>>>>>>>> putAsync, getAsync etc... They just block, until the cluster >>>>>>>>>>>>>> resolves it's >>>>>>>>>>>>>> state. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>
