Hi Denis, so to understand it's all operations or just the query? On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda, <[email protected]> wrote:
> John, > > Ok, we nailed it. That's the current expected behavior. Generally, I agree > with you that the platform should support an option when operations fail if > the cluster is deactivated. Could you propose the change by starting a > discussion on the dev list? You can refer to this user list discussion for > reference. Let me know if you need help with this. > > - > Denis > > > On Thu, Aug 13, 2020 at 5:55 PM John Smith <[email protected]> wrote: > >> No I, reuse the instance. The cache instance is created once at startup >> of the application and I pass it to my "repository" class >> >> public abstract class AbstractIgniteRepository<K,V> implements >> CacheRepository<K, V> { >> public final long DEFAULT_OPERATION_TIMEOUT = 2000; >> >> private Vertx vertx; >> private IgniteCache<K, V> cache; >> >> AbstractIgniteRepository(Vertx vertx, IgniteCache<K, V> cache) { >> this.vertx = vertx; >> this.cache = cache; >> } >> >> ... >> >> Future<List<JsonArray>> query(final String sql, final long timeoutMs, >> final Object... args) { >> final Promise<List<JsonArray>> promise = Promise.promise(); >> >> vertx.setTimer(timeoutMs, l -> { >> promise.tryFail(new TimeoutException("Cache operation did not >> complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE DOESN"T >> COMPLETE IN TIME. >> }); >> >> vertx.<List<JsonArray>>executeBlocking(code -> { >> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args); >> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS); >> >> >> try (QueryCursor<List<?>> cursor = cache.query(query)) { // <--- >> BLOCKS HERE. >> List<JsonArray> rows = new ArrayList<>(); >> Iterator<List<?>> iterator = cursor.iterator(); >> >> while(iterator.hasNext()) { >> List currentRow = iterator.next(); >> JsonArray row = new JsonArray(); >> >> currentRow.forEach(o -> row.add(o)); >> >> rows.add(row); >> } >> >> code.complete(rows); >> } catch(Exception ex) { >> code.fail(ex); >> } >> }, result -> { >> if(result.succeeded()) { >> promise.tryComplete(result.result()); >> } else { >> promise.tryFail(result.cause()); >> } >> }); >> >> return promise.future(); >> } >> >> public <T> T cache() { >> return (T) cache; >> } >> } >> >> >> >> On Thu, 13 Aug 2020 at 16:29, Denis Magda <[email protected]> wrote: >> >>> I've created a simple test and always getting the exception below on an >>> attempt to get a reference to an IgniteCache instance in cases when the >>> cluster is not activated: >>> >>> *Exception in thread "main" class org.apache.ignite.IgniteException: Can >>> not perform the operation because the cluster is inactive. Note, that the >>> cluster is considered inactive by default if Ignite Persistent Store is >>> used to let all the nodes join the cluster. To activate the cluster call >>> Ignite.active(true)* >>> >>> Are you trying to get a new IgniteCache reference whenever the client >>> reconnects successfully to the cluster? My guts feel that currently, Ignite >>> verifies the activation status and generates the exception above whenever >>> you're getting a reference to an IgniteCache or IgniteCompute. But once you >>> got those references and try to run some operations then those get stuck if >>> the cluster is not activated. >>> - >>> Denis >>> >>> >>> On Thu, Aug 13, 2020 at 6:37 AM John Smith <[email protected]> >>> wrote: >>> >>>> The cache.query() starts to block when ignite server nodes are being >>>> restarted and there's no baseline topology yet. The server nodes do not >>>> block. It's the client that blocks. >>>> >>>> The dumpfiles are of the server nodes. The screen shot is from the >>>> client app using your kit profiler on the client side the threads are >>>> marked as red on your kit. >>>> >>>> The app is simple, make http request, it runs cache Sql query on ignite >>>> and if it succeeds does a put back to ignite. >>>> >>>> The Client disconnected exception only happens when all server nodes in >>>> the cluster are down. The blockage only happens when the cluster is trying >>>> to establish baseline topology. >>>> >>>> On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda, <[email protected]> >>>> wrote: >>>> >>>>> John, >>>>> >>>>> I don't see any traits of an application-caused deadlock in the thread >>>>> dumps. Please elaborate on the following: >>>>> >>>>> 7- Restart 1st node, run operation, operation fails with >>>>>> ClientDisconectedException but application still able to complete it's >>>>>> request. >>>>> >>>>> >>>>> What's the IP address of the server node the client app uses to join >>>>> the cluster? If that's not the address of the 1st node, that is already >>>>> restarted, then the client couldn't join the cluster and it's expected >>>>> that >>>>> it fails with the ClientDisconnectedException. >>>>> >>>>> 8- Start 2nd node, run operation, from here on all operations just >>>>>> block. >>>>> >>>>> >>>>> Are the operations unblocked and completed successfully when the third >>>>> node joins the cluster and the cluster gets activated automatically? >>>>> >>>>> - >>>>> Denis >>>>> >>>>> >>>>> On Wed, Aug 12, 2020 at 11:08 AM John Smith <[email protected]> >>>>> wrote: >>>>> >>>>>> Ok Denis here they are... >>>>>> >>>>>> 3 nodes and I capture a yourlit screenshot of what it thinks are >>>>>> deadlocks on the client app. >>>>>> >>>>>> >>>>>> https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0 >>>>>> >>>>>> On Wed, 12 Aug 2020 at 11:07, John Smith <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Denis. I will asap but you I think you were right it is the query >>>>>>> that blocks. >>>>>>> >>>>>>> My application first first runs a select on the cache and then does >>>>>>> a put to cache. >>>>>>> >>>>>>> On Tue, 11 Aug 2020 at 19:22, Denis Magda <[email protected]> wrote: >>>>>>> >>>>>>>> John, >>>>>>>> >>>>>>>> It sounds like a deadlock caused by the application logic. Is there >>>>>>>> any chance that the operation you run on step 8 accesses several keys >>>>>>>> in >>>>>>>> one order while the other operations work with the same keys but in a >>>>>>>> different order. The deadlocks are possible when you use Ignite >>>>>>>> Transaction >>>>>>>> API or simply execute bulk operations such as cache.readAll() or >>>>>>>> cache.writeAll(..). >>>>>>>> >>>>>>>> Please take and attach thread dumps from all the cluster nodes for >>>>>>>> analysis if we need to dig deeper. >>>>>>>> >>>>>>>> - >>>>>>>> Denis >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 10, 2020 at 6:23 PM John Smith <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Denis, I think you are right. It's the query that blocks the >>>>>>>>> other k/v operations are ok. >>>>>>>>> >>>>>>>>> Any thoughts on this? >>>>>>>>> >>>>>>>>> On Mon, 10 Aug 2020 at 15:28, John Smith <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I tried with 2.8.1, same issue. Operations block indefinitely... >>>>>>>>>> >>>>>>>>>> 1- Start 3 node cluster >>>>>>>>>> 2- Start client application client = true with Ignition.start() >>>>>>>>>> 3- Run some cache operations, everything ok... >>>>>>>>>> 4- Shut down one node, run operation, still ok >>>>>>>>>> 5- Shut down 2nd node, run operation, still ok >>>>>>>>>> 6- Shut down 3rd node, run operation, still ok... >>>>>>>>>> Operations start failing with ClientDisconectedException... >>>>>>>>>> 7- Restart 1st node, run operation, operation fails >>>>>>>>>> with ClientDisconectedException but application still able to >>>>>>>>>> complete it's >>>>>>>>>> request. >>>>>>>>>> 8- Start 2nd node, run operation, from here on all operations >>>>>>>>>> just block. >>>>>>>>>> >>>>>>>>>> Basically the client application is an HTTP Server on each HTTP >>>>>>>>>> request does cache exception. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, 7 Aug 2020 at 19:46, John Smith <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> No, everything blocks... Also using 2.7.0 just in case. >>>>>>>>>>> >>>>>>>>>>> Only time I get exception is if the cluster is completely off, >>>>>>>>>>> then I get ClientDisconectedException... >>>>>>>>>>> >>>>>>>>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> If I'm not mistaken, key-value operations (cache.get/put) and >>>>>>>>>>>> compute calls fail with an exception if the cluster is >>>>>>>>>>>> deactivated. Do >>>>>>>>>>>> those fail on your end? >>>>>>>>>>>> >>>>>>>>>>>> As for the async and SQL operations, let's see what other >>>>>>>>>>>> community members say. >>>>>>>>>>>> >>>>>>>>>>>> - >>>>>>>>>>>> Denis >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi any thoughts on this? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Here is another example where it blocks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery( >>>>>>>>>>>>>> "select * from my_table") >>>>>>>>>>>>>> .setArgs(providerId, carrierCode); >>>>>>>>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS); >>>>>>>>>>>>>> >>>>>>>>>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query)) >>>>>>>>>>>>>> >>>>>>>>>>>>>> cache.query just blocks even with the timeout set. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there a way to timeout and at least have the application >>>>>>>>>>>>>> continue and respond with an appropriate message? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi running 2.7.0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> When I reboot a node and it begins to rejoin the cluster or >>>>>>>>>>>>>>> the cluster is not yet activated with baseline topology >>>>>>>>>>>>>>> operations seem to >>>>>>>>>>>>>>> block forever, operations that are supposed to return >>>>>>>>>>>>>>> IgniteFuture. I.e: >>>>>>>>>>>>>>> putAsync, getAsync etc... They just block, until the cluster >>>>>>>>>>>>>>> resolves it's >>>>>>>>>>>>>>> state. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
