Yeah I can maybe use vertx event bus or something to do this... But now I have to tie the ignite instance to the IgniteCahe repository I wrote.
But on client reconnect, doesn't it mean it will still block until the cluster is active even if I get new IgniteCache instance? On Fri, 14 Aug 2020 at 18:22, Denis Magda <[email protected]> wrote: > @Evgenii Zhuravlev <[email protected]>, @Ilya Kasnacheev > <[email protected]>, any thoughts on this? > > As a dirty workaround, you can update your cache references on client > reconnect events. You will be getting an exception by calling > ignite.cache(cacheName) in the time when the cluster is not activated yet. > Does this work for you? > > - > Denis > > > On Fri, Aug 14, 2020 at 3:12 PM John Smith <[email protected]> wrote: > >> Is there any work around? I can't have an HTTP server block on all >> requests. >> >> 1- I need to figure out why I lose a server nodes every few weeks, which >> when rebooting the nodes cause the inactive state until they are back.... >> >> 2- Implement some kind of logic on the client side not to block the HTTP >> part... >> >> Can IgniteCache instance be notified of disconnected events so I can >> maybe tell the repository class I have to set a flag to skip the operation? >> >> >> On Fri., Aug. 14, 2020, 5:17 p.m. Denis Magda, <[email protected]> wrote: >> >>> My guess that it's standard behavior for all operations (SQL, key-value, >>> compute, etc.). But I'll let the maintainers of those modules clarify. >>> >>> - >>> Denis >>> >>> >>> On Fri, Aug 14, 2020 at 1:44 PM John Smith <[email protected]> >>> wrote: >>> >>>> Hi Denis, so to understand it's all operations or just the query? >>>> >>>> On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda, <[email protected]> >>>> wrote: >>>> >>>>> John, >>>>> >>>>> Ok, we nailed it. That's the current expected behavior. Generally, I >>>>> agree with you that the platform should support an option when operations >>>>> fail if the cluster is deactivated. Could you propose the change by >>>>> starting a discussion on the dev list? You can refer to this user list >>>>> discussion for reference. Let me know if you need help with this. >>>>> >>>>> - >>>>> Denis >>>>> >>>>> >>>>> On Thu, Aug 13, 2020 at 5:55 PM John Smith <[email protected]> >>>>> wrote: >>>>> >>>>>> No I, reuse the instance. The cache instance is created once at >>>>>> startup of the application and I pass it to my "repository" class >>>>>> >>>>>> public abstract class AbstractIgniteRepository<K,V> implements >>>>>> CacheRepository<K, V> { >>>>>> public final long DEFAULT_OPERATION_TIMEOUT = 2000; >>>>>> >>>>>> private Vertx vertx; >>>>>> private IgniteCache<K, V> cache; >>>>>> >>>>>> AbstractIgniteRepository(Vertx vertx, IgniteCache<K, V> cache) { >>>>>> this.vertx = vertx; >>>>>> this.cache = cache; >>>>>> } >>>>>> >>>>>> ... >>>>>> >>>>>> Future<List<JsonArray>> query(final String sql, final long >>>>>> timeoutMs, final Object... args) { >>>>>> final Promise<List<JsonArray>> promise = Promise.promise(); >>>>>> >>>>>> vertx.setTimer(timeoutMs, l -> { >>>>>> promise.tryFail(new TimeoutException("Cache operation did >>>>>> not complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE >>>>>> DOESN"T COMPLETE IN TIME. >>>>>> }); >>>>>> >>>>>> vertx.<List<JsonArray>>executeBlocking(code -> { >>>>>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args); >>>>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS); >>>>>> >>>>>> >>>>>> try (QueryCursor<List<?>> cursor = cache.query(query)) { // >>>>>> <--- BLOCKS HERE. >>>>>> List<JsonArray> rows = new ArrayList<>(); >>>>>> Iterator<List<?>> iterator = cursor.iterator(); >>>>>> >>>>>> while(iterator.hasNext()) { >>>>>> List currentRow = iterator.next(); >>>>>> JsonArray row = new JsonArray(); >>>>>> >>>>>> currentRow.forEach(o -> row.add(o)); >>>>>> >>>>>> rows.add(row); >>>>>> } >>>>>> >>>>>> code.complete(rows); >>>>>> } catch(Exception ex) { >>>>>> code.fail(ex); >>>>>> } >>>>>> }, result -> { >>>>>> if(result.succeeded()) { >>>>>> promise.tryComplete(result.result()); >>>>>> } else { >>>>>> promise.tryFail(result.cause()); >>>>>> } >>>>>> }); >>>>>> >>>>>> return promise.future(); >>>>>> } >>>>>> >>>>>> public <T> T cache() { >>>>>> return (T) cache; >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> On Thu, 13 Aug 2020 at 16:29, Denis Magda <[email protected]> wrote: >>>>>> >>>>>>> I've created a simple test and always getting the exception below on >>>>>>> an attempt to get a reference to an IgniteCache instance in cases when >>>>>>> the >>>>>>> cluster is not activated: >>>>>>> >>>>>>> *Exception in thread "main" class org.apache.ignite.IgniteException: >>>>>>> Can not perform the operation because the cluster is inactive. Note, >>>>>>> that >>>>>>> the cluster is considered inactive by default if Ignite Persistent >>>>>>> Store is >>>>>>> used to let all the nodes join the cluster. To activate the cluster call >>>>>>> Ignite.active(true)* >>>>>>> >>>>>>> Are you trying to get a new IgniteCache reference whenever the >>>>>>> client reconnects successfully to the cluster? My guts feel that >>>>>>> currently, >>>>>>> Ignite verifies the activation status and generates the exception above >>>>>>> whenever you're getting a reference to an IgniteCache or IgniteCompute. >>>>>>> But >>>>>>> once you got those references and try to run some operations then those >>>>>>> get >>>>>>> stuck if the cluster is not activated. >>>>>>> - >>>>>>> Denis >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 13, 2020 at 6:37 AM John Smith <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> The cache.query() starts to block when ignite server nodes are >>>>>>>> being restarted and there's no baseline topology yet. The server nodes >>>>>>>> do >>>>>>>> not block. It's the client that blocks. >>>>>>>> >>>>>>>> The dumpfiles are of the server nodes. The screen shot is from the >>>>>>>> client app using your kit profiler on the client side the threads are >>>>>>>> marked as red on your kit. >>>>>>>> >>>>>>>> The app is simple, make http request, it runs cache Sql query on >>>>>>>> ignite and if it succeeds does a put back to ignite. >>>>>>>> >>>>>>>> The Client disconnected exception only happens when all server >>>>>>>> nodes in the cluster are down. The blockage only happens when the >>>>>>>> cluster >>>>>>>> is trying to establish baseline topology. >>>>>>>> >>>>>>>> On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda, <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> John, >>>>>>>>> >>>>>>>>> I don't see any traits of an application-caused deadlock in the >>>>>>>>> thread dumps. Please elaborate on the following: >>>>>>>>> >>>>>>>>> 7- Restart 1st node, run operation, operation fails with >>>>>>>>>> ClientDisconectedException but application still able to complete >>>>>>>>>> it's >>>>>>>>>> request. >>>>>>>>> >>>>>>>>> >>>>>>>>> What's the IP address of the server node the client app uses to >>>>>>>>> join the cluster? If that's not the address of the 1st node, that is >>>>>>>>> already restarted, then the client couldn't join the cluster and it's >>>>>>>>> expected that it fails with the ClientDisconnectedException. >>>>>>>>> >>>>>>>>> 8- Start 2nd node, run operation, from here on all operations just >>>>>>>>>> block. >>>>>>>>> >>>>>>>>> >>>>>>>>> Are the operations unblocked and completed successfully when the >>>>>>>>> third node joins the cluster and the cluster gets activated >>>>>>>>> automatically? >>>>>>>>> >>>>>>>>> - >>>>>>>>> Denis >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Aug 12, 2020 at 11:08 AM John Smith < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Ok Denis here they are... >>>>>>>>>> >>>>>>>>>> 3 nodes and I capture a yourlit screenshot of what it thinks are >>>>>>>>>> deadlocks on the client app. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0 >>>>>>>>>> >>>>>>>>>> On Wed, 12 Aug 2020 at 11:07, John Smith <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Denis. I will asap but you I think you were right it is the >>>>>>>>>>> query that blocks. >>>>>>>>>>> >>>>>>>>>>> My application first first runs a select on the cache and then >>>>>>>>>>> does a put to cache. >>>>>>>>>>> >>>>>>>>>>> On Tue, 11 Aug 2020 at 19:22, Denis Magda <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> John, >>>>>>>>>>>> >>>>>>>>>>>> It sounds like a deadlock caused by the application logic. Is >>>>>>>>>>>> there any chance that the operation you run on step 8 accesses >>>>>>>>>>>> several keys >>>>>>>>>>>> in one order while the other operations work with the same keys >>>>>>>>>>>> but in a >>>>>>>>>>>> different order. The deadlocks are possible when you use Ignite >>>>>>>>>>>> Transaction >>>>>>>>>>>> API or simply execute bulk operations such as cache.readAll() or >>>>>>>>>>>> cache.writeAll(..). >>>>>>>>>>>> >>>>>>>>>>>> Please take and attach thread dumps from all the cluster nodes >>>>>>>>>>>> for analysis if we need to dig deeper. >>>>>>>>>>>> >>>>>>>>>>>> - >>>>>>>>>>>> Denis >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Aug 10, 2020 at 6:23 PM John Smith < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Denis, I think you are right. It's the query that blocks >>>>>>>>>>>>> the other k/v operations are ok. >>>>>>>>>>>>> >>>>>>>>>>>>> Any thoughts on this? >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, 10 Aug 2020 at 15:28, John Smith < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I tried with 2.8.1, same issue. Operations block >>>>>>>>>>>>>> indefinitely... >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1- Start 3 node cluster >>>>>>>>>>>>>> 2- Start client application client = true with >>>>>>>>>>>>>> Ignition.start() >>>>>>>>>>>>>> 3- Run some cache operations, everything ok... >>>>>>>>>>>>>> 4- Shut down one node, run operation, still ok >>>>>>>>>>>>>> 5- Shut down 2nd node, run operation, still ok >>>>>>>>>>>>>> 6- Shut down 3rd node, run operation, still ok... >>>>>>>>>>>>>> Operations start failing with ClientDisconectedException... >>>>>>>>>>>>>> 7- Restart 1st node, run operation, operation fails >>>>>>>>>>>>>> with ClientDisconectedException but application still able to >>>>>>>>>>>>>> complete it's >>>>>>>>>>>>>> request. >>>>>>>>>>>>>> 8- Start 2nd node, run operation, from here on all operations >>>>>>>>>>>>>> just block. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Basically the client application is an HTTP Server on each >>>>>>>>>>>>>> HTTP request does cache exception. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, 7 Aug 2020 at 19:46, John Smith < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> No, everything blocks... Also using 2.7.0 just in case. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Only time I get exception is if the cluster is >>>>>>>>>>>>>>> completely off, then I get ClientDisconectedException... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda <[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If I'm not mistaken, key-value operations (cache.get/put) >>>>>>>>>>>>>>>> and compute calls fail with an exception if the cluster is >>>>>>>>>>>>>>>> deactivated. Do >>>>>>>>>>>>>>>> those fail on your end? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As for the async and SQL operations, let's see what other >>>>>>>>>>>>>>>> community members say. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> Denis >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi any thoughts on this? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here is another example where it blocks. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery( >>>>>>>>>>>>>>>>>> "select * from my_table") >>>>>>>>>>>>>>>>>> .setArgs(providerId, carrierCode); >>>>>>>>>>>>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> try (QueryCursor<List<?>> cursor = cache.query(query)) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> cache.query just blocks even with the timeout set. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is there a way to timeout and at least have the >>>>>>>>>>>>>>>>>> application continue and respond with an appropriate message? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi running 2.7.0 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> When I reboot a node and it begins to rejoin the cluster >>>>>>>>>>>>>>>>>>> or the cluster is not yet activated with baseline topology >>>>>>>>>>>>>>>>>>> operations seem >>>>>>>>>>>>>>>>>>> to block forever, operations that are supposed to return >>>>>>>>>>>>>>>>>>> IgniteFuture. I.e: >>>>>>>>>>>>>>>>>>> putAsync, getAsync etc... They just block, until the >>>>>>>>>>>>>>>>>>> cluster resolves it's >>>>>>>>>>>>>>>>>>> state. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>
