Hi Mikhail, proposed changes make sense to me. I left some comments to the pr. Thank you!
On Wed, May 6, 2020 at 2:28 PM Mikhail Petrov <pmgheap....@gmail.com> wrote: > Hello, Igniters. > > I am working on IGNITE-12894 - [1]. It seems that it has the root cause > which is similar to the problem described in this thread. > > To solve these problems, I propose to change the behavior of the > IgniteServiceProcessor#serviceTopology if the timeout argument is 0. > At the moment, IgniteServiceProcessor#serviceTopology returns the > topology immediately, regardless of whether it was initialized or not in > this case. I propose to wait for the service topology to be initialized > if the requested service is already registered on local node, but the > full message was not received from the coordinator yet. > > So the final behavior of IgniteServices#serviceProxy() will be: > 1. If the timeout is specified - it waits for the topology over a > specified timeout even if the requested service was not registered yet. > As in current implementation. > > 2. If the timeout is not specified - if service was not registered it > fails immediately, else it is waiting for the topology initialization > (full message from the coordinator) if needed. > > Here is PR with the implementation of the described proposal - [2]. > > WDYT? > > [1] - https://issues.apache.org/jira/browse/IGNITE-12894 > [2] - https://github.com/apache/ignite/pull/7771 > > On 30.12.2019 13:03, Alexey Goncharuk wrote: > > Agree, sounds like a plan, thanks for taking over! > > > > пн, 30 дек. 2019 г. в 13:00, Vyacheslav Daradur <daradu...@gmail.com>: > > > >> Alexey, > >> > >> I would not make it default in the current implementation. > >> > >> Waiting of proxies on non-deployment-initiator nodes should be > >> improved - additional checks are required: > >> 1) We should not wait if requested service has not been submitted to > >> deploy (when there is no info about such service) > >> 2) If service deployment failed - getting proxy should be failed or > >> interrupted as well (do not wait for all available timeout) > >> > >> Let's schedule this improvement to next release, I'll try to find a > >> time to implement it. > >> > >> What do you think? > >> > >> On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk > >> <alexey.goncha...@gmail.com> wrote: > >>> Vyacheslav, thanks for the explanation, makes sense to me. > >>> > >>> I was thinking though, should we make the behavior with the timeout > >> default > >>> for all proxies? > >>> > >>> Just my opinion - I think for a user it would be hard to control which > >> node > >>> deploys the service, especially if multiple nodes deploy it > concurrently. > >>> Most likely users will end up always calling the second option of the > >> proxy > >>> (with the timeout), so, perhaps, make it default? > >>> > >>> вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <daradu...@gmail.com>: > >>> > >>>> Alexey, > >>>> > >>>> I've prepared pr [1] to show our proxy invocation guarantees and to > >>>> avoid misunderstanding. > >>>> > >>>> Please, let me know if you think that we should improve our guaranties > >>>> in some cases. > >>>> > >>>> [1] https://github.com/apache/ignite/pull/7213 > >>>> > >>>> On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur < > >> daradu...@gmail.com> > >>>> wrote: > >>>>>> even the local deployment looks broken: if a compute job > >>>>>> is sent to a remote node after the service deployment > >>>>> This is a different case and covered by retries: > >>>>> * If you deploy a service from node A to node B, then take a proxy > >>>>> from node A (deployment initiator) it should NOT fail even if node B > >>>>> has not received yet a message that deployment finished successfully, > >>>>> because of proxy invocation retries. > >>>>> > >>>>> Look like It's better to describe all these cases on the wiki. > >>>>> > >>>>>> Should we schedule this ticket for the further work on Services > >> IEP? > >>>>> If it is a frequent use-case we definitely should implement it. > >>>>> > >>>>> > >>>>> On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk > >>>>> <alexey.goncha...@gmail.com> wrote: > >>>>>> Ok, got it. > >>>>>> > >>>>>> I agree that this is consistent with the old behavior, but this is > >> the > >>>> kind > >>>>>> of errors we wanted to get rid of when we started the IEP. From the > >>>>>> user perspective, even the local deployment looks broken: if a > >> compute > >>>> job > >>>>>> is sent to a remote node after the service deployment, the job > >>>> execution > >>>>>> may fail due to this error. > >>>>>> > >>>>>> Should we schedule this ticket for the further work on Services > >> IEP? > >>>>>> вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur < > >> daradu...@gmail.com>: > >>>>>>> Not sure that "user fallback" is the right definition, it is not > >> new > >>>>>>> behaviour in comparison with legacy implementation. > >>>>>>> > >>>>>>> Our synchronous deployment provides guaranties for a deployment > >>>>>>> initiator to be able to start work with service immediately after > >>>>>>> deployment finished successfully. > >>>>>>> For not the deployment initiator we can't provide such guarantees > >>>> now, > >>>>>>> because of unknown deployment result and possibly fail. > >>>>>>> > >>>>>>> In this case, a reasonable timeout might be an acceptable > >> solution. > >>>>>>> We can improve guaranties in future releases, but there is an > >> open > >>>>>>> question: > >>>>>>> - how long taking of proxy should wait? - deployment of "heavy" > >>>>>>> service may take a while > >>>>>>> > >>>>>>> On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk > >>>>>>> <alexey.goncha...@gmail.com> wrote: > >>>>>>>> What should be the user fallback in this case? Retry > >> infinitely? Is > >>>>>>> there a > >>>>>>>> way to wait for the proper deployment? > >>>>>>>> > >>>>>>>> вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur < > >>>> daradu...@gmail.com>: > >>>>>>>>> I’ll take a look at the end of the week. > >>>>>>>>> > >>>>>>>>> There is one more use-case: > >>>>>>>>> * if you initiate deployment from node A, but getting proxy > >> on > >>>> node B > >>>>>>>>> (which isn’t deployment initiator) to call service on node A > >> - > >>>> it may > >>>>>>> fail > >>>>>>>>> with "service not found", this is expected behaviour because > >> we > >>>> didn't > >>>>>>>>> provide such guarantees. > >>>>>>>>> > >>>>>>>>> API of getting proxy with timeout should be used in this > >> case: > >>>>>>>>> T serviceProxy(String name, Class<? super T> svcItf, boolean > >>>> sticky, > >>>>>>> long > >>>>>>>>> timeout) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk < > >>>>>>> alexey.goncha...@gmail.com > >>>>>>>>>> : > >>>>>>>>>> Well, this is exactly the case. The service is deployed > >> from > >>>> node A, > >>>>>>> the > >>>>>>>>>> proxy is created on node B, and "service not found" > >> exception > >>>> gets > >>>>>>> thrown > >>>>>>>>>> to a user anyway. Perhaps, the retry happens too fast? > >>>>>>>>>> > >>>>>>>>>> Created a ticket [1]. > >>>>>>>>>> > >>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12490 > >>>>>>>>>> > >>>>>>>>>> пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur < > >>>> daradu...@gmail.com > >>>>>>>> : > >>>>>>>>>>> Hi, Alexey > >>>>>>>>>>> > >>>>>>>>>>> Please attach a reproducer to the ticket. > >>>>>>>>>>> > >>>>>>>>>>> As far as I remember we have the following behaviour for > >> the > >>>>>>> proxies: > >>>>>>>>>>> Let's assume you have deployed service from node A, then: > >>>>>>>>>>> * if you invoke service locally from node A - it is > >>>> guaranteed to > >>>>>>>>>>> service to be deployed and ready to work > >>>>>>>>>>> * if you take a proxy from node A to remote node B right > >>>> after > >>>>>>> deploy > >>>>>>>>>>> - there is might be a race between disco-spi (a message > >> which > >>>>>>> releases > >>>>>>>>>>> deployed service) and comm-spi (remote call works via > >>>> Compute over > >>>>>>>>>>> comm-spi), but it shouldn't affect end-users because the > >>>> failed > >>>>>>>>>>> request will be retried in this case > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk > >>>>>>>>>>> <alexey.goncha...@gmail.com> wrote: > >>>>>>>>>>>> Nikolay, > >>>>>>>>>>>> > >>>>>>>>>>>> Yes, I've rechecked, the new service processor is being > >>>> used. > >>>>>>> I'll > >>>>>>>>>> file a > >>>>>>>>>>>> bug shortly. > >>>>>>>>>>>> > >>>>>>>>>>>> пн, 23 дек. 2019 г. в 17:33, Николай Ижиков < > >>>> nizhi...@apache.org > >>>>>>>> : > >>>>>>>>>>>>> Alexey, are you sure, you are testing new service > >>>> framework? > >>>>>>>>>>>>> Is yes - you definitely should file a bug. > >>>>>>>>>>>>> > >>>>>>>>>>>>>> 23 дек. 2019 г., в 17:02, Alexey Goncharuk < > >>>>>>>>>>> alexey.goncha...@gmail.com> > >>>>>>>>>>>>> написал(а): > >>>>>>>>>>>>>> Igniters, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I have a question based on one of my recent tests > >>>> debugging. > >>>>>>>>>>>>>> The test is related to Ignite services. I noticed > >> that > >>>>>>> sometimes > >>>>>>>>> a > >>>>>>>>>>> proxy > >>>>>>>>>>>>>> invocation of a newly deployed service fails > >> because > >>>> the > >>>>>>> service > >>>>>>>>>>> cannot > >>>>>>>>>>>>> be > >>>>>>>>>>>>>> found. I managed to reduce the test to a simple > >> "start > >>>> two > >>>>>>> nodes, > >>>>>>>>>>> deploy > >>>>>>>>>>>>> a > >>>>>>>>>>>>>> service, create a proxy, invoke the proxy" > >> scenario. > >>>> The > >>>>>>> proxy > >>>>>>>>>>> invocation > >>>>>>>>>>>>>> fails in about ~80% of runs. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> As far as I remember, the new discovery-based > >> service > >>>>>>> deployment > >>>>>>>>>> was > >>>>>>>>>>>>>> supposed to be synchronous, so not only non-proxy > >>>> service > >>>>>>>>> instances > >>>>>>>>>>>>> should > >>>>>>>>>>>>>> work, but the proxies as well. Was my understanding > >>>> correct? > >>>>>>>>>> Should I > >>>>>>>>>>>>> file > >>>>>>>>>>>>>> a bug for the observed behavior? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> --AG > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Best Regards, Vyacheslav D. > >>>>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Best Regards, Vyacheslav D. > >>>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Best Regards, Vyacheslav D. > >>>> > >>>> > >>>> -- > >>>> Best Regards, Vyacheslav D. > >>>> > >> > >> > >> -- > >> Best Regards, Vyacheslav D. > >> > -- Best Regards, Vyacheslav D.