Mikhail, I merged your changes. Thanks for your contribution! On Tue, May 12, 2020 at 8:01 PM Vyacheslav Daradur <daradu...@gmail.com> wrote:
> Hi Mikhail, proposed changes make sense to me. > I left some comments to the pr. > Thank you! > > On Wed, May 6, 2020 at 2:28 PM Mikhail Petrov <pmgheap....@gmail.com> > wrote: > >> Hello, Igniters. >> >> I am working on IGNITE-12894 - [1]. It seems that it has the root cause >> which is similar to the problem described in this thread. >> >> To solve these problems, I propose to change the behavior of the >> IgniteServiceProcessor#serviceTopology if the timeout argument is 0. >> At the moment, IgniteServiceProcessor#serviceTopology returns the >> topology immediately, regardless of whether it was initialized or not in >> this case. I propose to wait for the service topology to be initialized >> if the requested service is already registered on local node, but the >> full message was not received from the coordinator yet. >> >> So the final behavior of IgniteServices#serviceProxy() will be: >> 1. If the timeout is specified - it waits for the topology over a >> specified timeout even if the requested service was not registered yet. >> As in current implementation. >> >> 2. If the timeout is not specified - if service was not registered it >> fails immediately, else it is waiting for the topology initialization >> (full message from the coordinator) if needed. >> >> Here is PR with the implementation of the described proposal - [2]. >> >> WDYT? >> >> [1] - https://issues.apache.org/jira/browse/IGNITE-12894 >> [2] - https://github.com/apache/ignite/pull/7771 >> >> On 30.12.2019 13:03, Alexey Goncharuk wrote: >> > Agree, sounds like a plan, thanks for taking over! >> > >> > пн, 30 дек. 2019 г. в 13:00, Vyacheslav Daradur <daradu...@gmail.com>: >> > >> >> Alexey, >> >> >> >> I would not make it default in the current implementation. >> >> >> >> Waiting of proxies on non-deployment-initiator nodes should be >> >> improved - additional checks are required: >> >> 1) We should not wait if requested service has not been submitted to >> >> deploy (when there is no info about such service) >> >> 2) If service deployment failed - getting proxy should be failed or >> >> interrupted as well (do not wait for all available timeout) >> >> >> >> Let's schedule this improvement to next release, I'll try to find a >> >> time to implement it. >> >> >> >> What do you think? >> >> >> >> On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk >> >> <alexey.goncha...@gmail.com> wrote: >> >>> Vyacheslav, thanks for the explanation, makes sense to me. >> >>> >> >>> I was thinking though, should we make the behavior with the timeout >> >> default >> >>> for all proxies? >> >>> >> >>> Just my opinion - I think for a user it would be hard to control which >> >> node >> >>> deploys the service, especially if multiple nodes deploy it >> concurrently. >> >>> Most likely users will end up always calling the second option of the >> >> proxy >> >>> (with the timeout), so, perhaps, make it default? >> >>> >> >>> вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <daradu...@gmail.com >> >: >> >>> >> >>>> Alexey, >> >>>> >> >>>> I've prepared pr [1] to show our proxy invocation guarantees and to >> >>>> avoid misunderstanding. >> >>>> >> >>>> Please, let me know if you think that we should improve our >> guaranties >> >>>> in some cases. >> >>>> >> >>>> [1] https://github.com/apache/ignite/pull/7213 >> >>>> >> >>>> On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur < >> >> daradu...@gmail.com> >> >>>> wrote: >> >>>>>> even the local deployment looks broken: if a compute job >> >>>>>> is sent to a remote node after the service deployment >> >>>>> This is a different case and covered by retries: >> >>>>> * If you deploy a service from node A to node B, then take a proxy >> >>>>> from node A (deployment initiator) it should NOT fail even if node B >> >>>>> has not received yet a message that deployment finished >> successfully, >> >>>>> because of proxy invocation retries. >> >>>>> >> >>>>> Look like It's better to describe all these cases on the wiki. >> >>>>> >> >>>>>> Should we schedule this ticket for the further work on Services >> >> IEP? >> >>>>> If it is a frequent use-case we definitely should implement it. >> >>>>> >> >>>>> >> >>>>> On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk >> >>>>> <alexey.goncha...@gmail.com> wrote: >> >>>>>> Ok, got it. >> >>>>>> >> >>>>>> I agree that this is consistent with the old behavior, but this is >> >> the >> >>>> kind >> >>>>>> of errors we wanted to get rid of when we started the IEP. From the >> >>>>>> user perspective, even the local deployment looks broken: if a >> >> compute >> >>>> job >> >>>>>> is sent to a remote node after the service deployment, the job >> >>>> execution >> >>>>>> may fail due to this error. >> >>>>>> >> >>>>>> Should we schedule this ticket for the further work on Services >> >> IEP? >> >>>>>> вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur < >> >> daradu...@gmail.com>: >> >>>>>>> Not sure that "user fallback" is the right definition, it is not >> >> new >> >>>>>>> behaviour in comparison with legacy implementation. >> >>>>>>> >> >>>>>>> Our synchronous deployment provides guaranties for a deployment >> >>>>>>> initiator to be able to start work with service immediately after >> >>>>>>> deployment finished successfully. >> >>>>>>> For not the deployment initiator we can't provide such guarantees >> >>>> now, >> >>>>>>> because of unknown deployment result and possibly fail. >> >>>>>>> >> >>>>>>> In this case, a reasonable timeout might be an acceptable >> >> solution. >> >>>>>>> We can improve guaranties in future releases, but there is an >> >> open >> >>>>>>> question: >> >>>>>>> - how long taking of proxy should wait? - deployment of "heavy" >> >>>>>>> service may take a while >> >>>>>>> >> >>>>>>> On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk >> >>>>>>> <alexey.goncha...@gmail.com> wrote: >> >>>>>>>> What should be the user fallback in this case? Retry >> >> infinitely? Is >> >>>>>>> there a >> >>>>>>>> way to wait for the proper deployment? >> >>>>>>>> >> >>>>>>>> вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur < >> >>>> daradu...@gmail.com>: >> >>>>>>>>> I’ll take a look at the end of the week. >> >>>>>>>>> >> >>>>>>>>> There is one more use-case: >> >>>>>>>>> * if you initiate deployment from node A, but getting proxy >> >> on >> >>>> node B >> >>>>>>>>> (which isn’t deployment initiator) to call service on node A >> >> - >> >>>> it may >> >>>>>>> fail >> >>>>>>>>> with "service not found", this is expected behaviour because >> >> we >> >>>> didn't >> >>>>>>>>> provide such guarantees. >> >>>>>>>>> >> >>>>>>>>> API of getting proxy with timeout should be used in this >> >> case: >> >>>>>>>>> T serviceProxy(String name, Class<? super T> svcItf, boolean >> >>>> sticky, >> >>>>>>> long >> >>>>>>>>> timeout) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk < >> >>>>>>> alexey.goncha...@gmail.com >> >>>>>>>>>> : >> >>>>>>>>>> Well, this is exactly the case. The service is deployed >> >> from >> >>>> node A, >> >>>>>>> the >> >>>>>>>>>> proxy is created on node B, and "service not found" >> >> exception >> >>>> gets >> >>>>>>> thrown >> >>>>>>>>>> to a user anyway. Perhaps, the retry happens too fast? >> >>>>>>>>>> >> >>>>>>>>>> Created a ticket [1]. >> >>>>>>>>>> >> >>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12490 >> >>>>>>>>>> >> >>>>>>>>>> пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur < >> >>>> daradu...@gmail.com >> >>>>>>>> : >> >>>>>>>>>>> Hi, Alexey >> >>>>>>>>>>> >> >>>>>>>>>>> Please attach a reproducer to the ticket. >> >>>>>>>>>>> >> >>>>>>>>>>> As far as I remember we have the following behaviour for >> >> the >> >>>>>>> proxies: >> >>>>>>>>>>> Let's assume you have deployed service from node A, then: >> >>>>>>>>>>> * if you invoke service locally from node A - it is >> >>>> guaranteed to >> >>>>>>>>>>> service to be deployed and ready to work >> >>>>>>>>>>> * if you take a proxy from node A to remote node B right >> >>>> after >> >>>>>>> deploy >> >>>>>>>>>>> - there is might be a race between disco-spi (a message >> >> which >> >>>>>>> releases >> >>>>>>>>>>> deployed service) and comm-spi (remote call works via >> >>>> Compute over >> >>>>>>>>>>> comm-spi), but it shouldn't affect end-users because the >> >>>> failed >> >>>>>>>>>>> request will be retried in this case >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk >> >>>>>>>>>>> <alexey.goncha...@gmail.com> wrote: >> >>>>>>>>>>>> Nikolay, >> >>>>>>>>>>>> >> >>>>>>>>>>>> Yes, I've rechecked, the new service processor is being >> >>>> used. >> >>>>>>> I'll >> >>>>>>>>>> file a >> >>>>>>>>>>>> bug shortly. >> >>>>>>>>>>>> >> >>>>>>>>>>>> пн, 23 дек. 2019 г. в 17:33, Николай Ижиков < >> >>>> nizhi...@apache.org >> >>>>>>>> : >> >>>>>>>>>>>>> Alexey, are you sure, you are testing new service >> >>>> framework? >> >>>>>>>>>>>>> Is yes - you definitely should file a bug. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> 23 дек. 2019 г., в 17:02, Alexey Goncharuk < >> >>>>>>>>>>> alexey.goncha...@gmail.com> >> >>>>>>>>>>>>> написал(а): >> >>>>>>>>>>>>>> Igniters, >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I have a question based on one of my recent tests >> >>>> debugging. >> >>>>>>>>>>>>>> The test is related to Ignite services. I noticed >> >> that >> >>>>>>> sometimes >> >>>>>>>>> a >> >>>>>>>>>>> proxy >> >>>>>>>>>>>>>> invocation of a newly deployed service fails >> >> because >> >>>> the >> >>>>>>> service >> >>>>>>>>>>> cannot >> >>>>>>>>>>>>> be >> >>>>>>>>>>>>>> found. I managed to reduce the test to a simple >> >> "start >> >>>> two >> >>>>>>> nodes, >> >>>>>>>>>>> deploy >> >>>>>>>>>>>>> a >> >>>>>>>>>>>>>> service, create a proxy, invoke the proxy" >> >> scenario. >> >>>> The >> >>>>>>> proxy >> >>>>>>>>>>> invocation >> >>>>>>>>>>>>>> fails in about ~80% of runs. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> As far as I remember, the new discovery-based >> >> service >> >>>>>>> deployment >> >>>>>>>>>> was >> >>>>>>>>>>>>>> supposed to be synchronous, so not only non-proxy >> >>>> service >> >>>>>>>>> instances >> >>>>>>>>>>>>> should >> >>>>>>>>>>>>>> work, but the proxies as well. Was my understanding >> >>>> correct? >> >>>>>>>>>> Should I >> >>>>>>>>>>>>> file >> >>>>>>>>>>>>>> a bug for the observed behavior? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> --AG >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> Best Regards, Vyacheslav D. >> >>>>>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Best Regards, Vyacheslav D. >> >>>>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Best Regards, Vyacheslav D. >> >>>> >> >>>> >> >>>> -- >> >>>> Best Regards, Vyacheslav D. >> >>>> >> >> >> >> >> >> -- >> >> Best Regards, Vyacheslav D. >> >> >> > > > -- > Best Regards, > Vyacheslav D. > -- Best Regards, Vyacheslav D.