Re: Discovery-based services deployment guarantees question

Alexey Goncharuk Mon, 30 Dec 2019 02:04:32 -0800

Agree, sounds like a plan, thanks for taking over!

пн, 30 дек. 2019 г. в 13:00, Vyacheslav Daradur <daradu...@gmail.com>:


> Alexey,
>
> I would not make it default in the current implementation.
>
> Waiting of proxies on non-deployment-initiator nodes should be
> improved - additional checks are required:
> 1) We should not wait if requested service has not been submitted to
> deploy (when there is no info about such service)
> 2) If service deployment failed - getting proxy should be failed or
> interrupted as well (do not wait for all available timeout)
>
> Let's schedule this improvement to next release, I'll try to find a
> time to implement it.
>
> What do you think?
>
> On Mon, Dec 30, 2019 at 12:05 PM Alexey Goncharuk
> <alexey.goncha...@gmail.com> wrote:
> >
> > Vyacheslav, thanks for the explanation, makes sense to me.
> >
> > I was thinking though, should we make the behavior with the timeout
> default
> > for all proxies?
> >
> > Just my opinion - I think for a user it would be hard to control which
> node
> > deploys the service, especially if multiple nodes deploy it concurrently.
> > Most likely users will end up always calling the second option of the
> proxy
> > (with the timeout), so, perhaps, make it default?
> >
> > вс, 29 дек. 2019 г. в 21:05, Vyacheslav Daradur <daradu...@gmail.com>:
> >
> > > Alexey,
> > >
> > > I've prepared pr [1] to show our proxy invocation guarantees and to
> > > avoid misunderstanding.
> > >
> > > Please, let me know if you think that we should improve our guaranties
> > > in some cases.
> > >
> > > [1] https://github.com/apache/ignite/pull/7213
> > >
> > > On Tue, Dec 24, 2019 at 7:27 PM Vyacheslav Daradur <
> daradu...@gmail.com>
> > > wrote:
> > > >
> > > > > even the local deployment looks broken: if a compute job
> > > > > is sent to a remote node after the service deployment
> > > >
> > > > This is a different case and covered by retries:
> > > > * If you deploy a service from node A to node B, then take a proxy
> > > > from node A (deployment initiator) it should NOT fail even if node B
> > > > has not received yet a message that deployment finished successfully,
> > > > because of proxy invocation retries.
> > > >
> > > > Look like It's better to describe all these cases on the wiki.
> > > >
> > > > > Should we schedule this ticket for the further work on Services
> IEP?
> > > >
> > > > If it is a frequent use-case we definitely should implement it.
> > > >
> > > >
> > > > On Tue, Dec 24, 2019 at 6:55 PM Alexey Goncharuk
> > > > <alexey.goncha...@gmail.com> wrote:
> > > > >
> > > > > Ok, got it.
> > > > >
> > > > > I agree that this is consistent with the old behavior, but this is
> the
> > > kind
> > > > > of errors we wanted to get rid of when we started the IEP. From the
> > > > > user perspective, even the local deployment looks broken: if a
> compute
> > > job
> > > > > is sent to a remote node after the service deployment, the job
> > > execution
> > > > > may fail due to this error.
> > > > >
> > > > > Should we schedule this ticket for the further work on Services
> IEP?
> > > > >
> > > > > вт, 24 дек. 2019 г. в 18:49, Vyacheslav Daradur <
> daradu...@gmail.com>:
> > > > >
> > > > > > Not sure that "user fallback" is the right definition, it is not
> new
> > > > > > behaviour in comparison with legacy implementation.
> > > > > >
> > > > > > Our synchronous deployment provides guaranties for a deployment
> > > > > > initiator to be able to start work with service immediately after
> > > > > > deployment finished successfully.
> > > > > > For not the deployment initiator we can't provide such guarantees
> > > now,
> > > > > > because of unknown deployment result and possibly fail.
> > > > > >
> > > > > > In this case, a reasonable timeout might be an acceptable
> solution.
> > > > > >
> > > > > > We can improve guaranties in future releases, but there is an
> open
> > > > > > question:
> > > > > > - how long taking of proxy should wait? - deployment of "heavy"
> > > > > > service may take a while
> > > > > >
> > > > > > On Tue, Dec 24, 2019 at 6:19 PM Alexey Goncharuk
> > > > > > <alexey.goncha...@gmail.com> wrote:
> > > > > > >
> > > > > > > What should be the user fallback in this case? Retry
> infinitely? Is
> > > > > > there a
> > > > > > > way to wait for the proper deployment?
> > > > > > >
> > > > > > > вт, 24 дек. 2019 г. в 12:41, Vyacheslav Daradur <
> > > daradu...@gmail.com>:
> > > > > > >
> > > > > > > > I’ll take a look at the end of the week.
> > > > > > > >
> > > > > > > > There is one more use-case:
> > > > > > > > * if you initiate deployment from node A, but getting proxy
> on
> > > node B
> > > > > > > > (which isn’t deployment initiator) to call service on node A
> -
> > > it may
> > > > > > fail
> > > > > > > > with "service not found", this is expected behaviour because
> we
> > > didn't
> > > > > > > > provide such guarantees.
> > > > > > > >
> > > > > > > > API of getting proxy with timeout should be used in this
> case:
> > > > > > > > T serviceProxy(String name, Class<? super T> svcItf, boolean
> > > sticky,
> > > > > > long
> > > > > > > > timeout)
> > > > > > > >
> > > > > > > >
> > > > > > > > вт, 24 дек. 2019 г. в 12:11, Alexey Goncharuk <
> > > > > > alexey.goncha...@gmail.com
> > > > > > > > >:
> > > > > > > >
> > > > > > > > > Well, this is exactly the case. The service is deployed
> from
> > > node A,
> > > > > > the
> > > > > > > > > proxy is created on node B, and "service not found"
> exception
> > > gets
> > > > > > thrown
> > > > > > > > > to a user anyway. Perhaps, the retry happens too fast?
> > > > > > > > >
> > > > > > > > > Created a ticket [1].
> > > > > > > > >
> > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12490
> > > > > > > > >
> > > > > > > > > пн, 23 дек. 2019 г. в 22:08, Vyacheslav Daradur <
> > > daradu...@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Hi, Alexey
> > > > > > > > > >
> > > > > > > > > > Please attach a reproducer to the ticket.
> > > > > > > > > >
> > > > > > > > > > As far as I remember we have the following behaviour for
> the
> > > > > > proxies:
> > > > > > > > > >
> > > > > > > > > > Let's assume you have deployed service from node A, then:
> > > > > > > > > > * if you invoke service locally from node A - it is
> > > guaranteed to
> > > > > > > > > > service to be deployed and ready to work
> > > > > > > > > > * if you take a proxy from node A to remote node B right
> > > after
> > > > > > deploy
> > > > > > > > > > - there is might be a race between disco-spi (a message
> which
> > > > > > releases
> > > > > > > > > > deployed service)  and comm-spi (remote call works via
> > > Compute over
> > > > > > > > > > comm-spi), but it shouldn't affect end-users because the
> > > failed
> > > > > > > > > > request will be retried in this case
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 23, 2019 at 6:55 PM Alexey Goncharuk
> > > > > > > > > > <alexey.goncha...@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Nikolay,
> > > > > > > > > > >
> > > > > > > > > > > Yes, I've rechecked, the new service processor is being
> > > used.
> > > > > > I'll
> > > > > > > > > file a
> > > > > > > > > > > bug shortly.
> > > > > > > > > > >
> > > > > > > > > > > пн, 23 дек. 2019 г. в 17:33, Николай Ижиков <
> > > nizhi...@apache.org
> > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Alexey, are you sure, you are testing new service
> > > framework?
> > > > > > > > > > > >
> > > > > > > > > > > > Is yes - you definitely should file a bug.
> > > > > > > > > > > >
> > > > > > > > > > > > > 23 дек. 2019 г., в 17:02, Alexey Goncharuk <
> > > > > > > > > > alexey.goncha...@gmail.com>
> > > > > > > > > > > > написал(а):
> > > > > > > > > > > > >
> > > > > > > > > > > > > Igniters,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have a question based on one of my recent tests
> > > debugging.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The test is related to Ignite services. I noticed
> that
> > > > > > sometimes
> > > > > > > > a
> > > > > > > > > > proxy
> > > > > > > > > > > > > invocation of a newly deployed service fails
> because
> > > the
> > > > > > service
> > > > > > > > > > cannot
> > > > > > > > > > > > be
> > > > > > > > > > > > > found. I managed to reduce the test to a simple
> "start
> > > two
> > > > > > nodes,
> > > > > > > > > > deploy
> > > > > > > > > > > > a
> > > > > > > > > > > > > service, create a proxy, invoke the proxy"
> scenario.
> > > The
> > > > > > proxy
> > > > > > > > > > invocation
> > > > > > > > > > > > > fails in about ~80% of runs.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As far as I remember, the new discovery-based
> service
> > > > > > deployment
> > > > > > > > > was
> > > > > > > > > > > > > supposed to be synchronous, so not only non-proxy
> > > service
> > > > > > > > instances
> > > > > > > > > > > > should
> > > > > > > > > > > > > work, but the proxies as well. Was my understanding
> > > correct?
> > > > > > > > > Should I
> > > > > > > > > > > > file
> > > > > > > > > > > > > a bug for the observed behavior?
> > > > > > > > > > > > >
> > > > > > > > > > > > > --AG
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav D.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav D.
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav D.
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Discovery-based services deployment guarantees question

Reply via email to