RE: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Martin, Nick J [US] (IS) Mon, 21 Oct 2019 08:57:43 -0700

So I think what you’re saying is if I use a DFS for web.upload.dir, my clients 
can send all their requests to any Job Manager instance and not worry or care 
which one is the leader. That definitely is an improvement, thanks.

From: Till Rohrmann [mailto:trohrm...@apache.org]
Sent: Friday, October 18, 2019 6:42 AM
To: Martin, Nick J [US] (IS) <nick.mar...@ngc.com>
Cc: Ravi Bhushan Ratnakar <ravibhushanratna...@gmail.com>; user 
<user@flink.apache.org>
Subject: Re: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Hi Martin,

Flink's web UI based job submission is not well suited to be run behind a load 
balancer at the moment. The problem is that the web based job submission is 
actually a two phase operation: Uploading the jars and then starting the job. 
Since Flink's RestServer stores the uploaded files locally, it is required that 
the web submission is executed on the same RestServer to which you also 
uploaded the files before. Note, however, that the cli client job submission is 
not affected by this since the job graph upload and submission is one request.

A workaround to make the uploads accessible to all RestServers is to configure 
a DFS for the `web.upload.dir` as Ravi suggested or to use Flink's CLI to 
submit jobs instead.

A quick note about the old behaviour with the redirects. The redirects actually 
defied the purpose of load balancers because all requests were redirected to a 
single RestServer instance. Hence, running it with or w/o load balancer should 
not have made a big difference.

Cheers,
Till

On Wed, Oct 16, 2019 at 5:58 PM Martin, Nick J [US] (IS) 
<nick.mar...@ngc.com<mailto:nick.mar...@ngc.com>> wrote:
Yeah, I’ll do that if I have to. I’m hoping there’s a ‘right’ way to do it 
that’s easier. If I have to implement the zookeeper lookups in my load balancer 
myself, that feels like a definite step backwards from the pre-1.5 days when 
the cluster would give 307 redirects to the current leader

From: Ravi Bhushan Ratnakar 
[mailto:ravibhushanratna...@gmail.com<mailto:ravibhushanratna...@gmail.com>]
Sent: Tuesday, October 15, 2019 10:35 PM
To: Martin, Nick J [US] (IS) <nick.mar...@ngc.com<mailto:nick.mar...@ngc.com>>
Cc: user <user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Hi,

i was also experiencing with the similar behavior. I adopted following approach

  *    used a distributed file system(in my case aws efs) and set the attribute 
"web.upload.dir", this way both the job manager have same location.
  *   on the load balancer side(aws elb), i used "readiness probe" based on 
zookeeper entry for active jobmanager address, this way elb always point to the 
active job manager and if the active jobmanager changes then it automatically 
point to the new active jobmanager and as both are using the same location by 
configuring distributed file system so new active job is able to find the same 
jar.

Regards,
Ravi

On Wed, Oct 16, 2019 at 1:15 AM Martin, Nick J [US] (IS) 
<nick.mar...@ngc.com<mailto:nick.mar...@ngc.com>> wrote:
I’m seeing that when I upload a jar through the rest API, it looks like only 
the Jobmanager that received the upload request is aware of the newly uploaded 
jar. That worked fine for me in older versions where all clients were 
redirected to connect to the leader, but now that each Jobmanager accepts 
requests, if I send a jar upload request, it could end up on any one (and only 
one) of the Jobmanagers, not necessarily the leader. Further, each Jobmanager 
responds to a GET request on the /jars endpoint with its own local list of 
jars. If I try and use one of the Jar IDs from that request, my next request 
may not go to the same Jobmanager (requests are going through Docker and being 
load-balanced), and so the Jar ID isn’t found on the new Jobmanager handling 
that request.

RE: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Reply via email to