Yeah just chiming in this conversation as well. We heavily use multiple job
graphs to get isolation around retry logic and resource allocation across the
job graphs. Putting all these parallel flows into a single graph would mean
sharing of TaskManagers across what was meant to be truly independ
4 INFO org.apache.flink.yarn.YarnResourceManager
- Received new container:
container_e22_1571837093169_78279_01_000947 - Remaining pending container
requests: 0
2019-10-25 09:55:51,514 INFO org.apache.flink.yarn.YarnResourceManager
-
From: Chan, Regina [Engineering]
Sent: Wednesday, Octo
AM
To: Yang Wang
Cc: Chan, Regina [Engineering] ;
user@flink.apache.org
Subject: Re: The RMClient's and YarnResourceManagers internal state about the
number of pending container requests has diverged
Hi Regina,
When using the FLIP-6 mode, you can control how long it takes for an
There’s no collect() explicitly from me. It has a cogroup operator before
writing to DataSink.
From: Fabian Hueske [mailto:fhue...@gmail.com]
Sent: Monday, May 07, 2018 6:31 AM
To: Chan, Regina [Tech]
Cc: user@flink.apache.org; Newport, Billy [Tech]
Subject: Re: Lost JobManager
Hi Regina,
I
Any updates on this one? I'm seeing similar issues with 1.3.3 and the batch
api.
Main difference is that I have even more operators ~850, mostly maps and
filters with one cogroup. I don't really want to set a akka.client.timeout for
anything more than 10 minutes seeing that it still fails with
ser/delp/.flink/application_1510733430616_2098853/log4j.properties
From: Chan, Regina [Tech]
Sent: Tuesday, December 12, 2017 1:56 AM
To: 'user@flink.apache.org'
Subject: ProgramInvocationException: Could not upload the jar files to the job
manager / No space left on device
Hi,
I'm currently su
Hi,
I'm currently submitting 50 separate jobs to a 50TM, 1 slot set up. Each job
has 1 parallelism. There's plenty of space left in my cluster and on that node.
It's not clear to me what's happening. Any pointers?
On the client side, when I try to execute, I see the following:
org.apache.flink.
Hi,
As I moved from Flink 1.2.0 to 1.3.2 I noticed that the TaskManager may have
all tasks with FINISHED but then take about 2-3 minutes before the Job
execution switches to FINISHED. What is it doing that's taking this long? This
was a parallelism = 1 case...
Regina Chan
Goldman Sachs - Enter
Hi,
I was reading that I should avoid using dynamic classloading and so copy the
job's jar into the /lib directory (RE: below)
1. How can I confirm that the jar was copied over? I only see the following
below:
2017-11-20 15:36:52,724 INFO org.apache.flink.yarn.Utils
To: Chan, Regina [Tech]
Cc: user@flink.apache.org
Subject: Re: Job Manager Configuration
I have an IO-dominated batch job with 471 distinct tasks (3786 tasks with
parallelism) running on 8 nodes with 12 GiB of memory and 4 CPUs each. I
haven’t had any problems adding additional tasks except for 1
JobManager doing that’s keeping it busy? It’s the same code across the
TaskManagers.
I’ll get you the logs shortly.
From: Till Rohrmann [mailto:trohrm...@apache.org]
Sent: Wednesday, November 08, 2017 10:17 AM
To: Chan, Regina [Tech]
Cc: Chesnay Schepler; user@flink.apache.org
Subject: Re: Job
ependently.
On 31.10.2017 16:25, Chan, Regina wrote:
Asking an additional question, what is the largest plan that the JobManager can
handle? Is there a limit? My flows don't need to run in parallel and can run
independently. I wanted them to run in one single job because it's part of one
Asking an additional question, what is the largest plan that the JobManager can
handle? Is there a limit? My flows don't need to run in parallel and can run
independently. I wanted them to run in one single job because it's part of one
logical commit on my side.
Thanks,
Regina
Flink Users,
I have about 300 parallel flows in one job each with 2 inputs, 3 operators, and
1 sink which makes for a large job. I keep getting the below timeout exception
but I've already set it to a 30 minute time out with a 6GB heap on the
JobManager? Is there a heuristic to better configure
Hi folks,
Is Flink is able to do impersonation using UserGroupInformation? How do we make
all the tasks run with this in a way that we wouldn't have to do it per task?
UserGroupInformation ugi = UserGroupInformation.createProxyUser( proxyUser,
UserGroupInformation.getLoginUser());
PrivilegedEx
Hi,
Was trying to understand why it takes about 9 minutes between the last try to
start a container and when it finally gets the sigterm to kill the
YarnApplicationMasterRunner.
Client:
Calc Engine: 2017-08-28 12:39:23,596 INFO
org.apache.flink.yarn.YarnClusterClient
16 matches
Mail list logo