Re: "Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-17 Thread Geoffrey Mon
’m asking just to make sure of the limitations Robert mentioned. > > Cheers, > Gordon > > > On February 17, 2017 at 3:37:27 AM, Geoffrey Mon (geof...@gmail.com) > wrote: > > Hi Robert, > > Thanks for your reply. I've done some further testing and (hopefully) >

Re: "Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-16 Thread Geoffrey Mon
function submitting multiple jobs. If you have a yarn session + regular "flink run" it should work. On Mon, Feb 13, 2017 at 5:37 PM, Geoffrey Mon wrote: Just to clarify, is Flink designed to allow submitting multiple jobs from a single program class when using a YARN cluster? I wasn

Re: "Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-13 Thread Geoffrey Mon
Just to clarify, is Flink designed to allow submitting multiple jobs from a single program class when using a YARN cluster? I wasn't sure based on the documentation. Cheers, Geoffrey On Thu, Feb 9, 2017 at 6:34 PM Geoffrey Mon wrote: > Hello all, > > I'm running a Fl

"Job submission to the JobManager timed out" on EMR YARN cluster with multiple jobs

2017-02-09 Thread Geoffrey Mon
.scala:167) I have tried increasing akka.client.timeout to large values such as 1200s (20 minutes), but even then Flink does not acknowledge or execute any other jobs and there is the same timeout error. Does anyone know how I can get Flink to execute all of the jobs properly? Cheers, Geoffrey Mon

Re: Many operations cause StackOverflowError with AWS EMR YARN cluster

2017-01-26 Thread Geoffrey Mon
eration of the for-loop as a separate job and save the result in a file. Note that right now the Pyhton API can't execute multiple jobs from the same file; we would need some modifications in the PythonPlanBinder to allow this. Regards, Chesnay On 20.11.2016 23:54, Geoffrey Mon wrote: Hell

Re: Many operations cause StackOverflowError with AWS EMR YARN cluster

2016-11-20 Thread Geoffrey Mon
In addition, I don't think the problem is YARN-specific anymore because I have been able to reproduce it on a local machine. Cheers, Geoffrey On Mon, Nov 14, 2016 at 11:38 AM Geoffrey Mon wrote: > Hi Ufuk, > > The master instance of the cluster was also a m3.xlarge instance wit

Re: Many operations cause StackOverflowError with AWS EMR YARN cluster

2016-11-14 Thread Geoffrey Mon
. for() { ... > bulk iteration Flink program }? > > – Ufuk > > On 14 November 2016 at 08:02:26, Geoffrey Mon (geof...@gmail.com) wrote: > > Hello all, > > > > I have a pretty complicated plan file using the Flink Python API running > on > > a AWS EMR cluste

Many operations cause StackOverflowError with AWS EMR YARN cluster

2016-11-13 Thread Geoffrey Mon
Hello all, I have a pretty complicated plan file using the Flink Python API running on a AWS EMR cluster of m3.xlarge instances using YARN. The plan is for a dictionary learning algorithm and has to run a sequence of operations many times; each sequence involves bulk iterations with join operation

Re: Issue with running Flink Python jobs on cluster

2016-07-18 Thread Geoffrey Mon
cluster you *must* have a filesystem > that is accessible by all workers (like HDFS) to which the files can be > copied. From there they can be distributed to the nodes via the DC. > > > On 17.07.2016 17:33, Geoffrey Mon wrote: > > I haven't yet figured out how to write a Jav

Re: Issue with running Flink Python jobs on cluster

2016-07-17 Thread Geoffrey Mon
lying on using a Flink cluster to run a Python job for some scientific data that needs to be completed soon. Thank for your assistance, Geoffrey On Sun, Jul 17, 2016 at 4:04 AM Chesnay Schepler wrote: > Please also post the job you're trying to run. > > > On 17.07.2016 08:43, Ge

Re: Issue with running Flink Python jobs on cluster

2016-07-16 Thread Geoffrey Mon
nt in Flink. Cheers, Geoffrey On Fri, Jul 15, 2016 at 11:28 AM Geoffrey Mon wrote: > I wrote a simple Java plan that reads a file in the distributed cache and > uses the first line from that file in a map operation. Sure enough, it > works locally, but fails when the job is sent to a ta

Re: Issue with running Flink Python jobs on cluster

2016-07-15 Thread Geoffrey Mon
cache. Cheers, Geoffrey On Fri, Jul 15, 2016 at 4:15 AM Chesnay Schepler wrote: > Could you write a java job that uses the Distributed cache to distribute > files? > > If this fails then the DC is faulty, if it doesn't something in the Python > API is wrong. > > > On 15

Re: Issue with running Flink Python jobs on cluster

2016-07-14 Thread Geoffrey Mon
n Wed, Jul 13, 2016 at 1:15 PM Geoffrey Mon wrote: > Hello, > > Here is the TaskManager log on pastebin: > http://pastebin.com/XAJ56gn4 > > I will look into whether the files were created. > > By the way, the cluster is made with virtual machines running on BlueData > EPI

Re: Issue with running Flink Python jobs on cluster

2016-07-13 Thread Geoffrey Mon
e TaskManager of which the exception occurs could be of > interest too; could you send them to me? > > Regards, > Chesnay > > > On 13.07.2016 04:11, Geoffrey Mon wrote: > > Hello all, > > I've set up Flink on a very small cluster of one master node and five >

Issue with running Flink Python jobs on cluster

2016-07-12 Thread Geoffrey Mon
Hello all, I've set up Flink on a very small cluster of one master node and five worker nodes, following the instructions in the documentation ( https://ci.apache.org/projects/flink/flink-docs-master/setup/cluster_setup.html). I can run the included examples like WordCount and PageRank across the