Re: Timeout while requesting InputSplit

2015-01-30 Thread Till Rohrmann
I've updated the corresponding jira ticket. On Fri, Jan 30, 2015 at 5:46 PM, Till Rohrmann wrote: > I looked into the problem and the problem is a deserialization issue on > the TaskManager side. Somehow the system is not capable to send InputSplits > around whose classes are contained in the us

Re: Timeout while requesting InputSplit

2015-01-30 Thread Till Rohrmann
I looked into the problem and the problem is a deserialization issue on the TaskManager side. Somehow the system is not capable to send InputSplits around whose classes are contained in the user code jars. A similar issue was already observed by Fabian in FLINK-1438. I used his test program and the

Re: Timeout while requesting InputSplit

2015-01-29 Thread Bruecke, Christoph
@Till: Yes I’m running the job on cloud-11 or better to say I’m using the yarn cluster and the flink-yarn package. I’m using flink-0.9-SNAPSHOT from the following commit [1] together with Timos patch [2]. I’ll send you a separate email with instructions where you can find the jars on cloud-11.

Re: Timeout while requesting InputSplit

2015-01-29 Thread Till Rohrmann
Yes actually the timeouts should not really matter. However, an exception in the InputSplitAssigner should happen in the actor thread and thus cause the actor to stop. This should be logged by the supervisor. I just checked and the method InputSplitAssigner.getNextInputSplit is not supposed to thr

Re: Timeout while requesting InputSplit

2015-01-28 Thread Stephan Ewen
@Till: The default timeouts are high enough that such a timeout should actually not occur, right? Increasing the timeouts cannot really be the issue. Might it be something different? What happens if there is an error in the code that produces the input split? Is that properly handled, or is the re

Re: Timeout while requesting InputSplit

2015-01-28 Thread Till Rohrmann
I think that the machines have lost connection. That is most likely connected to the heartbeat interval of the watch or transport failure detector. The transport failure detector should actually be set to a heartbeat interval of 1000 s and consequently it should not cause any problems. Which versi

Re: Timeout while requesting InputSplit

2015-01-28 Thread Stephan Ewen
I see the following line: 11:14:32,603 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp:// fl...@cloud-26.dima.tu-berlin.de:51449] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. Does that mean that the machines have lost co

Re: Timeout while requesting InputSplit

2015-01-28 Thread Bruecke, Christoph
I might add that the error only occurs when running with the RemoteExecutor regardless of the number of TM. Starting the job in IntelliJ with the LocalExecutor with dop 1 works just fine. Best, Christoph On 28 Jan 2015, at 12:17, Bruecke, Christoph wrote: > Hi Robert, > > thanks for the qui

Re: Timeout while requesting InputSplit

2015-01-28 Thread Bruecke, Christoph
Hi Robert, thanks for the quick response. Here is the jobmanager-main.log: PS: I’m subscribed now. 11:09:16,144 INFO org.apache.flink.yarn.ApplicationMaster$ - YARN daemon runs as hadoop setting user to execute Flink ApplicationMaster/JobManager to hadoop 11:09:16,199 INF

Re: Timeout while requesting InputSplit

2015-01-28 Thread Robert Metzger
Hi, it seems that you are not subscribed to our mailing list, so I had to manually accept your mail. Would be good if you could subscribe. Can you send us also the log output of the JobManager? If your YARN cluster has log aggregation activated, you can retrieve the logs of a stopped YARN session

Timeout while requesting InputSplit

2015-01-28 Thread Bruecke, Christoph
Hi, I have written a job that reads a SequenceFile from HDFS using the Hadoop-Compatibility add-on. Doing so results in a TimeoutException. I’m using flink-0.9-SNAPSHOT with PR 342 ( https://github.com/apache/flink/pull/342 ). Furthermore I’m running flink on yarn with two TM using flink-yarn-