I've updated the corresponding jira ticket.
On Fri, Jan 30, 2015 at 5:46 PM, Till Rohrmann wrote:
> I looked into the problem and the problem is a deserialization issue on
> the TaskManager side. Somehow the system is not capable to send InputSplits
> around whose classes are contained in the us
I looked into the problem and the problem is a deserialization issue on the
TaskManager side. Somehow the system is not capable to send InputSplits
around whose classes are contained in the user code jars. A similar issue
was already observed by Fabian in FLINK-1438. I used his test program and
the
@Till: Yes I’m running the job on cloud-11 or better to say I’m using the yarn
cluster and the flink-yarn package. I’m using flink-0.9-SNAPSHOT from the
following commit [1] together with Timos patch [2]. I’ll send you a separate
email with instructions where you can find the jars on cloud-11.
Yes actually the timeouts should not really matter. However, an exception
in the InputSplitAssigner should happen in the actor thread and thus cause
the actor to stop. This should be logged by the supervisor.
I just checked and the method InputSplitAssigner.getNextInputSplit is not
supposed to thr
@Till: The default timeouts are high enough that such a timeout should
actually not occur, right? Increasing the timeouts cannot really be the
issue.
Might it be something different? What happens if there is an error in the
code that produces the input split? Is that properly handled, or is the
re
I think that the machines have lost connection. That is most likely
connected to the heartbeat interval of the watch or transport failure
detector. The transport failure detector should actually be set to a
heartbeat interval of 1000 s and consequently it should not cause any
problems.
Which versi
I see the following line:
11:14:32,603 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://
fl...@cloud-26.dima.tu-berlin.de:51449] has failed, address is now gated
for [5000] ms. Reason is: [Disassociated].
Does that mean that the machines have lost co
I might add that the error only occurs when running with the RemoteExecutor
regardless of the number of TM. Starting the job in IntelliJ with the
LocalExecutor with dop 1 works just fine.
Best,
Christoph
On 28 Jan 2015, at 12:17, Bruecke, Christoph
wrote:
> Hi Robert,
>
> thanks for the qui
Hi Robert,
thanks for the quick response. Here is the jobmanager-main.log:
PS: I’m subscribed now.
11:09:16,144 INFO org.apache.flink.yarn.ApplicationMaster$
- YARN daemon runs as hadoop setting user to execute Flink
ApplicationMaster/JobManager to hadoop
11:09:16,199 INF
Hi,
it seems that you are not subscribed to our mailing list, so I had to
manually accept your mail. Would be good if you could subscribe.
Can you send us also the log output of the JobManager?
If your YARN cluster has log aggregation activated, you can retrieve the
logs of a stopped YARN session
Hi,
I have written a job that reads a SequenceFile from HDFS using the
Hadoop-Compatibility add-on. Doing so results in a TimeoutException. I’m using
flink-0.9-SNAPSHOT with PR 342 ( https://github.com/apache/flink/pull/342 ).
Furthermore I’m running flink on yarn with two TM using
flink-yarn-
11 matches
Mail list logo