Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Robert Metzger
So is the TaskManager JVM still running after the JM detected that the TM has gone? If not, can you check the kernel log (dmesg) to see whether Linux OOM killer stopped the process? (if its a kill, the JVM might not be able to log anything anymore) On Thu, Oct 29, 2015 at 9:27 PM, Stephan Ewen w

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Stephan Ewen
Thanks for sharing the logs, Greg! Okay, so the TaskManager does not crash, but the Remote Failure Detector of Akka marks the connection between JobManager and TaskManager as broken. The TaskManager is not doing much GC, so it is not a long JVM freeze that causes hearbeats to time out... I am wo

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Aljoscha Krettek
Could it be a problem that there are two TaskManagers running per machine? > On 29 Oct 2015, at 19:04, Greg Hogan wrote: > > I have memory logging enabled. Tail of TaskManager log on 10.0.88.140: > > 17:35:26,415 INFO > org.apache.flink.runtime.taskmanager.TaskManager - Garbage > c

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Greg Hogan
I have memory logging enabled. Tail of TaskManager log on 10.0.88.140: 17:35:26,415 INFO org.apache.flink.runtime.taskmanager.TaskManager - Garbage collector stats: [PS Scavenge, GC TIME (ms): 341, GC COUNT: 3], [PS MarkSweep, GC TIME (ms): 974, GC COUNT: 1] 17:35:27,415 INFO org.apac

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Till Rohrmann
What does the log of the failed TaskManager 10.0.88.140 say? On Thu, Oct 29, 2015 at 6:44 PM, Greg Hogan wrote: > I removed the use of numactl but left in starting two TaskManagers and am > still seeing TaskManagers crash. > From the JobManager log: > > 17:36:06,412 WARN > akka.remote.ReliableDe

Fwd: neo4j - Flink connector

2015-10-29 Thread Vasiliki Kalavri
Forwarding these here to keep dev@ in the loop :) -- Forwarded message -- From: Martin Junghanns Date: 29 October 2015 at 18:37 Subject: Re: neo4j - Flink connector To: Martin Liesenberg , Vasia Kalavri < vasilikikala...@gmail.com> Cc: Alexander Keller , Martin Neumann My idea

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Greg Hogan
I removed the use of numactl but left in starting two TaskManagers and am still seeing TaskManagers crash. >From the JobManager log: 17:36:06,412 WARN akka.remote.ReliableDeliverySupervisor- Association with remote system [akka.tcp://flink@10.0.88.140:45742] has failed, add

[jira] [Created] (FLINK-2943) Confusing Bytes/Records "read" and "write" labels in WebUI job view

2015-10-29 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2943: Summary: Confusing Bytes/Records "read" and "write" labels in WebUI job view Key: FLINK-2943 URL: https://issues.apache.org/jira/browse/FLINK-2943 Project: Flink

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Stephan Ewen
Hi Greg! Interesting... When you say the TaskManagers are dropping, are the TaskManager processes crashing, or are they loosing connection to the JobManager? Greetings, Stephan On Thu, Oct 29, 2015 at 9:56 AM, Greg Hogan wrote: > I recently discovered that AWS uses NUMA for its largest nodes.

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Greg Hogan
I recently discovered that AWS uses NUMA for its largest nodes. An example c4.8xlarge: $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 18 19 20 21 22 23 24 25 26 node 0 size: 29813 MB node 0 free: 24537 MB node 1 cpus: 9 10 11 12 13 14 15 16 17 27 28 29 30 31 32 33 34

[jira] [Created] (FLINK-2942) Dangling operators in web UI's program visualization

2015-10-29 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2942: Summary: Dangling operators in web UI's program visualization Key: FLINK-2942 URL: https://issues.apache.org/jira/browse/FLINK-2942 Project: Flink Issue Type

Re: New JobManager web frontend

2015-10-29 Thread Maximilian Michels
Hi Matthias, There is currently no cancel button in the web frontend. Just filed this ticket today: https://issues.apache.org/jira/browse/FLINK-2939 Cheers, Max On Thu, Oct 29, 2015 at 4:49 PM, Matthias J. Sax wrote: > Hi, > > I was just playing with the new JobManager web frontend and missing

New JobManager web frontend

2015-10-29 Thread Matthias J. Sax
Hi, I was just playing with the new JobManager web frontend and missing a button to cancel a running job. It there no such button, or is it hidden somewhere? -Matthias signature.asc Description: OpenPGP digital signature

neo4j - Flink connector

2015-10-29 Thread Vasiliki Kalavri
Hello everyone, Martin, Martin, Alex (cc'ed) and myself have started discussing about implementing a neo4j-Flink connector. I've opened a corresponding JIRA (FLINK-2941) containing an initial document [1], but we'd also like to share our ideas here to engage the community and get your feedback. W

Re: Caching information from a stream

2015-10-29 Thread Andra Lungu
Thanks Max ^^ On Wed, Oct 28, 2015 at 8:41 PM, Maximilian Michels wrote: > Oups, forgot the mapper :) > > static class StatefulMapper extends RichMapFunction Long>, Tuple2> { > >private OperatorState counter; > >@Override >public Tuple2 map(Tuple2 value) throws > Exception { >

Re: Diagnosing TaskManager disappearance

2015-10-29 Thread Maximilian Michels
Hi Greg, Thanks for reporting. You wrote you didn't see any output in the .out files of the task managers. What about the .log files of these instances? Where and when did you produce the thread dump you included? Thanks, Max On Thu, Oct 29, 2015 at 1:46 PM, Greg Hogan wrote: > I am testing a

Diagnosing TaskManager disappearance

2015-10-29 Thread Greg Hogan
I am testing again on a 64 node cluster (the JobManager is running fine having reduced some operator's parallelism and fixed the string conversion performance). I am seeing TaskManagers drop like flies every other job or so. I am not seeing any output in the .out log files corresponding to the cra

[jira] [Created] (FLINK-2941) Implement a neo4j - Flink/Gelly connector

2015-10-29 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-2941: Summary: Implement a neo4j - Flink/Gelly connector Key: FLINK-2941 URL: https://issues.apache.org/jira/browse/FLINK-2941 Project: Flink Issue Type: New Featu

Re: Scala 2.10/2.11 Maven dependencies

2015-10-29 Thread Maximilian Michels
Seems like we agree that we need artifacts for different versions of Scala on Maven. There also seems to be a preference for including the version in the artifact name. I've created an issue and marked it to be resolved for 1.0. For the 0.10 release, we will have binaries but no Maven artifacts. T

[jira] [Created] (FLINK-2940) Deploy multiple Scala versions for Maven artifacts

2015-10-29 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-2940: - Summary: Deploy multiple Scala versions for Maven artifacts Key: FLINK-2940 URL: https://issues.apache.org/jira/browse/FLINK-2940 Project: Flink Is

[jira] [Created] (FLINK-2939) Add button to cancel jobs in new web frontend

2015-10-29 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-2939: - Summary: Add button to cancel jobs in new web frontend Key: FLINK-2939 URL: https://issues.apache.org/jira/browse/FLINK-2939 Project: Flink Issue T

[jira] [Created] (FLINK-2938) Streaming docs not in sink with latest state changes

2015-10-29 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-2938: - Summary: Streaming docs not in sink with latest state changes Key: FLINK-2938 URL: https://issues.apache.org/jira/browse/FLINK-2938 Project: Flink

[jira] [Created] (FLINK-2937) Typo in Quickstart->Scala API->Alternative Build Tools: SBT

2015-10-29 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2937: -- Summary: Typo in Quickstart->Scala API->Alternative Build Tools: SBT Key: FLINK-2937 URL: https://issues.apache.org/jira/browse/FLINK-2937 Project: Flink