Re: Streaming job failure due to loss of Taskmanagers

2016-03-21 Thread Ufuk Celebi
Hey Ravinder, can you please share the JobManager logs as well? The logs say that the TaskManager disconnects from the JobManager, because that one is not reachable anymore. At this point, the running shuffles are cancelled and you see the follow up RemoteTransportExceptions. – Ufuk On Mon, Ma

Re: Unable to run the batch examples after running stream examples

2016-03-21 Thread subash basnet
Hello Chesnay Schepler, I am running the latest flink examples from here in eclipse. I had used the WikipediaAnalysis pom properties/dependency in batch pom as shown below: UTF-8 1.0.0 org.apache.flink flink-java *1.0.0* org.ap

Re: [DISCUSS] Improving Trigger/Window API and Semantics

2016-03-21 Thread Aljoscha Krettek
Hi, my previous message might be a bit hard to parse for people that are not very deep into the Trigger implementation. So I’ll try to give a bit more explanation right in the mail. The basic idea is that we observed some basic problems that keep coming up for people on the mailing lists and I

[DISCUSS] Improving Trigger/Window API and Semantics

2016-03-21 Thread Aljoscha Krettek
Hi, I’m also sending this to @user because the Trigger API concerns users directly. There are some things in the Trigger API that I think require some improvements. The issues are trigger testability, fire semantics and composite triggers and lateness. I started a document to keep track of thing

Streaming job failure due to loss of Taskmanagers

2016-03-21 Thread Ravinder Kaur
Hello All, I'm running the WordCount example streaming job and it fails because of loss of Taskmanagers. When gone through the logs of the taskmanager it has the following messages 15:14:26,592 INFO org.apache.flink.streaming.runtime.tasks.StreamTask - State backend is set to heap memory

Re: Access to S3 from YARN on EC2

2016-03-21 Thread Ashutosh Kumar
Adding required jar files in flink/lib resolves the issue. On Mar 21, 2016 12:57 PM, "Balaji Rajagopalan" < balaji.rajagopa...@olacabs.com> wrote: > This kind of class not found exception is a little bit misleading, it is > not the class is not found is the real problem rather than the combination

Re: Unable to run the batch examples after running stream examples

2016-03-21 Thread Chesnay Schepler
where did you run the batch/streaming jobs? did both job use the same flink version? On 21.03.2016 17:06, subash basnet wrote: Hello all, The scenario is I am working on Kafka Read/Write examples. It works fine, but now when I try to run the batch examples such as PiEstimation or any other,

Re: Not enough free slots to run the job

2016-03-21 Thread Ovidiu-Cristian MARCU
Thanks, very clear :)! best, Ovidiu > On 21 Mar 2016, at 16:31, Robert Metzger wrote: > > Hi, > > lets say you have 10 TaskManagers with 2 slots each. In total you have 20 > slots available. > Starting a job with parallelism=18 allows you to restart immediately if one > TaskManager fails. > N

Unable to run the batch examples after running stream examples

2016-03-21 Thread subash basnet
Hello all, The scenario is I am working on Kafka Read/Write examples. It works fine, but now when I try to run the batch examples such as PiEstimation or any other, I get the following error: Exception in thread "main" *java.lang.NoSuchMethodError: org.apache.flink.api.common.Plan.getRestartStrat

Re: Not enough free slots to run the job

2016-03-21 Thread Robert Metzger
Hi, lets say you have 10 TaskManagers with 2 slots each. In total you have 20 slots available. Starting a job with parallelism=18 allows you to restart immediately if one TaskManager fails. Now, regarding your questions: Q1: yes, using fewer slots than available reduces the likelihood of running i

Re: define no. of nodes via source code

2016-03-21 Thread Till Rohrmann
Hi Subash, you can use ExecutionEnvironment env = ...; env.setParallelism(dop) for that. Cheers, Till ​ On Mon, Mar 21, 2016 at 3:42 PM, subash basnet wrote: > Hello all, > > Using the flink-webclient we have the options to define no. of parallelism > and the same no. i.e. taskmanager.numberO

define no. of nodes via source code

2016-03-21 Thread subash basnet
Hello all, Using the flink-webclient we have the options to define no. of parallelism and the same no. i.e. taskmanager.numberOfTaskSlots is given in flink-conf.yaml. But, where can I define this no. of parallel task when running the examples from IDE. Best Regards, Subash Basnet

Re: Not enough free slots to run the job

2016-03-21 Thread Ovidiu-Cristian MARCU
Hi Robert, I am not sure I understand so please confirm if I understand correctly your suggestions: - to use less slots than available slots capacity to avoid issues like when a TaskManager is not giving its slots because of some problems registering the TM; (This means I will lose some performa

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-21 Thread Sourigna Phetsarath
Fabian, I'll try extending InputFormat as you suggested and will create a JIRA issue as well. I also have an AvroGenericRecordInput format class that I would like to contribute once I have time to clean it up and get it into your code base. -Gna On Mon, Mar 21, 2016 at 6:35 AM, Fabian Hueske w

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-21 Thread Sourigna Phetsarath
Thanks Ufuk, I'm already using the recursive traversal feature. On Mon, Mar 21, 2016 at 8:39 AM, Ufuk Celebi wrote: > If you want all sub directories under data/2016/01, then this could help: > https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#recursive-traversal-of-th

Re: Output from Beam (on Flink) to Kafka

2016-03-21 Thread Maximilian Michels
Sorry. Wrong mailing list... On Mon, Mar 21, 2016 at 11:47 AM, Maximilian Michels wrote: > FYI: The Runner registration has been fixed. The Flink runner > explicitly registers as of [1]. Also, the SDK tries to look up the > PipelineRunner class in case it has not been registered [2]. > > [1] http

Oracle 11g number serialization: classcast problem

2016-03-21 Thread Stefano Bortoli
Hi squirrels, I working on a flink job connecting to a Oracle DB. I started from the JDBC example for Derby, and used the TupleTypeInfo to configure the fields of the tuple as it is read. The record of the example has 2 INT, 1 FLOAT and 2 VARCHAR. Apparently, using Oracle, all the numbers are rea

Re: Not enough free slots to run the job

2016-03-21 Thread Robert Metzger
Hi Ovidiu, right now the scheduler in Flink will not use more slots than requested. To avoid issues on recovery, we usually recommend users to have some spare slots (run job with p=15 on a cluster with slots=20). I agree that it would make sense to add a flag which allows a job to grab more slots

RE: Read a given list of HDFS folder

2016-03-21 Thread Gwenhael Pasquiers
Hi and thanks, i'm not sure that recurive traversal is what I need. Let's say I have the following dir tree : /data/2016_03_21_13/.gz /data/2016_03_21_12/.gz /data/2016_03_21_11/.gz /data/2016_03_21_10/.gz /data/2016_03_21_09/.gz /data/2016_03_21_08/.gz /data/2016_03_21_07/.gz I want my DataSet

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-21 Thread Ufuk Celebi
If you want all sub directories under data/2016/01, then this could help: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#recursive-traversal-of-the-input-path-directory On Mon, Mar 21, 2016 at 11:35 AM, Fabian Hueske wrote: > Hi, > > no, this is currently not suppor

Re: Read a given list of HDFS folder

2016-03-21 Thread Ufuk Celebi
Hey Gwenhaël, see here for recursive traversal of input paths: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/index.html#recursive-traversal-of-the-input-path-directory Regarding the phases: the best way to exchange data between batch jobs is via files. You can then execute two

modify solution set within the delta iteration

2016-03-21 Thread Riccardo Diomedi
I try to explain my situation I’m doing a delta iteration in which my solution set in something like: DataSet> Depending on a different dataset that i retrieve during the iteration, i need to update the hashMap of the solution set in order to keep it up to date. But i noticed that i cannot modi

Read a given list of HDFS folder

2016-03-21 Thread Gwenhael Pasquiers
Hello, Sorry if this has been already asked or is already in the docs, I did not find the answer : Is there a way to read a given set of folders in Flink batch ? Let's say we have one folder per hour of data, written by flume, and we'd like to read only the N last hours (or any other pattern o

Re: Output from Beam (on Flink) to Kafka

2016-03-21 Thread Maximilian Michels
FYI: The Runner registration has been fixed. The Flink runner explicitly registers as of [1]. Also, the SDK tries to look up the PipelineRunner class in case it has not been registered [2]. [1] https://github.com/apache/incubator-beam/pull/40 [2] https://github.com/apache/incubator-beam/pull/61 O

Re: Flink 1.0.0 reading files from multiple directory with wildcards

2016-03-21 Thread Fabian Hueske
Hi, no, this is currently not supported. However, I agree this would be a very valuable addition to the FileInputFormat. Would you mind opening a JIRA issue with your suggestions? Until this is added to Flink, it can be implemented as a custom InputFormat based on FileInputFormat by overriding th

Re: Setting taskmanager.network.numberOfBuffers does not seem to have an affect - Flink 0.10.2

2016-03-21 Thread Fabian Hueske
Hi, right now there is no way to sequentially execute the input tasks. Flink's FileInputFormat does also not support multiple paths out of the box. However, it is certainly possible to extend the FileInputFormat such that this is possible. You would need to override / extend the createInputSplits(

Re: TumblingProcessingTimeWindow and ContinuousProcessingTimeTrigger

2016-03-21 Thread Aljoscha Krettek
Hi, I’m afraid you discovered a bug in the ContinuousProcessingTimeTrigger. The timer is not correctly set. You can try it with this fixed version, that I will also update in the Flink code: https://gist.github.com/aljoscha/cbdbd62932b6dd2d1930 One more thing, the ContinuousProcessingTimeTrigge

Re: Access to S3 from YARN on EC2

2016-03-21 Thread Balaji Rajagopalan
This kind of class not found exception is a little bit misleading, it is not the class is not found is the real problem rather than the combination of the different libraries that are using there is a version compatibility mismatch, so you will have to go back and check if there is any version mism

Re: Java I/O exception

2016-03-21 Thread Chesnay Schepler
I quickly went through the code: Flink gathers some data about the hardware available, like numberOfCPUCores / available physical memory. Now the physical memory part is apparently only used for logging / metrics display in the dashboard, so its not a problem that you got it, it is simply not r