Re: Streaming job on YARN - ClassNotFoundException

2016-05-06 Thread Theofilos Kakantousis
Hi Robert, Thank you for the prompt reply. You're right, it was a left over from a previous build. With the fixed dependencies, I get the same error though. A have a question on job submission as well. I use the following code to submit the job: InetSocketAddress jobManagerAddress = cluster

Re: Streaming job on YARN - ClassNotFoundException

2016-05-06 Thread Robert Metzger
Hi Theo, you can't mix different Flink versions in your dependencies. Please use 1.0.2 for the flink_yarn client as well or 1.1-SNAPSHOT everywhere. On Fri, May 6, 2016 at 7:02 PM, Theofilos Kakantousis wrote: > Hi everyone, > Flink 1.0.2 > Hadoop 2.4.0 > > I am running Flink on Yarn by using F

How to measure Flink performance

2016-05-06 Thread prateekarora
Hi I am new in Apache Flink and using Flink 1.0.1 I have a streaming program that fetch data from kafka , perform some computation and send result to kafka again. I am want to compare results between Flink and Spark . I have below information from spark . do i can get similar information fro

Streaming job on YARN - ClassNotFoundException

2016-05-06 Thread Theofilos Kakantousis
Hi everyone, Flink 1.0.2 Hadoop 2.4.0 I am running Flink on Yarn by using FlinkYarnClient to launch a Flink cluster and Flink Client to submit a PackagedProgram. To keep it simple, for batch jobs I use the WordCount example and for streaming the IterateExample and IncrementalLearning ones wit

Re: s3 checkpointing issue

2016-05-06 Thread Ufuk Celebi
OK, thanks for reporting back. Thanks to Igor as well. I just updated the docs with a note about this. On Thu, May 5, 2016 at 3:16 AM, Chen Qin wrote: > Uruk & Igor, > > Thanks for helping out! Yup, it fixed my issue. > > Chen > > > > On Wed, May 4, 2016 at 12:57 PM, Igor Berman wrote: >> >> I

NoSuchMethodError when using the Flink Gelly library with Scala

2016-05-06 Thread Adrian Bartnik
Hi, I am trying to run the code examples from the Gelly documentation, in particular this code: import org.apache.flink.api.scala._ import org.apache.flink.graph.generator.GridGraph object SampleObject { def main(args: Array[String]) { val env = ExecutionEnvironment.getExecutionEnviron

Re: Discussion about a Flink DataSource repository

2016-05-06 Thread Fabian Hueske
Yes, you can transform the broadcast set when it is accessed with RuntimeContext.getBroadcastVariableWithInitializer() and a BroadcastVariableInitializer. 2016-05-06 14:07 GMT+02:00 Flavio Pompermaier : > That was more or less what I was thinking. The only thing I'm not sure is > the usage of the

Re: OutputFormat in streaming job

2016-05-06 Thread Andrea Sella
Hi Fabian, So I misunderstood the behaviour of configure(), thank you. Andrea 2016-05-06 14:17 GMT+02:00 Fabian Hueske : > Hi Andrea, > > actually, OutputFormat.configure() will also be invoked per task. So you > would also end up with 16 ActorSystems. > However, I think you can use synchronize

Re: OutputFormat in streaming job

2016-05-06 Thread Fabian Hueske
Hi Andrea, actually, OutputFormat.configure() will also be invoked per task. So you would also end up with 16 ActorSystems. However, I think you can use synchronized singleton object to start one ActorSystem per TM (each TM and all tasks run in a single JVM). So from the point of view of configur

Re: Where to put live model and business logic in Hadoop/Flink BigData system

2016-05-06 Thread Fabian Hueske
Hi Palle, this sounds indeed like a good use case for Flink. Depending on the complexity of the aggregated historical views, you can implement a Flink DataStream program which builds the views on the fly, i.e., you do not need to periodically trigger MR/Flink/Spark batch jobs to compute the views

Re: OutputFormat in streaming job

2016-05-06 Thread Andrea Sella
Hi Fabian, ATM I am not interesting to guarantee exactly-once processing, thank you for the clarification. As far as I know, it is not present a similar method as OutputFormat's configure for RichSinkFunction, correct? So I am not able to instantiate an ActorSystem per TM but I have to instantiat

Re: Discussion about a Flink DataSource repository

2016-05-06 Thread Flavio Pompermaier
That was more or less what I was thinking. The only thing I'm not sure is the usage of the broadcasted dataset, since I'd need to access tot the MetaData dataset by sourceId (so I'd need an Map. Probably I'd do: Map meta = ...;//preparing metadata lookUp table ... ds.map(MetaMapFunctionWrapper(new

Re: OutputFormat in streaming job

2016-05-06 Thread Fabian Hueske
Hi Andrea, you can use any OutputFormat to emit data from a DataStream using the writeUsingOutputFormat() method. However, this method does not guarantee exactly-once processing. In case of a failure, it might emit some records a second time. Hence the results will be written at least once. Hope

Re: Where to put live model and business logic in Hadoop/Flink BigData system

2016-05-06 Thread Deepak Sharma
I see the flow to be as below: LogStash->Log Stream->Flink ->Kafka->Live Model | Mongo/HBASE The Live Model will again be Flink streaming data sets from Kakfa. There you analyze the incoming stream for the certain value and

general design questions when using flink

2016-05-06 Thread Igor Berman
1. Suppose I have stream of different events(A,B,C). Each event will need it's own processing pipeline. what is recommended approach of splitting pipelines per each event? I can do some filter operator at the beginning. I can setup different jobs per each event. I can hold every such event in diffe

RE: Where to put live model and business logic in Hadoop/Flink BigData system

2016-05-06 Thread Lohith Samaga M
HI Palle, I am a beginner in Flink. However, I can say something about your other questions: 1. It is better to use Spark to create aggregate views. It is a lot faster than MR. You could use either batch or streaming mode in spark based on your needs. 2. If your a

Where to put live model and business logic in Hadoop/Flink BigData system

2016-05-06 Thread palle
Hi there. We are putting together some BigData components for handling a large amount of incoming data from different log files and perform some analysis on the data. All data being fed into the system will go into HDFS. We plan on using Logstash, Kafka and Flink for bringing data from the log

OutputFormat in streaming job

2016-05-06 Thread Andrea Sella
Hi, I created a custom OutputFormat to send data to a remote actor, there are issues to use an OutputFormat into a stream job? Or it will treat like a Sink? I prefer to use it in order to create a custom ActorSystem per TM in the configure method. Cheers, Andrea

question regarding windowed stream

2016-05-06 Thread Balaji Rajagopalan
I have a requirement where I want to do aggregation on one data stream every 5 minutes, a different data stream every 1 minute. I wrote a example code to test this out but the behavior is different from what I expected , I expected the window2 to be called 5 times, and window 1 to called once , but

Re: reading from latest kafka offset when flink starts

2016-05-06 Thread Balaji Rajagopalan
Thanks Robert appreciate your help. On Fri, May 6, 2016 at 3:07 PM, Robert Metzger wrote: > Hi, > > yes, you can use Kafka's configuration setting for that. Its called > "auto.offset.reset". Setting it to "latest" will change the restart > behavior to the current offset ("earliest" is the opposi

Re: Discussion about a Flink DataSource repository

2016-05-06 Thread Fabian Hueske
Hi Flavio, I'll open a JIRA for de/serializing TableSource to textual JSON. Would something like this work for you? main() { ExecutionEnvironment env = ... TableEnvironment tEnv = ... // accessing an external catalog YourTableSource ts = Catalog.getTableSource("someIdentifier"); tEnv.

Re: reading from latest kafka offset when flink starts

2016-05-06 Thread Robert Metzger
Hi, yes, you can use Kafka's configuration setting for that. Its called "auto.offset.reset". Setting it to "latest" will change the restart behavior to the current offset ("earliest" is the opposite). How heavy is the processing you are doing? 4500 events/second sounds not like a lot of throughpu