Re: Compiler Exception

2015-11-19 Thread Till Rohrmann
Hi Kien Truong, could you share the problematic code with us? Cheers, Till On Nov 18, 2015 9:54 PM, "Truong Duc Kien" wrote: > Hi, > > I'm hitting Compiler Exception with some of my data set, but not all of > them. > > Exception in thread "main" org.apache.flink.optimizer.CompilerException: > N

Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Madhukar Thota
Hi I am very new to Avro. Currently I am using confluent Kafka version and I am able to write an Avro message to Kafka by storing schema in schema registry. Now I need to consume those messages using Flink Kafka Consumer and having a hard time to deserialize the messages. I am looking for an exam

Re: YARN High Availability

2015-11-19 Thread Aljoscha Krettek
I think we should find a way to randomize the paths where the HA stuff stores data. If users don’t realize that they store data in the same paths this could lead to problems. > On 19 Nov 2015, at 08:50, Till Rohrmann wrote: > > Hi Gwenhaël, > > good to hear that you could resolve the problem.

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Maximilian Michels
Hi Madhukar, Thanks for your question. When you instantiate the FlinkKafkaConsumer, you supply a DeserializationSchema in the constructor. You simply create a class which implements DeserializationSchema and contains the KafkaAvroDecoder with the schema registry. Like so: public class MyAvroDese

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-19 Thread Aljoscha Krettek
Hmm, that’s very strange. I’ll continue looking into it. > On 17 Nov 2015, at 21:42, Konstantin Knauf > wrote: > > Hi Aljoscha, > > Are you sure? I am running the job from my IDE at the moment. > > If I set > > StreamExecutionEnvironment.setParallelism(1); > > I works with the old TimestampE

Re: YARN High Availability

2015-11-19 Thread Robert Metzger
I agree with Aljoscha. Many companies install Flink (and its config) in a central directory and users share that installation. On Thu, Nov 19, 2015 at 10:45 AM, Aljoscha Krettek wrote: > I think we should find a way to randomize the paths where the HA stuff > stores data. If users don’t realize

Re: Published test artifacts for flink streaming

2015-11-19 Thread lofifnc
Hi, I'm currently working on improving the testing process of flink streaming applications. I have written a test runtime that takes care of execution, collecting the output and applying a verifier to it. The runtime is able to provide test sources and sinks that run in parallel. On top of that i

Re: YARN High Availability

2015-11-19 Thread Till Rohrmann
I agree that this would make the configuration easier. However, it entails also that the user has to retrieve the randomized path from the logs if he wants to restart jobs after the cluster has crashed or intentionally restarted. Furthermore, the system won't be able to clean up old checkpoint and

flink yarn-session failure

2015-11-19 Thread Stefanos Antaris
Hi to all, i am trying to use Flink with Hadoop yarn but i am facing an exception while trying to create a yarn-session. First of all, i have a Hadoop cluster with 20 VMs that uses yarn. I can start the Hadoop cluster and run Hadoop jobs without any problem. Furthermore, i am trying to deploy

Re: flink yarn-session failure

2015-11-19 Thread Robert Metzger
The exception is thrown even before Flink code is executed, so I assume that your YARN setup is not properly working. Did you try running any other YARN application on the setup? I suspect that other systems like MapReduce or Spark will also not run on the environment. Maybe the yarn-site.xml on t

Re: YARN High Availability

2015-11-19 Thread Aljoscha Krettek
Maybe we could add a user parameter to specify a cluster name that is used to make the paths unique. On Thu, Nov 19, 2015, 11:24 Till Rohrmann wrote: > I agree that this would make the configuration easier. However, it entails > also that the user has to retrieve the randomized path from the log

Re: YARN High Availability

2015-11-19 Thread Till Rohrmann
You mean an additional start-up parameter for the `start-cluster.sh` script for the HA case? That could work. On Thu, Nov 19, 2015 at 11:54 AM, Aljoscha Krettek wrote: > Maybe we could add a user parameter to specify a cluster name that is used > to make the paths unique. > > On Thu, Nov 19, 201

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Madhukar Thota
Hi Max Thanks for the example. Based on your example here is what i did: public class Streamingkafka { public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.enableCheckpointing(500)

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Stephan Ewen
The KafkaAvroDecoder is not serializable, and Flink uses serialization to distribute the code to the TaskManagers in the cluster. I think you need to "lazily" initialize the decoder, in the first invocation of "deserialize()". That should do it. Stephan On Thu, Nov 19, 2015 at 12:10 PM, Madhuka

Re: YARN High Availability

2015-11-19 Thread Aljoscha Krettek
Yes, that’s what I meant. > On 19 Nov 2015, at 12:08, Till Rohrmann wrote: > > You mean an additional start-up parameter for the `start-cluster.sh` script > for the HA case? That could work. > > On Thu, Nov 19, 2015 at 11:54 AM, Aljoscha Krettek > wrote: > Maybe we could add a user parameter

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Maximilian Michels
Stephan is right, this should do it in deserialize(): if (decoder == null) { decoder = new KafkaAvroDecoder(vProps); } Further, you might have to specify the correct return type for getProducedType(). You may use public TypeInformation getProducedType() { return TypeE

Re: YARN High Availability

2015-11-19 Thread Ufuk Celebi
I’ve added a note about this to the docs and asked Max to trigger a new build of them. Regarding Aljoscha’s idea: I like it. It is essentially a shortcut for configuring the root path. In any case, it is orthogonal to Till’s proposals. That one we need to address as well (see FLINK-2929). The

Re: flink yarn-session failure

2015-11-19 Thread Stefanos Antaris
Yes. You are right. I cannot run any YARN application. However, i have no localhost in my yarn-site.xml. yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler

Re: YARN High Availability

2015-11-19 Thread Maximilian Michels
The docs have been updated. On Thu, Nov 19, 2015 at 12:36 PM, Ufuk Celebi wrote: > I’ve added a note about this to the docs and asked Max to trigger a new build > of them. > > Regarding Aljoscha’s idea: I like it. It is essentially a shortcut for > configuring the root path. > > In any case, it

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Madhukar Thota
Hi Max/Ewen, Thank you for the inputs. I was able to solve the serialization issues. Now i am seeing the NullPoint Exceptions. public class MyAvroDeserializer implements DeserializationSchema { private transient KafkaAvroDecoder decoder; public MyAvroDeserializer(VerifiableProperties vP

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Till Rohrmann
The constructor of Java classes after deserialization is not necessarily called. Thus, you should move the check if (this.decoder == null) { this.decoder = new KafkaAvroDecoder(vProps); } into the deserialize method of MyAvroDeserializer. Cheers, Till ​ On Thu, Nov 19, 2015 at 1:50 PM, Madh

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Maximilian Michels
You need to initialize the decoder in the deserialize method instead of in the constructor. On Thu, Nov 19, 2015 at 1:50 PM, Madhukar Thota wrote: > Hi Max/Ewen, > > Thank you for the inputs. I was able to solve the serialization issues. Now > i am seeing the NullPoint Exceptions. > > public clas

Re: Compiler Exception

2015-11-19 Thread Truong Duc Kien
Hi Till, I have narrowed down a minimal test case, you will need flink-gelly-scala package to run this. import org.apache.flink.api.common.functions.MapFunction import org.apache.flink.api.scala._ import org.apache.flink.graph._ import org.apache.flink.graph.scala.Graph import org.apache.flink

Re: flink yarn-session failure

2015-11-19 Thread Robert Metzger
Hi Stefanos, the pasted yarn-site.xml file looks fine on the first sight. You don't need a yarn-site.xml file for Namenodes or DataNodes, these belong to HDFS. In YARN these components are called ResourceManager and NodeManager. You can usually create one yarn-site.xml file and copy it to all mac

Re: Flink execution time benchmark.

2015-11-19 Thread Robert Metzger
Hi Saleh, The new web interface in Flink 0.10 has also a REST API that you can use for querying job information. GET http://localhost:8081/jobs/83547b683ad5b388355a49911168fbc7 will give you the following JSON object: { "jid":"83547b683ad5b388355a49911168fbc7", "name":"com.dataartisans.

Re: Error handling

2015-11-19 Thread Robert Metzger
Hi Nick, regarding the Kafka example: What happens is that the FlinkKafkaConsumer will throw an exception. The JobManager then cancels the entire job and restarts it. It will then try to continue reading from the last valid checkpoint or the consumer offset in zookeeper. Since the data in the topi

Re: Fold vs Reduce in DataStream API

2015-11-19 Thread Ron Crocker
Hi Fabian - Thanks Fabian, that is a helpful description. That document WAS my source of information and it seems to also be the source of my confusion. Further, it appears to be wrong - there is a FoldFunction ( https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/fli

Re: Fold vs Reduce in DataStream API

2015-11-19 Thread Stephan Ewen
Hi Ron! You are right, there is a copy/paste error in the docs, it should be a FoldFunction that is passed to fold(), not a ReduceFunction. In Flink-0.10, the FoldFunction is only available on - KeyedStream ( https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flin

Re: Fold vs Reduce in DataStream API

2015-11-19 Thread Ron Crocker
Thanks Stephan, that helps quite a bit. Looks like another one of those API changes that I'll be struggling with for a little bit. On Thu, Nov 19, 2015 at 10:40 AM, Stephan Ewen wrote: > Hi Ron! > > You are right, there is a copy/paste error in the docs, it should be a > FoldFunction that is pas

Re: Fold vs Reduce in DataStream API

2015-11-19 Thread Stephan Ewen
Hi Ron! Yes, we had to change a few things in the API between 0.9 and 0.10. The API in 0.9 had quite a few problems. This one now looks good, we are confident that it will stay. Greetings, Stephan On Thu, Nov 19, 2015 at 8:15 PM, Ron Crocker wrote: > Thanks Stephan, that helps quite a bit. Lo