Re: Submit Flink Jobs to YARN running on AWS

2016-04-29 Thread Bajaj, Abhinav
Hi Robert, Thanks for your reply. I am using the Public DNS for the EC2 machines in the yarn and hdfs configuration files. It looks like "ec2-203-0-113-25.compute-1.amazonaws.com” You should be able to connect then. I have hadoop installed locally and the YARN_CONF_DIR is pointing to it. The ya

Re: Creating a custom operator

2016-04-29 Thread Fabian Hueske
Hi Simone, the GraphCreatingVisitor transforms the common operator plan into a representation that is translated by the optimizer. You have to implement an OptimizerNode and OperatorDescriptor to describe the operator. Depending on the semantics of the operator, there are a few more places to make

Flink on Azure HDInsight

2016-04-29 Thread Brig Lamoreaux
Hi All, Are there any issues with Flink on Azure HDInsight? Thanks, Brig Lamoreaux Data Solution Architect US Desert/Mountain Tempe [MSFT_logo_Gray DE sized SIG1.png]

Re: join performance

2016-04-29 Thread Henry Cai
So is the window defined as hour-window or second-window? If I am using hour-window, I guess I need to modify the trigger to fire early (e.g. every minute)? But I don't want to repeatedly emit the same joined records for every minute (i.e. on 2nd minute, I only want to emit the changes introduced

Creating a custom operator

2016-04-29 Thread Simone Robutti
Hello, I'm trying to create a custom operator to explore the internals of Flink. Actually the one I'm working on is rather similar to Union and I'm trying to mimick it for now. When I run my job though, this error arise: Exception in thread "main" java.lang.IllegalArgumentException: Unknown opera

Re: Checking actual config values used by TaskManager

2016-04-29 Thread Ken Krugler
Hi Timur, > On Apr 28, 2016, at 10:40pm, Timur Fayruzov wrote: > > If you're talking about parameters that were set on JVM startup then `ps > aux|grep flink` on an EMR slave node should do the trick, that'll give you > the full command line. No, I’m talking about values that come from flink-c

Re: Anyone going to ApacheCon Big Data in Vancouver?

2016-04-29 Thread Trevor Grant
Hey Ken, I'll be there doing a talk on Monday afternoon on using Zeppelin for a Data Science environment with a couple of Flink and Spark Examples. I'm doing a tutorial Wednesday morning (I think for ApacheCon) that is about setting up Zeppelin with Flink and Spark in cluster mode. Would love to

Re: Requesting the next InputSplit failed

2016-04-29 Thread Stefano Bortoli
We could successfully run the job without issues. Thanks a lot everyone for the support. FYI: with Flink we completed in 3h28m the job that was planned to run for 15 days 24/7 relying on our legacy customer approach. :-) saluti, Stefano 2016-04-28 14:50 GMT+02:00 Fabian Hueske : > Yes, assignin

Re: Count on grouped keys

2016-04-29 Thread Punit Naik
Yeah no problem. Its not an optimised solution but I think it gives enough understanding of how reduceGroup works. On 29-Apr-2016 5:17 PM, "Stefano Baghino" wrote: > Thanks for sharing the solution, Punit. > > On Fri, Apr 29, 2016 at 1:40 PM, Punit Naik > wrote: > >> Anyways, I managed to do it.

Re: Count on grouped keys

2016-04-29 Thread Stefano Baghino
Thanks for sharing the solution, Punit. On Fri, Apr 29, 2016 at 1:40 PM, Punit Naik wrote: > Anyways, I managed to do it. you should attach the following code block to > your groupBy > .reduceGroup { > (in, out: org.apache.flink.util.Collector[(Map[String,String], > Int)]) => // Map[String

Re: Count on grouped keys

2016-04-29 Thread Punit Naik
Anyways, I managed to do it. you should attach the following code block to your groupBy .reduceGroup { (in, out: org.apache.flink.util.Collector[(Map[String,String], Int)]) => // Map[String,String] is the data type I want to output along with the count as Int in a Tuple var v:Int = 0;

Re: join performance

2016-04-29 Thread Aljoscha Krettek
Hi, you are right, everything will be emitted in a huge burst at the end of the hour. If you want to experiment a bit you can write a custom Trigger based on EventTimeTrigger that will delay firing of windows. You would change onEventTime() to not fire but instead register a processing-time timer a

Re: Discarding header from CSV file

2016-04-29 Thread nsengupta
Hello Chiwan, Sorry for the late reply. I have been into other things for some time. Yes, you are right. I have been assuming that field to be Integer, wrongly. I will fix it and give it a go again. Many thanks again. -- Nirmalya -- View this message in context: http://apache-flink-user-ma

Discussion about a Flink DataSource repository

2016-04-29 Thread Flavio Pompermaier
Hi to all, as discussed briefly with Fabian, for our products in Okkam we need a central repository of DataSources processed by Flink. With respect to existing external catalogs, such as Hive or Confluent's SchemaRegistry, whose objective is to provide necessary metadata to read/write the register

Count on grouped keys

2016-04-29 Thread Punit Naik
I have a dataset which has maps. I have performed a groupBy on a key and I want to count all the elements in a particular group. How do I do this? -- Thank You Regards Punit Naik

Fwd: TypeVariable problems

2016-04-29 Thread Martin Neumann
Hej, I have a construct of different generic classes stacked on each other to create a library (so the type variables get handed on). And I have some trouble getting it to work. The current offender is a Class with 3 type variables internally it calls: .fold(new Tuple3<>(keyInit ,new Tuple2(0d,0