"Failed to retrieve JobManager address" in Flink 1.1.1 with JM HA

2016-08-22 Thread Hironori Ogibayashi
Hello, After I upgraded to 1.1.1, I am getting error when submitting job with "flink run" The command and result is like this. It has been working with Flink 1.0.3. --- % FLINK_CONF_DIR=~/opt/flink/conf ~/opt/flink/flink-1.1.1/bin/flink run -c MyJob target/my-flink-job.jar

Re: Error joining with Python API

2016-08-22 Thread davis k
Awesome, thanks Chesnay! On Wed, Aug 17, 2016 at 2:58 AM, Chesnay Schepler wrote: > Found the issue, there was a missing tab in the chaining method... > > > On 16.08.2016 12:12, Chesnay Schepler wrote: > >> looks like a bug, will look into it. :) >> >> On 16.08.2016 10:29, Ufuk Celebi wrote: >>

Re: how to get rid of duplicate rows group by in DataStream

2016-08-22 Thread Kostas Kloudas
Hi Subash, You should also split your elements in windows. If not, Flink emits an element for each incoming record. That is why you have: (1,1) (1,2) (1,3) … Kostas > On Aug 22, 2016, at 5:58 PM, subash basnet wrote: > > Hello all, > > I grouped by the input based on it's id to count the n

how to get rid of duplicate rows group by in DataStream

2016-08-22 Thread subash basnet
Hello all, I grouped by the input based on it's id to count the number of elements in each group. DataStream> gridWithCount; Upon printing the above datastream it shows with duplicate rows: Output: (1, 1) (1,2) (2,1) (1,3) (2,2)... Whereas I wanted the distinct rows with final count: Needed O

Re: Complex batch workflow needs (too) much time to create executionPlan

2016-08-22 Thread Fabian Hueske
Hi Markus, you might be right, that a lot of time is spend in optimization. The optimizer enumerates all alternatives and chooses the plan with the least estimated cost. The degrees of freedom of the optimizer are rather restricted (execution strategies and the used partitioning & sorting keys. Th

Complex batch workflow needs (too) much time to create executionPlan

2016-08-22 Thread Markus Nentwig
Hello Flink community, I created a slightly long batch workflow for my use case of clustering vertices using Flink and Gelly. Executing each of the workflow parts individually (and write intermediate results to disk) works as suspected. When combining workflow parts to longer jobs, I noticed that

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-22 Thread Miroslav Gajdoš
Here is the log from yarn application - run on another cluster (this time cdh5.7.0, but with similar configuration). Check the hostnames; in configuration, there are aliases used and the difference from fqdn may be the cause, judging by the log (exception at line 87)... http://pastebin.com/iimPVbX

Re: Contain topic name in the stream

2016-08-22 Thread Robert Metzger
Hi Sendoh, the KeyedDeserializationSchema allows you to access the topic name, partition id and offset of the message. So you need to implement a custom deserialization schema, extending the KeyedDeserializationSchema to get this information. Regards, Robert On Mon, Aug 22, 2016 at 10:58 AM, Se

Re: How to share text file across tasks at run time in flink.

2016-08-22 Thread Kostas Kloudas
Hello Baswaraj, Are you using the DataSet (batch) or the DataStream API? If you are in the first, you can use a broadcast variable for your task. If you are using the DataStream one, then there is

Contain topic name in the stream

2016-08-22 Thread Sendoh
Hi Flink users, Would it be possible that the current kafka connector contain the topic name in the stream (each event)? For example, when reading a list of topics, we often want to count the events in each topic which used keyBy(). I remember already saw the implementation mentioned somewhere,

Re: Checking for existance of output directory/files before running a batch job

2016-08-22 Thread Niels Basjes
Yes, that did the trick. Thanks. I was using a relative path without any FS specification. So my path was "foo" and on the cluster this resolves to "hdfs:///user/nbasjes/foo" Locally this resolved to "file:///home/nbasjes/foo" and hence the mismatch I was looking at. For now I can work with this f