Arrays values in keyBy

2016-06-10 Thread Elias Levy
I would be useful if the documentation warned what type of equality it expected of values used as keys in keyBy. I just got bit in the ass by converting a field from a string to a byte array. All of the sudden the windows were no longer aggregating. So it seems Flink is not doing a deep compare

Re: Application log on Yarn FlinkCluster

2016-06-10 Thread Robert Metzger
Hi Theofilos, how exactly are you writing the application output? Are you using a logging framework? Are you writing the log statements from the open(), map(), invoke() methods or from some constructors? (I'm asking since different parts are executed on the cluster and locally). On Fri, Jun 10, 2

Re: Join two streams using a count-based window

2016-06-10 Thread Nikos R. Katsipoulakis
Thank you very much Matthias! Also, the link you provided is very helpful. Cheers, Nikos On Fri, Jun 10, 2016 at 3:16 AM, Matthias J. Sax wrote: > I just put an answer to SO. > > About the other questions: Flink processes tuple-by-tuple and does some > internal buffering. You might be intereste

Re: Leader not found

2016-06-10 Thread Robert Metzger
Hi, sorry for dropping this one. I totally forgot this thread. Did you find a solution? One reason for this to happen could be that you specified only one broker in the list of bootstrap severs? On Wed, Apr 20, 2016 at 7:08 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > Robert

Re: Kafka exception "Unable to find a leader for partitions"

2016-06-10 Thread Robert Metzger
Hi Shannon, Some questions: which Flink version are you using? Can you provide me with some more logs, in particular the log entries before this event from the Kafka connector. Also, it is possible that the Kafka broker was in an erroneous state? Did the error happen after weeks of data consump

Re: Result comparison from 2 DataStream Sources

2016-06-10 Thread Konstantin Knauf
Hi again, and again sorry for the late response. Regarding your first question: You can use a Key Selector Function [1]. Regarding your second question: If I understand your requirement correctly, this is already happening in my gist. By taking the union of both streams the local and away max a

Application log on Yarn FlinkCluster

2016-06-10 Thread Theofilos Kakantousis
Hi all, Flink 1.0.3 Hadoop 2.4.0 When running a job on a Flink Cluster on Yarn, the application output is not included in the Yarn log. Instead, it is only printed in the stdout from where I run my program. For the jobmanager, I'm using the log4j.properties file from the flink/conf directory

Re: Reading whole files (from S3)

2016-06-10 Thread Robert Metzger
Hi, setting the unsplittable attribute in the constructor is fine. The field's value will be send to the cluster. So what happens is that you initialize the input format in your client program. Then, its serialized, send over the network to the machines and deserilaized again. So the value you've s

Re: Join two streams using a count-based window

2016-06-10 Thread Matthias J. Sax
I just put an answer to SO. About the other questions: Flink processes tuple-by-tuple and does some internal buffering. You might be interested in https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks -Matthias On 06/09/2016 08:13 PM, Nikos R. Katsipoulakis wrote: > Hello

Re: Reading whole files (from S3)

2016-06-10 Thread Andrea Cisternino
Hi, I am replying to myself for the records and to provide an update on what I am trying to do. I have looked into Mahout's XmlInputFormat class but unfortunately it doesn't solve my problem. My exploratory work with Flink tries to reproduce the key steps that we already perform in a quite large

Re: Does Flink allows for encapsulation of transformations?

2016-06-10 Thread Chesnay Schepler
Q1: Whether one of your classes requires the *e**nv* parameter depends on whether you want to create a new Source or set a ExecutionEnvironment parameter inside the class. If you don't you can of course not pass it :) I can't see anything that would prevent it form running on a cluster. Q2: Us