Re: Finding the average temperature

2016-02-15 Thread Nirmalya Sengupta
Hello Stefano Sorry for the late reply. Many thanks for taking effort to write and share an example code snippet. I have been playing with the countWindow behaviour for some weeks now and I am generally aware of the functionality of countWindowAll(). For my useCase, where I _have to observe_ the

Re: Read once input data?

2016-02-15 Thread Saliya Ekanayake
Thanks, I'll check this. Saliya On Mon, Feb 15, 2016 at 4:08 PM, Fabian Hueske wrote: > I would have a look at the example programs in our code base: > > > https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java > > Best, Fabi

Re: Read once input data?

2016-02-15 Thread Fabian Hueske
I would have a look at the example programs in our code base: https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java Best, Fabian 2016-02-15 22:03 GMT+01:00 Saliya Ekanayake : > Thank you, Fabian. > > Any chance you might hav

Re: Read once input data?

2016-02-15 Thread Saliya Ekanayake
Thank you, Fabian. Any chance you might have an example on how to define a data flow with Flink? On Mon, Feb 15, 2016 at 3:58 PM, Fabian Hueske wrote: > It is not possible to "pin" data sets in memory, yet. > However, you can stream the same data set through two different mappers at > the sam

Re: Read once input data?

2016-02-15 Thread Fabian Hueske
It is not possible to "pin" data sets in memory, yet. However, you can stream the same data set through two different mappers at the same time. For instance you can have a job like: /---> Map 1 --> SInk1 Source --< \---> Map 2 --> SInk2 and execute it at once. F

Re: Read once input data?

2016-02-15 Thread Saliya Ekanayake
Fabian, count() was just an example. What I would like to do is say run two map operations on the dataset (ds). Each map will have it's own reduction, so is there a way to avoid creating two jobs for such scenario? The reason is, reading these binary matrices are expensive. In our current MPI imp

Re: Read once input data?

2016-02-15 Thread Fabian Hueske
Hi, it looks like you are executing two distinct Flink jobs. DataSet.count() triggers the execution of a new job. If you have an execute() call in your program, this will lead to two Flink jobs being executed. It is not possible to share state among these jobs. Maybe you should add a custom count

Read once input data?

2016-02-15 Thread Saliya Ekanayake
Hi, I see that an InputFormat's open() and nextRecord() methods get called for each terminal operation on a given dataset using that particular InputFormat. Is it possible to avoid this - possibly using some caching technique in Flink? For example, I've some code like below and I see for both the

Flink 1.0.0 Release Candidate 0: Please help testing

2016-02-15 Thread Robert Metzger
Hi, I've now created a "preview RC" for the upcoming 1.0.0 release. There are still some blocking issues and important pull requests to be merged but nevertheless I would like to start testing Flink for the release. In past major releases, we needed to create many release candidates, often for fi

Re: 1.0-SNAPSHOT downloads

2016-02-15 Thread Maximilian Michels
Hi Zach, Here you go: http://flink.apache.org/contribute-code.html#snapshots-nightly-builds Cheers, Max On Mon, Feb 15, 2016 at 6:29 PM, Zach Cox wrote: > Hi - are there binary downloads of the Flink 1.0-SNAPSHOT tarballs, like > there are for 0.10.2 [1]? I'm testing out an application built a

1.0-SNAPSHOT downloads

2016-02-15 Thread Zach Cox
Hi - are there binary downloads of the Flink 1.0-SNAPSHOT tarballs, like there are for 0.10.2 [1]? I'm testing out an application built against the 1.0-SNAPSHOT dependencies from Maven central, and want to make sure I run them on a Flink 1.0-SNAPSHOT cluster that matches up with those jars. Thanks

RE: events eviction

2016-02-15 Thread Radu Tudoran
Hi, Thanks Aljoscha for the details! The warning about performance and evictors is useful, but I am not sure how it can be put in practice always. Take for example a GlobalWindow that you would use to aggregate data from multiple partitions. A GlobalWindow does not come with a trigger - would

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
You can find the log of the recovering job manager here: https://gist.github.com/stefanobaghino/ae28f00efb6bdd907b42 Basically, what Ufuk said happened: the job manager tried to fill in for the lost one but couldn't find the actual data because it looked it up locally whereas due to my configurati

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
Hi Stefano, A correction from my side: You don't need to set the execution retries for HA because a new JobManager will automatically take over and resubmit all jobs which were recovered from the storage directory you set up. The number of execution retries applies only to jobs which are restarted

Re: Regarding Concurrent Modification Exception

2016-02-15 Thread Till Rohrmann
But isn't that a normal stack trace which you see when you submit a job to the cluster via the CLI and somewhere in the compilation process something fails? Anyway, it would be helpful to see the program which causes this problem. Cheers, Till On Mon, Feb 15, 2016 at 12:25 PM, Fabian Hueske wro

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-15 Thread shikhar
Stephan Ewen wrote > Do you know why you are getting conflicts on the FashHashMap class, even > though the core Flink dependencies are "provided"? Does adding the Kafka > connector pull in all the core Flink dependencies? Yes, the core Flink dependencies are being pulled in transitively from the K

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-15 Thread Stephan Ewen
Hi! Looks like that experience should be improved. Do you know why you are getting conflicts on the FashHashMap class, even though the core Flink dependencies are "provided"? Does adding the Kafka connector pull in all the core Flink dependencies? Concerning the Kafka connector: We did not inclu

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Ufuk Celebi
> On 15 Feb 2016, at 13:40, Stefano Baghino > wrote: > > Hi Ufuk, thanks for replying. > > Regarding the masters file: yes, I've specified all the masters and checked > out that they were actually running after the start-cluster.sh. I'll gladly > share the logs as soon as I get to see them.

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
Hi Stefano, That is true. The documentation doesn't mention that. Just wanted to point you to the documentation if anything else needs to be configured. We will update it. Instead of setting the number of execution retries on the StreamExecutionEnvironment, you may also set "execution-retries.def

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
Hi Maximilian, thank you for the reply. I've checked out the documentation before running my tests (I'm not expert enough to not read the docs ;)) but it doesn't mention some specific requirement regarding the execution retries, I'll check it out, thank! On Mon, Feb 15, 2016 at 12:51 PM, Maximili

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
Hi Ufuk, thanks for replying. Regarding the masters file: yes, I've specified all the masters and checked out that they were actually running after the start-cluster.sh. I'll gladly share the logs as soon as I get to see them. Regarding the state backend: how does having a non-distributed storage

Re: IOException when trying flink-twitter example

2016-02-15 Thread ram kumar
org.apache.flink.streaming.connectors.twitter.TwitterFilterSource - Initializing Twitter Streaming API connection 12:27:32,134 INFO com.twitter.hbc.httpclient.BasicClient- New connection executed: twitterSourceClient, endpoint: /1.1/statuses/filter.json?delimited=length 12:

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
Hi Stefano, The Job should stop temporarily but then be resumed by the new JobManager. Have you increased the number of execution retries? AFAIK, it is set to 0 by default. This will not re-run the job, even in HA mode. You can enable it on the StreamExecutionEnvironment. Otherwise, you have prob

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Ufuk Celebi
Using the local file system as state backend only works if all job managers run on the same machine. Is that the case? Have you specified all job managers in the masters file? With the local file system state backend only something like host-X host-X host-X will be a valid masters configuration.

Re: Problem with KeyedStream 1.0-SNAPSHOT

2016-02-15 Thread Fabian Hueske
Hi Javier, Keys is an internal class and was recently moved to a different package. So it appears like your Flink dependencies are not aligned to the same version. We also added Scala version identifiers to all our dependencies which depend on Scala 2.10. For instance, flink-scala became flink-sc

Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Stefano Baghino
Hello everyone, last week I've ran some tests with Apache ZooKeeper to get a grip on Flink HA features. My tests went bad so far and I can't sort out the reason. My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3 masters and 4 slaves. The 3 masters are also the ZooKeeper (3

Re: schedule tasks `inside` Flink

2016-02-15 Thread Fabian Hueske
Hi Michal, If I got your requirements right, you could try to solve this issue by serving the updates through a regular DataStream. You could add a SourceFunction which periodically emits a new version of the cache and a CoFlatMap operator which receives on the first input the regular streamed inp

Re: Regarding Concurrent Modification Exception

2016-02-15 Thread Fabian Hueske
Hi, This stacktrace looks really suspicious. It includes classes from the submission client (CLIClient), optimizer (JobGraphGenerator), and runtime (KryoSerializer). Is it possible that you try to start a new Flink job inside another job? This would not work. Best, Fabian

Re: writeAsCSV with partitionBy

2016-02-15 Thread Fabian Hueske
Hi Srikanth, DataSet.partitionBy() will partition the data on the declared partition fields. If you append a DataSink with the same parallelism as the partition operator, the data will be written out with the defined partitioning. It should be possible to achieve the behavior you described using D

Re: events eviction

2016-02-15 Thread Aljoscha Krettek
Hi, you are right, the logic is in EvictingNonKeyedWindowOperator.emitWindow() for non-parallel (non-keyed) windows and in EvictingWindow.processTriggerResult() in the case of keyed windows. You are also right about the contract of the Evictor, it returns the number of elements to be evicted fr

events eviction

2016-02-15 Thread Radu Tudoran
Hello, I am looking over the mechanisms of evicting events in Flink. I saw that either using a default evictor or building a custom one the logic is that the evictor will provide the number of events to be discarded. Could you please provide me with some additional pointers regarding the mechan

Problem with KeyedStream 1.0-SNAPSHOT

2016-02-15 Thread Lopez, Javier
Hi guys, I'm running a small test with the SNAPSHOT version in order to be able to use Kafka 0.9 and I'm getting the following error: *cannot access org.apache.flink.api.java.operators.Keys* *[ERROR] class file for org.apache.flink.api.java.operators.Keys not found* The code I'm using is as foll