Checkpoint and restore states

2016-04-19 Thread Jack Huang
Hi all, I am doing a simple word count example and want to checkpoint the accumulated word counts. I am not having any luck getting the counts saved and restored. Can someone help? env.enableCheckpointing(1000) env.setStateBackend(new MemoryStateBackend()) > ... inStream > .keyBy({s =>

Re: Flink + S3

2016-04-19 Thread Michael-Keith Bernard
Hey Till & Ufuk, We're running on self-managed EC2 instances (and we'll eventually have a mirror cluster in our colo). The provided documentation notes that for Hadoop 2.6, we'd need such-and-such version of hadoop-aws and guice on the CP. If I wanted to instead use Hadoop 2.7, which versions o

Re: Flink on Yarn - ApplicationMaster command

2016-04-19 Thread Theofilos Kakantousis
Hi Max, Thank you for your reply. Exactly, I want to setup the Yarn cluster and submit a job through code and not using cmd client. I had done what you suggested, I used part of the deploy method to write my own code that starts up the cluster which seems to be working fine. Could you point m

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
If the data exceeds the main memory of your machine, then you should use the RocksDBStateBackend as a state backend. It allows you to store state (including windows) on disk. Thus, the size of state you can store is then limited by your hard disk capacity. If the expected data size can be kept in

Re: YARN session application attempts

2016-04-19 Thread Till Rohrmann
Hi Stefano, Hadoop supports this feature since version 2.6.0. You can define a time interval for the maximum number of applications attempt. This means that you have to observe this number of application failures in a time interval before failing the application ultimately. Flink will activate thi

Re: FoldFunction accumulator checkpointing

2016-04-19 Thread Ron Crocker
Aljoscha - I want to use a RichFoldFunction to get the open() hook. I cheat and use this structure instead with a (non-Rich) FoldFunction: public class InfinitResponseFilterFolder implements FoldFunction, String> { private BackingStore backingStore; @Override public String fold(Infini

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-19 Thread Till Rohrmann
Hi Norman, sorry for the late reply. I finally found time and could, thanks to you, reproduce the problem. The problem was that the window borders were treated differently in two parts of the code. Now the left border of a window is inclusive and the right border (late elements) is exclusive. I've

Re: Master (1.1-SNAPSHOT) Can't run on YARN

2016-04-19 Thread Ufuk Celebi
Hey Stefano, Flink's resource management has been refactored for 1.1 recently. This could be a regression introduced by this. Max can probably help you with more details. Is this currently a blocker for you? – Ufuk On Tue, Apr 19, 2016 at 6:31 PM, Stefano Baghino wrote: > Hi everyone, > > I'm c

Master (1.1-SNAPSHOT) Can't run on YARN

2016-04-19 Thread Stefano Baghino
Hi everyone, I'm currently experiencing a weird situation, I hope you can help me out with this. I've cloned and built from the master, then I've edited the default config fil by adding my Hadoop config path, exported the HADOOP_CONF_DIR env var and ran bin/yarn-session.sh -n 1 -s 2 -jm 2048 -tm

Re: ClasNotFound when submitting job from command line

2016-04-19 Thread Flavio Pompermaier
I use maven to generate the shaded jar (and the classes are inside it) but when the job starts it can load those classes using Class.forName() (required to instantiate the JDBC connections). I think it's probably a problem related to class loading of Flink On Tue, Apr 19, 2016 at 6:02 PM, Balaji R

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Yifei Li
Hi Till and Aljoscha, Thank you so much for your suggestions and I'll try them out. I have another question. Since S2 my be days delayed, so there are may be lots of windows and large amount of data stored in memory waiting for computation. How does Flink deal with that? Thanks, Yifei On Tue,

Re: Leader not found

2016-04-19 Thread Balaji Rajagopalan
Flink version : 1.0.0 Kafka version : 0.8.2.1 Try to use a topic which has no message posted to it, at the time flink starts. On Tue, Apr 19, 2016 at 5:41 PM, Robert Metzger wrote: > Can you provide me with the exact Flink and Kafka versions you are using > and the steps to reproduce the issue?

Re: ClasNotFound when submitting job from command line

2016-04-19 Thread Balaji Rajagopalan
In your pom.xml add the maven.plugins like this, and you will have to add all the dependent artifacts, this works for me, if you fire mvn clean compile package, the created jar is a fat jar. org.apache.maven.plugins maven-dependency-plugin 2.9

RE: ClasNotFound when submitting job from command line

2016-04-19 Thread Radu Tudoran
Hi, In my case the root cause for this was mainly that I was using eclipse to package the jar. Try using mvn instead. Additioanlly you can copy the dependency jars in the lib of the task managers and restart them Dr. Radu Tudoran Research Engineer - Big Data Expert IT R&D Division [cid:image00

ClasNotFound when submitting job from command line

2016-04-19 Thread Flavio Pompermaier
Hi to all, I just tied to dubmit my application to the Flink cluster (1.0.1) but I get ClassNotFound exceptions for classes inside my shaded jar (like oracle.jdbc.OracleDriver or org.apache.commons.pool2.PooledObjectFactory). Those classes are in the shaded jar but aren't found. If I put the jars

Re: logback.xml and logback-yarn.xml rollingpolicy configuration

2016-04-19 Thread Till Rohrmann
Have you made sure that Flink is using logback [1]? [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#using-logback-instead-of-log4j Cheers, Till On Tue, Apr 19, 2016 at 2:01 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > The are two files in

Re: adding source not serializable exception in streaming implementation

2016-04-19 Thread Till Rohrmann
I assume that the provided FetchStock code is not complete. As the exception indicates, you somehow store a LocalStreamEnvironment in you source function. The StreamExecutionEnvironments are not serializable and cannot be part of the source function’s closure. Cheers, Till ​ On Tue, Apr 19, 2016

Re: Sink Parallelism

2016-04-19 Thread Chesnay Schepler
The picture you reference does not really show how dataflows are connected. For a better picture, visit this link: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows Let me know if this doesn't answer your question. On 19.04.2016 14:22, Ravind

Re: Flink on Yarn - ApplicationMaster command

2016-04-19 Thread Maximilian Michels
Hi Theofilos, I'm not sure whether I understand correctly what you are trying to do. I'm assuming you don't want to use the command-line client. You can setup the Yarn cluster in your code manually using the FlinkYarnClient class. The deploy() method will give you a FlinkYarnCluster which you can

Flink on Yarn - ApplicationMaster command

2016-04-19 Thread Theofilos Kakantousis
Hi everyone, I'm using Flink 0.10.1 and hadoop 2.4.0 to implement a client that submits a flink application to Yarn. To keep it simple I use the ConnectedComponents app from flink examples. I set the required properties (Resources, AM ContainerLaunchContext etc.) on the YARN client interface

adding source not serializable exception in streaming implementation

2016-04-19 Thread subash basnet
Hello all, My requirement is to re-read the csv file from a file path at certain time intervals and process the csv data. The csv file gets updated at regular intervals. Below is my code: StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); *DataStream dataStream

Sink Parallelism

2016-04-19 Thread Ravinder Kaur
Hello All, Considering the following streaming dataflow of the example WordCount, I want to understand how Sink is parallelised. Source --> flatMap --> groupBy(), sum() --> Sink If I set the paralellism at runtime using -p, as shown here https://ci.apache.org/projects/flink/flink-docs-release-1

Re: Leader not found

2016-04-19 Thread Robert Metzger
Can you provide me with the exact Flink and Kafka versions you are using and the steps to reproduce the issue? On Tue, Apr 19, 2016 at 2:06 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > It does not seem to fully work if there is no data in the kafka stream, > the flink applica

Re: Leader not found

2016-04-19 Thread Balaji Rajagopalan
It does not seem to fully work if there is no data in the kafka stream, the flink application emits this error and bails, could this be missed use case in the fix. On Tue, Apr 19, 2016 at 3:58 PM, Robert Metzger wrote: > Hi, > > I'm sorry, the documentation in the JIRA issue is a bit incorrect.

logback.xml and logback-yarn.xml rollingpolicy configuration

2016-04-19 Thread Balaji Rajagopalan
The are two files in the /usr/share/flink/conf directory, and I was trying to do the rolling of application logs which goes to following directory in task nodes. /var/log/hadoop-yarn/containers/application_*/container_*/taskmanager.log out err Changing the logback.xml and logback-yarn.xml has no

jobmanager.web.* properties for long running yarn session

2016-04-19 Thread Konstantin Knauf
Hi everyone, we are using a long running yarn session and changed jobmanager.web.checkpoints.history to 20. On the dashboard's job manager panel I can see the changed config, but the checkpoint history for the job still has only 10 entries. Are these properties only supported in stand-alone mode?

Re: Leader not found

2016-04-19 Thread Robert Metzger
Hi, I'm sorry, the documentation in the JIRA issue is a bit incorrect. The issue has been fixed in all versions including and after 1.0.0. Earlier releases (0.10, 0.9) will fail when the leader changes. However, you don't necessarily need to upgrade to Flink 1.0.0 to resolve the issue: With checkp

Leader not found

2016-04-19 Thread Balaji Rajagopalan
I am facing this exception repeatedly while trying to consume from kafka topic. It seems it was reported in 1.0.0 and fixed in 1.0.0, how can I be sure that is fixed in the version of flink that I am using, does it require me to install patch updates ? Caused by: java.lang.RuntimeException: Unabl

Re: class java.util.UUID is not a valid POJO type

2016-04-19 Thread Till Rohrmann
Hi Leonard, the UUID class cannot be treated as a POJO by Flink, because it is lacking the public getters and setters for mostSigBits and leastSigBits. However, it should be possible to treat it as a generic type. I think the difference is that you cannot use key expressions and key indices to def

Re: Flink + Kafka + Scalabuff issue

2016-04-19 Thread Robert Metzger
Hi Alex, I suspect its a GC issue with the code generated by ScalaBuff. Can you maybe try to do something like a standalone test where use use a while(true) loop to see how fast you can deserialize elements from your Foo type? Maybe you'll find that the JVM is growing all the time. Then there's pro

Re: Turn off logging in Flink

2016-04-19 Thread Sendoh
Thank you! Totally works. Best, Sendoh -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Turn-off-logging-in-Flink-tp6196p6200.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Hash tables - joins, cogroup, deltaIteration

2016-04-19 Thread Fabian Hueske
Hi Ovidiu, Hash tables are currently used for joins (inner & outer) and the solution set of delta iterations. There is a pending PR that implements a hash table for partial aggregations (combiner) [1] which should be added soon. Joins (inner & outer) are already implemented as Hybrid Hash joins t

Re: Turn off logging in Flink

2016-04-19 Thread Till Rohrmann
Hi Sendoh, you have to edit your log4j.properties file to set log4j.rootLogger=OFF in order to turn off the logger. Depending on how you run Flink and where you wanna turn off the logging, you either have to edit the log4j.properties file in the FLINK_HOME/conf directory or the in your project whi

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
Hi Yifei, if you don't wanna implement your own join operator, then you could also chain two join operations. I created a small example to demonstrate that: https://gist.github.com/tillrohrmann/c074b4eedb9deaf9c8ca2a5e124800f3. However, bare in mind that for this approach you will construct two wi

Turn off logging in Flink

2016-04-19 Thread Sendoh
Hi, Can I ask how to turn off Flink logging to avoid seeing INFO? I have tried StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.getConfig().disableSysoutLogging(); env.execute() and Configuration env_config = new Configuration(); env_config.setBoo

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Aljoscha Krettek
Hi, right now, there is no built-in support for n-ary joins. I am working on this, however. For now you can simulate n-ary joins by using a tagged union and doing the join yourself in a WindowFunction. I created a small example that demonstrates this: https://gist.github.com/aljoscha/a2a213d90c7c1

Re: Flink + Kafka + Scalabuff issue

2016-04-19 Thread Ufuk Celebi
Hey Alex, (1) Which Flink version are you using for this? (2) Can you also get a heap dump after the job slows down? Slow downs like this are often caused by some component leaking memory, maybe in Flink, maybe the Scalabuff deserializer. Can you also share the Foo code? – Ufuk On Mon, Apr 18,

Re: Flink + S3

2016-04-19 Thread Ufuk Celebi
Hey Michael-Keith, are you running self-managed EC2 instances or EMR? In addition to what Till said: We tried to document this here as well: https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#provide-s3-filesystem-dependency Does this help? You don't need to really install Ha

Re: Flink + S3

2016-04-19 Thread Till Rohrmann
Hi Michael-Keith, you can use S3 as the checkpoint directory for the filesystem state backend. This means that whenever a checkpoint is performed the state data will be written to this directory. The same holds true for the zookeeper recovery storage directory. This directory will contain the sub