How to read from a Kafka topic from the beginning

2015-11-17 Thread Miaoyongqiang (Will)
Hi, How can I tell a "FlinkKafkaConsumer" that I want to read from a topic from the beginning? Thanks, Will

Re: Apache Flink Operator State as Query Cache

2015-11-17 Thread Aljoscha Krettek
Hi, yes, it will be in 1.0. I’m working on adding it to master right now. Cheers, Aljoscha > On 17 Nov 2015, at 02:46, Welly Tambunan wrote: > > Hi Stephan, > > So that will be in Flink 1.0 right ? > > Cheers > > On Mon, Nov 16, 2015 at 9:06 PM, Stephan Ewen wrote: > Hi Anwar! > > 0.10.0 w

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-17 Thread Aljoscha Krettek
Hi, yes, unfortunately, there is a bug in the timestamp extraction operator that sets the “last-seen watermark” to Long.MIN_VALUE even though it should not when calling getCurrentWatermark(). I’m opening an Issue and adding a fix to the latest master and the branch for the 0.10.x bugfix release

Re: How to read from a Kafka topic from the beginning

2015-11-17 Thread Robert Metzger
Hi Will, In Kafka's consumer configuration [1] there is a configuration parameter called "auto.offset.reset". Setting it to "smallest" will tell the consumer to start reading a topic from the smallest available offset. You can pass the configuration using the properties of the Kafka consumer. [

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-17 Thread Aljoscha Krettek
Hi, actually, the bug is more subtle. Normally, it is not a problem that the TimestampExtractor sometimes emits a watermark that is lower than the one before. (This is the result of the bug with Long.MIN_VALUE I mentioned before). The stream operators wait for watermarks from all upstream operat

RE: MaxPermSize on yarn

2015-11-17 Thread Gwenhael Pasquiers
We tried to add : -yD env.java.opts="-XX:MaxPermSize=256m" But we’ve got to investigate since we have to following error : Improperly specified VM option 'MaxPermSize' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. Either is the option

Re: MaxPermSize on yarn

2015-11-17 Thread Robert Metzger
You can also put the configuration option into the flink-conf.yaml file. On Tue, Nov 17, 2015 at 12:01 PM, Gwenhael Pasquiers < gwenhael.pasqui...@ericsson.com> wrote: > We tried to add : -yD env.java.opts="-XX:MaxPermSize=256m" > > > > But we’ve got to investigate since we have to following erro

Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Vladimir Stoyak
Got stuck a bit with CoFlatMapFunction. It seems to work fine if I place it on the DataStream before window but fails if placed after window's “apply” function.I was testing two streams, main “Features” on flatMap1 constantly ingesting data and control stream “Model” on flatMap2 changing the mod

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Stephan Ewen
Hi! Can you give us a bit more context? For example share the structure of the program (what stream get windowed and connected in what way)? I would guess that the following is the problem: When you connect one stream to another, then partition n of the first stream connects with partition n of

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Vladimir Stoyak
My model DataStream is not keyed and does not have any windows, only the main stream has windows and apply function I have two Kafka Streams, one for events and one for model DataStream model_stream = env.addSource(new  FlinkKafkaConsumer082(model_topic, new  AvroDeserializationSchema(Model.class)

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Stephan Ewen
Is the CoFlatMapFunction intended to be executed in parallel? If yes, you need some way to deterministically assign which record goes to which parallel instance. In some way the CoFlatMapFunction does a parallel (partitions) join between the model and the result of the session windows, so you need

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Vladimir Stoyak
Perfect! It does explain my problem. Thanks a lot On Tuesday, November 17, 2015 1:43 PM, Stephan Ewen wrote: Is the CoFlatMapFunction intended to be executed in parallel? If yes, you need some way to deterministically assign which record goes to which parallel instance. In some way

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Vladimir Stoyak
Not that I necessarily need that for this particular example, but is there a Global State available?  IE, how can I make a state available across all parallel instances of an operator? On Tuesday, November 17, 2015 1:49 PM, Vladimir Stoyak wrote: Perfect! It does explain my proble

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Vladimir Stoyak
I know I can use broadcast, but was wondering if there is a better way DataStream control_stream = env.addSource(new FlinkKafkaConsumer082(control_topic, new AvroDeserializationSchema(Model.class), properties)).broadcast(); On Tuesday, November 17, 2015 2:45 PM, Vladimir Stoyak wrote:

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Stephan Ewen
A global state that all can access read-only is doable via broadcast(). A global state that is available to all for read and update is currently not available. Consistent operations on that would be quite costly, require some form of distributed communication/consensus. Instead, I would encourage

Re: Issue with sharing state in CoFlatMapFunction

2015-11-17 Thread Anwar Rizal
Broadcast is what we do for the same type of your initial problem indeed. In another thread, Stephan mentioned a possibility of using OperatorState in ConnectedStream. I think this approach using OperatorState does the business as well. In my understanding, the approach using broadcast will requi

Re: Crash in a simple "mapper style" streaming app likely due to a memory leak ?

2015-11-17 Thread Stephan Ewen
Hi Arnaud! Java direct-memory is tricky to debug. You can turn on the memory logging or check the TaskManager tab in teh web dashboard - both report on direct memory consumption. One thing you can look for is forgetting to close streams. That means the streams consume native resources until the J

Re: Join Stream with big ref table

2015-11-17 Thread Stephan Ewen
I think this pattern may be common, so some tools that share such a table across multiple tasks may make sense. Would be nice to add a handler that you give an "initializer" which reads the data and build the shared lookup map. The first to acquire the handler actually initializes the data set (re

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-17 Thread Konstantin Knauf
Hi Aljoscha, Are you sure? I am running the job from my IDE at the moment. If I set StreamExecutionEnvironment.setParallelism(1); I works with the old TimestampExtractor (returning Long.MIN_VALUE from getCurrentWatermark() and emitting a watermark at every record) If I set StreamExecutionEnvi

Re: Flink on EC"

2015-11-17 Thread Robert Metzger
Hi Thomas, I'm sorry that nobody responded anymore. As you you've probably noticed, there is a lot of traffic on the mailing lists and sometimes stuff gets lost. Were you able to get the S3 file system running with Flink? If not, lets try to figure out why it is not picking up the config correctly

Re: Session Based Windows

2015-11-17 Thread Konstantin Knauf
Hi Aljoscha, sorry to bother you again (this time with this old thread), just a short question about the caveat you mention in your answer. You wrote that events of different sessions can not intermingled. Isn't the idea of the keyBy expression below exactly not to have intermingled sessions by fi