Hi,
How can I tell a "FlinkKafkaConsumer" that I want to read from a topic from the
beginning?
Thanks,
Will
Hi,
yes, it will be in 1.0. I’m working on adding it to master right now.
Cheers,
Aljoscha
> On 17 Nov 2015, at 02:46, Welly Tambunan wrote:
>
> Hi Stephan,
>
> So that will be in Flink 1.0 right ?
>
> Cheers
>
> On Mon, Nov 16, 2015 at 9:06 PM, Stephan Ewen wrote:
> Hi Anwar!
>
> 0.10.0 w
Hi,
yes, unfortunately, there is a bug in the timestamp extraction operator that
sets the “last-seen watermark” to Long.MIN_VALUE even though it should not when
calling getCurrentWatermark().
I’m opening an Issue and adding a fix to the latest master and the branch for
the 0.10.x bugfix release
Hi Will,
In Kafka's consumer configuration [1] there is a configuration parameter
called "auto.offset.reset". Setting it to "smallest" will tell the consumer
to start reading a topic from the smallest available offset.
You can pass the configuration using the properties of the Kafka consumer.
[
Hi,
actually, the bug is more subtle. Normally, it is not a problem that the
TimestampExtractor sometimes emits a watermark that is lower than the one
before. (This is the result of the bug with Long.MIN_VALUE I mentioned before).
The stream operators wait for watermarks from all upstream operat
We tried to add : -yD env.java.opts="-XX:MaxPermSize=256m"
But we’ve got to investigate since we have to following error :
Improperly specified VM option 'MaxPermSize'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Either is the option
You can also put the configuration option into the flink-conf.yaml file.
On Tue, Nov 17, 2015 at 12:01 PM, Gwenhael Pasquiers <
gwenhael.pasqui...@ericsson.com> wrote:
> We tried to add : -yD env.java.opts="-XX:MaxPermSize=256m"
>
>
>
> But we’ve got to investigate since we have to following erro
Got stuck a bit with CoFlatMapFunction. It seems to work fine if I place it on
the DataStream before window but fails if placed after window's “apply”
function.I was testing two streams, main “Features” on flatMap1 constantly
ingesting data and control stream “Model” on flatMap2 changing the mod
Hi!
Can you give us a bit more context? For example share the structure of the
program (what stream get windowed and connected in what way)?
I would guess that the following is the problem:
When you connect one stream to another, then partition n of the first
stream connects with partition n of
My model DataStream is not keyed and does not have any windows, only the main
stream has windows and apply function
I have two Kafka Streams, one for events and one for model
DataStream model_stream = env.addSource(new
FlinkKafkaConsumer082(model_topic, new
AvroDeserializationSchema(Model.class)
Is the CoFlatMapFunction intended to be executed in parallel?
If yes, you need some way to deterministically assign which record goes to
which parallel instance. In some way the CoFlatMapFunction does a parallel
(partitions) join between the model and the result of the session windows,
so you need
Perfect! It does explain my problem.
Thanks a lot
On Tuesday, November 17, 2015 1:43 PM, Stephan Ewen
wrote:
Is the CoFlatMapFunction intended to be executed in parallel?
If yes, you need some way to deterministically assign which record goes to
which parallel instance. In some way
Not that I necessarily need that for this particular example, but is there a
Global State available?
IE, how can I make a state available across all parallel instances of an
operator?
On Tuesday, November 17, 2015 1:49 PM, Vladimir Stoyak
wrote:
Perfect! It does explain my proble
I know I can use broadcast, but was wondering if there is a better way
DataStream control_stream = env.addSource(new
FlinkKafkaConsumer082(control_topic, new
AvroDeserializationSchema(Model.class), properties)).broadcast();
On Tuesday, November 17, 2015 2:45 PM, Vladimir Stoyak
wrote:
A global state that all can access read-only is doable via broadcast().
A global state that is available to all for read and update is currently
not available. Consistent operations on that would be quite costly, require
some form of distributed communication/consensus.
Instead, I would encourage
Broadcast is what we do for the same type of your initial problem indeed.
In another thread, Stephan mentioned a possibility of using OperatorState
in ConnectedStream. I think this approach using OperatorState does the
business as well.
In my understanding, the approach using broadcast will requi
Hi Arnaud!
Java direct-memory is tricky to debug. You can turn on the memory logging
or check the TaskManager tab in teh web dashboard - both report on direct
memory consumption.
One thing you can look for is forgetting to close streams. That means the
streams consume native resources until the J
I think this pattern may be common, so some tools that share such a table
across multiple tasks may make sense.
Would be nice to add a handler that you give an "initializer" which reads
the data and build the shared lookup map. The first to acquire the handler
actually initializes the data set (re
Hi Aljoscha,
Are you sure? I am running the job from my IDE at the moment.
If I set
StreamExecutionEnvironment.setParallelism(1);
I works with the old TimestampExtractor (returning Long.MIN_VALUE from
getCurrentWatermark() and emitting a watermark at every record)
If I set
StreamExecutionEnvi
Hi Thomas,
I'm sorry that nobody responded anymore. As you you've probably noticed,
there is a lot of traffic on the mailing lists and sometimes stuff gets
lost.
Were you able to get the S3 file system running with Flink?
If not, lets try to figure out why it is not picking up the config
correctly
Hi Aljoscha,
sorry to bother you again (this time with this old thread), just a short
question about the caveat you mention in your answer. You wrote that
events of different sessions can not intermingled. Isn't the idea of the
keyBy expression below exactly not to have intermingled sessions by
fi
21 matches
Mail list logo