Hi
When 'RollingSink' try to initialize state, it would first check current file
system supported truncate method. If file system not supported, it would use
another work-around solution, which means you should not meet the problem.
Otherwise 'RollingSink' thought and found the reflection metho
Hi,
I would not use a window for that.
Implementing the logic with a ProcessFunction seems more straight-forward.
The function simply collects all events between 00:00 and 01:00 in a
ListState and emits them when the time passes 01:00.
All other records are simply forwarded.
Best, Fabian
Am Fr.,
Hi Chirag
I think the doc is outdated, the comments in CheckpointFuncion.java on master
now[1] is `get the state data structure for the per-partition state`
[1]
https://github.com/apache/flink/blob/00fe8a01192a523544d3868360a924863a69d8f8/flink-streaming-java/src/main/java/org/apache/flink/stre
Hi, simpleusr
Maybe custom trigger[1] can be helpful.
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/windows.html#triggers
Best, Congxian
On Feb 15, 2019, 13:15 +0800, simpleusr , wrote:
> Hi,
>
> My ultimate requirement is to stop processing of certain eve
Hi,
I was going through the Javadoc for CheckpointedFunction.java, it says that:
* // get the state data structure for the per-key state
* countPerKey = context.getKeyedStateStore().getReducingState(
* new ReducingStateDescriptor<>("perKeyCount", new
AddFunction<>()
Hi,
My ultimate requirement is to stop processing of certain events between
00:00:00 and 01:00:00 for each day (Time is in HH:mm:SS format).
I am flink newbie and I thought only option to delay elements is to collect
them in a window between 00:00:00 and 01:00:00 for each day.
TumblingEventTime
Hi Stephan,
Thanks for the clarification! You are right, we have never initiated a
discussion about supporting OVER Window on DataStream, we can discuss it in
a separate thread. I agree with you add the item after move the discussion
forward.
+1 for putting the roadmap on the website.
+1 for peri
Thanks Stephan for this proposal and I totally agree with it.
It is very necessary to summarize the overall features/directions the community
is going or planning to go. Although I almost checked the mailing list
everyday, it still seems difficult to trace everything. In addtion I think this
w
Hey Jayant. Getting the same using gradle. my metrics reporter and my
application both using the flink-metrics-dropwizard dependency for reporting
Meters. how should i be solving it?
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Thank you Gordon and Ken.
My Flink job is now running well with 1.7.2 RC1, with failed ES request
retried successfully.
One more question I have on this is how to limit the number of retries for
different types of errors with ES bulk request. Is there any guideline on
that?
My temporary solution
Hi all,
I'm very excited to announce that the community is planning the next meetup
in Bay Area on March 25, 2019. The event is just announced on Meetup.com
[1].
To make the event successful, your participation and help will be needed.
Currently, we are looking for an organization that can host t
Hi Ajay,
Yes, Andrey is right. I was actually missing the first basic but important
point: If your process function is stuck, it will immediately block that
thread.
>From your description, what it sounds like is that not all the messages you
consume from kafka actually triggers the processing logi
And yes cannot work with RollingFleSink for hadoop 2.6 release of 1.7.1
b'coz of this.
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop
are only supported for HDFS and for Hadoop version 2.7 or newer
at
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.(Hadoo
The job uses a RolllingFileSink to push data to hdfs. Run an HA standalone
cluster on k8s,
* get the job running
* kill the pod.
The k8s deployment relaunches the pod but fails with
java.io.IOException: Missing data in tmp file:
hdfs://nn-crunchy:8020/tmp/kafka-to-hdfs/ls_kraken_events/dt=2019-0
Thanks Gary. Understood the behavior.
I am leaning towards running 7 TM on each machine(8 core), I have 4 nodes, that
will end up 28 taskmanagers and 1 job manager. I was wondering if this can
bring additional burden on jobmanager? Is it recommended?
Thanks,
Jins George
On 2/14/19 8:49 AM, Ga
Hi Averell,
The TM containers fetch the Flink binaries and config files form HDFS (or
another DFS if configured) [1]. I think you should be able to change the log
level by patching the logback configuration in HDFS, and kill all Flink
containers on all hosts. If you are running an HA setup, your c
Hi Jins George,
This has been asked before [1]. The bottom line is that you currently cannot
pre-allocate TMs and distribute your tasks evenly. You might be able to
achieve a better distribution across hosts by configuring fewer slots in
your
TMs.
Best,
Gary
[1]
http://apache-flink-user-mailing-
Hi Stephan,
Thanks for the clarification, yes I think these issues has already been
discussed in previous mailing list threads [1,2,3].
I also agree that updating the "official" roadmap every release is a very
good idea to avoid frequent update.
One question I might've been a bit confusion is: ar
My team and I are keen to help out with testing and review as soon as there is
a pill request.
-H
> On Feb 11, 2019, at 00:26, Till Rohrmann wrote:
>
> Hi Heath,
>
> I just learned that people from Alibaba already made some good progress with
> FLINK-9953. I'm currently talking to them in or
Hi,
We have a Flink job were we are trying to window join two datastreams
originating from two different Kafka topics, where one topic contains a lot
more data per time instance than the other one.
We use event time processing, and this all works fine when running our pipeline
live, i.e. data i
Thank you Rong and Andrey. The blog and your explanation was very useful.
In my use case, source stream (kafka based) contains messages that capture some
“work” that needs to be done for a tenant. It’s a multi-tenant source stream.
I need to queue up (and execute) this work per tenant in the or
The related job manager log is
https://gist.github.com/Ethanlm/86a10e786ad9025ddaa27c113c536da8
> On Feb 14, 2019, at 9:40 AM, Ethan Li wrote:
>
> Hello,
>
> I have a standalone flink-1.4.2 cluster with one JobManager, one TaskManager,
> and zookeeper. I first started JM and TM and waited fo
Hello,
I have a standalone flink-1.4.2 cluster with one JobManager, one TaskManager,
and zookeeper. I first started JM and TM and waited for them to be stable.
Then I restarted JM. It’s when the TM got confused.
TM got notified that Leader node has changed and it tried to register to the
new
Hi Gordon,
This class is used for states, in/out parameters and as key. As you wrote,
there is no problem with usage in states - I just specify TypeInformation in
descriptor. With return value of process function, I tried
.process(new MyProcessFunction())
.returns(MyTypeInformation), it works, bu
Awesome, thanks! Will open a new thread. But yes the inprogress file was
helpful.
On Thu, Feb 14, 2019, 7:50 AM Kostas Kloudas Hi Vishal,
>
> For the StreamingFileSink vs Rolling/BucketingSink:
> - you can use the StreamingFileSink instead of the Rolling/BucketingSink.
> You can see the Streami
Hi Vishal,
For the StreamingFileSink vs Rolling/BucketingSink:
- you can use the StreamingFileSink instead of the Rolling/BucketingSink.
You can see the StreamingFileSink as an evolution of the previous two.
In the StreamingFileSink the files in Pending state are not renamed, but
they keep their
Hi Juho,
* does the output of the streaming job contain any data, which is not
>> contained in the batch
>
>
> No.
>
> * do you know if all lost records are contained in the last savepoint you
>> took before the window fired? This would mean that no records are lost
>> after the last restore.
>
>
Hi,
I'm using a simply streaming app with processing time and without states.
The app read from kafka, transform the data and write the data to the
storage (redis).
But I see an interesting behavior, a few dates are getting through very
slowly.
Do you have any idea why this could be?
Best,
Marke
Thanks a lot :D
From: Konstantin Knauf
Date: Thursday, 14 February 2019 at 5:38 PM
To: Harshith Kumar Bolar , user
Subject: [External] Re: Re: How to clear state immediately after a keyed window
is processed?
Yes, for processing-time windows the clean up time is exactly the end time of
the wi
Yes, for processing-time windows the clean up time is exactly the end time
of the window, because by definition there is no late data and state does
not need to be kept around.
On Thu, Feb 14, 2019 at 1:03 PM Kumar Bolar, Harshith
wrote:
> Thanks Konstanin,
>
>
>
> But I’m using processing time,
Thanks Konstantin!
I'll try to see if I can prepare code & conf to be shared as fully as
possible.
In the meantime:
* does the output of the streaming job contain any data, which is not
> contained in the batch
No.
* do you know if all lost records are contained in the last savepoint you
> to
Hi Harshith,
when you use Flink's Windowing API, the state of an event time window is
cleared once the watermark passes the end time of the window (that's when
the window fires) + the allowed lateness. So, as long as you don't
configure additional allowed lateness (default=0), Flink will already
b
Thanks Fabian,
more questions
1. I had on k8s standlone job
env.getCheckpointConfig().setFailOnCheckpointingErrors(true)// the default.
The job failed on chkpoint and I would have imagined that under HA the job
would restore from the last checkpoint but it did not ( The UI showed the
job had res
Hi all,
My application uses a keyed window that is keyed by a function of timestamp.
This means once that particular window has been fired and processed, there is
no use in keeping that key active because there is no way that particular key
will appear again. Because this use case involves cont
I think the website is better as well.
I agree with Fabian that the wiki is not so visible, and visibility is the
main motivation.
This type of roadmap overview would not be updated by everyone - letting
committers update the roadmap means the listed threads are actually
happening at the moment.
No effort in this direction, then?
I had a try using SQL on Table API but I fear that the generated plan is
not the optimal one..I'm looking for an efficient way to implement
describe() method on a table or dataset/datasource
On Fri, Feb 8, 2019 at 10:35 AM Flavio Pompermaier
wrote:
> Hi to all,
Hi,
I like the idea of putting the roadmap on the website because it is much
more visible (and IMO more credible, obligatory) there.
However, I share the concerns about frequent updates.
It think it would be great to update the "official" roadmap on the website
once per release (-bugfix releases)
Hi Ajay,
Technically, it will immediately block the thread of MyKeyedProcessFunction
subtask scheduled to some slot and basically block processing of the key
range assigned to this subtask.
Practically, I agree with Rong's answer. Depending on the topology of your
inputStream, it can eventually bl
Hi Stephan,
Thanks for this proposal. It is a good idea to track the roadmap. One
suggestion is that it might be better to put it into wiki page first.
Because it is easier to update the roadmap on wiki compared to on flink web
site. And I guess we may need to update the roadmap very often at the
Thanks Jincheng and Rong Rong!
I am not deciding a roadmap and making a call on what features should be
developed or not. I was only collecting broader issues that are already
happening or have an active FLIP/design discussion plus committer support.
Do we have that for the suggested issues as we
Hi Juho,
you are right the problem has actually been narrowed down quite a bit over
time. Nevertheless, sharing the code (incl. flink-conf.yaml) might be a
good idea. Maybe something strikes the eye, that we have not thought about
so far. If you don't feel comfortable sharing the code on the ML, f
Hi Chirag,
Broadcast state is checkpointed, hence the savepoint would contain it.
Best,
Konstantin
On Wed, Feb 13, 2019 at 4:04 PM Chirag Dewan
wrote:
> Hi Konstantin,
>
> For the second solution, would savepoint persist the Broadcast state in
> State backend? Because I am aware that Broadcas
42 matches
Mail list logo