The Apache Flink community is very happy to announce the release of Apache
Flink Stateful Functions 3.2.0.
Stateful Functions is an API that simplifies building distributed stateful
applications.
It's based on functions with persistent state that can interact dynamically
with strong consistency gu
This is really great news. Thanks a lot for all the work Dong, Yun, Zhipeng
and others!
Cheers,
Till
On Fri, Jan 7, 2022 at 2:36 PM David Morávek wrote:
> Great job! <3 Thanks Dong and Yun for managing the release and big thanks
> to everyone who has contributed!
>
> Best,
> D.
>
> On Fri, Jan
+1 for dropping the MapR FS.
Cheers,
Till
On Wed, Jan 5, 2022 at 10:11 AM Martijn Visser
wrote:
> Hi everyone,
>
> Thanks for your input. I've checked the MapR implementation and it has no
> annotation at all. Given the circumstances that we thought that MapR was
> already dropped, I would prop
I haven't seen any changes or requests to/for Gelly in ages. Hence, I would
assume that it is not really used and can be removed.
+1 for dropping Gelly.
Cheers,
Till
On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser wrote:
> Hi everyone,
>
> Flink is bundled with Gelly, a Graph API library [1]. Th
Thanks a lot for being our release manager and swiftly addressing the log4j
CVE Igal!
Cheers,
Till
On Wed, Dec 22, 2021 at 5:41 PM Igal Shilman wrote:
> The Apache Flink community is very happy to announce the release of Apache
> Flink Stateful Functions (StateFun) 3.1.1.
>
> This is a bugfix r
If there are no users strongly objecting to dropping Hadoop support for <
2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong
said.
Cheers,
Till
On Wed, Dec 22, 2021 at 10:33 AM David Morávek wrote:
> Agreed, if we drop the CI for lower versions, there is actually no point
Thanks a lot for driving these releases Chesnay! This is super helpful for
the community.
For the synchronization problem, I guess we just have to wait a bit longer.
Cheers,
Till
On Fri, Dec 17, 2021 at 7:39 AM Leonard Xu wrote:
> I guess this is related to publishers everywhere are updating t
As part of this FLIP, does it make sense to also extend the documentation
for the sort shuffle [1] to include a tuning guide? I am thinking of a more
in depth description of what things you might observe and how to influence
them with the configuration options.
[1]
https://nightlies.apache.org/fli
Thanks for starting this discussion Yingjie,
How will our tests be affected by these changes? Will Flink require more
resources and, thus, will it risk destabilizing our testing infrastructure?
I would propose to create a FLIP for these changes since you propose to
change the default behaviour. I
Great news, Yingjie. Thanks a lot for sharing this information with the
community and kudos to all the contributors of the external shuffle service
:-)
Cheers,
Till
On Tue, Nov 30, 2021 at 2:32 PM Yingjie Cao wrote:
> Hi dev & users,
>
> We are happy to announce the open source of remote shuffl
x27;m not sure about ZK compatibility, but we'd also upgrade Curator to
> 5.x, which doesn't support ookeeperK 3.4 anymore.
>
> On 25/11/2021 21:56, Till Rohrmann wrote:
> > Should we ask on the user mailing list whether anybody is still using
> > ZooKeeper 3.4 and
a lot working fine now. Also you also explain how to pass
> parameters to the job. In the session cluster I am passing arguments using
> api.
>
>
>
> Here how can I pass the arguments to the job?
>
>
>
>
>
> Regards,
>
> Ravi Sankar Reddy.
>
>
>
&g
am.readOrdinaryObject(ObjectInputStream.java:2186)
> ~[?:1.8.0_312]
>
> at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
> ~[?:1.8.0_312]
>
> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
> ~[?:1.8.0_312
Hi Ravi,
The reactive mode shouldn't do things differently compared to a normal
application cluster deployment. Maybe you can show us exactly how you
submit a job, the contents of the bundled jar, how you build the fat jar
and the logs of the failed Flink run.
Moving this discussion to the user M
We are getting the attached exception when running the
> application written in Flink version 1.13.2. AKS is 1.21. Please let us
> know if you need additional information.
>
>
>
> Regards,
>
> Sundar.
>
> *From:* Till Rohrmann
> *Sent:* Monday, 1 Nov
Hi Deepa,
I cannot see the attached image. Maybe you can post the description of the
problem.
I am also moving this discussion to the user ML.
Cheers,
Till
On Mon, Nov 1, 2021 at 9:05 AM Sekaran, Sreedeepa <
sreedeepa.seka...@westpac.com.au> wrote:
> Hi Team,
>
>
>
> We are working with Flink
Thanks Chesnay and Martijn for managing this release and to everyone who
contributed to it.
Cheers,
Till
On Fri, Oct 22, 2021 at 11:04 AM Yangze Guo wrote:
> Thank Chesnay, Martijn, and everyone involved!
>
> Best,
> Yangze Guo
>
> On Fri, Oct 22, 2021 at 4:25 PM Yun Tang wrote:
> >
> > Thanks
Thanks for the hint with the managed search engines Matthias. I think this
is quite helpful.
Cheers,
Till
On Wed, Sep 15, 2021 at 4:27 PM Matthias Pohl
wrote:
> Thanks Leonard for the announcement. I guess that is helpful.
>
> @Robert is there any way we can change the default setting to someth
Forwarding the discussion back to the user mailing list.
On Thu, Sep 2, 2021 at 12:25 PM Till Rohrmann wrote:
> The stack trace looks ok. This happens whenever the leader loses
> leadership and this can have different reasons. What's more interesting is
> what happens before
Hi Xiangyu,
Can you provide us with more information about your job, which state
backend you are using and how you've configured the checkpointing? Can you
also provide some information about the problematic checkpoints (e.g.
alignment time, async/sync duration) that you find on the checkpoint
det
Hi Xiangyu,
Do you have the logs of the problematic test run available? Ideally, we can
enable the DEBUG log level to get some more information. I think this
information would be needed to figure out the problem.
Cheers,
Till
On Thu, Sep 2, 2021 at 11:47 AM Xiangyu Su wrote:
> Hello Everyone,
Great news! Thanks a lot for all your work on the new release :-)
Cheers,
Till
On Wed, Sep 1, 2021 at 9:07 AM Johannes Moser wrote:
> Congratulations, great job. 🎉
>
> On 31.08.2021, at 17:09, Igal Shilman wrote:
>
> The Apache Flink community is very happy to announce the release of Apache
>
Cool, thanks for letting us know Jeff. Hopefully, many users use Zeppelin
together with Flink.
Cheers,
Till
On Thu, Aug 26, 2021 at 4:47 AM Leonard Xu wrote:
> Thanks Jeff for the great work !
>
> Best,
> Leonard
>
> 在 2021年8月25日,22:48,Jeff Zhang 写道:
>
> Hi Flink users,
>
> We (Zeppelin commun
Hi Yanjie,
The observed exception in the logs is just a side effect of the shut down
procedure. It is a bug that shutting down the Dispatcher will result in a
fatal exception coming from the ApplicationDispatcherBootstrap. I've
created a ticket in order to fix it [1].
The true reason for stopping
This is great news. Thanks a lot for being our release manager Godfrey and
also to everyone who made this release possible.
Cheers,
Till
On Tue, Aug 10, 2021 at 11:09 AM godfrey he wrote:
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.11.4, which is the f
This is great news. Thanks a lot for being our release manager Jingsong and
also to everyone who made this release possible.
Cheers,
Till
On Tue, Aug 10, 2021 at 10:57 AM Jingsong Lee
wrote:
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.12.5, which is th
Thanks Yun Tang for being our release manager and the great work! Also
thanks a lot to everyone who contributed to this release.
Cheers,
Till
On Mon, Aug 9, 2021 at 9:48 AM Yu Li wrote:
> Thanks Yun Tang for being our release manager and everyone else who made
> the release possible!
>
> Best R
Hi Rainie,
It looks to me as if Yarn is causing this problem. Which Yarn node are you
terminating? Have you configured your Yarn cluster to be highly available
in case you are terminating the ResourceManager?
Flink should retry the operation of starting a new container in case it
fails. If this i
(I
> understand that normally, as user code is not a JVM-blocking activity such
> as a GC, it should have no impact on heartbeats, but from experience, it
> really does)
>
>
>
> Cheers,
>
> Arnaud
>
>
>
>
>
> *De :* Gen Luo
> *Envoyé :* jeudi 22 juillet 20
about long pauses of unresponsiveness of Flink.
[1] https://issues.apache.org/jira/browse/FLINK-23216
[2] https://issues.apache.org/jira/browse/FLINK-22483
Cheers,
Till
On Wed, Jul 21, 2021 at 4:58 AM Yang Wang wrote:
> Thanks @Till Rohrmann for starting this discussion
>
> Firstly,
Hi everyone,
Since Flink 1.5 we have the same heartbeat timeout and interval default
values that are defined as heartbeat.timeout: 50s and heartbeat.interval:
10s. These values were mainly chosen to compensate for lengthy GC pauses
and blocking operations that were executed in the main threads of
lism settings have not changed.
>>
>>
>>
>> On Tue, Jul 13, 2021 at 12:11 PM Dan Hill wrote:
>>
>>> Hey. I just hit a similar error in production when trying to
>>> savepoint. We also use protobufs.
>>>
>>> Has anyone found a b
y helpful to find the lost container quickly. In our inner
>>> flink version, we optimize it by task's report and jobmaster's probe. When
>>> a task fails because of the connection, it reports to the jobmaster. The
>>> jobmaster will try to confirm the liveness
double check should be good
> enough. But since I'm not experienced with an unstable environment, I can't
> tell whether that's also enough for it.
>
> On Mon, Jul 5, 2021 at 5:59 PM Till Rohrmann wrote:
>
>> I think for RPC communication there are retry strategie
ations for
> fault tolerance of heartbeat, unless we also introduce some retry
> strategies for netty connections.
>
>
> On Fri, Jul 2, 2021 at 9:34 PM Till Rohrmann wrote:
>
>> Could you share the full logs with us for the second experiment, Lu? I
>> cannot tell
ote:
>>>
>>>> Thanks TIll and Yang for help! Also Thanks Till for a quick fix!
>>>>
>>>> I did another test yesterday. In this test, I intentionally throw
>>>> exception from the source operator:
>>>> ```
>>>> if (ru
nt.
> As a result, Flink will not deploy tasks onto the dead TaskManagers.
>
>
> I think most of the time cost in Phase 1 might be cancelling the tasks on
> the dead TaskManagers.
>
>
> Best,
> Yang
>
>
> Till Rohrmann 于2021年7月1日周四 下午4:49写道:
>
>> The
The analysis of Gen is correct. Flink currently uses its heartbeat as the
primary means to detect dead TaskManagers. This means that Flink will take
at least `heartbeat.timeout` time before the system recovers. Even if the
cancellation happens fast (e.g. by having configured a low
akka.ask.timeout)
Thanks for starting this discussion. I do see the benefit of dynamically
configuring your Flink job and the cluster running it. Some of the use
cases which were mentioned here are already possible. E.g. adjusting the
log level dynamically can be done by configuring an appropriate logging
backend an
Hi Li,
Roman is right about Flink's behavior and what you can do about it. The
idea behind its current behavior is the following: If Flink cannot recover
a job, it is very hard for it to tell whether it is due to an intermittent
problem or a permanent one. No matter how often you retry, you can al
Great :-)
On Tue, Jun 8, 2021 at 1:11 PM Yingjie Cao wrote:
> Hi Till,
>
> Thanks for the suggestion. The blog post is already on the way.
>
> Best,
> Yingjie
>
> Till Rohrmann 于2021年6月8日周二 下午5:30写道:
>
>> Thanks for the update Yingjie. Would it make sense to
Thanks for the update Yingjie. Would it make sense to write a short blog
post about this feature including some performance improvement numbers? I
think this could be interesting to our users.
Cheers,
Till
On Mon, Jun 7, 2021 at 4:49 AM Jingsong Li wrote:
> Thanks Yingjie for the great effort!
tead of
>
>> this.producerIdAndEpoch = ProducerIdAndEpoch.NONE;
>
>
> On Thu, Jun 3, 2021 at 12:11 AM Till Rohrmann
> wrote:
>
>> Thanks for the update. Skimming over the code it looks indeed that we are
>> overwriting the values of the static value ProducerIdAndEpoch.NONE.
more familiar could double check this part of the code.
Concerning the required changing of the UID of an operator Piotr, is this a
known issue and documented somewhere? I find this rather surprising from a
user's point of view.
Cheers,
Till
On Thu, Jun 3, 2021 at 8:53 AM Till Rohrmann
enqueueRequest(handler);
return handler.result;
}, State.INITIALIZING);
}
On Wed, Jun 2, 2021 at 3:55 PM Piotr Nowojski wrote:
> Hi,
>
> I think there is no generic way. If this error has happened indeed after
> starting a second job from the same savepoint, or som
ssueKey FLINK-18450 Preview comment
> issues.apache.org
>
> Thanks,
> Alexey
> --
> *From:* Till Rohrmann
> *Sent:* Tuesday, June 1, 2021 6:24 AM
> *To:* Alexey Trenikhun
> *Cc:* Flink User Mail List
> *Subject:* Re: How
Hi Angelo,
what Svend has written is very good advice. Additionally, you could give us
a bit more context by posting the exact stack trace and the exact
configuration you use to deploy the Flink cluster. To me this looks like a
configuration/setup problem in combination with AWS.
Cheers,
Till
On
Hi Min,
Usually, you should be able to provide type information and thereby a
serializer via the StateDescriptors which you create to access the state.
If this is not working, then you need to give us a bit more context to
understand what's not working.
I am also pulling in Seth who is the origin
Responded as part of the following discussion
https://lists.apache.org/x/thread.html/re85a1fb3f17b5ef1af2844fc1a07b6d2f5e6237bf4c33059e13890ee@%3Cuser.flink.apache.org%3E.
Let's continue the discussion there.
Cheers,
Till
On Mon, May 31, 2021 at 11:02 AM 周瑞 wrote:
> HI:
> When "sink.seman
Hi Lingfeng,
Youngwoo is right. Flink currently officially supports Java 8 and Java 11.
Cheers,
Till
On Mon, May 31, 2021 at 9:00 AM Youngwoo Kim (김영우) wrote:
> Hi Lingfeng,
>
> I believe Java 8 or 11 is appropriate for the Flink cluster at this point.
> I'm not sure that Flink 1.13 supports J
Responded as part of the following discussion
https://lists.apache.org/x/thread.html/re85a1fb3f17b5ef1af2844fc1a07b6d2f5e6237bf4c33059e13890ee@%3Cuser.flink.apache.org%3E.
Let's continue the discussion there.
Cheers,
Till
On Sun, May 30, 2021 at 2:32 PM 周瑞 wrote:
>
> 程序用于测试 flink kafka exactly
Hi Tao,
I think this looks like a bug to me. Could it be that this problem is
covered by [1, 2]? Maybe you can review this PR and check whether it solves
the problem. If yes, then let's quickly get it in.
[1] https://issues.apache.org/jira/browse/FLINK-21445
[2] https://github.com/apache/flink/pu
Hi Kai,
The rejection you are seeing should not be serious. The way this can happen
is the following: If Yarn restarts the application master, Flink will try
to recover previously started containers. If this is not possible or Yarn
only tells about a subset of the previously allocated containers,
Hi Mason,
The idea is that a metric is not uniquely identified by its name alone but
instead by its path. The groups in which it is defined specify this path
(similar to directories). That's why it is valid to specify two metrics
with the same name if they reside in different groups.
I think Prom
Hi Alexey,
looking at KafkaTopicPartitionStatus, it looks that it does not contain
this information. In a nutshell, what you probably have to do is to
aggregate the watermarks across all partitions and then pause the
consumption of a partition if its watermark advances too much wrt to the
minimum
The error message says that we are trying to reuse a transaction id that is
currently being used or has expired.
I am not 100% sure how this can happen. My suspicion is that you have
resumed a job multiple times from the same savepoint. Have you checked that
there is no other job which has been re
Hi Dipanjan,
this type of question is best sent to Flink's user mailing list because
there are a lot more people using Flink who could help you. The dev mailing
list is intended to be used for development discussions.
Cheers,
Till
On Tue, Jun 1, 2021 at 1:31 PM Dipanjan Mazumder wrote:
> Hi ,
Hi Dipanjan,
I am assuming that you are using the flink-siddhi library [1]. I am not an
expert but it looks as if the AbstractSiddhiOperator overrides the
snapshotState [2] method to store the Siddhi state in Flink.
[1] https://github.com/haoch/flink-siddhi
[2]
https://github.com/haoch/flink-sidd
Thanks for the great work Dawid and to everyone who has contributed to this
release.
Cheers,
Till
On Mon, May 31, 2021 at 10:25 AM Yangze Guo wrote:
> Thanks, Dawid for the great work, thanks to everyone involved.
>
> Best,
> Yangze Guo
>
> On Mon, May 31, 2021 at 4:14 PM Youngwoo Kim (김영우)
>
Hi Dipanjan,
Please double check whether the libraries are really contained in the job
jar you are submitting because if the library is contained in this jar,
then it should be on the classpath and you should be able to load it.
Cheers,
Till
On Thu, May 20, 2021 at 3:43 PM Dipanjan Mazumder wro
;bucket-states",new
> JavaSerializer());
>
> Hope this is helpful.
>
> Yours sincerely
> Josh
>
>
>
> Till Rohrmann 于2021年5月18日周二 下午2:54写道:
>
>> Hi Joshua,
>>
>> could you try whether the job also fails when not using the gzip format?
>>
gt;>> be triggered now by newer versions of FLink.
>>>
>>> I found this on StackOverflow, which looks like it could be related:
>>> https://stackoverflow.com/questions/38326183/jvm-crashed-in-java-util-zip-zipfile-getentry
>>> Can you try the suggested option
Hi Adi,
To me, this looks like a version conflict of some kind. Maybe you use
different Avro versions for your user program and on your Flink cluster.
Could you check that you don't have conflicting versions on your classpath?
It would also be helpful to have a minimal example that allows
reproduc
One small addition: The old mapping looks to use the
SubtaskStateMapper.RANGE whereas the new mapping looks to use the
SubtaskStateMapper.ROUND_ROBIN.
On Mon, May 17, 2021 at 11:56 AM Till Rohrmann wrote:
> Hi ChangZhuo Chen,
>
> This looks like a bug in Flink. Could you provide us
Hi ChangZhuo Chen,
This looks like a bug in Flink. Could you provide us with the logs of the
run and more information about your job? In particular, how does your
topology look like?
My suspicion is the following: You have an operator with two inputs. One
input is keyed whereas the other input is
t; ticket do?
>
> Thanks,
> Sonam
> --
> *From:* Till Rohrmann
> *Sent:* Monday, April 26, 2021 10:24 AM
> *To:* dev
> *Cc:* user@flink.apache.org ; Sonam Mandal <
> soman...@linkedin.com>
> *Subject:* Re: Task Local Recovery with mou
Yes, exposing an API to adjust the parallelism of individual operators is
definitely a good step towards the auto-scaling feature which we will
consider. The missing piece is persisting this information so that in case
of recovery you don't recover with a completely different parallelism.
I also a
Hi Vishal,
thanks a lot for all your feedback on the new reactive mode. I'll try to
answer your questions.
0. In order to avoid confusion let me quickly explain a bit of terminology:
The reactive mode is the new feature that allows Flink to react to newly
available resources and to make use of th
Hi Prasanna,
in the latest Flink version (1.13.0) I couldn't find these dependencies.
Which version of Flink are you looking at? What you could check is whether
one of these dependencies is contained in one of Flink's shaded
dependencies [1].
[1] https://github.com/apache/flink-shaded
Cheers,
Ti
Somewhere the system retrieves the address x.x.x.x:43092 which cannot be
connected to. Can you check that this points towards a valid Flink process?
Maybe it is some leftover information in the ZooKeeper from a previous run?
Maybe you can check what's written in the Znodes for
/leader/resource_mana
This is great news. Thanks a lot for being our release managers Dawid and
Guowei! And also thanks to everyone who has made this release possible :-)
Cheers,
Till
On Mon, May 3, 2021 at 5:46 PM vishalovercome wrote:
> This is a very big release! Many thanks to the flink developers for their
> co
only
> set this once at the first event. But when testing it does work as I want
> and fires every ten seconds and the fires and purges only after no events
> have been received for 2 minutes (as specified in the SessionWindow). Is
> the processingTimeTimer being updated every
apache.flink.streaming.api.operators.SourceOperator.snapshotState(SourceOperator.java:288)
> at
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:205)
> ... 20 more
>
>
>
> --
> *From:*
Great to hear. Thanks a lot for being our release manager Arvid and to
everyone who has contributed to this release!
Cheers,
Till
On Thu, Apr 29, 2021 at 4:11 PM Arvid Heise wrote:
> Dear all,
>
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.12.3, which i
eknosrc.com
>>
>>
>>
>>
>>
>>
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
>> Virus-free.
>> www.avast.com
>> <https://www.avast.com/sig-email?utm_
ssionWindows.withGap(Time.seconds(60))).aggregate(new
> SensorEventAggregator)
>
> L
>
>
> --
> *From:* Till Rohrmann
> *Sent:* Thursday, April 29, 2021 09:16
> *To:* Lars Skjærven ; Becket Qin <
> becket@gmail.com>
> *Cc:* user@fl
ueStates in both
> cases?
>
> On Thu, 29 Apr 2021 at 10:25, Till Rohrmann wrote:
>
>> Hi Tim,
>>
>> I think you could use Flink's trigger API [1] to implement a trigger
>> which fires when it sees a certain event or after some time.
>>
>> [1]
&g
ython
> code on a Flink cluster.
>
> Thanks,
> Sumeet
>
>
>
> On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann
> wrote:
>
>> Hi Sumeet,
>>
>> Is there a problem with the documented approaches on how to submit the
>> Python program (not working) o
will update you.
>
> Regards
> Sambaran
>
> On Wed, Apr 28, 2021 at 5:33 PM Till Rohrmann
> wrote:
>
>> Hi Sambaran,
>>
>> could you also share the cause why the checkpoints could not be discarded
>> with us?
>>
>> With Flink 1.10, we introduced
Hi Lars,
I think this is a duplicate message. Let's continue the discussion on your
original message.
Cheers,
Till
On Wed, Apr 28, 2021 at 8:50 PM Lars Skjærven wrote:
> Hello,
> I ran into an issue when using the new KafkaSourceBuilder (running Flink
> 1.12.2, scala 2.12.13, on ververica plat
Hi Sandeep,
I don't fully understand the problematic scenario yet. What exactly is the
HA state maintained by Kubernetes in S3?
Queryable state works by asking for the current state of an operator. If
you use asQueryableState, then you create a reducing state which appends
all stream elements. Th
Hi Tim,
I think you could use Flink's trigger API [1] to implement a trigger which
fires when it sees a certain event or after some time.
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#triggers
Cheers,
Till
On Wed, Apr 28, 2021 at 5:25 PM Tim Josefs
Hi Zachary,
How did you configure the Kafka connector to commit the offsets
(periodically, on checkpoint)? One explanation for the graphs you showed is
that you enabled periodic committing of the offsets. If this
automatic commit happens between two checkpoints and you later fall back to
the earli
Hi Yegor,
If you want to use Flink's keyed windowing logic, then you need to insert a
keyBy/shuffle operation because Flink currently cannot simply use the
partitioning of the Kinesis shards. The reason is that Flink needs to group
the keys into the correct key groups in order to support rescaling
Hi Lars,
The KafkaSourceBuilder constructs the new KafkaSource which has not been
fully hardened in 1.12.2. In fact, it should not be documented yet. I think
you are running into an instability/bug of. The new Kafka source should be
hardened a lot more in the 1.13.0 release.
Could you tell us exa
Hi Sumeet,
Is there a problem with the documented approaches on how to submit the
Python program (not working) or are you asking in general? Given the
documentation, I would assume that you can configure the requirements.txt
via `set_python_requirements`.
I am also pulling in Dian who might be ab
Hi Jacob,
one of the contracts Flink has is that if a UDF throws an exception then
this means that it has failed and that it needs recovery. Hence, it is the
responsibility of the user to make sure that tolerable exceptions do not
bubble up. If you have dirty input data then it might make sense to
Hi Sambaran,
could you also share the cause why the checkpoints could not be discarded
with us?
With Flink 1.10, we introduced a stricter memory model for the
TaskManagers. That could be a reason why you see more TaskManagers being
killed by the underlying resource management system. You could ma
Hi Sonam,
sorry for the late reply. We were a bit caught in the midst of the feature
freeze for the next major Flink release.
In general, I think it is a very good idea to disaggregate the local state
storage to make it reusable across TaskManager failures. However, it is
also not trivial to do.
I should always be the first choice.
> Could you help me to understand why the FlinkYarnSessionCli can be
> activated?
>
>
> Best,
> Yangze Guo
>
> On Mon, Apr 26, 2021 at 4:48 PM Till Rohrmann
> wrote:
> >
> > Hi Tony,
> >
> > I think you are ri
Hi Tony,
I think you are right that Flink's cli does not behave super consistent at
the moment. Case 2. should definitely work because `-t yarn-application`
should overwrite what is defined in the Flink configuration. The problem
seems to be that we don't resolve the configuration wrt the specifie
hts on a good minimum state size we should experiment
> with to check recovery time differences between the two modes?
>
> Thanks,
> Sonam
> --
> *From:* dhanesh arole
> *Sent:* Wednesday, April 7, 2021 3:43:11 AM
> *To:* Till Rohrmann
&
Hi Dhanesh,
The way local state works in Flink currently is the following: The user
configures a `taskmanager.state.local.root-dirs` or the tmp directory is
used where Flink creates a "localState" directory. This is the base
directory for all local state. Within this directory a TaskManager create
I actually think that the logging problem is caused by Hadoop 2.7.3 which
pulls in the slf4j-log4j12-1.7.10.jar. This binding is then used but there
is no proper configuration file for log4j because Flink actually uses
log4j2.
Cheers,
Till
On Fri, Apr 9, 2021 at 12:05 PM Till Rohrmann wrote
Hi Yik San,
to me it looks as if there is a problem with the job and the deployment.
Unfortunately, the logging seems to not have worked. Could you check that
you have a valid log4j.properties file in your conf directory.
Cheers,
Till
On Wed, Apr 7, 2021 at 4:57 AM Yik San Chan
wrote:
> *The q
Hi,
What you could also do is to create several heap dumps [1] whenever you
submit a new job. This could allow us to analyze whether there is something
increasing the heap memory consumption. Additionally, you could try to
upgrade your cluster to Flink 1.12.2 since we fixed some problems Maciek
me
cluster" part -
> does it always require a cluster restart whenever the /lib directory
> changes?
>
> Thanks.
>
> Best,
> Yik San
>
> On Fri, Apr 9, 2021 at 1:07 AM Till Rohrmann wrote:
>
>> Hi Yik San,
>>
>> for future reference, I copy my an
Hi Yik San,
for future reference, I copy my answer from the SO here:
The reason for this difference is that for Hive it is recommended to start
the cluster with the respective Hive dependencies. The documentation [1]
states that it's best to put the dependencies into the lib directory before
you
Hi Flavio,
I tried to execute the code snippet you have provided and I could not
reproduce the problem.
Concretely I am running this code:
final EnvironmentSettings envSettings = EnvironmentSettings.newInstance()
.useBlinkPlanner()
.inStreamingMode()
.build();
final TableEnvironment
Hi Kevin,
when decreasing the TaskManager count I assume that you also decrease the
parallelism of the Flink job. There are three aspects which can then cause
a slower recovery.
1) Each Task gets a larger key range assigned. Therefore, each TaskManager
has to download more data in order to restar
1 - 100 of 1508 matches
Mail list logo