Hi community,
I would like to ask how one can clear Flink's k8s operator state.
I have a sandbox k8s cluster with Flink k8s operator where I've deployed
Flink session cluster with few Session jobs. After some play around, and
braking few things here and there I see this log:
023-0
Hi Krzysztof,
Sorry there are indeed no document said that the operator state is only kept in
memory, but based on the
current implementation it is indeed the case.
And I might also need to fix one point: the Split Enumerate should be executed
in the JM side inside the OperatorCoordinator
Thank you both,
yes seems that the only option on a non keyed operate would be List State,
my bad.
Yun Gao,
I'm wondering from where you get the information that " Flink only support
in-memory operator state", can you point me to the documentation that says
that?
I cannot find any
Hi Krzysztof,
If I understand right, I think managed operator state might not help here since
currently Flink
only support in-memory operator state.
Is it possible currently we first have a customized SplitEnumerator to skip the
processed files
in some other way? For example, if these files
Hi Krzysztof,
Non-keyed operator state only supports list-like state [1] as there exist no
primary key in operator state. That is to say you cannot use map state in
source operator.
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/fault-tolerance/state
Hi,
Is it possible to use managed operator state like MapState in an
implementation of new unified source interface [1]. I'm especially
interested with using Managed State in SplitEnumerator implementation.
I have a use case that is a variation of File Source where I will have a
great numb
t; Fabian
>
> Am So., 5. Apr. 2020 um 22:33 Uhr schrieb KristoffSC <
> krzysiek.chmielew...@gmail.com>:
>
>> Hi,
>> according to [1] operator state and broadcast state (which is a "special"
>> type of operator state) are not stored in RocksDb during runt
-each-record-to-a-processor-task-slot-td16483.html
On Wed, Apr 8, 2020 at 9:10 AM Salva Alcántara
wrote:
>
> Just for the sake of experimenting and learning. Let's assume that we
have a
> keyed process function using keyed state and we want to rewrite it using
> operator state
Just for the sake of experimenting and learning. Let's assume that we have a
keyed process function using keyed state and we want to rewrite it using
operator state. The question is, would that be possible to keep the exact
same behaviour? For example, one could use operator union list stat
Hi Kristoff,
I'm not aware of any concrete plans for such a feature.
Best,
Fabian
Am So., 5. Apr. 2020 um 22:33 Uhr schrieb KristoffSC <
krzysiek.chmielew...@gmail.com>:
> Hi,
> according to [1] operator state and broadcast state (which is a "special"
> type of
Hi,
according to [1] operator state and broadcast state (which is a "special"
type of operator state) are not stored in RocksDb during runtime when
RocksDb is choosed as state backend.
Are there any plans to change this?
I'm working around a case where I will have a quite large
then JM
>> > may encounter OOM because each TDD will contains all the state of all
>> > subtasks[1]
>> >
>> > [1]
>> > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#using-managed-operator-state
>> > B
wrote:
> >
> > Hi
> >
> > Do you use UNION state in your scenario, when using UNION state, then JM
> may encounter OOM because each TDD will contains all the state of all
> subtasks[1]
> >
> > [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/d
tasks[1]
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#using-managed-operator-state
> Best,
> Congxian
>
>
> Aaron Levin 于2019年11月27日周三 上午3:55写道:
>>
>> Hi,
>>
>> Some context: after a refactoring, we were unabl
Hi
Do you use UNION state in your scenario, when using UNION state, then JM
may encounter OOM because each TDD will contains all the state of all
subtasks[1]
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#using-managed-operator-state
Best,
Congxian
Aaron
file for the checkpoint was 1.3GB (usually 11MB). Inside
the `_metadata` file we saw `- 1402496 offsets:
com.stripe.flink.backfill.kafka-archive-file-progress`. This happened
to be the operator state we were no longer initializing or
snapshotting after the refactoring.
Before I dig further into thi
Hi Aaron,
I don't think we have such fine grained metrics on per operation state
size, but from your description that "YARN kills containers who are
exceeding their memory limits", I think the root cause is not the state
size but related to the memory consumption of the state backend.
My guess is
hen configuring the source[1, 2]. When resuming your job pass the
> —allowNonRestoredState flag and your offsets will reset but all other state
> will be retained.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/upgrading.html#matching-operator-state
>
&
:39写道:
> Hi,
>
> We are running Flink 1.7 and recently due to Kafka cluster migration, we
> need to find a way to modify kafka offset in FlinkKafkaConnector's state,
> and we found Flink 1.9's State Processor API is the exactly tool we need,
> we are able to modify the opera
Hi,
We are running Flink 1.7 and recently due to Kafka cluster migration, we
need to find a way to modify kafka offset in FlinkKafkaConnector's state,
and we found Flink 1.9's State Processor API is the exactly tool we need,
we are able to modify the operator state via State Processo
Hi,
I’m not sure if there is some simple way of doing that (maybe some other
contributors will know more).
There are two potential ideas worth exploring:
- use periodically triggered save points for monitoring? If I remember
correctly save points are never incremental
- use save point input/out
Hey Flink Community,
I'm working on a Flink application where we are implementing operators that
extend the RichFlatMap and RichCoFlatMap interfaces. As a result, we are
working directly with Flink's state API (ValueState, ListState, MapState).
Something that appears to be extremely valuable is ha
From: Navneeth Krishnan
Sent: Thursday, August 8, 2019 13:50
To: user
Subject: Operator state
Hi All,
Is there a way to share operator state among operators? For example, I have an
operator which has union state and the same state data is required in a
downstream operator. If not, is
Hi All,
Is there a way to share operator state among operators? For example, I have
an operator which has union state and the same state data is required in a
downstream operator. If not, is there a recommended way to share the state?
Thanks
The stack trace shows that the state is being restored which has probably
already happened after job restart. I am wondering why it has been
restarted after some time of running. Could you share full job/task manager
logs?
On Thu, May 16, 2019 at 6:26 AM anaray wrote:
> Thank You Andrey. Arity o
Thank You Andrey. Arity of the job has not changed. Here issue is that job
will run for sometime (with checkpoint enabled) and then after some time
will get into above exception. The job keeps restarting afterwards.
One thing that I want point out here is that we have a custom *serialization
sche
y of some Tuple type changed in
some operator state. The number of tuple fields could have increased after
job restart. In that case Flink expects tuples with more fields stored in
checkpoint and fails. Such change would require an explicit state
migration. Could it be the case? When did the failure
options.
Please advice.
java.lang.IllegalStateException: Could not initialize operator state
backend.
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:301)
at
Hi,
I do not use custom serializers, and my job contains only source and
sink(BucketingSink). What causes this phenomenon in general?
I suggest that you also update to a newer version, at least the latest
> bugfix release
Which version does this sentence refer to? And could you please help l
Hi,
that is correct. If you are using custom serializers you should double check
their correctness, maybe using our test base for type serializers. Another
reason could be that you modified the job in a way that silently changed the
schema somehow. Concurrent use of serializers across different
Hi Qingxiang,
Several days ago, Stefan described the causes of this anomaly in a problem
similar to this:
Typically, these problems have been observed when something was wrong with
a serializer or a stateful serializer was used from multiple threads.
Thanks, vino.
Marvin777 于2018年9月21日周五 下午3:20
Hi all,
When Flink(1.4.2) job starts, it could find checkpoint files at HDFS, but
exception occurs during deserializing:
[image: image.png]
Do you have any insight on this?
Thanks,
Qingxiang Ma
Solved it by using a key selector that returns a constant, so that creates a
"pseudo" keyedStream with only one partition.
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Hi All,
I'm sorry if I'm double posting, but I posted it before subscribing and I
don't see it in my post lists. So I'm posting it again.
The Flink app we are trying to build is as such: read from kafka, sort the
messages according to some dependency rules, and only send messages that
have satis
Hello Gyula & Stefan,
Below I attach a similar situation that I am trying to resolve, [1]
I am also using *managed operator state*, but I have some trouble with the
flink documentation. I believe it is not that clear.
So, I have the following questions:
1 -- Can I concatenate all the par
>
> 2018-01-14 2:39 GMT+01:00 Boris Lublinsky <mailto:boris.lublin...@lightbend.com>>:
> Documentation
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#using-managed-operator-state
>
> <https://ci.apache.org/projects/
9 GMT+01:00 Boris Lublinsky :
> Documentation https://ci.apache.org/projects/flink/
> flink-docs-release-1.4/dev/stream/state/state.html#using-
> managed-operator-state
> Refers to CheckpointedRestoring interface.
> *Which jar defines this interface - can’t find it*
>
> *Al
Documentation
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#using-managed-operator-state
<https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/state.html#using-managed-operator-state>
Refers to CheckpointedRestoring int
e this functionality? I think I don't have enough knowledge about
> Flink's internals to implement this easily.
>
> Gerard
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Managed-operator-state-treating-s
s (e.g. snapshot and restore) to
> achieve this functionality? I think I don't have enough knowledge about
> Flink's internals to implement this easily.
>
> Gerard
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archiv
ternals to implement this easily.
Gerard
--
View this message in context:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Managed-operator-state-treating-state-of-all-parallel-operators-as-the-same-tp14102p14111.html
Sent from the Apache Flink User Mailing List archive. mai
>
> 2017-07-03 17:50 GMT+02:00 gerardg <mailto:ger...@talaia.io>>:
> Hello,
>
> Is it possible to have managed operator state where all the parallel
> operators know that they have the same state? Therefore, it would be only
> snapshotted by one of them and in case
e managed operator state where all the parallel
> operators know that they have the same state? Therefore, it would be only
> snapshotted by one of them and in case of failure (or rescaling) all would
> use that snapshot.
>
> For the rescaling case I guess I could use union redistributio
Hello,
Is it possible to have managed operator state where all the parallel
operators know that they have the same state? Therefore, it would be only
snapshotted by one of them and in case of failure (or rescaling) all would
use that snapshot.
For the rescaling case I guess I could use union
list?
>
> Best regards,
> Dmitry
>
> On Tue, Feb 14, 2017 at 9:33 AM, Stefan Richter <mailto:s.rich...@data-artisans.com>> wrote:
> Hi,
>
> there is something that we call "raw keyed“ operator state, which might
> exactly serve your purpose. It is already u
results in correct behavior.
Can I rely on Flink distributing state evenly and in the order I return it
in the list?
Best regards,
Dmitry
On Tue, Feb 14, 2017 at 9:33 AM, Stefan Richter wrote:
> Hi,
>
> there is something that we call "raw keyed“ operator state, which might
> exa
Hi,
there is something that we call "raw keyed“ operator state, which might exactly
serve your purpose. It is already used internally by Flink’s window operator,
but there exists currently no public API for this feature. Way it works
currently is that you obtain input and output streams
to the user and making it
configurable.
Looping in Stefan (in cc) who mostly worked on this part and see if he can
provide more info.
- Gordon
On February 14, 2017 at 2:30:27 AM, Dmitry Golubets (dgolub...@gmail.com) wrote:
Hi,
It looks impossible to implement a keyed state with operator state
Hi,
It looks impossible to implement a keyed state with operator state now.
I know it sounds like "just use a keyed state", but latter requires
updating it on every value change as opposed to operator state and thus can
be expensive (especially if you have to deal with mutable structu
?
Dominik
> On 8 Dec 2016, at 10:04, Stefan Richter wrote:
>
> Hi Dominik,
>
> as Gordon’s response only covers keyed-state, I will briefly explain what
> happens for non-keyed operator state. In contrast to Flink 1.1, Flink 1.2
> checkpointing does not write a single blac
Hi Dominik,
as Gordon’s response only covers keyed-state, I will briefly explain what
happens for non-keyed operator state. In contrast to Flink 1.1, Flink 1.2
checkpointing does not write a single blackbox object (e.g. ONE object that is
a set of all kafka offsets is emitted), but a list of
-LuBCyPUWyFd9l3T9WyssQ63w/edit#
On December 8, 2016 at 4:40:18 AM, Dominik Safaric (dominiksafa...@gmail.com)
wrote:
Hi everyone,
In the case of scaling out a Flink cluster, how does Flink handle operator
state partitioning of a staged topology?
Regards,
Dominik
Hi everyone,
In the case of scaling out a Flink cluster, how does Flink handle operator
state partitioning of a staged topology?
Regards,
Dominik
resources and as such recover a bit faster. There are
> use cases for that as well.
>
> Storing the model in OperatorState is a good idea, if you can. On the roadmap
> is to migrate the operator state to managed memory as well, so that should
> take care of the GC issues.
>
>
t better
>>> throughput/latency/consistency and have one less system to worry about
>>> (external k/v store). State outside means that the Flink processes can be
>>> slimmer and need fewer resources and as such recover a bit faster. There
>>> are use cases for that as
a bit faster. There
>> are use cases for that as well.
>>
>> Storing the model in OperatorState is a good idea, if you can. On the
>> roadmap is to migrate the operator state to managed memory as well, so that
>> should take care of the GC issues.
>>
>> We ar
aster. There
> are use cases for that as well.
>
> Storing the model in OperatorState is a good idea, if you can. On the
> roadmap is to migrate the operator state to managed memory as well, so that
> should take care of the GC issues.
>
> We are just adding functionality to make t
.0/
>>>>
>>>> On Fri, Nov 13, 2015 at 9:50 AM, Welly Tambunan
>>>> wrote:
>>>>
>>>>> Hi Aljoscha,
>>>>>
>>>>> Thanks for this one. Looking forward for 0.10 release version.
>>>>>
>>>>&g
lly Tambunan
>>> wrote:
>>>
>>>> Hi Aljoscha,
>>>>
>>>> Thanks for this one. Looking forward for 0.10 release version.
>>>>
>>>> Cheers
>>>>
>>>> On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek
;> Cheers
>>>
>>> On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek
>>> wrote:
>>>
>>>> Hi,
>>>> I don’t know yet when the operator state will be transitioned to
>>>> managed memory but it could happen for 1.0 (which will come af
gt;>
>> On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek
>> wrote:
>>
>>> Hi,
>>> I don’t know yet when the operator state will be transitioned to managed
>>> memory but it could happen for 1.0 (which will come after 0.10). The good
>>> thing
:
> Hi Aljoscha,
>
> Thanks for this one. Looking forward for 0.10 release version.
>
> Cheers
>
> On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek
> wrote:
>
>> Hi,
>> I don’t know yet when the operator state will be transitioned to managed
>> memory
Hi Aljoscha,
Thanks for this one. Looking forward for 0.10 release version.
Cheers
On Thu, Nov 12, 2015 at 5:34 PM, Aljoscha Krettek
wrote:
> Hi,
> I don’t know yet when the operator state will be transitioned to managed
> memory but it could happen for 1.0 (which will come after 0
Hi,
I don’t know yet when the operator state will be transitioned to managed memory
but it could happen for 1.0 (which will come after 0.10). The good thing is
that the interfaces won’t change, so state can be used as it is now.
For 0.10, the release vote is winding down right now, so you can
Hi Stephan,
>Storing the model in OperatorState is a good idea, if you can. On the
roadmap is to migrate the operator state to managed memory as well, so that
should take care of the GC issues.
Is this using off the heap memory ? Which version we expect this one to be
available ?
Anot
for that as well.
Storing the model in OperatorState is a good idea, if you can. On the
roadmap is to migrate the operator state to managed memory as well, so that
should take care of the GC issues.
We are just adding functionality to make the Key/Value operator state
usable in CoMap/CoFlatMap as
e, so i'm thinking instead of
using in memory cache like redis, ignite etc, i can just use operator state
for this one.
I just want to gauge do i need to use memory cache or operator state would
be just fine.
However i'm concern about the Gen 2 Garbage Collection for caching our own
stat
Let me understand your case better here. You have a stream of model and
stream of data. To process the data, you will need a way to access your
model from the subsequent stream operations (map, filter, flatmap, ..).
I'm not sure in which case Operator State is a good choice, but I think yo
subsequent query.
We are considering using Flink Operator state for that one.
Is that the right approach to use that for memory cache ? Or if that
preferable using memory cache like redis etc.
Any comments will be appreciated.
Cheers
--
Welly Tambunan
Triplelands
http://weltam.wordpress.com
http
69 matches
Mail list logo