As Yangze stated ticket cache will be expired after its lifespan.
Please be aware that when keytab is used then Flink obtains delegation
tokens which will be never ever used.
The fact that delegation token handling is not functioning is a known issue
and working on it to fix it.
w/o delegation toke
Hi Raul,
On all systems keystore is needed normally on the server side and
truststore on client side.
As a result it's highly advised to use different config files in these
places.
It's easy to see why it would be a security leak when keystore would be
available in client side (client can fake a
export SSL_PASSWORD=secret
flink run -yDsecurity.ssl.rest.*-password=$SSL_PASSWORD ... app.jar
Such way the code which starts the workload can store the passwords in a
centrally protected area.
This still can be hacked but at least not stored in plain text file.
BR,
G
On Tue, Jan 18, 2022 at 10
Thanks for pinging me!
Yes, this is my main target to finish this feature however there are major
code parts which are still missing.
Please have a look at the umbrella jira to get better understanding:
https://issues.apache.org/jira/browse/FLINK-21232
In general it's not advised to use it for pro
Flink tried to create the following dir: tm_localhost:50329-fc0146
Colon is allowed on linux but not on windows and that's the reason of the
exception.
BR,
G
On Tue, Jul 12, 2022 at 11:30 AM wrote:
> ...
> 2022-07-12 11:25:08,448 INFO
> akka.remote.Remoting
In order to provide a hotfix please set "taskmanager.resource-id" to
something which doesn't contain special any character.
G
On Tue, Jul 12, 2022 at 11:59 AM Gabor Somogyi
wrote:
> Flink tried to create the following dir: tm_localhost:50329-fc0146
> Colon is allowe
The min supported version was 2.8.5 but in 1.17 it's gonna be 2.10.1 so one
can downgrade.
G
On Fri, Jan 27, 2023, 20:42 Leon Xu wrote:
> Thank you Mate.
> Yeah this looks like the root cause. A follow-up question, do you know if
> Flink 1.16 will have a hard dependency on Hadoop 3.3.x? or can
Hi Arthur,
Delegation tokens were enabled all the time which is not changed since it
would be a breaking change. I would personally turn it off by default but
it's important to keep original behavior.
The manager is loading providers at the very beginning of the init process.
It loads and initial
Hi Sriram,
This has been fixed in https://issues.apache.org/jira/browse/FLINK-31839
G
On Thu, Apr 20, 2023 at 4:57 PM Sriram Ganesh wrote:
> Hi Team,
>
> I am using S3 as FileSystem to write data from Flink. I am getting the
> below error in Flink 1.17. The same code works in Flink 1.16. Coul
hi Anuj,
As Martijn said IAM is the preferred option but if you've no other way than
access keys then environment variables is a better choice.
Such case conf doesn't contain plain text keys.
Just a side note, putting `s3a.access.key` into Flink conf file is not
configuring Hadoop S3. The way how
hi Chirag,
Flink now supports 2 ways to have TGT which is a Kerberos ticket and has
nothing to do with the "until 7 days renewable" HDFS TGS ticket (with
default config).
* Keytab: if one mounts a keytab for at least the JobManager pod then it
can create TGT infinitely (or until the user's passwo
o applicable in Flink 1.16?
>
> Thanks
>
> On Tuesday, 5 September, 2023 at 07:15:07 pm IST, Gabor Somogyi <
> gabor.g.somo...@gmail.com> wrote:
>
>
> hi Chirag,
>
> Flink now supports 2 ways to have TGT which is a Kerberos ticket and has
> nothing to do with
Hi Chirag,
Couple things can be done to reduce the attack surface (including but not
limited to):
* Use delegation tokens where only JM needs the keytab file:
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-delegation-token/
* Limit the access rights of the k
Hi Kirti,
Not sure what is the exact issue here but I'm not convinced that
having FlinkSecurityManager is going to solve it.
Here is the condition however:
* cluster.intercept-user-system-exit != DISABLED (this must be changed)
* cluster.processes.halt-on-fatal-error == false (this is good by defa
Hi Salva,
Just wondering why not good to set the uid like this?
```
output.sinkTo(outputSink).uid("my-human-readable-sink-uid");
```
>From the mentioned UID Flink is going to make the hash which is consistent
from UID -> HASH transformation perspective.
BR,
G
On Fri, Jun 7, 2024 at 7:54 AM Sa
ply the
> same strategy for generating uids to compute the corresponding uidHash for
> each suboperator. Maybe you can further investigate it and fire a JIRA
> issue on it.
>
> Best,
> Zhanghao Chen
> --
> *From:* Salva Alcántara
> *Sent:* Sunday, June
YW, ping me back whether it works because it's a nifty feature.
G
On Mon, Jun 10, 2024 at 9:26 AM Salva Alcántara
wrote:
> Thanks Gabor, I will give it a try!
>
> On Mon, Jun 10, 2024 at 12:01 AM Gabor Somogyi
> wrote:
>
>> Now I see the intention and then you
> Salva
>
> On Mon, Jun 10, 2024 at 9:49 AM Gabor Somogyi
> wrote:
>
>> YW, ping me back whether it works because it's a nifty feature.
>>
>> G
>>
>> On Mon, Jun 10, 2024 at 9:26 AM Salva Alcántara
>> wrote:
>>
>>> Thanks Gabor,
Hi Elakiya,
I've just double checked the story and seems like the latest 1.17 gosu
release is not vulnerable.
Can you please try it out on your side? Alexis has written down how you can
bump the docker version locally:
---CUT-HERE---
ENV GOSU_VERSION 1.17
---CUT-HERE---
Please report back and we
Hi Xiao,
I'm not quite convinced that the azure plugin ruined your workload, I would
take a look at the dependency graph you've in the pom.
Adding multiple deps can conflict in terms of class loader services
(src/main/resources/META-INF/services).
As an example you've 2 such dependencies where
or
>
>
>
>
>
>
>
> And like my reply in stackoverflow, I found the hadoop-common file :
> https://github.com/apache/hadoop/blob/release-3.3.4-RC1/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3374
&g
Hi Alexis,
It depends. When one uses SavepointLoader to read metadata only then it's
non-parallel.
SavepointReader however is basically a normal batch job with all its
features.
G
On Fri, Jul 5, 2024 at 5:21 PM Alexis Sarda-Espinosa <
sarda.espin...@gmail.com> wrote:
> Hello,
>
> Really quick
I've double checked and I think that CollectSinkOperatorFactory is
initialized in DataStream.collectAsync without MAX_BATCH_SIZE
and SOCKET_TIMEOUT values coming from the Flink config.
Could you plz share the whole stacktrace to double check my assumption?
G
On Tue, Jul 23, 2024 at 12:46 PM Salv
This has been not moved for a while so assigned to you.
G
On Mon, Jul 15, 2024 at 9:06 AM Zhongyou Lee
wrote:
> Hellow everyone :
>
> Up to now, To adjuest rocksdb flush thread the only way is implement
> ConfigurableRocksDBOptionsFactory #setMaxBackgroundFlushes by user. I found
> FLINK-22059
Hi Bjarke,
It's normal to see longer recovery time as the state size grows. There are
developments in progress to mitigate this problem.
> Any ideas as to what could be causing this?
I think there is no easy way to tell it, I can give just some pointers.
First I would take a look at the state fi
Hi,
As a general suggestion please use the official releases since we're not
able to analyze any kind of custom code with cherry-picks with potential
hand made conflict resolution.
When you state "needs to be restarted due to an exception" then what kind
of exception you're are referring to?
I me
ken*
>
> *(*FLIP-211: Kerberos delegation token framework - Apache Flink - Apache
> Software Foundation
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-211%3A+Kerberos+delegation+token+framework>
> *)*
>
>
> You can verify this Kerberos scenario, thank you.
>
e is a Yarn parameter called
> `yarn.resourcemanager.proxy-user-privileges.enabled` that seems to be
> suitable for long-running applications. Are you familiar with this
> parameter? Can it be used to solve Flink's issue?
>
>
> --
>
>
>
>
>
rocksDB tuning and
reducing state size if possible (better eviction, etc...).
>
> Best regards,
> Bjarke
>
> On Thu, Aug 8, 2024 at 11:16 AM Gabor Somogyi
> wrote:
>
>> Hi Bjarke,
>>
>> It's normal to see longer recovery time as the state size grows. There
>
fs.core.windows.net/test/checkpointing
> state.savepoints.dir: abfss://@.
> dfs.core.windows.net/test/savepoints
> taskmanager.network.memory.buffer-debloat.enabled: "true"
> taskmanager.numberOfTaskSlots: "1"
>
> Best regards,
> Bjarke
>
&
Hi Dominic,
The issue has nothing to do with DynamicKafkaSource.
The scenario what happens here is clear:
* At some point in time you've used the 3.2 Kafka connector which writes
out it's state with v2 serializer
* Later falled back to a version which is not able to read that (pre 3.1.0
because t
to reload the state with an old
> dependency (which I cannot even find on the classpath) 😲
>
>
>
>
> *Dominik Bünzli *Data, Analytics & AI Engineer III
>
>
>
> *From: *Gabor Somogyi
> *Date: *Tuesday, 3 September 2024 at 12:59
> *To: *Bünzli
[] -
> Sending Request: GET
> https://.../savepoints/flink-compression/.../savepoint-...
> Range: bytes=0-9223372036854775806.
> ```
>
> For uncompressed state could you please let us know how the change from
> your PR eliminates the multiple calls to S3
as a
> byte[] and do the reads from memory ... but it seems the state compression
> is doing the same. We are in the process of testing state compression under
> production volumes ... can't say how that will actually work for us.
>
> Thank you again for looking into this.
Hi Alex,
Please see my comment here [1].
[1] https://lists.apache.org/thread/h5mv6ld4l2g4hsjszfdos9f365nh7ctf
BR,
G
On Mon, Sep 2, 2024 at 11:02 AM Alex K. wrote:
> We have an issue where a savepoint file containing Kafka topic partitions
> offsets is requested millions of times from AWS S3.
Hi William,
It's a bit old question but I think now we know why this is happening.
Please see [1] for further details.
It's an important requirement to use uncompressed state because even with
the fix compressed state is still problematic.
We've already tested the PR with load but if you can repo
some time since our checkpoints can reach
>> more than 10GB (uncompressed).
>>
>> Let me know if you have any updates. I will let you know if I observe
>> anything else.
>> Please let us know if you have any new findings.
>>
>> Thank you.
>>
>> On Tue, Oct
ation performing well after applied my PR [1]
After it's going to be merged we can say both compressed and uncompressed
state is safe to use.
Thanks for everybody for the efforts to sort this out!
[1] https://github.com/apache/flink/pull/25509
G
On Thu, Oct 17, 2024 at 11:06 PM Gabor Somo
(uidHash))
.findFirst();
assertEquals(uidHash, pair.get().getUserDefinedOperatorID().get().toString());
Please load your savepoint with SavepointLoader and analyze what kind of
operators are inside and why is your app blowing up.
G
On Tue, Sep 24, 2024 at 9:29 AM Gabor Somogy
s/egress
> 4) How can I create a docker image of flink stateful APIs, I was unable to
> decouple the source code and build a docker image. I used the playground
> below.
>
> https://github.com/apache/flink-statefun-playground/tree/release-3.3/playground-internal
>
> Best Regards
Hi Nitin,
Flink applications can be started locally (run the main) for example from
Intellij or any other similar IDE.
Important note that such case the execution path is different but it's
convenient for business logic debugging.
BR,
G
On Tue, Oct 1, 2024 at 12:21 PM Nitin Chauhan
wrote:
> H
Hi Salva,
Which version is this?
BR,
G
On Mon, Sep 23, 2024 at 8:46 PM Salva Alcántara
wrote:
> I have a pipeline where I'm trying to add a new operator. The problem is
> that originally I forgot to specify the uid for one source. To remedy this,
> I'm using setUidHash, providing the operator
Hi Jad,
There is no low hanging fruit here if you really want to find this out.
Such case the memory manager tries to allocate and deallocate the total
memory which is prepared for.
When not all the memory is available then it's not going to be successful
and you see the mentioned exception.
I wo
I'm intended to file a jira + PR.
G
On Thu, Dec 5, 2024 at 2:40 PM Salva Alcántara
wrote:
> So what's next? Do you want me to do something Gabor?
>
> Regards,
>
> Salva
>
> On Thu, Dec 5, 2024, 13:48 Gabor Somogyi
> wrote:
>
>> Based on t
op(MailboxProcessor.java:231)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
> at
> org.apache.flink.runtime.taskmanager.Task.ru
test
> your changes locally. If still necessary, let me know and I'll find the
> time!
>
>
> On Fri, Dec 6, 2024 at 1:49 PM Gabor Somogyi
> wrote:
>
>> Salva, I've tested the code by myslelf but do you have the possibility to
>> test this[1] fix?
>>
&
Salva, I've tested the code by myslelf but do you have the possibility to
test this[1] fix?
[1] https://github.com/apache/flink/pull/25755
G
On Thu, Dec 5, 2024 at 3:15 PM Gabor Somogyi
wrote:
> I'm intended to file a jira + PR.
>
> G
>
>
> On Thu, Dec 5, 202
ch entry and extract the bit we
> want in that scenario.
>
> Never mind
>
> Thank you for the insight. Saves me a lot of hunting for nothing.
>
> JM
>
> On Tue, Feb 4, 2025 at 10:45 AM Gabor Somogyi
> wrote:
>
>> Hi Jean-Marc,
>>
>> We've a
d all other
aspects.
G
On Tue, Feb 4, 2025 at 2:26 PM Gabor Somogyi
wrote:
> Please report back on how the patch behaves including any side effects.
>
> Now I'm in testing the state reading with processor API vs the mentioned
> job where we control the keys.
> The differen
Hi Jean-Marc,
We've already realized that the RocksDB approach is not reaching the
performance criteria which it should be. There is an open issue for it [1].
The hashmap based approach was and is always expecting more memory. So if
the memory footprint is a hard requirement then RocksDB is the on
b performance in an acceptable range for us we
> might go that way. I really like the light memory consumption of RockDb for
> that kind of side job.
>
> Thanks
>
> JM
>
> On Tue, Feb 4, 2025 at 12:23 PM Gabor Somogyi
> wrote:
>
>> What I could imagine is to
> controlled environment. but I think it's conclusive enough. I will need to
> run further tests but I think we will patch our Flink. using a system
> property to configure it.
>
> Hope this help
>
> JM
>
> On Tue, Feb 4, 2025 at 4:01 PM Gabor Somogyi
> wrote:
>
&g
sm 1, and only 1 task manager) . I also know
> it has been created by a job using a HashMap backend. And I do not care
> about duplicates.
>
> I should still be good, right? from what I saw I never read any duplicate
> keys.
>
> Thanks
>
> JM
>
>
> On Wed, Feb 5,
as
> wondering if you could share more in - depth details about this
> configuration.
> Your insights would be of great help to me in resolving this issue. Thank
> you very much in advance.
> Best regards,Leilinee
>
> 获取 Outlook for iOS <https://aka.ms/o0ukef>
> --
Hi Nicolas,
This is only supported by the operator but not by Flink. It's on my list
but not yet reached this to actually do it.
BR,
G
On Mon, Feb 10, 2025 at 10:41 AM Nicolas Fraison via user <
user@flink.apache.org> wrote:
> Hi,
>
> We are looking into enabling SSL on our flink jobs.
> Follo
rator and 2 value states
- Old execution time: 25M27.126737S
- New execution time: 1M19.602042S
In short: ~95% performance gain.
G
On Thu, Feb 6, 2025 at 9:06 AM Gabor Somogyi
wrote:
> In short, when you don't care about
> multiple KeyedStateReaderFunction.readKey calls then yo
ere any plans on a migration guide or something for users to adapt
> their QS observers (beyond the current docs)? (State-)Observability-wise
> Flink has some room for improvement I would say.
>
> Regards,
>
> Salva
>
>
> On Wed, Feb 5, 2025 at 9:36 AM Gabor Somogyi
>
Hi Jean-Marc,
FYI, I've just opened this [1] PR to address the issue in a clean way.
May I ask you to test it on your side?
[1] https://github.com/apache/flink/pull/26134
BR,
G
On Fri, Feb 7, 2025 at 6:14 PM Gabor Somogyi
wrote:
> Just a little update on this. We've made our f
utFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> ~[flink-runtime-2.1-SNAPSHOT.jar:2.1-SNAPSHOT]
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:107)
> ~[flink-runtime-2.1-SNAPSHOT.jar:2.1-SNAPSHOT]
> at
> org.apache.flink.strea
s with your patch, which is exactly what I am
> expecting. If you have more specific concerns I suppose I can instrument my
> code further. I have no expectation regarding the key ordering.
>
> JM
>
>
> On Wed, Feb 12, 2025 at 11:29 AM Gabor Somogyi
> wrote:
>
>> I t
t; at
>>> com.ibm.aiops.lifecycle.policy.execution.state.RequestExecutionStateManager.(RequestExecutionStateManager.java:52)
>>> ~[classes/:?]
>>> at
>>> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Meth
package -DskipTests
>
> runs fine in my terminal.
>
> Regards,
>
> Salva
>
> On Sun, Dec 8, 2024 at 12:06 PM Gabor Somogyi
> wrote:
>
>> It would be good to report back whether it's solved your issue, but since
>> we've added automated test + I
Hi Patricia,
Different S3 buckets can be accessed with different S3A client
configurations.
This allows for different endpoints, data read and write strategies, as
well as login details.
A bucket s3a://nightly/ used for nightly data can then be given a session
key:
fs.s3a.bucket.nightly.access.key
learning
> exercise for us.
>
> Kind regards
>
> JM
>
>
>
> On Wed, Feb 12, 2025 at 12:57 PM Gabor Somogyi
> wrote:
>
>> Hi Jean-Marc,
>>
>> Thanks for your efforts! We've done quite extensive tests inside and they
>> are all passing. Goo
> signature_561365254] <https://www.facebook.com/Genesys/>[image:
> signature_262758434] <https://www.youtube.com/Genesys>[image:
> signature_2090310858] <http://blog.genesys.com/>
>
>
>
>
>
>
>
>
>
> *From: *Gabor Somogyi
> *Date: *Friday, F
Hi leilinee,
Since Flink SQL is not supporting operator uid configuration there are only
hard ways.
State operator uid change +
`execution.savepoint.ignore-unclaimed-state=true` would be the easy way.
Depending on you constraints either you use the state processor API and
remove the
operator whic
BTW, do you have a bare minimum application which can reproduce the issue?
On Fri, Feb 14, 2025 at 5:07 PM Gabor Somogyi
wrote:
> Hi Peter,
>
> I've had a look on this and seems like this problem or a similar one still
> exists in 1.20.
> Since I've not done in-dept
Hi Peter,
I've had a look on this and seems like this problem or a similar one still
exists in 1.20.
Since I've not done in-depth analysis I can only say this by taking a look
at the last comments, like:
> Faced an issue on Flink 1.20.0
but there are more of such...
So all in all I've the feelin
n
>
>
> On Fri, Feb 21, 2025 at 3:37 PM Gabor Somogyi
> wrote:
>
>> The UID must match in the Flink app `events.uid("my-uid")` and in the
>> reader app `forUid("my-uid")`.
>> In general it's a good practice to set uid in the Flink app for each
The UID must match in the Flink app `events.uid("my-uid")` and in the
reader app `forUid("my-uid")`.
In general it's a good practice to set uid in the Flink app for each and
every operator, otherwise Flink generates an almost random number for it.
When you don't know the generated uid then no worr
Hi Sachin,
You're on the right track in general and this should work :)
I know that version upgrade can be complex but it worth to do so because
we've just added a ~95% performance boost for the state processor api in
1.20 [1].
As you mentioned the SQL connector is on 2.x which is not yet feasible
te("Analysis");
>
>
> Any idea as to what could be going wrong.
> Also note this is my checkpoint config:
> state.backend.type: rocksdb
> state.checkpoints.dir: s3://{bucket}/flink-checkpoints
> state.backend.incremental: 'true'
> state.backend.local-recovery: 'tr
ate one of my operators writes to, to
> understand if there is a state leak and why this state becomes very huge.
>
> Thanks
> Sachin
>
>
> On Wed, Feb 26, 2025 at 5:01 PM Gabor Somogyi
> wrote:
>
>> State processor API is now not supporting changelog. Plea
Hi Vadim,
In short, state eviction is not applied to incremental checkpoints from
storage perspective but there are techniques to configure Flink to do it.
I think you look for state TTL feature which can resolve your problem [1].
[1]
https://flink.apache.org/2019/05/17/state-ttl-in-flink-1.8.0-h
e that couldn't be
> restored by the operator ID. Finally, I used the API to rewrite the new
> savepoint file, replacing the old one, the new Job can restor state and
> running well.
>
> Best regards, leilinee
>
> ------
> *发件人:* Gabor Somogyi
> *
> Best Regards,
>
> Le mer. 26 mars 2025 à 07:18, Gabor Somogyi a
> écrit :
>
>> In short it's encouraged to use savepoint because of the following
>> situation:
>> * You start processing data from offset 0
>> * 2 savepoints created, one with offset 10, a
possible to merge it to Flink and how I should proceed (just push a PR
> to flink repo or should I need a FLIP for such change)?
>
> Regards,
> Nicolas
>
> On Mon, Feb 10, 2025 at 11:27 AM Gabor Somogyi
> wrote:
>
>> Hi Nicolas,
>>
>> This is only supported b
Hi Or,
Is the link in the official place [1] not working?
[1] https://flink.apache.org/what-is-flink/community/#slack
BR,
G
On Wed, Mar 26, 2025 at 9:44 AM Or Keren wrote:
> Hey,
>
> Can I can an invitation for your slack channel please?
>
> Thanks,
> Or.
>
In short it's encouraged to use savepoint because of the following
situation:
* You start processing data from offset 0
* 2 savepoints created, one with offset 10, another with 20
* This timeframe Kafka has offset 20 since that's the last processed
* At offset 30 you realize that data processed bet
I don't have access to fix this, hope the appropriate person will pick this
up...
G
On Wed, Mar 26, 2025 at 9:57 AM Or Keren wrote:
> Hi Gabor,
>
> Unfortunately not.
>
> [image: PastedGraphic-1.png]
>
> Thanks for the quick reply!
>
> On 26 Mar 2025, at 10:54,
80 matches
Mail list logo