[Schema Evolution] Cannot restore from savepoint after deleting field from POJO

2021-03-11 Thread Alexis Sarda-Espinosa
Hi everyone, It seems I'm having either the same problem, or a problem similar to the one mentioned here: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-when-restoring-from-savepoint-with-missing-state-amp-POJO-modification-td39945.html I have a POJO class that is u

DataStream in batch mode - handling (un)ordered bounded data

2021-03-12 Thread Alexis Sarda-Espinosa
Hello, Regarding the new BATCH mode of the data stream API, I see that the documentation states that some operators will process all data for a given key before moving on to the next one. However, I don't see how Flink is supposed to know whether the input will provide all data for a given key

Re: DataStream in batch mode - handling (un)ordered bounded data

2021-03-13 Thread Alexis Sarda-Espinosa
From: Dawid Wysakowicz Sent: Friday, March 12, 2021 4:10 PM To: Alexis Sarda-Espinosa; user@flink.apache.org Subject: Re: DataStream in batch mode - handling (un)ordered bounded data Hi Alexis, As of now there is no such feature in the DataStream API. The Batch mode in DataStream API is a new fe

Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Alexis Sarda-Espinosa
Hello, I have a Flink TM configured with taskmanager.memory.managed.size: 1372m. There is a streaming job using RocksDB for checkpoints, so I assume some of this memory will indeed be used. I was looking at the metrics exposed through the REST interface, and I queried some of them: /taskmanag

RE: Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Alexis Sarda-Espinosa
:30 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Clarification about Flink's managed memory and metric monitoring Hi Alexis, First of all, I strongly recommend not to look into the JVM metrics. These metrics are fetched directly from JVM and do not well correspond to Fl

Re: [Schema Evolution] Cannot restore from savepoint after deleting field from POJO

2021-04-29 Thread Alexis Sarda-Espinosa
, March 12, 2021 6:22 PM To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: [Schema Evolution] Cannot restore from savepoint after deleting field from POJO Hi Alexis, This looks like a bug, I've created a Jira ticket to address it [1]. Please feel free to provide any addit

Data loss when connecting keyed streams

2021-05-21 Thread Alexis Sarda-Espinosa
Hello everyone, I just experienced something weird and I'd like to know if anyone has any idea of what could have happened. I have a simple Flink cluster version 1.11.3 running on Kubernetes with a single worker. I was testing a pipeline that connects 2 keyed streams and processes the result w

Using Flink's Kubernetes API inside Java

2021-07-01 Thread Alexis Sarda-Espinosa
Hello everyone, I'm testing a custom Kubernetes operator that should fulfill some specific requirements I have for Flink. I know of this WIP project: https://github.com/wangyang0918/flink-native-k8s-operator I can see that it uses some classes that aren't publicly documented, and I believe it

Re: Using Flink's Kubernetes API inside Java

2021-07-02 Thread Alexis Sarda-Espinosa
resource. Regards, Alexis. From: Roman Khachatryan Sent: Friday, July 2, 2021 9:19 PM To: Alexis Sarda-Espinosa ; Yang Wang Cc: user@flink.apache.org Subject: Re: Using Flink's Kubernetes API inside Java Hi Alexis, Have you looked at flink-on-k8s-ope

RE: Using Flink's Kubernetes API inside Java

2021-07-07 Thread Alexis Sarda-Espinosa
Thanks Roman and Yang, I understand. I’ll have a look and ask on the developer list depending on what I find. Regards, Alexis. From: Yang Wang Sent: Mittwoch, 7. Juli 2021 05:14 To: ro...@apache.org Cc: Alexis Sarda-Espinosa ; user@flink.apache.org Subject: Re: Using Flink's Kubernete

OOM Metaspace after multiple jobs

2021-07-07 Thread Alexis Sarda-Espinosa
Hello, I am currently testing a scenario where I would run the same job multiple times in a loop with different inputs each time. I am testing with a local Flink cluster v1.12.4. I initially got an OOM - Metaspace error, so I increased the corresponding memory in the TM's JVM (to 512m), but it

Re: OOM Metaspace after multiple jobs

2021-07-07 Thread Alexis Sarda-Espinosa
cleaned up? In this scenario only my jobs would be running in the cluster, so I can have a bit more control. Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Thursday, July 8, 2021 12:14 AM To: user@flink.apache.org Subject: OOM Metaspace after multiple jobs

Re: OOM Metaspace after multiple jobs

2021-07-16 Thread Alexis Sarda-Espinosa
d see multiple instances (one per job, it seems) of some of my job's classes (e.g. sources), and their GC roots were the Flink User Class Loader. I haven't figured out why they would remain across different jobs. Regards, Alexis. ____ From: Alexis Sarda-Espi

Using POJOs with the table API

2021-08-05 Thread Alexis Sarda-Espinosa
Hi everyone, I had been using the DataSet API until now, but since that's been deprecated, I started looking into the Table API. In my DataSet job I have a lot of POJOs, all of which are even annotated with @TypeInfo and provide the corresponding factories. The Table API documentation talks abo

Keystore format limitations for TLS

2021-08-16 Thread Alexis Sarda-Espinosa
Hello, I am trying to configure TLS communication for a Flink cluster running on Kubernetes. I am currently using the BCFKS format and setting that as default via javax.net.ssl.keystoretype and javax.net.ssl.truststoretype (which are injected in the environment variable FLINK_ENV_JAVA_OPTS). Th

RE: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-08-26 Thread Alexis Sarda-Espinosa
I think it would be nice if the task manager pods get their values from the configuration file only if the pod templates don’t specify any resources. That was the goal of supporting pod templates, right? Allowing more custom scenarios without letting the configuration options get bloated. Regar

Re: logback variable substitution in kubernetes

2021-09-01 Thread Alexis Sarda-Espinosa
I'm fairly certain you need the curly braces surrounding the variable, the substitution is not done by the shell, it's just similar syntax (as mentioned in the doc http://logback.qos.ch/manual/configuration.html#variableSubstitution). Chapter 3: Logback configuration - QOS.ch

RE: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-02 Thread Alexis Sarda-Espinosa
; Alexis Sarda-Espinosa ; matth...@ververica.com; user@flink.apache.org Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests Hi Yang, I agree with you, but I think the limit-factor should be greater than or equal to 1, and default to 1 is a better

RE: Deploying Flink on Kubernetes with fractional CPU and different limits and requests

2021-09-03 Thread Alexis Sarda-Espinosa
. September 2021 08:09 To: Alexis Sarda-Espinosa Cc: spoon_lz ; Denis Cosmin NUTIU ; matth...@ververica.com; user@flink.apache.org Subject: Re: Deploying Flink on Kubernetes with fractional CPU and different limits and requests Hi Alexis Thanks for your valuable inputs. First, I want to share

Re: TaskManagers OOM'ing for Flink App with very large state only when restoring from checkpoint

2021-09-13 Thread Alexis Sarda-Espinosa
I'm not very knowledgeable when it comes to Linux memory management, but do note that Linux (and by extension Kubernetes) takes disk IO into account when deciding whether a process is using more memory than it's allowed to, see e.g. https://faun.pub/how-much-is-too-much-the-linux-oomkiller-and-u

RE: Fast serialization for Kotlin data classes

2021-09-16 Thread Alexis Sarda-Espinosa
Someone please correct me if I’m wrong but, until FLINK-16686 [1] is fixed, a class must be a POJO to be used in managed state with RocksDB, right? That’s not to say that the approach with TypeInfoFactory won’t work, just that even then it will mean none of the data classes can be used for manag

Kubernetes HA - Reusing storage dir for different clusters

2021-10-04 Thread Alexis Sarda-Espinosa
Hello, If I deploy a Flink-Kubernetes application with HA, I need to set high-availability.storageDir. If my application is a batch job that may run multiple times with the same configuration, do I need to manually clean up the storage dir between each execution? Regards, Alexis.

Re: Kubernetes HA - Reusing storage dir for different clusters

2021-10-08 Thread Alexis Sarda-Espinosa
. Regards, Alexis. From: Yang Wang Sent: Friday, October 8, 2021 5:24 AM To: Alexis Sarda-Espinosa Cc: Flink ML Subject: Re: Kubernetes HA - Reusing storage dir for different clusters When the Flink job reached to global terminal state(FAILED, CANCELED, FINISHED),

Troubleshooting checkpoint timeout

2021-10-19 Thread Alexis Sarda-Espinosa
Hello everyone, I am doing performance tests for one of our streaming applications and, after increasing the throughput a bit (~500 events per minute), it has started failing because checkpoints cannot be completed within 10 minutes. The Flink cluster is not exactly under my control and is runn

RE: Troubleshooting checkpoint timeout

2021-10-20 Thread Alexis Sarda-Espinosa
(keySelector) .connect(DataStreamUtils.reinterpretAsKeyedStream(openedEventsTimestamped, keySelector)) .process(...) Could this lead to delays or alignment issues? Regards, Alexis. From: Parag Somani Sent: Mittwoch, 20. Oktober 2021 09:22 To: Caizhi Weng Cc: Alexis Sarda-Espinosa

RE: Troubleshooting checkpoint timeout

2021-10-21 Thread Alexis Sarda-Espinosa
, Alexis. From: Alexis Sarda-Espinosa Sent: Mittwoch, 20. Oktober 2021 09:43 To: Parag Somani ; Caizhi Weng Cc: Flink ML Subject: RE: Troubleshooting checkpoint timeout Currently the windows are 10 minutes in size with a 1-minute slide time. The approximate 500 event/minute throughput is already

RE: Troubleshooting checkpoint timeout

2021-10-25 Thread Alexis Sarda-Espinosa
eam operator has lower parallelism? Regards, Alexis. From: Piotr Nowojski Sent: Montag, 25. Oktober 2021 09:59 To: Alexis Sarda-Espinosa Cc: Parag Somani ; Caizhi Weng ; Flink ML Subject: Re: Troubleshooting checkpoint timeout Hi Alexis, You can read about those metrics in the documentation

RE: Troubleshooting checkpoint timeout

2021-10-25 Thread Alexis Sarda-Espinosa
new checkpoint barriers created after the first restart are behind more data than before it restarted, no? Regards, Alexis. From: Piotr Nowojski Sent: Montag, 25. Oktober 2021 13:35 To: Alexis Sarda-Espinosa Cc: Parag Somani ; Caizhi Weng ; Flink ML Subject: Re: Troubleshooting checkpoint

RE: Troubleshooting checkpoint timeout

2021-10-25 Thread Alexis Sarda-Espinosa
checkpoints. Thanks again for all the info. Regards, Alexis. From: Piotr Nowojski Sent: Montag, 25. Oktober 2021 15:51 To: Alexis Sarda-Espinosa Cc: Parag Somani ; Caizhi Weng ; Flink ML Subject: Re: Troubleshooting checkpoint timeout Hi Alexis, > Should I understand these metrics as a prope

RE: Using POJOs with the table API

2021-10-26 Thread Alexis Sarda-Espinosa
20787 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Donnerstag, 5. August 2021 15:49 To: user@flink.apache.org Subject: Using POJOs with the table API Hi everyone, I had been using the DataSet API until now, but since that's been deprecated, I started looking into the Table API. In my Dat

Watermark behavior when connecting streams

2021-12-01 Thread Alexis Sarda-Espinosa
Hi everyone, Based on what I know, a single operator with parallelism > 1 checks the watermarks from all its streams and uses the smallest one out of the non-idle streams. My first question is whether watermarks are forwarded as long as a different watermark strategy is not applied downstream?

Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
Hello, I have a use case with event-time processing that ends up with a DAG roughly like this: source -> filter -> keyBy -> watermark -> window -> process -> keyBy_2 -> connect (KeyedCoProcessFunction) -> sink | / (s

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
n second 17, and my windows are evaluated every minute, so it wasn’t a race condition. Regards, Alexis. From: David Morávek Sent: Donnerstag, 2. Dezember 2021 14:52 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams Hi Alexis, I'm not s

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
reaches processElement1, even when considering watermarks. Regards, Alexis. From: David Morávek Sent: Donnerstag, 2. Dezember 2021 15:45 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams You can not rely on order of the two streams that easily. In

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
ávek Sent: Donnerstag, 2. Dezember 2021 16:59 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams I think this would require using lower level API and implementing a custom `TwoInputStreamOperator`. Then you can hook to `processWatemark{1,2}` metho

RE: Buffering when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
. Dezember 2021 17:18 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Buffering when connecting streams Even with the TwoInputStreamOperator you can not "halt" the processing. You need to buffer these elements for example in the ListState for later processing. At the t

RE: Watermark behavior when connecting streams

2021-12-02 Thread Alexis Sarda-Espinosa
://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#consecutive-windowed-operations Regards, Alexis. From: David Morávek Sent: Donnerstag, 2. Dezember 2021 17:26 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Watermark behavior when connecting streams

RE: Buffering when connecting streams

2022-01-18 Thread Alexis Sarda-Espinosa
a (Process)JoinFunction? The join needs keys, but I don’t know if the resulting stream counts as keyed from the state’s point of view. Regards, Alexis. From: Piotr Nowojski Sent: Montag, 6. Dezember 2021 08:43 To: David Morávek Cc: Alexis Sarda-Espinosa ; user@flink.apache.org Subject: Re

Re: Flink native k8s integration vs. operator

2022-01-20 Thread Alexis Sarda-Espinosa
rtainly doesn't mean an operator is a bad idea, it's just something that other users might want to keep in mind. Regards, Alexis. From: Robert Metzger Sent: Thursday, January 20, 2022 7:06 PM To: Alexis Sarda-Espinosa Cc: dev ; user Subject: Re: Flink

Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
Hi everyone, I'm seeing a lack of determinism in unit tests when using an interval join. I am testing with both Flink 1.11.3 and 1.14.3 and the relevant bits of my pipeline look a bit like this: keySelector1 = ... keySelector2 = ... rightStream = stream1 .flatMap(...) .keyBy(keySelector1)

Re: Determinism of interval joins

2022-01-27 Thread Alexis Sarda-Espinosa
marks should be deterministic, the input file is sorted, and the watermark strategies should essentially behave like the monotonous generator. [1] https://issues.apache.org/jira/browse/FLINK-24466 Regards, Alexis. ____ From: Alexis Sarda-Espinosa Sent: Thursday, January 2

RE: Determinism of interval joins

2022-01-29 Thread Alexis Sarda-Espinosa
mantics are commonly expected when handling multiple streams that need joins and so on. What do you think? Regards, Alexis. From: Robert Metzger Sent: Freitag, 28. Januar 2022 14:49 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Determinism of interval joins Instead of using

RE: Determinism of interval joins

2022-02-02 Thread Alexis Sarda-Espinosa
ption { if (mark.getTimestamp() > maxTimestamp2) { maxTimestamp2 = mark.getTimestamp(); } maybeProcessWatermark2(mark, maxTimestamp1, maxTimestamp1); } } Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Samstag, 29. Januar 2022 13:47 To: Robert Metzger Cc:

RE: Pojo State Migration - NPE with field deletion

2022-02-02 Thread Alexis Sarda-Espinosa
Hello, Happened to me too, here’s the JIRA ticket: https://issues.apache.org/jira/browse/FLINK-21752 Regards, Alexis. From: bastien dine Sent: Mittwoch, 2. Februar 2022 16:01 To: user Subject: Pojo State Migration - NPE with field deletion Hello, I have some trouble restoring a state (pojo)

Max parallelism and reactive mode

2022-03-03 Thread Alexis Sarda-Espinosa
Hi everyone, I have some questions regarding max parallelism and how interacts with deployment modes. The documentation states that max parallelism should be "set on a per-job and per-operator granularity" but doesn't provide more details. Is it possible to have different values of max parallel

Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
Hello, I'm in the process of updating from Flink 1.11.3 to 1.14.3, and it seems the interval join in my pipeline is no longer working. More specifically, I have a sliding window after the interval join, and the window isn't firing. After many tests, I ended up creating a custom operator that ex

RE: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
I found [1] and [2], which are closed, but could be related? [1] https://issues.apache.org/jira/browse/FLINK-23698 [2] https://issues.apache.org/jira/browse/FLINK-18934 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Donnerstag, 10. März 2022 19:27 To: user@flink.apache.org Subject

Re: Interval join operator is not forwarding watermarks correctly

2022-03-10 Thread Alexis Sarda-Espinosa
. [1] https://github.com/asardaes/flink-interval-join-test Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Thursday, March 10, 2022 7:47 PM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: RE: Interval join operator is not forwarding water

Re: Interval join operator is not forwarding watermarks correctly

2022-03-15 Thread Alexis Sarda-Espinosa
For completeness, this still happens with Flink 1.14.4 Regards, Alexis. From: Alexis Sarda-Espinosa Sent: Friday, March 11, 2022 12:21 AM To: user@flink.apache.org Cc: pnowoj...@apache.org Subject: Re: Interval join operator is not forwarding watermarks

RE: Interval join operator is not forwarding watermarks correctly

2022-03-17 Thread Alexis Sarda-Espinosa
Hi Dawid, Thanks for the update, I also managed to work around it by adding another watermark assignment operator between the join and the window. I’ll have to see if it’s possible to assign watermarks at the source, but even if it is, I worry that the different partitions created by all my key

Using state processor API to read state defined with a TypeHint

2022-04-08 Thread Alexis Sarda-Espinosa
Hi everyone, I have a ProcessWindowFunction that uses Global window state. It uses MapState with a descriptor defined like this: MapStateDescriptor> msd = new MapStateDescriptor<>( "descriptorName", TypeInformation.of(Long.class), TypeInformation.of(new TypeHint>() {}) );

RE: Using state processor API to read state defined with a TypeHint

2022-04-08 Thread Alexis Sarda-Espinosa
ginal Message- From: Roman Khachatryan Sent: Freitag, 8. April 2022 11:48 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: Using state processor API to read state defined with a TypeHint Hi Alexis, I think your setup is fine, but probably Java type erasure makes Flink con

RE: RocksDB state not cleaned up

2022-04-08 Thread Alexis Sarda-Espinosa
May I ask if anyone tested RocksDB’s periodic compaction in the meantime? And if yes, if it helped with this case. Regards, Alexis. From: tao xiao Sent: Samstag, 18. September 2021 05:01 To: David Morávek Cc: Yun Tang ; user Subject: Re: RocksDB state not cleaned up Thanks for the feedback!

RocksDB's state size discrepancy with what's seen with state processor API

2022-04-08 Thread Alexis Sarda-Espinosa
Hello, I have a streaming job running on Flink 1.14.4 that uses managed state with RocksDB with incremental checkpoints as backend. I've been monitoring a dev environment that has been running for the last week and I noticed that state size and end-to-end duration have been increasing steadily.

Re: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-08 Thread Alexis Sarda-Espinosa
with parallelism=4, this doesn't affect the result, does it? Regards, Alexis. From: Roman Khachatryan Sent: Friday, April 8, 2022 11:06 PM To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-11 Thread Alexis Sarda-Espinosa
Alexis. From: Roman Khachatryan mailto:ro...@apache.org>> Sent: Friday, April 8, 2022 11:06 PM To: Alexis Sarda-Espinosa mailto:alexis.sarda-espin...@microfocus.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> mailto:user@flink.ap

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-12 Thread Alexis Sarda-Espinosa
s, Alexis. -Original Message- From: Roman Khachatryan Sent: Dienstag, 12. April 2022 12:37 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's seen with state processor API Hi Alexis, Thanks a lot for sharing this. I t

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-12 Thread Alexis Sarda-Espinosa
rs save some information in the state as well. Regards, Alexis. -Original Message- From: Roman Khachatryan Sent: Dienstag, 12. April 2022 14:06 To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's seen with state proces

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-14 Thread Alexis Sarda-Espinosa
o MANIFEST File State; that accounts for almost 1.5GB. I believe that is one of the files RocksDB uses internally, but is that related to managed state used by my functions? Or does that indicate size growth is elsewhere? Regards, Alexis. -Original Message----- From: Alexis Sarda-Espin

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-19 Thread Alexis Sarda-Espinosa
constructors, i.e. before they are serialized to the TM, but the descriptors are Serializable, so I imagine that's not relevant. [1] https://stackoverflow.com/a/50510054/5793905 Regards, Alexis. -Original Message- From: Roman Khachatryan Sent: Dienstag, 19. April 2022 11:48 To: A

Re: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-19 Thread Alexis Sarda-Espinosa
ts? Regards, Alexis. From: Roman Khachatryan Sent: Tuesday, April 19, 2022 5:51 PM To: Alexis Sarda-Espinosa Cc: user@flink.apache.org Subject: Re: RocksDB's state size discrepancy with what's seen with state processor API > I assume that when you say &qu

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-21 Thread Alexis Sarda-Espinosa
9, 2022 at 10:52 PM Alexis Sarda-Espinosa wrote: > > I can look into RocksDB metrics, I need to configure Prometheus at some point > anyway. However, going back to the original question, is there no way to gain > more insight into this with the state processor API? You've me

Re: RocksDB's state size discrepancy with what's seen with state processor API

2022-04-22 Thread Alexis Sarda-Espinosa
letting the job run and wait? I've been reading through RocksDB documentation as well, but that might not be enough because I don't know how Flink handles its own framework state internally. Regards, Alexis. From: David Anderson Sent: Friday, April 2

RE: RocksDB's state size discrepancy with what's seen with state processor API

2022-05-02 Thread Alexis Sarda-Espinosa
Ok Regards, Alexis. From: Peter Brucia Sent: Freitag, 22. April 2022 15:31 To: Alexis Sarda-Espinosa Subject: Re: RocksDB's state size discrepancy with what's seen with state processor API No Sent from my iPhone

RE: Schema Evolution of POJOs fails on Field Removal

2022-05-18 Thread Alexis Sarda-Espinosa
Hi David, Please refer to https://issues.apache.org/jira/browse/FLINK-21752 Regards, Alexis. -Original Message- From: David Jost Sent: Mittwoch, 18. Mai 2022 15:07 To: user@flink.apache.org Subject: Schema Evolution of POJOs fails on Field Removal Hi, we currently have an issue, wher

Did the semantics of Kafka's earliest offset change with the new source API?

2022-07-13 Thread Alexis Sarda-Espinosa
Hello, I have a job running with Flink 1.15.0 that consumes from Kafka with the new KafkaSource API, setting a group ID explicitly and specifying OffsetsInitializer.earliest() as a starting offset. Today I restarted the job ignoring both savepoint and checkpoint, and the consumer started reading f

Making Kafka source respect offset changed externally

2022-07-14 Thread Alexis Sarda-Espinosa
Hello, Regarding the new Kafka source (configure with a consumer group), I found out that if I manually change the group's offset with Kafka's admin API independently of Flink (while the job is running), the Flink source will ignore that and reset it to whatever it stored internally. Is there any

Re: Making Kafka source respect offset changed externally

2022-07-14 Thread Alexis Sarda-Espinosa
ets? In this case, it should get the offsets from Kafka and > not the state. > > On Thu, Jul 14, 2022 at 11:18 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> Regarding the new Kafka source (configure with a consumer group), I f

Re: Did the semantics of Kafka's earliest offset change with the new source API?

2022-07-17 Thread Alexis Sarda-Espinosa
> *setStartFromEarliest()** / **setStartFromLatest()**: Start from the >> earliest / latest record. Under these modes, committed offsets in Kafka >> will be ignored and not used as starting positions.* >> >> On 13/07/2022 18:53, Alexis Sarda-Espinosa wrote: >> >>

Re: Making Kafka source respect offset changed externally

2022-07-20 Thread Alexis Sarda-Espinosa
pic \ --group our-group \ --command-config kafka-preprod.properties \ --reset-offsets --to-datetime '2022-07-20T00:01:00.000' \ --execute Regards, Alexis. Am Do., 14. Juli 2022 um 22:56 Uhr schrieb Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Hi Yaroslav, > > The

Re: Making Kafka source respect offset changed externally

2022-07-21 Thread Alexis Sarda-Espinosa
he job is initially started > from a clear slate. > Once checkpoints are involved it only relies on offsets stored in the > state. > > On 20/07/2022 14:51, Alexis Sarda-Espinosa wrote: > > Hello again, > > I just performed a test > using OffsetsInitializer.committedOffsets

Serialization in window contents and network buffers

2022-09-27 Thread Alexis Sarda-Espinosa
Hi everyone, I know the low level details of this are likely internal, but at a high level we can say that operators usually have some state associated with them. Particularly for error handling and job restarts, I imagine windows must persist state, and operators in general probably persist netwo

Window state size with global window and custom trigger

2022-10-07 Thread Alexis Sarda-Espinosa
Hello, I found an SO thread that clarifies some details of window state size [1]. I would just like to confirm that this also applies when using a global window with a custom trigger. The reason I ask is that the TriggerResult API is meant to cover all supported scenarios, so FIRE vs FIRE_AND_PUR

Re: Window state size with global window and custom trigger

2022-10-10 Thread Alexis Sarda-Espinosa
. > Hope these can help you. > > > On Sat, Oct 8, 2022 at 4:49 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> I found an SO thread that clarifies some details of window state size >> [1]. I would just like to confirm that this

Partial broadcast/keyed connected streams

2022-10-11 Thread Alexis Sarda-Espinosa
Hi everyone, I am currently thinking about a use case for a streaming job and, while I'm fairly certain it cannot be done with the APIs that Flink currently provides, I figured I'd put it out there in case other users think something like this would be useful to a wider audience. The current broa

Re: Partial broadcast/keyed connected streams

2022-10-11 Thread Alexis Sarda-Espinosa
tate as normal keyedstream. > > > > Best Regards! > > 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送 > > > > *发件人: *Alexis Sarda-Espinosa > *发送时间: *2022年10月12日 4:11 > *收件人: *user > *主题: *Partial broadcast/keyed connected streams > > >

Broadcast state restoration for BroadcastProcessFunction

2022-10-14 Thread Alexis Sarda-Espinosa
Hello, I wrote a test for a broadcast function to check how it handles broadcast state during retries [1] (the gist only shows a subset of the test in Kotlin, but it's hopefully understandable). The test will not pass unless my function also implements CheckpointedFunction, although those interfac

Broadcast state and job restarts

2022-10-27 Thread Alexis Sarda-Espinosa
Hello, The documentation for broadcast state specifies that it is always kept in memory. My assumptions based on this statement are: 1. If a job restarts in the same Flink cluster (i.e. using a restart strategy), the tasks' attempt number increases and the broadcast state is restored since it's n

Owner reference with the Kubernetes operator

2022-11-16 Thread Alexis Sarda-Espinosa
Hello, Is there a particular reason the operator doesn't set owner references for the Deployments it creates as a result of a FlinkDeployment CR? This makes tracking in the Argo CD UI impossible. (To be clear, I mean a reference from the Deployment to the FlinkDeployment). Regards, Alexis.

Savepoint restore mode for the Kubernetes operator

2022-11-16 Thread Alexis Sarda-Espinosa
Hello, Is there a recommended configuration for the restore mode of jobs managed by the operator? Since the documentation states that the operator keeps a savepoint history to perform cleanup, I imagine restore mode should always be NO_CLAIM, but I just want to confirm. Regards, Alexis.

Re: Owner reference with the Kubernetes operator

2022-11-16 Thread Alexis Sarda-Espinosa
at 3:32 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> Is there a particular reason the operator doesn't set owner references >> for the Deployments it creates as a result of a FlinkDeployment CR? This >> makes tracking

Kubernetes operator and jobs with last-state upgrades

2022-11-16 Thread Alexis Sarda-Espinosa
Hello, I am doing some tests with the operator and, if I'm not mistaken, using last-state upgrade means that, when something is changed in the CR, no savepoint is taken and the pods are simply terminated. Is that a requirement from Flink HA? I would have thought last-state would still use savepoin

Re: Savepoint restore mode for the Kubernetes operator

2022-11-29 Thread Alexis Sarda-Espinosa
> job. > In other words, savepoint cleanup should not clean the savepoint from the > old job which should only be controlled by restore mode. > So I think you could also set restore mode according to your needs. > > > On Wed, Nov 16, 2022 at 10:41 PM Alexis Sarda-Espinosa &l

Re: Savepoint restore mode for the Kubernetes operator

2022-11-29 Thread Alexis Sarda-Espinosa
oint cleanup should not clean the savepoint from >>> the old job which should only be controlled by restore mode. >>> So I think you could also set restore mode according to your needs. >>> >>> >>> On Wed, Nov 16, 2022 at 10:41 PM Alexis Sarda-Espinosa &

Clarification on checkpoint cleanup with RETAIN_ON_CANCELLATION

2022-12-05 Thread Alexis Sarda-Espinosa
Hello, I have a doubt about a very particular scenario with this configuration: - Flink HA enabled (Kubernetes). - ExternalizedCheckpointCleanup set to RETAIN_ON_CANCELLATION. - Savepoint restore mode left as default NO_CLAIM. During an upgrade, a stop-job-with-savepoint is triggered, and then t

Cleanup for high-availability.storageDir

2022-12-05 Thread Alexis Sarda-Espinosa
Hello, I see the number of entries in the directory configured for HA increases over time, particularly in the context of job upgrades in a Kubernetes environment managed by the operator. Would it be safe to assume that any files that haven't been updated in a while can be deleted? Assuming the ch

Re: Cleanup for high-availability.storageDir

2022-12-05 Thread Alexis Sarda-Espinosa
in the HA dir that > need to be cleaned up by the user: > > > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/#jobresultstore-resource-leak > > > Hope this helps > Gyula > > On Mon, 5 Dec 2022 at 11:56, Alexis Sarda-Espinosa &

Re: Cleanup for high-availability.storageDir

2022-12-06 Thread Alexis Sarda-Espinosa
Alexis Sarda-Espinosa < sarda.espin...@gmail.com>: > Hi Gyula, > > that certainly helps, but to set up automatic cleanup (in my case, of > azure blob storage), the ideal option would be to be able to set a simple > policy that deletes blobs that haven't been updated in s

Re: Cleanup for high-availability.storageDir

2022-12-07 Thread Alexis Sarda-Espinosa
- job_name/submittedJobGraphX >> - job_name/submittedJobGraphY >> >> Is it safe to clean these up when the job is in a healthy state? >> >> Regards, >> Alexis. >> >> Am Mo., 5. Dez. 2022 um 20:09 Uhr schrieb Alexis Sarda-Espinosa < >> sarda.es

Re: Cleanup for high-availability.storageDir

2022-12-07 Thread Alexis Sarda-Espinosa
pdates *without* savepoints"? Are you > referring to replacing the job's business logic without stopping the job? > > On Wed, Dec 7, 2022 at 3:17 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hi Matthias, >> >> Then the explanation

Re: Cleanup for high-availability.storageDir

2022-12-08 Thread Alexis Sarda-Espinosa
;s probably not what you want. > > @Gyula: Please correct me if I misunderstood something here. > > I hope that helped. > Matthias > > On Wed, Dec 7, 2022 at 4:19 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> I see, thanks for the details.

Re: Deterministic co-results

2022-12-08 Thread Alexis Sarda-Espinosa
Hi Salva, Just to give you further thoughts from another user, I think the "temporal join" semantics are very critical in this use case, and what you implement for that may not easily generalize to other cases. Because of that, I'm not sure if you can really define best practices that apply in gen

Re: Clarification on checkpoint cleanup with RETAIN_ON_CANCELLATION

2022-12-12 Thread Alexis Sarda-Espinosa
deletion? > In this case, the checkpoint will be cleaned and not retained and the > savepoint will remain. So you still could use savepoint to restore. > > On Mon, Dec 5, 2022 at 6:33 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello,

Backpressure due to busy sub-tasks

2022-12-13 Thread Alexis Sarda-Espinosa
Hello, I have a Kafka source (the new one) in Flink 1.15 that's followed by a process function with parallelism=2. Some days, I see long periods of backpressure in the source. During those times, the pool-usage metrics of all tasks stay between 0 and 1%, but the process function appears 100% busy.

Re: Backpressure due to busy sub-tasks

2022-12-16 Thread Alexis Sarda-Espinosa
t; martijnvis...@apache.org>: > Hi, > > Backpressure implies that it's actually a later operator that is busy. So > in this case, that would be your process function that can't handle the > incoming load from your Kafka source. > > Best regards, > > Martijn >

Could savepoints contain in-flight data?

2023-02-10 Thread Alexis Sarda-Espinosa
Hello, One feature of unaligned checkpoints is that the checkpoint barriers can overtake in-flight data, so the buffers are persisted as part of the state. The documentation for savepoints doesn't mention anything explicitly, so just to be sure, will savepoints always wait for in-flight data to b

Re: Could savepoints contain in-flight data?

2023-02-13 Thread Alexis Sarda-Espinosa
ink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/ >> [2] >> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/#checkpointing >> >> Alexis Sarda-Espinosa 于2023年2月11日周六 06:00写道: >> >>> Hello,

Calculation of UI's maximum non-heap memory

2023-02-20 Thread Alexis Sarda-Espinosa
Hello, I have configured a job manager with the following settings (Flink 1.16.1): jobmanager.memory.process.size: 1024m jobmanager.memory.jvm-metaspace.size: 150m jobmanager.memory.off-heap.size: 64m jobmanager.memory.jvm-overhead.min: 168m jobmanager.memory.jvm-overhead.max: 168m jobmanager.mem

Re: Calculation of UI's maximum non-heap memory

2023-02-20 Thread Alexis Sarda-Espinosa
52980629/runtime-getruntime-maxmemory-calculate-method > > Best, > Weihua > > > On Tue, Feb 21, 2023 at 12:15 AM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hello, >> >> I have configured a job manager with the following settin

Re: Calculation of UI's maximum non-heap memory

2023-02-21 Thread Alexis Sarda-Espinosa
; So, the maximum non-heap is 150+142+240 = 532m. > > > Best, > Weihua > > > On Tue, Feb 21, 2023 at 2:33 PM Alexis Sarda-Espinosa < > sarda.espin...@gmail.com> wrote: > >> Hi Weihua, >> >> Thanks for your response, I am familiar with those calculations,

  1   2   >