[jira] [Created] (FLINK-37327) Debezium Avro Format: Add FormatOption to Optionally Skip emitting UPDATE_BEFORE Rows

2025-02-14 Thread Kevin Lam (Jira)
Kevin Lam created FLINK-37327: - Summary: Debezium Avro Format: Add FormatOption to Optionally Skip emitting UPDATE_BEFORE Rows Key: FLINK-37327 URL: https://issues.apache.org/jira/browse/FLINK-37327

[jira] [Created] (FLINK-36017) Support Passing Metadata to Formats in KafkaDynamicSink

2024-08-08 Thread Kevin Lam (Jira)
Kevin Lam created FLINK-36017: - Summary: Support Passing Metadata to Formats in KafkaDynamicSink Key: FLINK-36017 URL: https://issues.apache.org/jira/browse/FLINK-36017 Project: Flink Issue Type

[Flink SQL Format] How to Process Metadata Columns in KafkaDynamicSink compatible EncodingFormat

2024-08-07 Thread Kevin Lam
Hi there, I'm looking to add some functionality to a custom Format that reads the `topic` metadata column in the context of serialization using the Flink SQL Kafka Connector. I'm c

[FLINK-22748] Update for "Allow dynamic target topic selection in SQL Kafka sinks"

2024-07-23 Thread Kevin Lam
Hi there, I've spent some time picking up the existing work for FLINK-22748 , and created a PR, building on Nicholas Jiang's work: https://github.com/apache/flink-connector-kafka/pull/109 I saw the issue is marked stale, so I'm starting this mai

Re: Potential Kafka Connector FLIP: Large Message Handling

2024-07-19 Thread Kevin Lam
ts then taken from external source and > put into the Kafka body or the other way round for serialization. This is > different to my Apicurio work that is to handle format specific headers. > > I assume you would want to add configuration to define where the external > storage l

Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-07-19 Thread Kevin Lam
t; In general, I think it would be good to be more explicit about the > > > schemas > > > > used ( > > > > > > > > > > > > > > https://docs.confluent.io/platform/curren/schema-registry/schema_registry_onprem_tutorial.html#auto-schema-r

[jira] [Created] (FLINK-35808) Let ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG be overridable by user in KafkaSourceBuilder

2024-07-10 Thread Kevin Lam (Jira)
Kevin Lam created FLINK-35808: - Summary: Let ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG be overridable by user in KafkaSourceBuilder Key: FLINK-35808 URL: https://issues.apache.org/jira/browse/FLINK-35808

Re: Potential Kafka Connector FLIP: Large Message Handling

2024-07-10 Thread Kevin Lam
the external > storage lives and authentication. Limitations around stack and heap sizes > would be worth considering. > > Am I understanding your intent correctly? > Kind regards, David. > > > From: Kevin Lam > Date: Wednesday, 10 July 2024 at 14:35 > T

Re: Potential Kafka Connector FLIP: Large Message Handling

2024-07-10 Thread Kevin Lam
23c216/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSourceBuilder.java#L464> to make it overridable. I'm planning to start a separate thread to propose making `value.serializer` overridable. On Mon, Jul 8, 2024 at 11:18 AM Kevin Lam wrote: > Hi Fa

Re: [DISCUSS] FLIP-XXX Apicurio-avro format

2024-07-08 Thread Kevin Lam
Hi David, Any updates on the Kafka Message Header support? I am also interested in supporting headers with the Flink SQL Formats: https://lists.apache.org/thread/spl88o63sjm2dv4l5no0ym632d2yt2o6 On Fri, Jun 14, 2024 at 6:10 AM David Radley wrote: > Hi everyone, > I have talked with Chesnay and

Re: Potential Kafka Connector FLIP: Large Message Handling

2024-07-08 Thread Kevin Lam
/apache/flink/connector/kafka/source/reader/deserializer/KafkaRecordDeserializationSchema.java#L107 > [3] > > https://github.com/bakdata/kafka-large-message-serde/blob/09eae933afaf8a1970b1b1bebcdffe934c368cb9/large-message-serde/src/main/java/com/bakdata/kafka/LargeMessageDeserializ

Re: Potential Kafka Connector FLIP: Large Message Handling

2024-07-08 Thread Kevin Lam
same time. > > > > It's not totally clear how Conduktor solved these issues, but IMO they > are > > worth keeping in mind. For Kroxylicious we decided these problems meant > it > > wasn't practical for us to implement this, but I'd be curious to know if >

Potential Kafka Connector FLIP: Large Message Handling

2024-07-05 Thread Kevin Lam
Hi all, Writing to see if the community would be open to exploring a FLIP for the Kafka Table Connectors. The FLIP would allow for storing Kafka Messages beyond a Kafka cluster's message limit (1 MB by default) out of band in cloud object storage or another backend. During serialization the messa

Re: Poor Load Balancing across TaskManagers for Multiple Kafka Sources

2024-06-06 Thread Kevin Lam
here. > > Best, > Zhanghao Chen > ________ > From: Kevin Lam > Sent: Thursday, June 6, 2024 2:32 > To: dev@flink.apache.org > Subject: Poor Load Balancing across TaskManagers for Multiple Kafka Sources > > Hey all, > > I'm seeing an issue with poor load bala

Re: Poor Load Balancing across TaskManagers for Multiple Kafka Sources

2024-06-05 Thread Kevin Lam
I've also just found https://issues.apache.org/jira/browse/FLINK-31762 which tracks the Kafka specific issue. On Wed, Jun 5, 2024 at 3:05 PM Kevin Lam wrote: > cc. panyuep...@apache.org as related to FLIP-370 > > On Wed, Jun 5, 2024 at 2:32 PM Kevin Lam wrote: > >> Hey a

Re: Poor Load Balancing across TaskManagers for Multiple Kafka Sources

2024-06-05 Thread Kevin Lam
cc. panyuep...@apache.org as related to FLIP-370 On Wed, Jun 5, 2024 at 2:32 PM Kevin Lam wrote: > Hey all, > > I'm seeing an issue with poor load balancing across TaskManagers for Kafka > Sources using the Flink SQL API and wondering if FLIP-370 will help with > it, or i

Poor Load Balancing across TaskManagers for Multiple Kafka Sources

2024-06-05 Thread Kevin Lam
Hey all, I'm seeing an issue with poor load balancing across TaskManagers for Kafka Sources using the Flink SQL API and wondering if FLIP-370 will help with it, or if not, interested in any ideas the community has to mitigate the issue. The Kafka SplitEnumerator uses the following logic to assign

Best Practices? Fault Isolation for Processing Large Number of Same-Shaped Input Kafka Topics in a Big Flink Job

2024-05-08 Thread Kevin Lam
Hi everyone, I'm currently prototyping on a project where we need to process a large number of Kafka input topics (say, a couple of hundred), all of which share the same DataType/Schema. Our objective is to run the same Flink SQL on all of the input topics, but I am concerned about doing this in

Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-03-26 Thread Kevin Lam
f FlinkToProtoSchemaConverter we might > end up overwriting the field-Ids. > If we are able to locate a prior schema, the approach you outlined makes a > lot of sense. > Let me explore this a bit further and get back(in terms of feasibility). > > Thanks again! > - Anupam > >

Re: Flink Kubernetes Operator Failing Over FlinkDeployments to a New Cluster

2024-03-22 Thread Kevin Lam
Thanks for sharing this work Gyula! That's great to see the FLIP covers some of the limitations already. I will follow the FLIP and associated JIRA ticket. Hi Matthias Pohl. I'd be interested to learn if there has been any progress on the FLIP-360 or associated JIRA issue FLINK-31709. On Fri, Mar

Re: Flink Kubernetes Operator Failing Over FlinkDeployments to a New Cluster

2024-03-21 Thread Kevin Lam
No worries, thanks for the reply Gyula. Ah yes, I see how those points you raised make the feature tricky to implement. Could this be considered for a FLIP (or two) in the future? On Wed, Mar 20, 2024 at 2:21 PM Gyula Fóra wrote: > Sorry for the late reply Kevin. > > I think what you are sugges

Efficient Use of Disk Resources (Only Attach Disks to Stateful Task Managers)

2024-03-14 Thread Kevin Lam
Hi all, I was wondering if anyone has any ideas or advice when it comes to being efficient about the use of disks with the RocksDB StateBackend. In general not all operators in a Flink Job will be stateful and require persistent disks to use with RocksDB, and so any stateless operators running on

Re: Flink Kubernetes Operator Failing Over FlinkDeployments to a New Cluster

2024-03-14 Thread Kevin Lam
e of course many many others. > > There may come a time when the Flink community decides to take on such a > scope but it feels a bit too much at this point to try to standardize this. > > Cheers, > Gyula > > On Wed, Mar 13, 2024 at 9:18 PM Kevin Lam > wrote: > >

Re: Flink Kubernetes Operator Failing Over FlinkDeployments to a New Cluster

2024-03-13 Thread Kevin Lam
e. > > Probably something worth trying :) > > -Max > > > > On Wed, Mar 6, 2024 at 9:09 PM Kevin Lam > wrote: > > > > Another thought could be modifying the operator to have a behaviour where > > upon first deploy, it optionally (flag/param enabled) finds

Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-03-12 Thread Kevin Lam
s/protobuf/AbstractKafkaProtobufSerializer.java#L157 > > > (|Deserializer). > Please let me know if this makes sense / or in case you have any other > feedback. > > Thanks > Anupam > > On Thu, Feb 29, 2024 at 8:54 PM Kevin Lam > wrote: > > > Hey Robert, >

Re: Flink Kubernetes Operator Failing Over FlinkDeployments to a New Cluster

2024-03-06 Thread Kevin Lam
Another thought could be modifying the operator to have a behaviour where upon first deploy, it optionally (flag/param enabled) finds the most recent snapshot and uses that as the initialSavepointPath to restore and run the Flink job. On Wed, Mar 6, 2024 at 2:07 PM Kevin Lam wrote: > Hi th

Flink Kubernetes Operator Failing Over FlinkDeployments to a New Cluster

2024-03-06 Thread Kevin Lam
Hi there, We use the Flink Kubernetes Operator, and I am investigating how we can easily support failing over a FlinkDeployment from one Kubernetes Cluster to another in the case of an outage that requires us to migrate a large number of FlinkDeployments from one K8s cluster to another. I underst

Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-29 Thread Kevin Lam
o that you can start reviewing and maybe also contributing to the > PR. > I hope this timeline works for you! > > Let's also decide if we need a FLIP once the code is public. > We will look into the field ids. > > > On Tue, Feb 27, 2024 at 8:56 PM Kevin Lam > w

Re: [DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-27 Thread Kevin Lam
ke sense to have the Confluent schema lookup in common > > code, which is part of the SchemaCoder readSchema logic. > > * I assume the ProtobufSchemaCoder readSchema would return a Protobuf > > Schema object. > > > > > > > > I also wondered whet

[DISCUSS] FLINK-34440 Support Debezium Protobuf Confluent Format

2024-02-21 Thread Kevin Lam
I would love to get some feedback from the community on this JIRA issue: https://issues.apache.org/jira/projects/FLINK/issues/FLINK-34440 I am looking into creating a PR and would appreciate some review on the approach. In terms of design I think we can mirror the `debezium-avro-confluent` and `a

[jira] [Created] (FLINK-34440) Support Debezium Protobuf Confluent Format

2024-02-13 Thread Kevin Lam (Jira)
Kevin Lam created FLINK-34440: - Summary: Support Debezium Protobuf Confluent Format Key: FLINK-34440 URL: https://issues.apache.org/jira/browse/FLINK-34440 Project: Flink Issue Type: New Feature

Re: Impact of redacting UPDATE_BEFORE fields?

2024-02-08 Thread Kevin Lam
ectly support > retractions (in case of updates and deletes). > > Great talk on this topic: > https://www.youtube.com/watch?v=iRlLaY-P6iE&ab_channel=PlainSchwarz (the > middle part is the most relevant). > > > On Wed, Feb 7, 2024 at 12:13 PM Kevin Lam > wrote: > >

Impact of redacting UPDATE_BEFORE fields?

2024-02-07 Thread Kevin Lam
Hi there! I have a question about Changelog Stream Processing with Flink SQL and the Flink Table API. I would like to better understand how UPDATE_BEFORE fields are used by Flink. Our team uses Debezium to extract Change Data Capture events from MySQL databases. We currently redact the `before` f

Re: Zero-Downtime Deployments with Flink Operator

2023-05-24 Thread Kevin Lam
ourse I > can be convinced otherwise if there is a general requirement / interest in > the community. In any case we should have confidence that this will > actually provide production value to many use-cases and it would require a > FLIP for sure. > > Cheers, > Gyula > > &

Zero-Downtime Deployments with Flink Operator

2023-05-24 Thread Kevin Lam
Hi, Is there any interest or ongoing work around supporting zero-downtime deployments with Flink using the Flink Operator? I saw that https://issues.apache.org/jira/browse/FLINK-24257 existed, but it looks a little stale. I'm interested in learning more about the current state of things. There i

Re: [DISCUSS] Release Flink 1.17.1

2023-05-16 Thread Kevin Lam
Hi! Thanks for doing this release. I'm looking forward to some of the bug fixes, is there a date set for the release of 1.17.1? On Mon, May 15, 2023 at 6:10 AM Lijie Wang wrote: > +1 for the release. > > Best, > Lijie > > Jing Ge 于2023年5月15日周一 17:07写道: > > > +1 for releasing 1.17.1 > > > > Best

Re: Flink Operator - Supporting Recovery from Snapshot

2023-02-10 Thread Kevin Lam
t; > This approach is indeed irreversible, but so far it's been working well. > > On Fri, Feb 10, 2023 at 8:17 AM Kevin Lam > wrote: > > > Thanks for the response Gyula! Those caveats make sense, and I see, > there's > > a bit of a complexity to consider if

Re: Flink Operator - Supporting Recovery from Snapshot

2023-02-10 Thread Kevin Lam
hen the operator > loses track of the latest checkpoint, mostly due to "incorrect" error > handling on the Flink side that also deletes the HA metadata. I think we > should strive to improve and eliminate most of these cases (as we have > already done for many of these problems)

Flink Operator - Supporting Recovery from Snapshot

2023-02-07 Thread Kevin Lam
Hello, I was reading the Flink Kubernetes Operator documentation and noticed that if you want to redeploy a Flink job from a specific snapshot, you must follow these manual recovery steps. Are there plans to streamline this process? Deploying from a specific snapshot is a relatively common operati

RE: [VOTE] FLIP-265 Deprecate and remove Scala API support

2022-10-19 Thread Kevin Lam
Non-binding -1. Aligned with the objections raised by Yaroslav in https://lists.apache.org/thread/d3borhdzj496nnggohq42fyb6zkwob3h and agree with his recommendation to support Java 17 as a prerequisite. We use the Scala APIs extensively and we would like to have an alternative to Scala so that we