Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-27 Thread Hongshun Wang
Hi,Raman Verma*:* Generally, Kafka responds quickly. However, as an asynchronous operation, we cannot guarantee that there will be no abnormal operations, such as temporary network issues. The two cases I mentioned are special situations to better understand the concept. Best Hongshun On Fri,

RE: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-27 Thread Raman Verma
Hello Hongshun Wang, You have mentioned that the first partition discovery can be very slow (section: Why do we need initialDiscoveryFinished?) Do you mean that Kafka can be slow to respond. If so, any idea under what conditions Kafka would be slow. Or, is it just a matter of bad timing, where

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-21 Thread Tzu-Li (Gordon) Tai
> I have already modified FLIP-288 to provide a newDiscoveryOffsetsInitializer in the KafkaSourceBuilder and KafkaSourceEnumerator. Users can use KafkaSourceBuilder#setNewDiscoveryOffsets to change the strategy for new partitions. Thanks for addressing my comment Hongshun. > Considering these rea

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-18 Thread Leonard Xu
Thanks Hongshun for deeper analysis of the existing KafkaSource implementation details, Cool! There’s no specific use case to use a future TIMESTAMP and SPECIFIC-OFFSET for new discovered partitions The existing SpecifiedOffsetsInitializer will use the EARLIEST offset for unspecified partitions

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-18 Thread Hongshun Wang
Hi Shammon, I agree with you. Since only EARLIEST is used, it's better not to mislead users through the interface. Yours Hongshun On Tue, Apr 18, 2023 at 7:12 PM Shammon FY wrote: > Hi Hongshun > > Thanks for your explanation, I have got your point. I review the FLIP again > and only have on

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-18 Thread Shammon FY
Hi Hongshun Thanks for your explanation, I have got your point. I review the FLIP again and only have one minor comment which won't block this FLIP: should we need in `OffsetsInitializer newDiscoveryOffsetsInitializer` in the constructor of `KafkaSourceEnumerator`? I think we can remove it if we

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-18 Thread Hongshun Wang
Hi Shammon, Thank you for your advice.I have carefully considered whether to show this in SQL DDL. Therefore, I carefully studied whether it is feasible Recently However, after reading the corresponding code more thoroughly, it appears that SpecifiedOffsetsInitializer and TimestampOffsetsInitial

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-14 Thread Shammon FY
Hi Hongshun Thanks for updating the FLIP, it totally sounds good to me. I just have one comment: How does sql job set new discovery offsets initializer? I found `DataStream` jobs can set different offsets initializers for new discovery partitions in `KafkaSourceBuilder.setNewDiscoveryOffsets`. Do

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-04-11 Thread Hongshun Wang
Hi everyone, I have already modified FLIP-288 to provide a newDiscoveryOffsetsInitializer in the KafkaSourceBuilder and KafkaSourceEnumerator. Users can use KafkaSourceBuilder#setNewDiscoveryOffsets to change the strategy for new partitions. Surely, enabling the partition discovery strategy by de

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-30 Thread Hongshun Wang
Hi everyone, Thanks for your participation. @Gordon, I looked at the several questions you raised: 1. Should we use the firstDiscovery flag or two separate OffsetsInitializers? Actually, I have considered later. If we follow my initial idea, we can provide a default earliest OffsetsIniti

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-29 Thread Shammon FY
Thanks Gordon and Leonard I'm sorry that there is no specific case from my side, but I consider the issue as follows 1. Users may set an offset later than the current time because Flink does not limit it 2. If we use EARLIEST for a newly discovered partition with different OFFSETs, which may be d

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-29 Thread Leonard Xu
Thanks Hongshun and Shammon for driving the FLIP! > *2. Clarification on "future-time" TIMESTAMP OffsetsInitializer* > *3. Clarification on coupling SPECIFIC-OFFSET startup with SPECIFIC-OFFSET > post-startup* Grodan raised a good point about the future TIMESTAMP and SPECIFIC-OFFSET, the timest

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-29 Thread Tzu-Li (Gordon) Tai
Hi Hongshun, Thank you for drafting the FLIP for this. Overall, the intent of the FLIP makes sense to me. I'm actually surprised that the partition discovery feature works as it is in the KafkaSource today - in older versions of the Kafka source connector (implemented on source v1), newly discove

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-27 Thread Shammon FY
Hi hongshun Thanks for updating, the FLIP's pretty good and looks good to me! Best, Shammon FY On Tue, Mar 28, 2023 at 11:16 AM Hongshun Wang wrote: > Hi Shammon, > > > Thanks a lot for your advise. I agree with your opinion now. It seems that > I forgot to consider that it may be at a certai

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-27 Thread Hongshun Wang
Hi Shammon, Thanks a lot for your advise. I agree with your opinion now. It seems that I forgot to consider that it may be at a certain point in the future. I will modify OffsetsInitializer to provide a different strategy for later discovered partitions, by which users can also customize strate

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-27 Thread Shammon FY
Hi Hongshun Thanks for your answer. I think the startup offset of Kafka such as timestamp or specific_offset has no relationship with `Window Operator`. Users can freely set the starting position according to their needs, it may be before the latest Kafka data, or it may be at a certain point in

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-23 Thread Hongshun Wang
"If all new messages in old partitions should be consumed, all new messages in new partitions should also be consumed." Sorry, I wrote the last sentence incorrectly. On Fri, Mar 24, 2023 at 11:15 AM Hongshun Wang wrote: > Hi Shammon, > > Thanks for your advise! I learn a lot about TIMESTAMP/SP

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-23 Thread Hongshun Wang
Hi Shammon, Thanks for your advise! I learn a lot about TIMESTAMP/SPECIFIC_OFFSET. That's interesting. However, I have a different opinion. If a user employs the SPECIFIC_OFFSET strategy and enables auto-discovery, they will be able to find new partitions beyond the specified offset. Otherwise,

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-23 Thread Shammon FY
Hi Hongshun Thanks for driving this discussion. Automatically discovering partitions without losing data sounds great! Currently flink supports kafka source with different startup modes, such as EARLIEST, LATEST, TIMESTAMP, SPECIFIC_OFFSETS and GROUP_OFFSET. If I understand correctly, you will s

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-21 Thread Hongshun Wang
Hi, Hang, Thanks for your advice. When the second case will occur? Currently, there are three ways to specify partitions in Kafka: by topic, by partition, and by matching the topic with a regular expression. Currently, if the initial partition number is 0, an error will occur for the first two me

Re: [DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-21 Thread Hang Ruan
Hi, Hongshun, Thank you for starting this discussion. I have some problems about the field `firstDiscoveryDone`. In the FLIP, why we need firstDiscoveryDone is as follows. > Why do we need firstDiscoveryDone? Only relying on the unAssignedInitialPartitons attribute cannot distinguish between the

[DISCUSS] FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source

2023-03-17 Thread Hongshun Wang
Hi everyone, I would like to start a discussion on FLIP-288:Enable Dynamic Partition Discovery by Default in Kafka Source[1]. As described in mail thread[2], dynamic partition discovery is disabled by default and users have to explicitly specify the interval of discovery in order to turn it on. B