date:20240108

Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-08 Thread xiangyu feng

Hi Rui and Yong,

Thx for ur reply.

My initial attention here is that for short-lived jobs under high QPS: a
fixed delay retry strategy will cause extra resource waste and not flexible
enough, an exponential-backoff strategy might significantly increase the
query latency since the interval time grows too fast. An incremental-delay
strategy could be balanced between resource consumption and short-query
latency.

With a second thought,  an exponential-delay retry strategy with a
configurable multiplier option can also achieve this goal. By setting the
default value of multiplier to 1, we can be consistent with the original
behavior and reduce the configuration items at the same time.

I've updated this FLIP accordingly, look forward to your feedback.

Regards,
Xiangyu Feng


Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 15:29写道：

> Only one strategy is fine to me.
>
> When the multiplier is set to 1, the exponential-delay will become
> fixed-delay.
> So fixed-delay may not be needed.
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 2:17 PM Yong Fang  wrote:
>
> > I agree with @Rui that the current configuration for Flink Client is a
> > little complex. Can we just provide one strategy with less configuration
> > items for all scenarios?
> >
> > Best,
> > Fang Yong
> >
> > On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Thanks xiangyu for driving this proposal! And sorry for the
> > > late reply.
> > >
> > > Overall looks good to me, I only have some minor questions:
> > >
> > > 1. Do we need to introduce 3 collect strategies in the first version?
> > >
> > > Large and comprehensive configuration items will bring
> > > additional learning costs and usage costs to users. I tend to
> > > provide users with out-of-the-box parameters and 2 collect
> > > strategies may be enough for users.
> > >
> > > IIUC, there is no big difference between exponential-delay and
> > > incremental-delay, especially the default parameters provided.
> > > I wonder could we provide a multiplier for exponential-delay strategy
> > > and removing the incremental-delay strategy?
> > >
> > > Of course, if you think multiplier option is not needed based on
> > > your production experience, it's totally fine for me. Simple is better.
> > >
> > > 2. Which strategy do you think is best in mass production?
> > >
> > > I'm working on FLIP-364[1], it's related to Flink failover restart
> > > strategy. IIUC, when one cluster only has a few flink jobs,
> > > fixed-delay is fine. It guarantees minimal latency without too
> > > much stress. But if one cluster has too many jobs, fixed-delay
> > > may not be stable.
> > >
> > > Do you think exponential-delay is better than fixed delay in this
> > > scenario? And which strategy is used in your production for now?
> > > Would you mind sharing it?
> > >
> > > Looking forwarding to your opinion~
> > >
> > > Best,
> > > Rui
> > >
> > > On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Thanks for the comments.
> > > >
> > > > If there is no further comment, we will open the voting thread next
> > week.
> > > >
> > > > Regards,
> > > > Xiangyu
> > > >
> > > > Zhanghao Chen  于2024年1月3日周三 16:46写道：
> > > >
> > > > > Thanks for driving this effort on improving the interactive use
> > > > experience
> > > > > of Flink. The proposal overall looks good to me.
> > > > >
> > > > > Best,
> > > > > Zhanghao Chen
> > > > > 
> > > > > From: xiangyu feng 
> > > > > Sent: Tuesday, December 26, 2023 16:51
> > > > > To: dev@flink.apache.org 
> > > > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> > > > > interactive scenarios
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > I'm opening this thread to discuss FLIP-407: Improve Flink Client
> > > > > performance in interactive scenarios. The POC test results and
> design
> > > doc
> > > > > can be found at: FLIP-407
> > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
> > > > > >
> > > > > .
> > > > >
> > > > > Currently, Flink Client is mainly designed for one time interaction
> > > with
> > > > > the Flink Cluster. All the resources(http connections, threads, ha
> > > > > services) and instances(ClusterDescriptor, ClusterClient,
> RestClient)
> > > are
> > > > > created and recycled for each interaction. This works well when
> users
> > > do
> > > > > not need to interact frequently with Flink Cluster and also saves
> > > > resource
> > > > > usage since resources are recycled immediately after each usage.
> > > > >
> > > > > However, in OLAP or StreamingWarehouse scenarios, users might
> submit
> > > > > interactive jobs to a dedicated Flink Session Cluster very often.
> In
> > > this
> > > > > case, we find that for short queries that can finish in less than
> 1s
> > in
> > > > > Flink Cluster will still have E2E lat

Re: [DISCUSS] FLIP-402: Extend ZooKeeper Curator configurations

2024-01-08 Thread Alex Nitavsky

Thanks Zhanghao Chen for the feedback.

In general right now there is no way to authorise Flink via the Curator
framework, which is probably an important feature which is missing. Since
public configuration change requires a Flip, current Flip is proposing to
add several more potentially important Curator configurations.

Kind regards
Oleksandr

On Thu, Jan 4, 2024 at 4:57 AM Zhanghao Chen 
wrote:

> Thanks for driving this. I'm not familiar with the listed advanced Curator
> configs, but the previous added switch for disabling ensemble tracking [1]
> saved us when deploying Flink in a cloud env where ZK can only be
> accessible via URLs. That being said, +1 for the overall idea, these
> configs may help users in certain scenarios sooner or later.
>
> [1] https://issues.apache.org/jira/browse/FLINK-31780
>
> Best,
> Zhanghao Chen
> 
> From: Alex Nitavsky 
> Sent: Thursday, December 14, 2023 21:20
> To: dev@flink.apache.org 
> Subject: [DISCUSS] FLIP-402: Extend ZooKeeper Curator configurations
>
> Hi all,
>
> I would like to start a discussion thread for: *FLIP-402: Extend ZooKeeper
> Curator configurations *[1]
>
> * Problem statement *
> Currently Flink misses several Apache Curator configurations, which could
> be useful for Flink deployment with ZooKeeper as HA provider.
>
> * Proposed solution *
> We have inspected all possible options for Apache Curator and proposed
> those which could be valuable for Flink users:
>
> - high-availability.zookeeper.client.authorization [2]
> - high-availability.zookeeper.client.maxCloseWaitMs [3]
> - high-availability.zookeeper.client.simulatedSessionExpirationPercent [4]
>
> The proposed way is to reflect those properties into Flink configuration
> options for Apache ZooKeeper.
>
> Looking forward to your feedback and suggestions.
>
> Kind regards
> Oleksandr
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-402%3A+Extend+ZooKeeper+Curator+configurations
> [2]
>
> https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#authorization(java.lang.String,byte%5B%5D)
> [3]
>
> https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#maxCloseWaitMs(int)
> [4]
>
> https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#simulatedSessionExpirationPercent(int)
>

Re: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

2024-01-08 Thread Yangze Guo

Thanks for the pointer, Rui!

I have reviewed FLIP-383, and based on my understanding, this feature
should be enabled by default for batch jobs in the future. Therefore,
+1 for checking the parameters and issuing log warnings when the user
explicitly configures execution.batch.job-recovery.enabled to true.

+1 for high-availability.job-recovery.enabled, which would be more
suitable with YAML hierarchy.


Best,
Yangze Guo

On Mon, Jan 8, 2024 at 3:43 PM Rui Fan <1996fan...@gmail.com> wrote:
>
> Thanks to Yangze driving this proposal!
>
> Overall looks good to me! This proposal is useful for
> the performance when the job doesn't need the failover.
>
> I have some minor questions:
>
> 1. How does it work with FLIP-383[1]?
>
> This FLIP introduces a high-availability.enable-job-recovery,
> and FLIP-383 introduces a execution.batch.job-recovery.enabled.
>
> IIUC, when high-availability.enable-job-recovery is false, the job
> cannot recover even if execution.batch.job-recovery.enabled = true,
> right?
>
> If so, could we check some parameters and warn some logs? Or
> disable the execution.batch.job-recovery.enabled directly when
> high-availability.enable-job-recovery = false.
>
> 2. Could we rename it to high-availability.job-recovery.enabled to unify
> the naming?
>
> WDYT?
>
> [1] https://cwiki.apache.org/confluence/x/QwqZE
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 2:04 PM Yangze Guo  wrote:
>
> > Thanks for your comment, Yong.
> >
> > Here are my thoughts on the splitting of HighAvailableServices:
> > Firstly, I would treat this separation as a result of technical debt
> > and a side effect of the FLIP. In order to achieve a cleaner interface
> > hierarchy for High Availability before Flink 2.0, the design decision
> > should not be limited to OLAP scenarios.
> > I agree that the current HAServices can be divided based on either the
> > actual target (cluster & job) or the type of functionality (leader
> > election & persistence). From a conceptual perspective, I do not see
> > one approach being better than the other. However, I have chosen the
> > current separation for a clear separation of concerns. After FLIP-285,
> > each process has a dedicated LeaderElectionService responsible for
> > leader election of all the components within it. This
> > LeaderElectionService has its own lifecycle management. If we were to
> > split the HAServices into 'ClusterHighAvailabilityService' and
> > 'JobHighAvailabilityService', we would need to couple the lifecycle
> > management of these two interfaces, as they both rely on the
> > LeaderElectionService and other relevant classes. This coupling and
> > implicit design assumption will increase the complexity and testing
> > difficulty of the system. WDYT?
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Jan 8, 2024 at 12:08 PM Yong Fang  wrote:
> > >
> > > Thanks Yangze for starting this discussion. I have one comment: why do we
> > > need to abstract two services as `LeaderServices` and
> > > `PersistenceServices`?
> > >
> > > From the content, the purpose of this FLIP is to make job failover more
> > > lightweight, so it would be more appropriate to abstract two services as
> > > `ClusterHighAvailabilityService` and `JobHighAvailabilityService` instead
> > > of `LeaderServices` and `PersistenceServices` based on leader and store.
> > In
> > > this way, we can create a `JobHighAvailabilityService` that has a leader
> > > service and store for the job that meets the requirements based on the
> > > configuration in the zk/k8s high availability service.
> > >
> > > WDYT?
> > >
> > > Best,
> > > Fang Yong
> > >
> > > On Fri, Dec 29, 2023 at 8:10 PM xiangyu feng 
> > wrote:
> > >
> > > > Thanks Yangze for restart this discussion.
> > > >
> > > > +1 for the overall idea. By splitting the HighAvailabilityServices into
> > > > LeaderServices and PersistenceServices, we may support configuring
> > > > different storage behind them in the future.
> > > >
> > > > We did run into real problems in production where too much job
> > metadata was
> > > > being stored on ZK, causing system instability.
> > > >
> > > >
> > > > Yangze Guo  于2023年12月29日周五 10:21写道：
> > > >
> > > > > Thanks for the response, Zhanghao.
> > > > >
> > > > > PersistenceServices sounds good to me.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Wed, Dec 27, 2023 at 11:30 AM Zhanghao Chen
> > > > >  wrote:
> > > > > >
> > > > > > Thanks for driving this effort, Yangze! The proposal overall LGTM.
> > > > Other
> > > > > from the throughput enhancement in the OLAP scenario, the separation
> > of
> > > > > leader election/discovery services and the metadata persistence
> > services
> > > > > will also make the HA impl clearer and easier to maintain. Just a
> > minor
> > > > > comment on naming: would it better to rename PersistentServices to
> > > > > PersistenceServices, as usually we put a noun before Services?
> > > > > >
> > > > > > Best,
> > > > > > Zhanghao Chen
> > > > > > ___

Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-08 Thread Rui Fan

Thanks Xiangyu for the quick update!

LGTM

Best,
Rui

On Mon, Jan 8, 2024 at 4:27 PM xiangyu feng  wrote:

> Hi Rui and Yong,
>
> Thx for ur reply.
>
> My initial attention here is that for short-lived jobs under high QPS: a
> fixed delay retry strategy will cause extra resource waste and not flexible
> enough, an exponential-backoff strategy might significantly increase the
> query latency since the interval time grows too fast. An incremental-delay
> strategy could be balanced between resource consumption and short-query
> latency.
>
> With a second thought,  an exponential-delay retry strategy with a
> configurable multiplier option can also achieve this goal. By setting the
> default value of multiplier to 1, we can be consistent with the original
> behavior and reduce the configuration items at the same time.
>
> I've updated this FLIP accordingly, look forward to your feedback.
>
> Regards,
> Xiangyu Feng
>
>
> Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 15:29写道：
>
>> Only one strategy is fine to me.
>>
>> When the multiplier is set to 1, the exponential-delay will become
>> fixed-delay.
>> So fixed-delay may not be needed.
>>
>> Best,
>> Rui
>>
>> On Mon, Jan 8, 2024 at 2:17 PM Yong Fang  wrote:
>>
>> > I agree with @Rui that the current configuration for Flink Client is a
>> > little complex. Can we just provide one strategy with less configuration
>> > items for all scenarios?
>> >
>> > Best,
>> > Fang Yong
>> >
>> > On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com> wrote:
>> >
>> > > Thanks xiangyu for driving this proposal! And sorry for the
>> > > late reply.
>> > >
>> > > Overall looks good to me, I only have some minor questions:
>> > >
>> > > 1. Do we need to introduce 3 collect strategies in the first version?
>> > >
>> > > Large and comprehensive configuration items will bring
>> > > additional learning costs and usage costs to users. I tend to
>> > > provide users with out-of-the-box parameters and 2 collect
>> > > strategies may be enough for users.
>> > >
>> > > IIUC, there is no big difference between exponential-delay and
>> > > incremental-delay, especially the default parameters provided.
>> > > I wonder could we provide a multiplier for exponential-delay strategy
>> > > and removing the incremental-delay strategy?
>> > >
>> > > Of course, if you think multiplier option is not needed based on
>> > > your production experience, it's totally fine for me. Simple is
>> better.
>> > >
>> > > 2. Which strategy do you think is best in mass production?
>> > >
>> > > I'm working on FLIP-364[1], it's related to Flink failover restart
>> > > strategy. IIUC, when one cluster only has a few flink jobs,
>> > > fixed-delay is fine. It guarantees minimal latency without too
>> > > much stress. But if one cluster has too many jobs, fixed-delay
>> > > may not be stable.
>> > >
>> > > Do you think exponential-delay is better than fixed delay in this
>> > > scenario? And which strategy is used in your production for now?
>> > > Would you mind sharing it?
>> > >
>> > > Looking forwarding to your opinion~
>> > >
>> > > Best,
>> > > Rui
>> > >
>> > > On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng 
>> > wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > Thanks for the comments.
>> > > >
>> > > > If there is no further comment, we will open the voting thread next
>> > week.
>> > > >
>> > > > Regards,
>> > > > Xiangyu
>> > > >
>> > > > Zhanghao Chen  于2024年1月3日周三 16:46写道：
>> > > >
>> > > > > Thanks for driving this effort on improving the interactive use
>> > > > experience
>> > > > > of Flink. The proposal overall looks good to me.
>> > > > >
>> > > > > Best,
>> > > > > Zhanghao Chen
>> > > > > 
>> > > > > From: xiangyu feng 
>> > > > > Sent: Tuesday, December 26, 2023 16:51
>> > > > > To: dev@flink.apache.org 
>> > > > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
>> > > > > interactive scenarios
>> > > > >
>> > > > > Hi devs,
>> > > > >
>> > > > > I'm opening this thread to discuss FLIP-407: Improve Flink Client
>> > > > > performance in interactive scenarios. The POC test results and
>> design
>> > > doc
>> > > > > can be found at: FLIP-407
>> > > > > <
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
>> > > > > >
>> > > > > .
>> > > > >
>> > > > > Currently, Flink Client is mainly designed for one time
>> interaction
>> > > with
>> > > > > the Flink Cluster. All the resources(http connections, threads, ha
>> > > > > services) and instances(ClusterDescriptor, ClusterClient,
>> RestClient)
>> > > are
>> > > > > created and recycled for each interaction. This works well when
>> users
>> > > do
>> > > > > not need to interact frequently with Flink Cluster and also saves
>> > > > resource
>> > > > > usage since resources are recycled immediately after each usage.
>> > > > >
>> > > > > However, in OLAP or Streaming

[jira] [Created] (FLINK-34017) Connectors docs with 404 link in dynamodb.md

2024-01-08 Thread Zhongqiang Gong (Jira)

Zhongqiang Gong created FLINK-34017:
---

 Summary: Connectors docs with 404 link in dynamodb.md
 Key: FLINK-34017
 URL: https://issues.apache.org/jira/browse/FLINK-34017
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Zhongqiang Gong
 Attachments: image-2024-01-08-17-21-16-434.png

docs Link: 
[https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/dynamodb/|http://example.com/]

screenshort:

!image-2024-01-08-17-21-16-434.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] Release flink-shaded 18.0, release candidate #1

2024-01-08 Thread Piotr Nowojski

Hi!

+1 (binding)

Best,
Piotrek

pt., 5 sty 2024 o 00:58 Sergey Nuyanzin  napisał(a):

> Bubble up, we need more votes, especially from PMC members.
>
> On Thu, Dec 28, 2023 at 1:29 PM Martijn Visser 
> wrote:
>
> > Hi,
> >
> > +1 (binding)
> >
> > - Validated hashes
> > - Verified signature
> > - Verified that no binaries exist in the source archive
> > - Build the source with Maven
> > - Verified licenses
> > - Verified web PRs
> >
> > Best regards,
> >
> > Martijn
> >
> > On Mon, Dec 11, 2023 at 12:11 AM Sergey Nuyanzin 
> > wrote:
> > >
> > > Hey everyone,
> > >
> > > The vote for flink-shaded 18.0 is still open. Please test and vote for
> > > rc1, so that we can release it.
> > >
> > > On Thu, Nov 30, 2023 at 4:03 PM Jing Ge 
> > wrote:
> > >
> > > > +1(not binding)
> > > >
> > > > - validate checksum
> > > > - validate hash
> > > > - checked the release notes
> > > > - verified that no binaries exist in the source archive
> > > > - build the source with Maven 3.8.6 and jdk11
> > > > - checked repo
> > > > - checked tag
> > > > - verified web PR
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Thu, Nov 30, 2023 at 11:39 AM Sergey Nuyanzin <
> snuyan...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > - Downloaded all the resources
> > > > > - Validated checksum hash
> > > > > - Build the source with Maven and jdk8
> > > > > - Build Flink master with new flink-shaded and check that all the
> > tests
> > > > are
> > > > > passing
> > > > >
> > > > > one minor thing that I noticed during releasing: for ci it uses
> maven
> > > > 3.8.6
> > > > > at the same time for release profile there is an enforcement plugin
> > to
> > > > > check that maven version is less than 3.3
> > > > > I created a jira issue[1] for that
> > > > > i made the release with 3.2.5 maven version (I suppose previous
> > version
> > > > was
> > > > > also done with 3.2.5 because of same issue)
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-33703
> > > > >
> > > > > On Wed, Nov 29, 2023 at 11:41 AM Matthias Pohl <
> > matthias.p...@aiven.io>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > * Downloaded all resources
> > > > > > * Extracts sources and compilation on these sources
> > > > > > * Diff of git tag checkout with downloaded sources
> > > > > > * Verifies SHA512 checksums & GPG certification
> > > > > > * Checks that all POMs have the right expected version
> > > > > > * Generated diffs to compare pom file changes with NOTICE files:
> > > > Nothing
> > > > > > suspicious found except for a minor (non-blocking) typo [1]
> > > > > >
> > > > > > Thanks for driving this effort, Sergey. :)
> > > > > >
> > > > > > [1]
> > https://github.com/apache/flink-shaded/pull/126/files#r1409080162
> > > > > >
> > > > > > On Wed, Nov 29, 2023 at 10:25 AM Rui Fan <1996fan...@gmail.com>
> > wrote:
> > > > > >
> > > > > >> Sorry, it's non-binding.
> > > > > >>
> > > > > >> On Wed, Nov 29, 2023 at 5:19 PM Rui Fan <1996fan...@gmail.com>
> > wrote:
> > > > > >>
> > > > > >> > Thanks Matthias for the clarification!
> > > > > >> >
> > > > > >> > After I import the latest KEYS, it works fine.
> > > > > >> >
> > > > > >> > +1(binding)
> > > > > >> >
> > > > > >> > - Validated checksum hash
> > > > > >> > - Verified signature
> > > > > >> > - Verified that no binaries exist in the source archive
> > > > > >> > - Build the source with Maven and jdk8
> > > > > >> > - Verified licenses
> > > > > >> > - Verified web PRs, and left a comment
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Rui
> > > > > >> >
> > > > > >> > On Wed, Nov 29, 2023 at 5:05 PM Matthias Pohl
> > > > > >> >  wrote:
> > > > > >> >
> > > > > >> >> The key is the last key in the KEYS file. It's just having a
> > > > > different
> > > > > >> >> format with spaces being added (due to different gpg
> > versions?):
> > > > F752
> > > > > >> 9FAE
> > > > > >> >> 2481 1A5C 0DF3  CA74 1596 BBF0 7268 35D8
> > > > > >> >>
> > > > > >> >> On Wed, Nov 29, 2023 at 9:41 AM Rui Fan <
> 1996fan...@gmail.com>
> > > > > wrote:
> > > > > >> >>
> > > > > >> >> > Hey Sergey,
> > > > > >> >> >
> > > > > >> >> > Thank you for driving this release.
> > > > > >> >> >
> > > > > >> >> > I try to check this signature, the whole key is
> > > > > >> >> > F7529FAE24811A5C0DF3CA741596BBF0726835D8,
> > > > > >> >> > it matches your 1596BBF0726835D8, but I cannot
> > > > > >> >> > find it from the Flink KEYS[1].
> > > > > >> >> >
> > > > > >> >> > Please correct me if my operation is wrong, thanks~
> > > > > >> >> >
> > > > > >> >> > [1] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > >> >> >
> > > > > >> >> > Best,
> > > > > >> >> > Rui
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Wed, Nov 29, 2023 at 6:09 AM Sergey Nuyanzin <
> > > > > snuyan...@gmail.com
> > > > > >> >
> > > > > >> >> > wrote:
> > > > > >> >> >
> > > > > >> >> > > Hi everyone,
> > > > > >> >> > > Please review and vote on the relea

Re: FLIP-413: Enable unaligned checkpoints by default

2024-01-08 Thread Zakelly Lan

Hi Piotr,

Thanks for driving this! Generally I support enabling the alignment timeout
for aligned checkpoint. And I second Rui's opinion, 30s seems a reasonable
value.

However I'm worried if there are some operators that do not support the
unaligned CP, which may cause data accuracy problems (as one described in
the doc[1])? How about providing a mechanism for users to claim the support
for unaligned CP and check before enabling it automatically?


[1]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#interplay-with-watermarks

Best,
Zakelly

On Mon, Jan 8, 2024 at 3:02 PM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks to Piotr driving this proposal!
>
> Enabling unaligned checkpoint with aligned checkpoints timeout
> is fine for me. I'm not sure if aligned checkpoints timeout =5s is
> too aggressive. If the unaligned checkpoint is enabled by default
> for all jobs, I recommend that the aligned checkpoints timeout be
> at least 30s.
>
> If the 30s is too big for some of the flink jobs, flink users can turn
> it down by themselves.
>
> To David, Ken and Zhanghao:
>
> Unaligned checkpoint indeed has some limitations than aligned checkpoint,
> but if we set aligned checkpoints timeout= 30s or 60s, it means
> when a job can be completed within 30s or 60s, this job still uses the
> aligned checkpoint (it doesn't introduce any extra effort).
> When the checkpoint cannot be completed within aligned checkpoints timeout,
> the aligned checkpoint will be switched to the unaligned checkpoint
> The unaligned checkpoint can be completed when backpressure is severe.
>
> In brief, when backpressure is low, enabling them without any effort.
> when backpressure is high, enabling them has some benefits.
>
> So I think it doesn't have too many risks when aligned checkpoints timeout
> is set to 30s or above. WDYT?
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen 
> wrote:
>
> > Hi Piotr,
> >
> > As a platform administer who runs kilos of Flink jobs, I'd be against the
> > idea to enable unaligned cp by default for our jobs. It may help a
> > significant portion of the users, but the subtle issues around unaligned
> CP
> > for a few jobs will probably raise a lot more on-calls and incidents.
> From
> > my point of view, we'd better not enable it by default before removing
> all
> > the limitations listed in
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
> > .
> >
> > Best,
> > Zhanghao Chen
> > 
> > From: Piotr Nowojski 
> > Sent: Friday, January 5, 2024 21:41
> > To: dev 
> > Subject: FLIP-413: Enable unaligned checkpoints by default
> >
> > Hi!
> >
> > I would like to propose by default to enable unaligned checkpoints and
> also
> > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I
> > think this change is the right one to do for the majority of Flink users.
> >
> > For more rationale please take a look into the short FLIP-413 [1].
> >
> > What do you all think?
> >
> > Best,
> > Piotrek
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
> >
>

Re: FLIP-413: Enable unaligned checkpoints by default

2024-01-08 Thread Piotr Nowojski

Hi thanks for the responses,

And thanks for pointing out the jobs upgrade issue. Indeed that has
slipped my mind. I was mistakenly
thinking that we are supporting all upgrades only via savepoint. Anyway,
maybe in that case we should
guide users towards that? Using savepoints for upgrades? That would be even
easier to understand
for the users:
- use unaligned checkpoints for checkpoints
- use savepoints for any changes in the job/version upgrades

There is a downside, that savepoints are always full, while aligned
checkpoints can be incremental.

WDYT?

Regarding the value for the timeout, I would also be fine with 30s. Indeed
that's a safer default.

> On a separate point, in the sentence below it seems to me it would be
> clearer to say that in the unlikely scenario you've described, the change
> would "significantly increase checkpoint sizes" -- assuming I understand
> things correctly.

I've reworded that paragraph.

Best,
Piotrek



pon., 8 sty 2024 o 08:02 Rui Fan <1996fan...@gmail.com> napisał(a):

> Thanks to Piotr driving this proposal!
>
> Enabling unaligned checkpoint with aligned checkpoints timeout
> is fine for me. I'm not sure if aligned checkpoints timeout =5s is
> too aggressive. If the unaligned checkpoint is enabled by default
> for all jobs, I recommend that the aligned checkpoints timeout be
> at least 30s.
>
> If the 30s is too big for some of the flink jobs, flink users can turn
> it down by themselves.
>
> To David, Ken and Zhanghao:
>
> Unaligned checkpoint indeed has some limitations than aligned checkpoint,
> but if we set aligned checkpoints timeout= 30s or 60s, it means
> when a job can be completed within 30s or 60s, this job still uses the
> aligned checkpoint (it doesn't introduce any extra effort).
> When the checkpoint cannot be completed within aligned checkpoints timeout,
> the aligned checkpoint will be switched to the unaligned checkpoint
> The unaligned checkpoint can be completed when backpressure is severe.
>
> In brief, when backpressure is low, enabling them without any effort.
> when backpressure is high, enabling them has some benefits.
>
> So I think it doesn't have too many risks when aligned checkpoints timeout
> is set to 30s or above. WDYT?
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen 
> wrote:
>
> > Hi Piotr,
> >
> > As a platform administer who runs kilos of Flink jobs, I'd be against the
> > idea to enable unaligned cp by default for our jobs. It may help a
> > significant portion of the users, but the subtle issues around unaligned
> CP
> > for a few jobs will probably raise a lot more on-calls and incidents.
> From
> > my point of view, we'd better not enable it by default before removing
> all
> > the limitations listed in
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
> > .
> >
> > Best,
> > Zhanghao Chen
> > 
> > From: Piotr Nowojski 
> > Sent: Friday, January 5, 2024 21:41
> > To: dev 
> > Subject: FLIP-413: Enable unaligned checkpoints by default
> >
> > Hi!
> >
> > I would like to propose by default to enable unaligned checkpoints and
> also
> > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I
> > think this change is the right one to do for the majority of Flink users.
> >
> > For more rationale please take a look into the short FLIP-413 [1].
> >
> > What do you all think?
> >
> > Best,
> > Piotrek
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
> >
>

[jira] [Created] (FLINK-34018) Add flink 1.19 verison in ci for flink-connector-aws

2024-01-08 Thread Zhongqiang Gong (Jira)

Zhongqiang Gong created FLINK-34018:
---

 Summary: Add flink 1.19 verison in ci for flink-connector-aws
 Key: FLINK-34018
 URL: https://issues.apache.org/jira/browse/FLINK-34018
 Project: Flink
  Issue Type: Improvement
Reporter: Zhongqiang Gong


Add flink 1.19 verison in ci for flink-connector-aws



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re:[jira] [Created] (FLINK-34018) Add flink 1.19 verison in ci for flink-connector-aws

2024-01-08 Thread yangpmldl

退订











At 2024-01-08 17:49:00, "Zhongqiang Gong (Jira)"  wrote:
>Zhongqiang Gong created FLINK-34018:
>---
>
> Summary: Add flink 1.19 verison in ci for flink-connector-aws
> Key: FLINK-34018
> URL: https://issues.apache.org/jira/browse/FLINK-34018
> Project: Flink
>  Issue Type: Improvement
>Reporter: Zhongqiang Gong
>
>
>Add flink 1.19 verison in ci for flink-connector-aws
>
>
>
>--
>This message was sent by Atlassian Jira
>(v8.20.10#820010)

Re:Re: FLIP-413: Enable unaligned checkpoints by default

2024-01-08 Thread yangpmldl

退订











At 2024-01-08 17:45:01, "Piotr Nowojski"  wrote:
>Hi thanks for the responses,
>
>And thanks for pointing out the jobs upgrade issue. Indeed that has
>slipped my mind. I was mistakenly
>thinking that we are supporting all upgrades only via savepoint. Anyway,
>maybe in that case we should
>guide users towards that? Using savepoints for upgrades? That would be even
>easier to understand
>for the users:
>- use unaligned checkpoints for checkpoints
>- use savepoints for any changes in the job/version upgrades
>
>There is a downside, that savepoints are always full, while aligned
>checkpoints can be incremental.
>
>WDYT?
>
>Regarding the value for the timeout, I would also be fine with 30s. Indeed
>that's a safer default.
>
>> On a separate point, in the sentence below it seems to me it would be
>> clearer to say that in the unlikely scenario you've described, the change
>> would "significantly increase checkpoint sizes" -- assuming I understand
>> things correctly.
>
>I've reworded that paragraph.
>
>Best,
>Piotrek
>
>
>
>pon., 8 sty 2024 o 08:02 Rui Fan <1996fan...@gmail.com> napisał(a):
>
>> Thanks to Piotr driving this proposal!
>>
>> Enabling unaligned checkpoint with aligned checkpoints timeout
>> is fine for me. I'm not sure if aligned checkpoints timeout =5s is
>> too aggressive. If the unaligned checkpoint is enabled by default
>> for all jobs, I recommend that the aligned checkpoints timeout be
>> at least 30s.
>>
>> If the 30s is too big for some of the flink jobs, flink users can turn
>> it down by themselves.
>>
>> To David, Ken and Zhanghao:
>>
>> Unaligned checkpoint indeed has some limitations than aligned checkpoint,
>> but if we set aligned checkpoints timeout= 30s or 60s, it means
>> when a job can be completed within 30s or 60s, this job still uses the
>> aligned checkpoint (it doesn't introduce any extra effort).
>> When the checkpoint cannot be completed within aligned checkpoints timeout,
>> the aligned checkpoint will be switched to the unaligned checkpoint
>> The unaligned checkpoint can be completed when backpressure is severe.
>>
>> In brief, when backpressure is low, enabling them without any effort.
>> when backpressure is high, enabling them has some benefits.
>>
>> So I think it doesn't have too many risks when aligned checkpoints timeout
>> is set to 30s or above. WDYT?
>>
>> Best,
>> Rui
>>
>> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen 
>> wrote:
>>
>> > Hi Piotr,
>> >
>> > As a platform administer who runs kilos of Flink jobs, I'd be against the
>> > idea to enable unaligned cp by default for our jobs. It may help a
>> > significant portion of the users, but the subtle issues around unaligned
>> CP
>> > for a few jobs will probably raise a lot more on-calls and incidents.
>> From
>> > my point of view, we'd better not enable it by default before removing
>> all
>> > the limitations listed in
>> >
>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
>> > .
>> >
>> > Best,
>> > Zhanghao Chen
>> > 
>> > From: Piotr Nowojski 
>> > Sent: Friday, January 5, 2024 21:41
>> > To: dev 
>> > Subject: FLIP-413: Enable unaligned checkpoints by default
>> >
>> > Hi!
>> >
>> > I would like to propose by default to enable unaligned checkpoints and
>> also
>> > simultaneously increase the aligned checkpoints timeout from 0ms to 5s. I
>> > think this change is the right one to do for the majority of Flink users.
>> >
>> > For more rationale please take a look into the short FLIP-413 [1].
>> >
>> > What do you all think?
>> >
>> > Best,
>> > Piotrek
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
>> >
>>

[jira] [Created] (FLINK-34019) Bump com.rabbitmq:amqp-client from 5.13.1 to 5.20.0

2024-01-08 Thread Martijn Visser (Jira)

Martijn Visser created FLINK-34019:
--

 Summary: Bump com.rabbitmq:amqp-client from 5.13.1 to 5.20.0
 Key: FLINK-34019
 URL: https://issues.apache.org/jira/browse/FLINK-34019
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors/ RabbitMQ
Reporter: Martijn Visser
Assignee: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing & Recovery Configuration

2024-01-08 Thread Zakelly Lan

Hi Yun,

Thanks for your comments!

 1.  We shall not describe the configuration with its implementation for
> 'execution.checkpointing.local-copy.*' options, for hashmap state-backend,
> it would write two streams and for Rocksdb state-backend, it would use
> hard-link for backup. Thus, I think
> 'execution.checkpointing.local-backup.*' looks better.

I agreed that we'd better name the option in user's perspective instead of
the implementation, thus I name it as a copy of the checkpoint in the
local disk, regardless of the way of generating it. The word 'backup' is
also suitable for this case, so I agree to change to
'execution.checkpointing.local-backup.*' if no one objects.

 2.  What does the 'execution.checkpointing.data-inline-threshold' mean? It
> seems not so easy to understand.

The 'execution.checkpointing.data-inline-threshold' (original one as
'state.storage.fs.memory-threshold') stands for the size threshold below
which state chunks will store inline with the metadata, thus I call it
'data-inline-threshold'.


Best,
Zakelly

On Mon, Jan 8, 2024 at 10:09 AM Yun Tang  wrote:

> Hi Zakelly,
>
> Thanks for driving this topic. I have two concerns here:
>
>   1.  We shall not describe the configuration with its implementation for
> 'execution.checkpointing.local-copy.*' options, for hashmap state-backend,
> it would write two streams and for Rocksdb state-backend, it would use
> hard-link for backup. Thus, I think
> 'execution.checkpointing.local-backup.*' looks better.
>   2.  What does the 'execution.checkpointing.data-inline-threshold' mean?
> It seems not so easy to understand.
>
> Best
> Yun Tang
> 
> From: Piotr Nowojski 
> Sent: Thursday, January 4, 2024 22:37
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing &
> Recovery Configuration
>
> Hi,
>
> Thanks for trying to clean this up! I don't have strong opinions on the
> topics discussed here, so generally speaking +1 from my side!
>
> Best,
> Piotrek
>
> śr., 3 sty 2024 o 04:16 Rui Fan <1996fan...@gmail.com> napisał(a):
>
> > Thanks for the feedback!
> >
> > Using the `execution.checkpointing.incremental.enabled`,
> > and enabling it by default sounds good to me.
> >
> > Best,
> > Rui
> >
> > On Wed, Jan 3, 2024 at 11:10 AM Zakelly Lan 
> wrote:
> >
> > > Hi Rui,
> > >
> > > Thanks for your comments!
> > >
> > > IMO, given that the state backend can be plugably loaded (as you can
> > > specify a state backend factory), I prefer not providing state backend
> > > specified options in the framework.
> > >
> > > Secondly, the incremental checkpoint is actually a sharing file
> strategy
> > > across checkpoints, which means the state backend *could* reuse files
> > from
> > > previous cp but not *must* do so. When the state backend could not
> reuse
> > > the files, it is reasonable to fallback to a full checkpoint.
> > >
> > > Thus, I suggest we make it `execution.checkpointing.incremental` and
> > enable
> > > it by default. For those state backends not supporting this, they
> perform
> > > full checkpoints and print a warning to inform users. Users do not need
> > to
> > > pay special attention to different options to control this across
> > different
> > > state backends. This is more user-friendly in my opinion. WDYT?
> > >
> > > On Tue, Jan 2, 2024 at 10:49 AM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > > Hi Zakelly,
> > > >
> > > > I'm not sure whether we could add the state backend type in the
> > > > new key name of state.backend.incremental. It means we use
> > > > `execution.checkpointing.rocksdb-incremental` or
> > > > `execution.checkpointing.rocksdb-incremental.enabled`.
> > > >
> > > > So far, state.backend.incremental only works for rocksdb state
> backend.
> > > > And this feature or optimization is very valuable and huge for large
> > > > state flink jobs. I believe it's enabled for most production flink
> jobs
> > > > with large rocksdb state.
> > > >
> > > > If this option isn't generic for all state backend types, I guess we
> > > > can enable `execution.checkpointing.rocksdb-incremental.enabled`
> > > > by default in Flink 2.0.
> > > >
> > > > But if it works for all state backends, it's hard to enable it by
> > > default.
> > > > Enabling great and valuable features or improvements are useful
> > > > for users, especially a lot of new flink users. Out-of-the-box
> options
> > > > are good for users.
> > > >
> > > > WDYT?
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Fri, Dec 29, 2023 at 1:45 PM Zakelly Lan 
> > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Thanks all for your comments!
> > > > >
> > > > > As many of you have questions about the names for boolean options,
> I
> > > > > suggest we make a naming rule for them. For now I could think of
> > three
> > > > > options:
> > > > >
> > > > > Option 1: Use enumeration options if possible. But this may cause
> > some
> > > > name
> > > > > collisions or confusion as

[jira] [Created] (FLINK-34020) Bump CI flink version on flink-connector-rabbitmq

2024-01-08 Thread Martijn Visser (Jira)

Martijn Visser created FLINK-34020:
--

 Summary: Bump CI flink version on flink-connector-rabbitmq
 Key: FLINK-34020
 URL: https://issues.apache.org/jira/browse/FLINK-34020
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors/ RabbitMQ
Reporter: Martijn Visser
Assignee: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] Release flink-shaded 18.0, release candidate #1

2024-01-08 Thread Dawid Wysakowicz

+1 (binding)

- Validated hashes
- Verified signature
- Verified that no binaries exist in the source archive

Best,
Dawid

On Tue, 28 Nov 2023 at 23:10, Sergey Nuyanzin  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 18.0, as
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint 1596BBF0726835D8 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-18.0-rc1" [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Sergey
>
> [1] https://issues.apache.org/jira/projects/FLINK/versions/12353081
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-18.0-rc1
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1676/
> [5] https://github.com/apache/flink-shaded/releases/tag/release-18.0-rc1
> [6] https://github.com/apache/flink-web/pull/701
>

[jira] [Created] (FLINK-34021) Print jobKey in the Autoscaler standalone log

2024-01-08 Thread Rui Fan (Jira)

Rui Fan created FLINK-34021:
---

 Summary: Print jobKey in the Autoscaler standalone log
 Key: FLINK-34021
 URL: https://issues.apache.org/jira/browse/FLINK-34021
 Project: Flink
  Issue Type: Sub-task
  Components: Autoscaler
Affects Versions: kubernetes-operator-1.8.0
Reporter: Rui Fan
Assignee: Rui Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34022) A Duration that cannot be expressed as long in nanoseconds causes an ArithmeticException in TimeUtils.formatWithHighestUnit

2024-01-08 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34022:
-

 Summary: A Duration that cannot be expressed as long in 
nanoseconds causes an ArithmeticException in TimeUtils.formatWithHighestUnit
 Key: FLINK-34022
 URL: https://issues.apache.org/jira/browse/FLINK-34022
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Configuration
Affects Versions: 1.17.2, 1.18.0, 1.19.0
Reporter: Matthias Pohl


The following code fails with an {{ArithmeticException}} because of the actual 
timeframe not being able to be converted into a long value:
{code:java}
TimeUtils.formatWithHighestUnit(Duration.ofMillis(Long.MAX_VALUE));
{code}

This method is used to pretty-print Duration values and most often used with 
configuration values. We might want to have a reasonable fallback instead of 
throwing a {{RuntimeException}} here.

E.g. it can cause the error in the 
[ClusterConfigHandler|https://github.com/apache/flink/blob/ebdde651edae8db6b2ac740f07d97124dc01fea4/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/cluster/ClusterConfigHandler.java#L56]
 if a Duration-based configuration parameter is set too high (e.g. 
[JobManagerOptions.RESOURCE_WAIT_TIMEOUT|https://github.com/apache/flink/blob/3ff225c5f993282d6dfc7726fc08cc00058d9a7f/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java#L540]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

2024-01-08 Thread Zhu Zhu

Thanks for creating the FLIP and starting the discussion, Yangze. It makes
sense to me to improve the job submission performance in OLAP scenarios.

I have a few questions regarding the proposed changes:

1. How about skipping the job graph persistence if the proposed config
'high-availability.enable-job-recovery' is set to false? In this way,
we do not need to do the refactoring work.

2. Instead of using different HA services for Dispatcher and JobMaster.
Can we leverage the work of FLINK-24038 to eliminate the leader election
time cost of each job? Honestly I had thought it was already the truth but
seems it is not. This improvement can also benefit non-OLAP jobs.

Thanks,
Zhu

Yangze Guo  于2024年1月8日周一 17:11写道：

> Thanks for the pointer, Rui!
>
> I have reviewed FLIP-383, and based on my understanding, this feature
> should be enabled by default for batch jobs in the future. Therefore,
> +1 for checking the parameters and issuing log warnings when the user
> explicitly configures execution.batch.job-recovery.enabled to true.
>
> +1 for high-availability.job-recovery.enabled, which would be more
> suitable with YAML hierarchy.
>
>
> Best,
> Yangze Guo
>
> On Mon, Jan 8, 2024 at 3:43 PM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > Thanks to Yangze driving this proposal!
> >
> > Overall looks good to me! This proposal is useful for
> > the performance when the job doesn't need the failover.
> >
> > I have some minor questions:
> >
> > 1. How does it work with FLIP-383[1]?
> >
> > This FLIP introduces a high-availability.enable-job-recovery,
> > and FLIP-383 introduces a execution.batch.job-recovery.enabled.
> >
> > IIUC, when high-availability.enable-job-recovery is false, the job
> > cannot recover even if execution.batch.job-recovery.enabled = true,
> > right?
> >
> > If so, could we check some parameters and warn some logs? Or
> > disable the execution.batch.job-recovery.enabled directly when
> > high-availability.enable-job-recovery = false.
> >
> > 2. Could we rename it to high-availability.job-recovery.enabled to unify
> > the naming?
> >
> > WDYT?
> >
> > [1] https://cwiki.apache.org/confluence/x/QwqZE
> >
> > Best,
> > Rui
> >
> > On Mon, Jan 8, 2024 at 2:04 PM Yangze Guo  wrote:
> >
> > > Thanks for your comment, Yong.
> > >
> > > Here are my thoughts on the splitting of HighAvailableServices:
> > > Firstly, I would treat this separation as a result of technical debt
> > > and a side effect of the FLIP. In order to achieve a cleaner interface
> > > hierarchy for High Availability before Flink 2.0, the design decision
> > > should not be limited to OLAP scenarios.
> > > I agree that the current HAServices can be divided based on either the
> > > actual target (cluster & job) or the type of functionality (leader
> > > election & persistence). From a conceptual perspective, I do not see
> > > one approach being better than the other. However, I have chosen the
> > > current separation for a clear separation of concerns. After FLIP-285,
> > > each process has a dedicated LeaderElectionService responsible for
> > > leader election of all the components within it. This
> > > LeaderElectionService has its own lifecycle management. If we were to
> > > split the HAServices into 'ClusterHighAvailabilityService' and
> > > 'JobHighAvailabilityService', we would need to couple the lifecycle
> > > management of these two interfaces, as they both rely on the
> > > LeaderElectionService and other relevant classes. This coupling and
> > > implicit design assumption will increase the complexity and testing
> > > difficulty of the system. WDYT?
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Jan 8, 2024 at 12:08 PM Yong Fang  wrote:
> > > >
> > > > Thanks Yangze for starting this discussion. I have one comment: why
> do we
> > > > need to abstract two services as `LeaderServices` and
> > > > `PersistenceServices`?
> > > >
> > > > From the content, the purpose of this FLIP is to make job failover
> more
> > > > lightweight, so it would be more appropriate to abstract two
> services as
> > > > `ClusterHighAvailabilityService` and `JobHighAvailabilityService`
> instead
> > > > of `LeaderServices` and `PersistenceServices` based on leader and
> store.
> > > In
> > > > this way, we can create a `JobHighAvailabilityService` that has a
> leader
> > > > service and store for the job that meets the requirements based on
> the
> > > > configuration in the zk/k8s high availability service.
> > > >
> > > > WDYT?
> > > >
> > > > Best,
> > > > Fang Yong
> > > >
> > > > On Fri, Dec 29, 2023 at 8:10 PM xiangyu feng 
> > > wrote:
> > > >
> > > > > Thanks Yangze for restart this discussion.
> > > > >
> > > > > +1 for the overall idea. By splitting the HighAvailabilityServices
> into
> > > > > LeaderServices and PersistenceServices, we may support configuring
> > > > > different storage behind them in the future.
> > > > >
> > > > > We did run into real problems in production where too much job
> > > metadata was
> > > > > bei

Re: [VOTE] Release 1.18.1, release candidate #2

2024-01-08 Thread Jark Wu

Thanks Jing for driving this.

+1 (binding)

- Build and compile the source code locally: *OK*
- Verified signatures and hashes: *OK*
- Checked no missing artifacts in the staging area: *OK*
- Reviewed the website release PR: *OK*
- Went through the quick start: *OK*
  * Started a cluster and ran the examples
  * Verified web ui and log output, nothing unexpected

Best,
Jark

On Thu, 28 Dec 2023 at 20:59, Yun Tang  wrote:

> Thanks Jing for driving this release.
>
> +1 (non-binding)
>
>
>   *
> Download artifacts and verify the signatures.
>   *
> Verified the web PR
>   *
> Verified the number of Python packages is 11
>   *
> Started a local cluster and verified FLIP-291 to see the rescale results.
>   *
> Verified the jar packages were built with JDK8
>
> Best
> Yun Tang
>
>
> 
> From: Rui Fan <1996fan...@gmail.com>
> Sent: Thursday, December 28, 2023 10:54
> To: dev@flink.apache.org 
> Subject: Re: [VOTE] Release 1.18.1, release candidate #2
>
> Thanks Jing for driving this release!
>
> +1(non-binding)
>
> - Downloaded artifacts
> - Verified signatures and sha512
> - The source archives do not contain any binaries
> - Verified web PR
> - Build the source with Maven 3 and java8 (Checked the license as well)
> - bin/start-cluster.sh with java8, it works fine and no any unexpected LOG-
> Ran demo, it's fine:  bin/flink
> runexamples/streaming/StateMachineExample.jar
>
> Best,
> Rui
>
> On Wed, Dec 27, 2023 at 8:45 PM Martijn Visser 
> wrote:
>
> > Hi Jing,
> >
> > Thanks for driving this.
> >
> > +1 (binding)
> >
> > - Validated hashes
> > - Verified signature
> > - Verified that no binaries exist in the source archive
> > - Build the source with Maven via mvn clean install
> > -Pcheck-convergence -Dflink.version=1.18.1
> > - Verified licenses
> > - Verified web PR
> > - Started a cluster and the Flink SQL client, successfully read and
> > wrote with the Kafka connector to Confluent Cloud with AVRO and Schema
> > Registry enabled
> > - Started a cluster and submitted a job that checkpoints to GCS without
> > problems
> >
> > Best regards,
> >
> > Martijn
> >
> > On Thu, Dec 21, 2023 at 4:55 AM gongzhongqiang
> >  wrote:
> > >
> > > Thanks Jing Ge for driving this release.
> > >
> > > +1 (non-binding), I have checked:
> > > [✓] The checksums and signatures are validated
> > > [✓] The tag checked is fine
> > > [✓] Built from source is passed
> > > [✓] The flink-web PR is reviewed and checked
> > >
> > >
> > > Best，
> > > Zhongqiang Gong
> >
>

Re: [VOTE] FLIP-397: Add config options for administrator JVM options

2024-01-08 Thread Zhanghao Chen

Thank you all! Closing the vote. The result will be sent in a separate email.

Best,
Zhanghao Chen

From: Zhanghao Chen 
Sent: Thursday, January 4, 2024 10:29
To: dev 
Subject: [VOTE] FLIP-397: Add config options for administrator JVM options

Hi everyone,

Thanks for all the feedbacks on FLIP-397 [1], which proposes to add a set of 
default JVM options for administrator use that prepends the user-set extra JVM 
options for easier platform-wide JVM pre-tuning. It has been discussed in [2].

I'd like to start a vote. The vote will be open for at least 72 hours (until 
January 8th 12:00 GMT) unless there is an objection or insufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-397%3A+Add+config+options+for+administrator+JVM+options
[2] https://lists.apache.org/thread/cflonyrfd1ftmyrpztzj3ywckbq41jzg

Best,
Zhanghao Chen

[jira] [Created] (FLINK-34023) Expose Kinesis client retry config in sink

2024-01-08 Thread Brad Atcheson (Jira)

Brad Atcheson created FLINK-34023:
-

 Summary: Expose Kinesis client retry config in sink
 Key: FLINK-34023
 URL: https://issues.apache.org/jira/browse/FLINK-34023
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kinesis
Reporter: Brad Atcheson


The consumer side exposes client retry configuration like 
[flink.shard.getrecords.maxretries|https://nightlies.apache.org/flink/flink-docs-stable/api/java/org/apache/flink/streaming/connectors/kinesis/config/ConsumerConfigConstants.html#SHARD_GETRECORDS_RETRIES]
 but the producer side lacks similar config for PutRecords.

The KinesisStreamsSinkWriter constructor calls 
{code}
this.httpClient = AWSGeneralUtil.createAsyncHttpClient(kinesisClientProperties);
this.kinesisClient = buildClient(kinesisClientProperties, this.httpClient);
{code}
But those methods only refer to these values (aside from endpoint/region/creds) 
in the kinesisClientProperties:
* aws.http-client.max-concurrency
* aws.http-client.read-timeout
* aws.trust.all.certificates
* aws.http.protocol.version

Without control over retry, users can observe exceptions like {code}Request 
attempt 2 failure: Unable to execute HTTP request: connection timed out after 
2000 ms: kinesis.us-west-2.amazonaws.com{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[RESULT][VOTE] FLIP-397: Add config options for administrator JVM options

2024-01-08 Thread Zhanghao Chen

Hi everyone,

I'm happy to announce that FLIP-397: Add config options for administrator JVM 
options, has been accepted with 4 approving votes (3 binding) [1]:

 - Benchao Li (binding)
 - Rui Fan (binding)
 - Xiangyu Feng (non-binding)
 - Yong Fang (binding)

There're no disapproving votes.

Thanks again to everyone who participated in the discussion and voting.

[1] https://lists.apache.org/thread/5qlr0xl0oyc9dnvcjr0q39pcrzyx4ohb

Best,
Zhanghao Chen

[jira] [Created] (FLINK-34024) Update connector release process for Python connectors

2024-01-08 Thread Danny Cranmer (Jira)

Danny Cranmer created FLINK-34024:
-

 Summary: Update connector release process for Python connectors
 Key: FLINK-34024
 URL: https://issues.apache.org/jira/browse/FLINK-34024
 Project: Flink
  Issue Type: Sub-task
Reporter: Danny Cranmer


Work out how to release the Python libs for Flink connectors. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34025) Show data skew score on Flink Dashboard

2024-01-08 Thread Emre Kartoglu (Jira)

Emre Kartoglu created FLINK-34025:
-

 Summary: Show data skew score on Flink Dashboard
 Key: FLINK-34025
 URL: https://issues.apache.org/jira/browse/FLINK-34025
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Web Frontend
Affects Versions: 1.19.0
Reporter: Emre Kartoglu
 Attachments: skew_proposal.png, skew_tab.png

*Problem:* Currently users have to click on every operator and check how much 
data each subtask is processing to see if there is data skew. This is 
particularly cumbersome and error-prone for jobs with big job graphs. Data skew 
is an important metric that should be more visible.

 

*Proposed solution:*
 * Show a data skew score on each operator (see screenshot below). This would 
be an improvement, but would not be sufficient. As it would still not be easy 
to see the data skew score for jobs with very large job graphs (it'd require a 
lot of zooming in/out).
 * Show data skew score for each operator under a new tab. See screenshot below 
!skew_tab.png! .

 

!skew_proposal.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Flink pending record metric weired after autoscaler rescaling

2024-01-08 Thread Yang LI

Dear Flink Community,

I've encountered an issue during the testing of my Flink autoscaler. It
appears that Flink is losing track of specific Kafka partitions, leading to
a persistent increase in lag on these partitions. The logs indicate a
'kafka connector metricGroup name collision exception.' Notably, the
consumption on these Kafka partitions returns to normal after restarting
the Kafka broker. For context, I have enabled in-place rescaling support
with 'jobmanager.scheduler: Adaptive.'

I suspect the problem may stem from:

The in-place rescaling support triggering restarts of some taskmanagers.
This process might not be restarting the metric groups registered by the
Kafka source connector correctly, leading to a name collision exception and
preventing Flink from accurately reporting metrics related to Kafka
consumption.
A potential edge case in the metric for pending records, especially when
different partitions exhibit varying lags. This discrepancy might be
causing the pending record metric to malfunction.
I would appreciate your insights on these observations.

Best regards,
Yang LI

[DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Danny Cranmer

Hello all,

I have been working with Péter and Marton on externalizing python
connectors [1] from the main repo to the connector repositories. We have
the code moved and the CI running tests for Kafka and AWS Connectors. I am
now looking into the release process.

When we undertake a Flink release we perform the following steps [2],
regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts,
3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
follow the same steps for connectors, using Github actions instead of Azure
pipeline.

Today we have a single pypi project for pyflink that contains all the Flink
libs, apache-flink [3]. I propose we create a new pypi project per
connector using the existing connector version, and following naming
convention: apache-, for example:
apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use
a DataStream API connector in python, users would need to first install the
lib, for example "python -m pip install apache-flink-connector-aws".

Once we have consensus I will update the release process and perform a
release of the flink-connector-aws project to test it end-to-end. I look
forward to any feedback.

Thanks,
Danny

[1] https://issues.apache.org/jira/browse/FLINK-33528
[2]
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
[3] https://pypi.org/project/apache-flink/

[jira] [Created] (FLINK-34026) Azure Pipelines not running for master and the release branches since 4a852fe

2024-01-08 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34026:
-

 Summary: Azure Pipelines not running for master and the release 
branches since 4a852fe
 Key: FLINK-34026
 URL: https://issues.apache.org/jira/browse/FLINK-34026
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Reporter: Matthias Pohl


It appears that [no Azure CI 
builds|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=1&_a=summary]
 are triggered for {{master}} (and possibly the release branches) since 
4a852fee28f2d87529dc05f5ba2e79202a0e00b6.

The [PR CI 
workflows|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2]
 appear to be not affected. I suspect some problem with the repo-sync process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Péter Váry

Thanks Danny for working on this!

It would be good to do this in a way that the different connectors could
reuse as much code as possible, so if possible put most of the code to the
flink connector shared utils repo [1]

+1 from for the general direction (non-binding)

Thanks,
Peter

[1] https://github.com/apache/flink-connector-shared-utils


Danny Cranmer  ezt írta (időpont: 2024. jan. 8.,
H, 17:31):

> Hello all,
>
> I have been working with Péter and Marton on externalizing python
> connectors [1] from the main repo to the connector repositories. We have
> the code moved and the CI running tests for Kafka and AWS Connectors. I am
> now looking into the release process.
>
> When we undertake a Flink release we perform the following steps [2],
> regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts,
> 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
> follow the same steps for connectors, using Github actions instead of Azure
> pipeline.
>
> Today we have a single pypi project for pyflink that contains all the Flink
> libs, apache-flink [3]. I propose we create a new pypi project per
> connector using the existing connector version, and following naming
> convention: apache-, for example:
> apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use
> a DataStream API connector in python, users would need to first install the
> lib, for example "python -m pip install apache-flink-connector-aws".
>
> Once we have consensus I will update the release process and perform a
> release of the flink-connector-aws project to test it end-to-end. I look
> forward to any feedback.
>
> Thanks,
> Danny
>
> [1] https://issues.apache.org/jira/browse/FLINK-33528
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> [3] https://pypi.org/project/apache-flink/
>

Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Martijn Visser

Thanks for investigating Danny. It looks like the best direction to go to :)

On Mon, Jan 8, 2024 at 5:56 PM Péter Váry  wrote:
>
> Thanks Danny for working on this!
>
> It would be good to do this in a way that the different connectors could
> reuse as much code as possible, so if possible put most of the code to the
> flink connector shared utils repo [1]
>
> +1 from for the general direction (non-binding)
>
> Thanks,
> Peter
>
> [1] https://github.com/apache/flink-connector-shared-utils
>
>
> Danny Cranmer  ezt írta (időpont: 2024. jan. 8.,
> H, 17:31):
>
> > Hello all,
> >
> > I have been working with Péter and Marton on externalizing python
> > connectors [1] from the main repo to the connector repositories. We have
> > the code moved and the CI running tests for Kafka and AWS Connectors. I am
> > now looking into the release process.
> >
> > When we undertake a Flink release we perform the following steps [2],
> > regarding Python: 1/ run python build on CI, 2/ download Wheels artifacts,
> > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
> > follow the same steps for connectors, using Github actions instead of Azure
> > pipeline.
> >
> > Today we have a single pypi project for pyflink that contains all the Flink
> > libs, apache-flink [3]. I propose we create a new pypi project per
> > connector using the existing connector version, and following naming
> > convention: apache-, for example:
> > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to use
> > a DataStream API connector in python, users would need to first install the
> > lib, for example "python -m pip install apache-flink-connector-aws".
> >
> > Once we have consensus I will update the release process and perform a
> > release of the flink-connector-aws project to test it end-to-end. I look
> > forward to any feedback.
> >
> > Thanks,
> > Danny
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-33528
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > [3] https://pypi.org/project/apache-flink/
> >

Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Márton Balassi

+1

Thanks, Danny - I really appreciate you taking the time for the in-depth
investigation. Please proceed, looking forward to your experience.

On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser 
wrote:

> Thanks for investigating Danny. It looks like the best direction to go to
> :)
>
> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry 
> wrote:
> >
> > Thanks Danny for working on this!
> >
> > It would be good to do this in a way that the different connectors could
> > reuse as much code as possible, so if possible put most of the code to
> the
> > flink connector shared utils repo [1]
> >
> > +1 from for the general direction (non-binding)
> >
> > Thanks,
> > Peter
> >
> > [1] https://github.com/apache/flink-connector-shared-utils
> >
> >
> > Danny Cranmer  ezt írta (időpont: 2024. jan.
> 8.,
> > H, 17:31):
> >
> > > Hello all,
> > >
> > > I have been working with Péter and Marton on externalizing python
> > > connectors [1] from the main repo to the connector repositories. We
> have
> > > the code moved and the CI running tests for Kafka and AWS Connectors.
> I am
> > > now looking into the release process.
> > >
> > > When we undertake a Flink release we perform the following steps [2],
> > > regarding Python: 1/ run python build on CI, 2/ download Wheels
> artifacts,
> > > 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
> > > follow the same steps for connectors, using Github actions instead of
> Azure
> > > pipeline.
> > >
> > > Today we have a single pypi project for pyflink that contains all the
> Flink
> > > libs, apache-flink [3]. I propose we create a new pypi project per
> > > connector using the existing connector version, and following naming
> > > convention: apache-, for example:
> > > apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to
> use
> > > a DataStream API connector in python, users would need to first
> install the
> > > lib, for example "python -m pip install apache-flink-connector-aws".
> > >
> > > Once we have consensus I will update the release process and perform a
> > > release of the flink-connector-aws project to test it end-to-end. I
> look
> > > forward to any feedback.
> > >
> > > Thanks,
> > > Danny
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-33528
> > > [2]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > > [3] https://pypi.org/project/apache-flink/
> > >
>

[jira] [Created] (FLINK-34028) Allow multiple parallel async calls in a single operator

2024-01-08 Thread Alan Sheinberg (Jira)

Alan Sheinberg created FLINK-34028:
--

 Summary: Allow multiple parallel async calls in a single operator
 Key: FLINK-34028
 URL: https://issues.apache.org/jira/browse/FLINK-34028
 Project: Flink
  Issue Type: Sub-task
Reporter: Alan Sheinberg






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34027) Create initial PR for FLIP-400

2024-01-08 Thread Alan Sheinberg (Jira)

Alan Sheinberg created FLINK-34027:
--

 Summary: Create initial PR for FLIP-400
 Key: FLINK-34027
 URL: https://issues.apache.org/jira/browse/FLINK-34027
 Project: Flink
  Issue Type: Sub-task
Reporter: Alan Sheinberg






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-329: Add operator attribute to specify support for object-reuse

2024-01-08 Thread Lu Niu

sounds good. Is the requirement to send an email thread about the voting?
What else is needed? What's the passing criteria?

Best
Lu

On Sun, Jan 7, 2024 at 5:41 PM Xuannan Su  wrote:

> Hi Liu,
>
> The voting thread has been open for a long time. We may want to start
> a new voting thread. WDYT?
>
> Best,
> Xuannan
>
> On Sat, Jan 6, 2024 at 1:51 AM Lu Niu  wrote:
> >
> > Thank you Dong and Xuannan!
> >
> > Yes. We can take on this task. Any help during bootstrapping would be
> greatly appreciated! I realize there is already a voting thread "[VOTE]
> FLIP-329: Add operator attribute to specify support for object-reuse". What
> else do we need?
> >
> > Best
> > Lu
> >
> > On Fri, Jan 5, 2024 at 12:46 AM Xuannan Su 
> wrote:
> >>
> >> Hi Lu,
> >>
> >> I believe this feature is very useful. However, I currently lack the
> >> capacity to work on it in the near future. I think it would be great
> >> if you could take on the task. I am willing to offer assistance if
> >> there are any questions about the FLIP, or to review the PR if needed.
> >>
> >> Please let me know if you are interested in taking over this task. And
> >> also think that we should start the voting thread if no future
> >> comments on this FLIP.
> >>
> >> Best,
> >> Xuannan
> >>
> >>
> >>
> >> On Fri, Jan 5, 2024 at 2:23 PM Dong Lin  wrote:
> >> >
> >> > Hi Lu,
> >> >
> >> > I am not actively working on Flink and this JIRA recently. If Xuannan
> does not plan to work on this anytime soon, I personally think it will be
> great if you can help work on this FLIP. Maybe we can start the voting
> thread if there is no further comment on this FLIP.
> >> >
> >> > Xuannan, what do you think?
> >> >
> >> > Thanks,
> >> > Dong
> >> >
> >> >
> >> > On Fri, Jan 5, 2024 at 2:03 AM Lu Niu  wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Is this still under active development? I notice
> https://issues.apache.org/jira/browse/FLINK-32476 is labeled as
> deprioritized. If this is the case, would it be acceptable for us to take
> on the task?
> >> >>
> >> >> Best
> >> >> Lu
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Oct 19, 2023 at 4:26 PM Ken Krugler <
> kkrugler_li...@transpac.com> wrote:
> >> >>>
> >> >>> Hi Dong,
> >> >>>
> >> >>> Sorry for not seeing this initially. I did have one question about
> the description of the issue in the FLIP:
> >> >>>
> >> >>> > However, in cases where the upstream and downstream operators do
> not store or access references to the input or output records, this
> deep-copy overhead becomes unnecessary
> >> >>>
> >> >>> I was interested in getting clarification as to what you meant by
> “or access references…”, to see if it covered this situation:
> >> >>>
> >> >>> StreamX —forward--> operator1
> >> >>> StreamX —forward--> operator2
> >> >>>
> >> >>> If operator1 modifies the record, and object re-use is enabled,
> then operator2 will see the modified version, right?
> >> >>>
> >> >>> Thanks,
> >> >>>
> >> >>> — Ken
> >> >>>
> >> >>> > On Jul 2, 2023, at 7:24 PM, Xuannan Su 
> wrote:
> >> >>> >
> >> >>> > Hi all,
> >> >>> >
> >> >>> > Dong(cc'ed) and I are opening this thread to discuss our proposal
> to
> >> >>> > add operator attribute to allow operator to specify support for
> >> >>> > object-reuse [1].
> >> >>> >
> >> >>> > Currently, the default configuration for pipeline.object-reuse is
> set
> >> >>> > to false to avoid data corruption, which can result in suboptimal
> >> >>> > performance. We propose adding APIs that operators can utilize to
> >> >>> > inform the Flink runtime whether it is safe to reuse the emitted
> >> >>> > records. This enhancement would enable Flink to maximize its
> >> >>> > performance using the default configuration.
> >> >>> >
> >> >>> > Please refer to the FLIP document for more details about the
> proposed
> >> >>> > design and implementation. We welcome any feedback and opinions on
> >> >>> > this proposal.
> >> >>> >
> >> >>> > Best regards,
> >> >>> >
> >> >>> > Dong and Xuannan
> >> >>> >
> >> >>> > [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
> >> >>>
> >> >>> --
> >> >>> Ken Krugler
> >> >>> http://www.scaleunlimited.com
> >> >>> Custom big data solutions
> >> >>> Flink & Pinot
> >> >>>
> >> >>>
> >> >>>
>

Re: [VOTE] Release flink-shaded 18.0, release candidate #1

2024-01-08 Thread Sergey Nuyanzin

Thanks all. This vote is now closed, I'll announce the results in a
separate thread.

On Mon, Jan 8, 2024 at 11:25 AM Dawid Wysakowicz 
wrote:

> +1 (binding)
>
> - Validated hashes
> - Verified signature
> - Verified that no binaries exist in the source archive
>
> Best,
> Dawid
>
> On Tue, 28 Nov 2023 at 23:10, Sergey Nuyanzin  wrote:
>
> > Hi everyone,
> > Please review and vote on the release candidate #1 for the version 18.0,
> as
> > follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2],
> > which are signed with the key with fingerprint 1596BBF0726835D8 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-18.0-rc1" [5],
> > * website pull request listing the new release [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Sergey
> >
> > [1] https://issues.apache.org/jira/projects/FLINK/versions/12353081
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-18.0-rc1
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1676/
> > [5] https://github.com/apache/flink-shaded/releases/tag/release-18.0-rc1
> > [6] https://github.com/apache/flink-web/pull/701
> >
>


-- 
Best regards,
Sergey

[RESULT][VOTE] Release flink-shaded 18.0, release candidate #1

2024-01-08 Thread Sergey Nuyanzin

I'm happy to announce that we have unanimously approved this release.

There are 7 approving votes, 4 of which are binding:
* Rui Fan
* Matthias Pohl (binding)
* Sergey Nuyanzin
* Jing Ge
* Martijn Visser (binding)
* Piotr Nowojski (binding)
* Dawid Wysakowicz (binding)

There are no disapproving votes.

Thanks everyone!

Best regards,
Sergey

Re: Re: FLIP-413: Enable unaligned checkpoints by default

2024-01-08 Thread Mason Chen

Hi Piotr,

I also agree with Zhanghao's assessment on the limitations of unaligned
checkpoints. Some of them are already handled properly by Flink, but in the
case of the "Interplay with watermarks" limitation, it is quite confusing
for a new user to find that their code doesn't generate consistent results
with the default checkpoint configuration. Is there a way for Flink to
detect and handle this situation correctly?

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations

Best,
Mason

On Mon, Jan 8, 2024 at 2:01 AM yangpmldl  wrote:

> 退订
>
>
>
>
>
>
>
>
>
>
>
> At 2024-01-08 17:45:01, "Piotr Nowojski"  wrote:
> >Hi thanks for the responses,
> >
> >And thanks for pointing out the jobs upgrade issue. Indeed that has
> >slipped my mind. I was mistakenly
> >thinking that we are supporting all upgrades only via savepoint. Anyway,
> >maybe in that case we should
> >guide users towards that? Using savepoints for upgrades? That would be
> even
> >easier to understand
> >for the users:
> >- use unaligned checkpoints for checkpoints
> >- use savepoints for any changes in the job/version upgrades
> >
> >There is a downside, that savepoints are always full, while aligned
> >checkpoints can be incremental.
> >
> >WDYT?
> >
> >Regarding the value for the timeout, I would also be fine with 30s. Indeed
> >that's a safer default.
> >
> >> On a separate point, in the sentence below it seems to me it would be
> >> clearer to say that in the unlikely scenario you've described, the
> change
> >> would "significantly increase checkpoint sizes" -- assuming I understand
> >> things correctly.
> >
> >I've reworded that paragraph.
> >
> >Best,
> >Piotrek
> >
> >
> >
> >pon., 8 sty 2024 o 08:02 Rui Fan <1996fan...@gmail.com> napisał(a):
> >
> >> Thanks to Piotr driving this proposal!
> >>
> >> Enabling unaligned checkpoint with aligned checkpoints timeout
> >> is fine for me. I'm not sure if aligned checkpoints timeout =5s is
> >> too aggressive. If the unaligned checkpoint is enabled by default
> >> for all jobs, I recommend that the aligned checkpoints timeout be
> >> at least 30s.
> >>
> >> If the 30s is too big for some of the flink jobs, flink users can turn
> >> it down by themselves.
> >>
> >> To David, Ken and Zhanghao:
> >>
> >> Unaligned checkpoint indeed has some limitations than aligned
> checkpoint,
> >> but if we set aligned checkpoints timeout= 30s or 60s, it means
> >> when a job can be completed within 30s or 60s, this job still uses the
> >> aligned checkpoint (it doesn't introduce any extra effort).
> >> When the checkpoint cannot be completed within aligned checkpoints
> timeout,
> >> the aligned checkpoint will be switched to the unaligned checkpoint
> >> The unaligned checkpoint can be completed when backpressure is severe.
> >>
> >> In brief, when backpressure is low, enabling them without any effort.
> >> when backpressure is high, enabling them has some benefits.
> >>
> >> So I think it doesn't have too many risks when aligned checkpoints
> timeout
> >> is set to 30s or above. WDYT?
> >>
> >> Best,
> >> Rui
> >>
> >> On Mon, Jan 8, 2024 at 12:57 PM Zhanghao Chen <
> zhanghao.c...@outlook.com>
> >> wrote:
> >>
> >> > Hi Piotr,
> >> >
> >> > As a platform administer who runs kilos of Flink jobs, I'd be against
> the
> >> > idea to enable unaligned cp by default for our jobs. It may help a
> >> > significant portion of the users, but the subtle issues around
> unaligned
> >> CP
> >> > for a few jobs will probably raise a lot more on-calls and incidents.
> >> From
> >> > my point of view, we'd better not enable it by default before removing
> >> all
> >> > the limitations listed in
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpointing_under_backpressure/#limitations
> >> > .
> >> >
> >> > Best,
> >> > Zhanghao Chen
> >> > 
> >> > From: Piotr Nowojski 
> >> > Sent: Friday, January 5, 2024 21:41
> >> > To: dev 
> >> > Subject: FLIP-413: Enable unaligned checkpoints by default
> >> >
> >> > Hi!
> >> >
> >> > I would like to propose by default to enable unaligned checkpoints and
> >> also
> >> > simultaneously increase the aligned checkpoints timeout from 0ms to
> 5s. I
> >> > think this change is the right one to do for the majority of Flink
> users.
> >> >
> >> > For more rationale please take a look into the short FLIP-413 [1].
> >> >
> >> > What do you all think?
> >> >
> >> > Best,
> >> > Piotrek
> >> >
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
> >> >
> >>
>

[jira] [Created] (FLINK-34029) Support different profiling mode on Flink WEB

2024-01-08 Thread Yu Chen (Jira)

Yu Chen created FLINK-34029:
---

 Summary: Support different profiling mode on Flink WEB
 Key: FLINK-34029
 URL: https://issues.apache.org/jira/browse/FLINK-34029
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Affects Versions: 1.19.0
Reporter: Yu Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34030) Avoid using negative value for periodic-materialize.interval

2024-01-08 Thread Hangxiang Yu (Jira)

Hangxiang Yu created FLINK-34030:


 Summary: Avoid using negative value for 
periodic-materialize.interval
 Key: FLINK-34030
 URL: https://issues.apache.org/jira/browse/FLINK-34030
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Reporter: Hangxiang Yu
Assignee: Hangxiang Yu


Similar to FLINK-32023, a nagative value doesn't work for Duration Type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-403: High Availability Services for OLAP Scenarios

2024-01-08 Thread Yangze Guo

Thank you for your comments, Zhu!

1. I would treat refactoring as a technical debt and a side effect of
this FLIP. The idea is inspired by Matthias' comments in [1]. It
suggests having a single implementation of HighAvailabilityServices
that requires a factory method for persistence services and leader
services. After this, we will achieve a clearer class hierarchy for
HAServices and eliminate code duplication.

2. While FLINK-24038 does eliminate the leader election time cost for
each job, it still involves creating a znode or writing to the
configmap for each job, which can negatively impact performance under
higher workloads. This also applies to all other persistence services
such as checkpoint and blob storage except for the job graph store.

WDYT?

[1] 
https://issues.apache.org/jira/browse/FLINK-31816?focusedCommentId=17741054&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17741054

Best,
Yangze Guo

On Mon, Jan 8, 2024 at 7:37 PM Zhu Zhu  wrote:
>
> Thanks for creating the FLIP and starting the discussion, Yangze. It makes
> sense to me to improve the job submission performance in OLAP scenarios.
>
> I have a few questions regarding the proposed changes:
>
> 1. How about skipping the job graph persistence if the proposed config
> 'high-availability.enable-job-recovery' is set to false? In this way,
> we do not need to do the refactoring work.
>
> 2. Instead of using different HA services for Dispatcher and JobMaster.
> Can we leverage the work of FLINK-24038 to eliminate the leader election
> time cost of each job? Honestly I had thought it was already the truth but
> seems it is not. This improvement can also benefit non-OLAP jobs.
>
> Thanks,
> Zhu
>
> Yangze Guo  于2024年1月8日周一 17:11写道：
>
> > Thanks for the pointer, Rui!
> >
> > I have reviewed FLIP-383, and based on my understanding, this feature
> > should be enabled by default for batch jobs in the future. Therefore,
> > +1 for checking the parameters and issuing log warnings when the user
> > explicitly configures execution.batch.job-recovery.enabled to true.
> >
> > +1 for high-availability.job-recovery.enabled, which would be more
> > suitable with YAML hierarchy.
> >
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Jan 8, 2024 at 3:43 PM Rui Fan <1996fan...@gmail.com> wrote:
> > >
> > > Thanks to Yangze driving this proposal!
> > >
> > > Overall looks good to me! This proposal is useful for
> > > the performance when the job doesn't need the failover.
> > >
> > > I have some minor questions:
> > >
> > > 1. How does it work with FLIP-383[1]?
> > >
> > > This FLIP introduces a high-availability.enable-job-recovery,
> > > and FLIP-383 introduces a execution.batch.job-recovery.enabled.
> > >
> > > IIUC, when high-availability.enable-job-recovery is false, the job
> > > cannot recover even if execution.batch.job-recovery.enabled = true,
> > > right?
> > >
> > > If so, could we check some parameters and warn some logs? Or
> > > disable the execution.batch.job-recovery.enabled directly when
> > > high-availability.enable-job-recovery = false.
> > >
> > > 2. Could we rename it to high-availability.job-recovery.enabled to unify
> > > the naming?
> > >
> > > WDYT?
> > >
> > > [1] https://cwiki.apache.org/confluence/x/QwqZE
> > >
> > > Best,
> > > Rui
> > >
> > > On Mon, Jan 8, 2024 at 2:04 PM Yangze Guo  wrote:
> > >
> > > > Thanks for your comment, Yong.
> > > >
> > > > Here are my thoughts on the splitting of HighAvailableServices:
> > > > Firstly, I would treat this separation as a result of technical debt
> > > > and a side effect of the FLIP. In order to achieve a cleaner interface
> > > > hierarchy for High Availability before Flink 2.0, the design decision
> > > > should not be limited to OLAP scenarios.
> > > > I agree that the current HAServices can be divided based on either the
> > > > actual target (cluster & job) or the type of functionality (leader
> > > > election & persistence). From a conceptual perspective, I do not see
> > > > one approach being better than the other. However, I have chosen the
> > > > current separation for a clear separation of concerns. After FLIP-285,
> > > > each process has a dedicated LeaderElectionService responsible for
> > > > leader election of all the components within it. This
> > > > LeaderElectionService has its own lifecycle management. If we were to
> > > > split the HAServices into 'ClusterHighAvailabilityService' and
> > > > 'JobHighAvailabilityService', we would need to couple the lifecycle
> > > > management of these two interfaces, as they both rely on the
> > > > LeaderElectionService and other relevant classes. This coupling and
> > > > implicit design assumption will increase the complexity and testing
> > > > difficulty of the system. WDYT?
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Mon, Jan 8, 2024 at 12:08 PM Yong Fang  wrote:
> > > > >
> > > > > Thanks Yangze for starting this discussion. I have one comment: why
> > do we
> > > > > need to

Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing & Recovery Configuration

2024-01-08 Thread Hangxiang Yu

Hi, Zakelly.
Thanks for driving this. Overall LGTM as we discussed offline.

Some comments/suggestions just came to mind:
1. Could execution.recovery also contain some other behaviors about
recovery ? e.g. restart-strategy.
2. Could we also remove some legacy configuration value ? e.g. LEGACY Mode
for execution.savepoint-restore-mode/execution.recovery.claim-mode.
3. Could the local checkpoint be cleaned
if execution.checkpointing.local-copy.enabled is true and
execution.recovery.from-local is false ? I found it's also an issue if
current local-recovery from enabled to disabled. Maybe another ticket is
needed.
4. +1 for enabling execution.checkpointing.incremental by default which is
basically default configuration in our production environment.


On Mon, Jan 8, 2024 at 6:06 PM Zakelly Lan  wrote:

> Hi Yun,
>
> Thanks for your comments!
>
>  1.  We shall not describe the configuration with its implementation for
> > 'execution.checkpointing.local-copy.*' options, for hashmap
> state-backend,
> > it would write two streams and for Rocksdb state-backend, it would use
> > hard-link for backup. Thus, I think
> > 'execution.checkpointing.local-backup.*' looks better.
>
> I agreed that we'd better name the option in user's perspective instead of
> the implementation, thus I name it as a copy of the checkpoint in the
> local disk, regardless of the way of generating it. The word 'backup' is
> also suitable for this case, so I agree to change to
> 'execution.checkpointing.local-backup.*' if no one objects.
>
>  2.  What does the 'execution.checkpointing.data-inline-threshold' mean? It
> > seems not so easy to understand.
>
> The 'execution.checkpointing.data-inline-threshold' (original one as
> 'state.storage.fs.memory-threshold') stands for the size threshold below
> which state chunks will store inline with the metadata, thus I call it
> 'data-inline-threshold'.
>
>
> Best,
> Zakelly
>
> On Mon, Jan 8, 2024 at 10:09 AM Yun Tang  wrote:
>
> > Hi Zakelly,
> >
> > Thanks for driving this topic. I have two concerns here:
> >
> >   1.  We shall not describe the configuration with its implementation for
> > 'execution.checkpointing.local-copy.*' options, for hashmap
> state-backend,
> > it would write two streams and for Rocksdb state-backend, it would use
> > hard-link for backup. Thus, I think
> > 'execution.checkpointing.local-backup.*' looks better.
> >   2.  What does the 'execution.checkpointing.data-inline-threshold' mean?
> > It seems not so easy to understand.
> >
> > Best
> > Yun Tang
> > 
> > From: Piotr Nowojski 
> > Sent: Thursday, January 4, 2024 22:37
> > To: dev@flink.apache.org 
> > Subject: Re: [DISCUSS] FLIP-406: Reorganize State & Checkpointing &
> > Recovery Configuration
> >
> > Hi,
> >
> > Thanks for trying to clean this up! I don't have strong opinions on the
> > topics discussed here, so generally speaking +1 from my side!
> >
> > Best,
> > Piotrek
> >
> > śr., 3 sty 2024 o 04:16 Rui Fan <1996fan...@gmail.com> napisał(a):
> >
> > > Thanks for the feedback!
> > >
> > > Using the `execution.checkpointing.incremental.enabled`,
> > > and enabling it by default sounds good to me.
> > >
> > > Best,
> > > Rui
> > >
> > > On Wed, Jan 3, 2024 at 11:10 AM Zakelly Lan 
> > wrote:
> > >
> > > > Hi Rui,
> > > >
> > > > Thanks for your comments!
> > > >
> > > > IMO, given that the state backend can be plugably loaded (as you can
> > > > specify a state backend factory), I prefer not providing state
> backend
> > > > specified options in the framework.
> > > >
> > > > Secondly, the incremental checkpoint is actually a sharing file
> > strategy
> > > > across checkpoints, which means the state backend *could* reuse files
> > > from
> > > > previous cp but not *must* do so. When the state backend could not
> > reuse
> > > > the files, it is reasonable to fallback to a full checkpoint.
> > > >
> > > > Thus, I suggest we make it `execution.checkpointing.incremental` and
> > > enable
> > > > it by default. For those state backends not supporting this, they
> > perform
> > > > full checkpoints and print a warning to inform users. Users do not
> need
> > > to
> > > > pay special attention to different options to control this across
> > > different
> > > > state backends. This is more user-friendly in my opinion. WDYT?
> > > >
> > > > On Tue, Jan 2, 2024 at 10:49 AM Rui Fan <1996fan...@gmail.com>
> wrote:
> > > >
> > > > > Hi Zakelly,
> > > > >
> > > > > I'm not sure whether we could add the state backend type in the
> > > > > new key name of state.backend.incremental. It means we use
> > > > > `execution.checkpointing.rocksdb-incremental` or
> > > > > `execution.checkpointing.rocksdb-incremental.enabled`.
> > > > >
> > > > > So far, state.backend.incremental only works for rocksdb state
> > backend.
> > > > > And this feature or optimization is very valuable and huge for
> large
> > > > > state flink jobs. I believe it's enabled for most production flink
> > jobs
> > > > > with la

[jira] [Created] (FLINK-34031) Hive table sources report statistics in various formats

2024-01-08 Thread hanjie (Jira)

hanjie created FLINK-34031:
--

 Summary: Hive table sources report statistics in various formats
 Key: FLINK-34031
 URL: https://issues.apache.org/jira/browse/FLINK-34031
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: hanjie


Now for hive table source, reporting statistics only support Orc and Parquet 
formats.

Currently,  we have some text format hive table. Somewhile text hive table as 
dimension table， task should use broadcast join, but text format table cannot 
obtain table stats. 

So, hive table sources report statistics in various formats, such  as `text`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34032) Cleanup local=recovery dir when switching local-recovery from enabled to disabled

2024-01-08 Thread Hangxiang Yu (Jira)

Hangxiang Yu created FLINK-34032:


 Summary: Cleanup local=recovery dir when switching local-recovery 
from enabled to disabled
 Key: FLINK-34032
 URL: https://issues.apache.org/jira/browse/FLINK-34032
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Reporter: Hangxiang Yu


When switching local-recovery from enabled to disabled, the local-recovery dir 
could not be cleaned.

In particular, for a job that switched multiple times, lots of historical local 
checkpoints will be retained forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-405: Migrate string configuration key to ConfigOption

2024-01-08 Thread Xintong Song

+1 (binding)

Best,

Xintong



On Mon, Jan 8, 2024 at 1:48 PM Hang Ruan  wrote:

> +1(non-binding)
>
> Best,
> Hang
>
> Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 13:04写道：
>
> > +1(binding)
> >
> > Best,
> > Rui
> >
> > On Mon, Jan 8, 2024 at 1:00 PM Xuannan Su  wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for all the feedback about the FLIP-405: Migrate string
> > > configuration key to ConfigOption [1] [2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours(excluding weekends,until Jan 11, 12:00AM GMT) unless there is an
> > > objection or an insufficient number of votes.
> > >
> > >
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-405%3A+Migrate+string+configuration+key+to+ConfigOption
> > > [2] https://lists.apache.org/thread/zfw1b1g3679yn0ppjbsokfrsx9k7ybg0
> > >
> > >
> > > Best,
> > > Xuannan
> > >
> >
>

[jira] [Created] (FLINK-34033) flink json supports raw type

2024-01-08 Thread Jacky Lau (Jira)

Jacky Lau created FLINK-34033:
-

 Summary: flink json supports raw type 
 Key: FLINK-34033
 URL: https://issues.apache.org/jira/browse/FLINK-34033
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.19.0
Reporter: Jacky Lau
 Fix For: 1.19.0


when user use es type nested/object, user can using complex type like 
ROW/ARRAY/MAP.

but it will not convenient when the object type is not fixed size like ROW. for 
example, user my user udf to produce those data and insert to es. we can 
supports RAW type
{code:java}
CREATE TABLE es_sink (
 `string` VARCHAR, 
  nested RAW('java.lang.Object', 
'AEdvcmcuYXBhY2hlLmZsaW5rLmFwaS5qYXZhLnR5cGV1dGlscy5ydW50aW1lLmtyeW8uS3J5b1NlcmlhbGl6ZXJTbmFwc2hvdAIAEGphdmEubGFuZy5PYmplY3QAAATyxpo9cAIAEGphdmEubGFuZy5PYmplY3QBEgAQamF2YS5sYW5nLk9iamVjdAEWABBqYXZhLmxhbmcuT2JqZWN0AAApb3JnLmFwYWNoZS5hdnJvLmdlbmVyaWMuR2VuZXJpY0RhdGEkQXJyYXkBKwApb3JnLmFwYWNoZS5hdnJvLmdlbmVyaWMuR2VuZXJpY0RhdGEkQXJyYXkBtgBVb3JnLmFwYWNoZS5mbGluay5hcGkuamF2YS50eXBldXRpbHMucnVudGltZS5rcnlvLlNlcmlhbGl6ZXJzJER1bW15QXZyb1JlZ2lzdGVyZWRDbGFzcwEAWW9yZy5hcGFjaGUuZmxpbmsuYXBpLmphdmEudHlwZXV0aWxzLnJ1bnRpbWUua3J5by5TZXJpYWxpemVycyREdW1teUF2cm9LcnlvU2VyaWFsaXplckNsYXNzAAAE8saaPXAE8saaPXAA'),
  object RAW('java.lang.Object', 
'AEdvcmcuYXBhY2hlLmZsaW5rLmFwaS5qYXZhLnR5cGV1dGlscy5ydW50aW1lLmtyeW8uS3J5b1NlcmlhbGl6ZXJTbmFwc2hvdAIAEGphdmEubGFuZy5PYmplY3QAAATyxpo9cAIAEGphdmEubGFuZy5PYmplY3QBEgAQamF2YS5sYW5nLk9iamVjdAEWABBqYXZhLmxhbmcuT2JqZWN0AAApb3JnLmFwYWNoZS5hdnJvLmdlbmVyaWMuR2VuZXJpY0RhdGEkQXJyYXkBKwApb3JnLmFwYWNoZS5hdnJvLmdlbmVyaWMuR2VuZXJpY0RhdGEkQXJyYXkBtgBVb3JnLmFwYWNoZS5mbGluay5hcGkuamF2YS50eXBldXRpbHMucnVudGltZS5rcnlvLlNlcmlhbGl6ZXJzJER1bW15QXZyb1JlZ2lzdGVyZWRDbGFzcwEAWW9yZy5hcGFjaGUuZmxpbmsuYXBpLmphdmEudHlwZXV0aWxzLnJ1bnRpbWUua3J5by5TZXJpYWxpemVycyREdW1teUF2cm9LcnlvU2VyaWFsaXplckNsYXNzAAAE8saaPXAE8saaPXAA'),
  PRIMARY KEY (`string`) NOT ENFORCED
) WITH
('connector'='elasticsearch'); {code}
and es is dependent on flink-json currently, so we can make flink-json supports 
RAW type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Externalized Python Connector Release/Dependency Process

2024-01-08 Thread Leonard Xu

+1

Thanks Danny for driving this.

Best,
Leonard


> 2024年1月9日 上午2:01，Márton Balassi  写道：
> 
> +1
> 
> Thanks, Danny - I really appreciate you taking the time for the in-depth
> investigation. Please proceed, looking forward to your experience.
> 
> On Mon, Jan 8, 2024 at 6:04 PM Martijn Visser 
> wrote:
> 
>> Thanks for investigating Danny. It looks like the best direction to go to
>> :)
>> 
>> On Mon, Jan 8, 2024 at 5:56 PM Péter Váry 
>> wrote:
>>> 
>>> Thanks Danny for working on this!
>>> 
>>> It would be good to do this in a way that the different connectors could
>>> reuse as much code as possible, so if possible put most of the code to
>> the
>>> flink connector shared utils repo [1]
>>> 
>>> +1 from for the general direction (non-binding)
>>> 
>>> Thanks,
>>> Peter
>>> 
>>> [1] https://github.com/apache/flink-connector-shared-utils
>>> 
>>> 
>>> Danny Cranmer  ezt írta (időpont: 2024. jan.
>> 8.,
>>> H, 17:31):
>>> 
 Hello all,
 
 I have been working with Péter and Marton on externalizing python
 connectors [1] from the main repo to the connector repositories. We
>> have
 the code moved and the CI running tests for Kafka and AWS Connectors.
>> I am
 now looking into the release process.
 
 When we undertake a Flink release we perform the following steps [2],
 regarding Python: 1/ run python build on CI, 2/ download Wheels
>> artifacts,
 3/ upload artifacts to the dist and 4/ deploy to pypi. The plan is to
 follow the same steps for connectors, using Github actions instead of
>> Azure
 pipeline.
 
 Today we have a single pypi project for pyflink that contains all the
>> Flink
 libs, apache-flink [3]. I propose we create a new pypi project per
 connector using the existing connector version, and following naming
 convention: apache-, for example:
 apache-flink-connector-aws, apache-flink-connector-kafka. Therefore to
>> use
 a DataStream API connector in python, users would need to first
>> install the
 lib, for example "python -m pip install apache-flink-connector-aws".
 
 Once we have consensus I will update the release process and perform a
 release of the flink-connector-aws project to test it end-to-end. I
>> look
 forward to any feedback.
 
 Thanks,
 Danny
 
 [1] https://issues.apache.org/jira/browse/FLINK-33528
 [2]
 
>> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
 [3] https://pypi.org/project/apache-flink/
 
>>

[RESULT] [VOTE] FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-08 Thread Yong Fang

Hi devs,

I'm happy to announce that FLIP-398: Improve Serialization Configuration
And Usage In Flink[1] has been accepted with 4 approving votes (3 binding)
[2]:

 - Xintong Song (binding)
 - Zhanghao Chen (non-binding)
 - Zhu Zhu (binding)
 - weijie guo (binding)

There're no disapproving votes.

Thanks again to everyone who participated in the discussion and voting.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink
[2] https://lists.apache.org/thread/2xmcxs67xxzwool554fglrnklyvw348h

Best,
Fang Yong

Re: [Discuss] FLIP-407: Improve Flink Client performance in interactive scenarios

2024-01-08 Thread Weihua Hu

Thanks for proposing this FLIP.

Experiments have shown that it significantly enhances the real-time query
experience.
+1 for this.

Best,
Weihua


On Mon, Jan 8, 2024 at 5:19 PM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks Xiangyu for the quick update!
>
> LGTM
>
> Best,
> Rui
>
> On Mon, Jan 8, 2024 at 4:27 PM xiangyu feng  wrote:
>
> > Hi Rui and Yong,
> >
> > Thx for ur reply.
> >
> > My initial attention here is that for short-lived jobs under high QPS: a
> > fixed delay retry strategy will cause extra resource waste and not
> flexible
> > enough, an exponential-backoff strategy might significantly increase the
> > query latency since the interval time grows too fast. An
> incremental-delay
> > strategy could be balanced between resource consumption and short-query
> > latency.
> >
> > With a second thought,  an exponential-delay retry strategy with a
> > configurable multiplier option can also achieve this goal. By setting the
> > default value of multiplier to 1, we can be consistent with the original
> > behavior and reduce the configuration items at the same time.
> >
> > I've updated this FLIP accordingly, look forward to your feedback.
> >
> > Regards,
> > Xiangyu Feng
> >
> >
> > Rui Fan <1996fan...@gmail.com> 于2024年1月8日周一 15:29写道：
> >
> >> Only one strategy is fine to me.
> >>
> >> When the multiplier is set to 1, the exponential-delay will become
> >> fixed-delay.
> >> So fixed-delay may not be needed.
> >>
> >> Best,
> >> Rui
> >>
> >> On Mon, Jan 8, 2024 at 2:17 PM Yong Fang  wrote:
> >>
> >> > I agree with @Rui that the current configuration for Flink Client is a
> >> > little complex. Can we just provide one strategy with less
> configuration
> >> > items for all scenarios?
> >> >
> >> > Best,
> >> > Fang Yong
> >> >
> >> > On Mon, Jan 8, 2024 at 11:19 AM Rui Fan <1996fan...@gmail.com> wrote:
> >> >
> >> > > Thanks xiangyu for driving this proposal! And sorry for the
> >> > > late reply.
> >> > >
> >> > > Overall looks good to me, I only have some minor questions:
> >> > >
> >> > > 1. Do we need to introduce 3 collect strategies in the first
> version?
> >> > >
> >> > > Large and comprehensive configuration items will bring
> >> > > additional learning costs and usage costs to users. I tend to
> >> > > provide users with out-of-the-box parameters and 2 collect
> >> > > strategies may be enough for users.
> >> > >
> >> > > IIUC, there is no big difference between exponential-delay and
> >> > > incremental-delay, especially the default parameters provided.
> >> > > I wonder could we provide a multiplier for exponential-delay
> strategy
> >> > > and removing the incremental-delay strategy?
> >> > >
> >> > > Of course, if you think multiplier option is not needed based on
> >> > > your production experience, it's totally fine for me. Simple is
> >> better.
> >> > >
> >> > > 2. Which strategy do you think is best in mass production?
> >> > >
> >> > > I'm working on FLIP-364[1], it's related to Flink failover restart
> >> > > strategy. IIUC, when one cluster only has a few flink jobs,
> >> > > fixed-delay is fine. It guarantees minimal latency without too
> >> > > much stress. But if one cluster has too many jobs, fixed-delay
> >> > > may not be stable.
> >> > >
> >> > > Do you think exponential-delay is better than fixed delay in this
> >> > > scenario? And which strategy is used in your production for now?
> >> > > Would you mind sharing it?
> >> > >
> >> > > Looking forwarding to your opinion~
> >> > >
> >> > > Best,
> >> > > Rui
> >> > >
> >> > > On Sat, Jan 6, 2024 at 5:54 PM xiangyu feng 
> >> > wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > Thanks for the comments.
> >> > > >
> >> > > > If there is no further comment, we will open the voting thread
> next
> >> > week.
> >> > > >
> >> > > > Regards,
> >> > > > Xiangyu
> >> > > >
> >> > > > Zhanghao Chen  于2024年1月3日周三 16:46写道：
> >> > > >
> >> > > > > Thanks for driving this effort on improving the interactive use
> >> > > > experience
> >> > > > > of Flink. The proposal overall looks good to me.
> >> > > > >
> >> > > > > Best,
> >> > > > > Zhanghao Chen
> >> > > > > 
> >> > > > > From: xiangyu feng 
> >> > > > > Sent: Tuesday, December 26, 2023 16:51
> >> > > > > To: dev@flink.apache.org 
> >> > > > > Subject: [Discuss] FLIP-407: Improve Flink Client performance in
> >> > > > > interactive scenarios
> >> > > > >
> >> > > > > Hi devs,
> >> > > > >
> >> > > > > I'm opening this thread to discuss FLIP-407: Improve Flink
> Client
> >> > > > > performance in interactive scenarios. The POC test results and
> >> design
> >> > > doc
> >> > > > > can be found at: FLIP-407
> >> > > > > <
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters
> >> > > > > >
> >> > > > > .
> >> > > > >
> >> > > > > Currently, Flink Client is mainly designed for one time
> >> interacti

[jira] [Created] (FLINK-34034) When kv hint and list hint handle duplicate query hints, the results are different.

2024-01-08 Thread xuyang (Jira)

xuyang created FLINK-34034:
--

 Summary: When kv hint and list hint handle duplicate query hints, 
the results are different.
 Key: FLINK-34034
 URL: https://issues.apache.org/jira/browse/FLINK-34034
 Project: Flink
  Issue Type: Bug
Reporter: xuyang


When there are duplicate keys in the kv hint, calcite will overwrite the 
previous value with the later value.
{code:java}
@TestTemplate
def test(): Unit = {
  val sql =
"SELECT /*+ LOOKUP('table'='D', 'retry-predicate'='lookup_miss', 
'retry-strategy'='fixed_delay', 'fixed-delay'='10s','max-attempts'='3', 
'max-attempts'='4') */ * FROM MyTable AS T JOIN LookupTable " +
  "FOR SYSTEM_TIME AS OF T.proctime AS D ON T.a = D.id"
  util.verifyExecPlan(sql)
} {code}
{code:java}
Calc(select=[a, b, c, PROCTIME_MATERIALIZE(proctime) AS proctime, rowtime, id, 
name, age]) 
  +- LookupJoin(table=[default_catalog.default_database.LookupTable], 
joinType=[InnerJoin], lookup=[id=a], select=[a, b, c, proctime, rowtime, id, 
name, age], retry=[lookup_miss, FIXED_DELAY, 1ms, 4]) 
+- DataStreamScan(table=[[default_catalog, default_database, MyTable]], 
fields=[a, b, c, proctime, rowtime])
{code}
But when a list hint is duplicated (such as a join hint), we will choose the 
first one as the effective hint.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Leonard Xu

Hello all,

This is the official vote whether to accept the Flink CDC code contribution
 to Apache Flink.

The current Flink CDC code, documentation, and website can be
found here:
code: https://github.com/ververica/flink-cdc-connectors 

docs: https://ververica.github.io/flink-cdc-connectors/ 


This vote should capture whether the Apache Flink community is interested
in accepting, maintaining, and evolving Flink CDC.

Regarding my original proposal[1] in the dev mailing list, I firmly believe
that this initiative aligns perfectly with Flink. For the Flink community,
it represents an opportunity to bolster Flink's competitive edge in streaming
data integration, fostering the robust growth and prosperity of the Apache Flink
ecosystem. For the Flink CDC project, becoming a sub-project of Apache Flink
means becoming an integral part of a neutral open-source community, capable of 
attracting a more diverse pool of contributors.

All Flink CDC maintainers are dedicated to continuously contributing to achieve 
seamless integration with Flink. Additionally, PMC members like Jark, 
Qingsheng, 
and I are willing to infacilitate the expansion of contributors and committers 
to 
effectively maintain this new sub-project.

This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
Only PMC votes are binding. The vote will be open at least 7 days
(excluding weekends), meaning until Thursday January 18 12:00 UTC, or until we
achieve the 2/3rd majority. We will follow the instructions in the Flink Bylaws
in the case of insufficient active binding voters:

> 1. Wait until the minimum length of the voting passes.
> 2. Publicly reach out via personal email to the remaining binding voters in 
> the
voting mail thread for at least 2 attempts with at least 7 days between two 
attempts.
> 3. If the binding voter being contacted still failed to respond after all the 
> attempts,
the binding voter will be considered as inactive for the purpose of this 
particular voting.

Welcome voting !

Best,
Leonard
[1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l 
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread tison

+1 non-binding

Best,
tison.

Leonard Xu  于2024年1月9日周二 15:05写道：
>
> Hello all,
>
> This is the official vote whether to accept the Flink CDC code contribution
>  to Apache Flink.
>
> The current Flink CDC code, documentation, and website can be
> found here:
> code: https://github.com/ververica/flink-cdc-connectors 
> 
> docs: https://ververica.github.io/flink-cdc-connectors/ 
> 
>
> This vote should capture whether the Apache Flink community is interested
> in accepting, maintaining, and evolving Flink CDC.
>
> Regarding my original proposal[1] in the dev mailing list, I firmly believe
> that this initiative aligns perfectly with Flink. For the Flink community,
> it represents an opportunity to bolster Flink's competitive edge in streaming
> data integration, fostering the robust growth and prosperity of the Apache 
> Flink
> ecosystem. For the Flink CDC project, becoming a sub-project of Apache Flink
> means becoming an integral part of a neutral open-source community, capable of
> attracting a more diverse pool of contributors.
>
> All Flink CDC maintainers are dedicated to continuously contributing to 
> achieve
> seamless integration with Flink. Additionally, PMC members like Jark, 
> Qingsheng,
> and I are willing to infacilitate the expansion of contributors and 
> committers to
> effectively maintain this new sub-project.
>
> This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
> Only PMC votes are binding. The vote will be open at least 7 days
> (excluding weekends), meaning until Thursday January 18 12:00 UTC, or until we
> achieve the 2/3rd majority. We will follow the instructions in the Flink 
> Bylaws
> in the case of insufficient active binding voters:
>
> > 1. Wait until the minimum length of the voting passes.
> > 2. Publicly reach out via personal email to the remaining binding voters in 
> > the
> voting mail thread for at least 2 attempts with at least 7 days between two 
> attempts.
> > 3. If the binding voter being contacted still failed to respond after all 
> > the attempts,
> the binding voter will be considered as inactive for the purpose of this 
> particular voting.
>
> Welcome voting !
>
> Best,
> Leonard
> [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Qingsheng Ren

+1 (binding)

Thanks for driving this, Leonard!

Best,
Qingsheng

On Tue, Jan 9, 2024 at 3:06 PM tison  wrote:

> +1 non-binding
>
> Best,
> tison.
>
> Leonard Xu  于2024年1月9日周二 15:05写道：
> >
> > Hello all,
> >
> > This is the official vote whether to accept the Flink CDC code
> contribution
> >  to Apache Flink.
> >
> > The current Flink CDC code, documentation, and website can be
> > found here:
> > code: https://github.com/ververica/flink-cdc-connectors <
> https://github.com/ververica/flink-cdc-connectors>
> > docs: https://ververica.github.io/flink-cdc-connectors/ <
> https://ververica.github.io/flink-cdc-connectors/>
> >
> > This vote should capture whether the Apache Flink community is interested
> > in accepting, maintaining, and evolving Flink CDC.
> >
> > Regarding my original proposal[1] in the dev mailing list, I firmly
> believe
> > that this initiative aligns perfectly with Flink. For the Flink
> community,
> > it represents an opportunity to bolster Flink's competitive edge in
> streaming
> > data integration, fostering the robust growth and prosperity of the
> Apache Flink
> > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> Flink
> > means becoming an integral part of a neutral open-source community,
> capable of
> > attracting a more diverse pool of contributors.
> >
> > All Flink CDC maintainers are dedicated to continuously contributing to
> achieve
> > seamless integration with Flink. Additionally, PMC members like Jark,
> Qingsheng,
> > and I are willing to infacilitate the expansion of contributors and
> committers to
> > effectively maintain this new sub-project.
> >
> > This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
> > Only PMC votes are binding. The vote will be open at least 7 days
> > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> until we
> > achieve the 2/3rd majority. We will follow the instructions in the Flink
> Bylaws
> > in the case of insufficient active binding voters:
> >
> > > 1. Wait until the minimum length of the voting passes.
> > > 2. Publicly reach out via personal email to the remaining binding
> voters in the
> > voting mail thread for at least 2 attempts with at least 7 days between
> two attempts.
> > > 3. If the binding voter being contacted still failed to respond after
> all the attempts,
> > the binding voter will be considered as inactive for the purpose of this
> particular voting.
> >
> > Welcome voting !
> >
> > Best,
> > Leonard
> > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Yuan Mei

+1

Best,
Yuan

On Tue, Jan 9, 2024 at 3:06 PM tison  wrote:

> +1 non-binding
>
> Best,
> tison.
>
> Leonard Xu  于2024年1月9日周二 15:05写道：
> >
> > Hello all,
> >
> > This is the official vote whether to accept the Flink CDC code
> contribution
> >  to Apache Flink.
> >
> > The current Flink CDC code, documentation, and website can be
> > found here:
> > code: https://github.com/ververica/flink-cdc-connectors <
> https://github.com/ververica/flink-cdc-connectors>
> > docs: https://ververica.github.io/flink-cdc-connectors/ <
> https://ververica.github.io/flink-cdc-connectors/>
> >
> > This vote should capture whether the Apache Flink community is interested
> > in accepting, maintaining, and evolving Flink CDC.
> >
> > Regarding my original proposal[1] in the dev mailing list, I firmly
> believe
> > that this initiative aligns perfectly with Flink. For the Flink
> community,
> > it represents an opportunity to bolster Flink's competitive edge in
> streaming
> > data integration, fostering the robust growth and prosperity of the
> Apache Flink
> > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> Flink
> > means becoming an integral part of a neutral open-source community,
> capable of
> > attracting a more diverse pool of contributors.
> >
> > All Flink CDC maintainers are dedicated to continuously contributing to
> achieve
> > seamless integration with Flink. Additionally, PMC members like Jark,
> Qingsheng,
> > and I are willing to infacilitate the expansion of contributors and
> committers to
> > effectively maintain this new sub-project.
> >
> > This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
> > Only PMC votes are binding. The vote will be open at least 7 days
> > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> until we
> > achieve the 2/3rd majority. We will follow the instructions in the Flink
> Bylaws
> > in the case of insufficient active binding voters:
> >
> > > 1. Wait until the minimum length of the voting passes.
> > > 2. Publicly reach out via personal email to the remaining binding
> voters in the
> > voting mail thread for at least 2 attempts with at least 7 days between
> two attempts.
> > > 3. If the binding voter being contacted still failed to respond after
> all the attempts,
> > the binding voter will be considered as inactive for the purpose of this
> particular voting.
> >
> > Welcome voting !
> >
> > Best,
> > Leonard
> > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Yuan Mei

+1 binding



On Tue, Jan 9, 2024 at 3:21 PM Yuan Mei  wrote:

> +1
>
> Best,
> Yuan
>
> On Tue, Jan 9, 2024 at 3:06 PM tison  wrote:
>
>> +1 non-binding
>>
>> Best,
>> tison.
>>
>> Leonard Xu  于2024年1月9日周二 15:05写道：
>> >
>> > Hello all,
>> >
>> > This is the official vote whether to accept the Flink CDC code
>> contribution
>> >  to Apache Flink.
>> >
>> > The current Flink CDC code, documentation, and website can be
>> > found here:
>> > code: https://github.com/ververica/flink-cdc-connectors <
>> https://github.com/ververica/flink-cdc-connectors>
>> > docs: https://ververica.github.io/flink-cdc-connectors/ <
>> https://ververica.github.io/flink-cdc-connectors/>
>> >
>> > This vote should capture whether the Apache Flink community is
>> interested
>> > in accepting, maintaining, and evolving Flink CDC.
>> >
>> > Regarding my original proposal[1] in the dev mailing list, I firmly
>> believe
>> > that this initiative aligns perfectly with Flink. For the Flink
>> community,
>> > it represents an opportunity to bolster Flink's competitive edge in
>> streaming
>> > data integration, fostering the robust growth and prosperity of the
>> Apache Flink
>> > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
>> Flink
>> > means becoming an integral part of a neutral open-source community,
>> capable of
>> > attracting a more diverse pool of contributors.
>> >
>> > All Flink CDC maintainers are dedicated to continuously contributing to
>> achieve
>> > seamless integration with Flink. Additionally, PMC members like Jark,
>> Qingsheng,
>> > and I are willing to infacilitate the expansion of contributors and
>> committers to
>> > effectively maintain this new sub-project.
>> >
>> > This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
>> > Only PMC votes are binding. The vote will be open at least 7 days
>> > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
>> until we
>> > achieve the 2/3rd majority. We will follow the instructions in the
>> Flink Bylaws
>> > in the case of insufficient active binding voters:
>> >
>> > > 1. Wait until the minimum length of the voting passes.
>> > > 2. Publicly reach out via personal email to the remaining binding
>> voters in the
>> > voting mail thread for at least 2 attempts with at least 7 days between
>> two attempts.
>> > > 3. If the binding voter being contacted still failed to respond after
>> all the attempts,
>> > the binding voter will be considered as inactive for the purpose of
>> this particular voting.
>> >
>> > Welcome voting !
>> >
>> > Best,
>> > Leonard
>> > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
>> > [2]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>>
>

[jira] [Created] (FLINK-34035) when flinksql with group by partition some errors field occured in jobmanager.log

2024-01-08 Thread hansonhe (Jira)

hansonhe created FLINK-34035:


 Summary: when flinksql with group by partition some errors field 
occured in jobmanager.log
 Key: FLINK-34035
 URL: https://issues.apache.org/jira/browse/FLINK-34035
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.17.1
Reporter: hansonhe






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Jingsong Li

+1

On Tue, Jan 9, 2024 at 3:23 PM Yuan Mei  wrote:
>
> +1 binding
>
>
>
> On Tue, Jan 9, 2024 at 3:21 PM Yuan Mei  wrote:
>
> > +1
> >
> > Best,
> > Yuan
> >
> > On Tue, Jan 9, 2024 at 3:06 PM tison  wrote:
> >
> >> +1 non-binding
> >>
> >> Best,
> >> tison.
> >>
> >> Leonard Xu  于2024年1月9日周二 15:05写道：
> >> >
> >> > Hello all,
> >> >
> >> > This is the official vote whether to accept the Flink CDC code
> >> contribution
> >> >  to Apache Flink.
> >> >
> >> > The current Flink CDC code, documentation, and website can be
> >> > found here:
> >> > code: https://github.com/ververica/flink-cdc-connectors <
> >> https://github.com/ververica/flink-cdc-connectors>
> >> > docs: https://ververica.github.io/flink-cdc-connectors/ <
> >> https://ververica.github.io/flink-cdc-connectors/>
> >> >
> >> > This vote should capture whether the Apache Flink community is
> >> interested
> >> > in accepting, maintaining, and evolving Flink CDC.
> >> >
> >> > Regarding my original proposal[1] in the dev mailing list, I firmly
> >> believe
> >> > that this initiative aligns perfectly with Flink. For the Flink
> >> community,
> >> > it represents an opportunity to bolster Flink's competitive edge in
> >> streaming
> >> > data integration, fostering the robust growth and prosperity of the
> >> Apache Flink
> >> > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> >> Flink
> >> > means becoming an integral part of a neutral open-source community,
> >> capable of
> >> > attracting a more diverse pool of contributors.
> >> >
> >> > All Flink CDC maintainers are dedicated to continuously contributing to
> >> achieve
> >> > seamless integration with Flink. Additionally, PMC members like Jark,
> >> Qingsheng,
> >> > and I are willing to infacilitate the expansion of contributors and
> >> committers to
> >> > effectively maintain this new sub-project.
> >> >
> >> > This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
> >> > Only PMC votes are binding. The vote will be open at least 7 days
> >> > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> >> until we
> >> > achieve the 2/3rd majority. We will follow the instructions in the
> >> Flink Bylaws
> >> > in the case of insufficient active binding voters:
> >> >
> >> > > 1. Wait until the minimum length of the voting passes.
> >> > > 2. Publicly reach out via personal email to the remaining binding
> >> voters in the
> >> > voting mail thread for at least 2 attempts with at least 7 days between
> >> two attempts.
> >> > > 3. If the binding voter being contacted still failed to respond after
> >> all the attempts,
> >> > the binding voter will be considered as inactive for the purpose of
> >> this particular voting.
> >> >
> >> > Welcome voting !
> >> >
> >> > Best,
> >> > Leonard
> >> > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> >> > [2]
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> >>
> >

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Benchao Li

+1 (non-binding)

Feng Wang  于2024年1月9日周二 15:29写道：
>
> +1 non-binding
> Regards,
> Feng
>
> On Tue, Jan 9, 2024 at 3:05 PM Leonard Xu  wrote:
>
> > Hello all,
> >
> > This is the official vote whether to accept the Flink CDC code contribution
> >  to Apache Flink.
> >
> > The current Flink CDC code, documentation, and website can be
> > found here:
> > code: https://github.com/ververica/flink-cdc-connectors <
> > https://github.com/ververica/flink-cdc-connectors>
> > docs: https://ververica.github.io/flink-cdc-connectors/ <
> > https://ververica.github.io/flink-cdc-connectors/>
> >
> > This vote should capture whether the Apache Flink community is interested
> > in accepting, maintaining, and evolving Flink CDC.
> >
> > Regarding my original proposal[1] in the dev mailing list, I firmly believe
> > that this initiative aligns perfectly with Flink. For the Flink community,
> > it represents an opportunity to bolster Flink's competitive edge in
> > streaming
> > data integration, fostering the robust growth and prosperity of the Apache
> > Flink
> > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> > Flink
> > means becoming an integral part of a neutral open-source community,
> > capable of
> > attracting a more diverse pool of contributors.
> >
> > All Flink CDC maintainers are dedicated to continuously contributing to
> > achieve
> > seamless integration with Flink. Additionally, PMC members like Jark,
> > Qingsheng,
> > and I are willing to infacilitate the expansion of contributors and
> > committers to
> > effectively maintain this new sub-project.
> >
> > This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
> > Only PMC votes are binding. The vote will be open at least 7 days
> > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> > until we
> > achieve the 2/3rd majority. We will follow the instructions in the Flink
> > Bylaws
> > in the case of insufficient active binding voters:
> >
> > > 1. Wait until the minimum length of the voting passes.
> > > 2. Publicly reach out via personal email to the remaining binding voters
> > in the
> > voting mail thread for at least 2 attempts with at least 7 days between
> > two attempts.
> > > 3. If the binding voter being contacted still failed to respond after
> > all the attempts,
> > the binding voter will be considered as inactive for the purpose of this
> > particular voting.
> >
> > Welcome voting !
> >
> > Best,
> > Leonard
> > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > [2]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026



-- 

Best,
Benchao Li

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Feng Wang

+1 non-binding
Regards,
Feng

On Tue, Jan 9, 2024 at 3:05 PM Leonard Xu  wrote:

> Hello all,
>
> This is the official vote whether to accept the Flink CDC code contribution
>  to Apache Flink.
>
> The current Flink CDC code, documentation, and website can be
> found here:
> code: https://github.com/ververica/flink-cdc-connectors <
> https://github.com/ververica/flink-cdc-connectors>
> docs: https://ververica.github.io/flink-cdc-connectors/ <
> https://ververica.github.io/flink-cdc-connectors/>
>
> This vote should capture whether the Apache Flink community is interested
> in accepting, maintaining, and evolving Flink CDC.
>
> Regarding my original proposal[1] in the dev mailing list, I firmly believe
> that this initiative aligns perfectly with Flink. For the Flink community,
> it represents an opportunity to bolster Flink's competitive edge in
> streaming
> data integration, fostering the robust growth and prosperity of the Apache
> Flink
> ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> Flink
> means becoming an integral part of a neutral open-source community,
> capable of
> attracting a more diverse pool of contributors.
>
> All Flink CDC maintainers are dedicated to continuously contributing to
> achieve
> seamless integration with Flink. Additionally, PMC members like Jark,
> Qingsheng,
> and I are willing to infacilitate the expansion of contributors and
> committers to
> effectively maintain this new sub-project.
>
> This is a "Adoption of a new Codebase" vote as per the Flink bylaws [2].
> Only PMC votes are binding. The vote will be open at least 7 days
> (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> until we
> achieve the 2/3rd majority. We will follow the instructions in the Flink
> Bylaws
> in the case of insufficient active binding voters:
>
> > 1. Wait until the minimum length of the voting passes.
> > 2. Publicly reach out via personal email to the remaining binding voters
> in the
> voting mail thread for at least 2 attempts with at least 7 days between
> two attempts.
> > 3. If the binding voter being contacted still failed to respond after
> all the attempts,
> the binding voter will be considered as inactive for the purpose of this
> particular voting.
>
> Welcome voting !
>
> Best,
> Leonard
> [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Xintong Song

+1 (binding)

Best,

Xintong



On Tue, Jan 9, 2024 at 3:31 PM Benchao Li  wrote:

> +1 (non-binding)
>
> Feng Wang  于2024年1月9日周二 15:29写道：
> >
> > +1 non-binding
> > Regards,
> > Feng
> >
> > On Tue, Jan 9, 2024 at 3:05 PM Leonard Xu  wrote:
> >
> > > Hello all,
> > >
> > > This is the official vote whether to accept the Flink CDC code
> contribution
> > >  to Apache Flink.
> > >
> > > The current Flink CDC code, documentation, and website can be
> > > found here:
> > > code: https://github.com/ververica/flink-cdc-connectors <
> > > https://github.com/ververica/flink-cdc-connectors>
> > > docs: https://ververica.github.io/flink-cdc-connectors/ <
> > > https://ververica.github.io/flink-cdc-connectors/>
> > >
> > > This vote should capture whether the Apache Flink community is
> interested
> > > in accepting, maintaining, and evolving Flink CDC.
> > >
> > > Regarding my original proposal[1] in the dev mailing list, I firmly
> believe
> > > that this initiative aligns perfectly with Flink. For the Flink
> community,
> > > it represents an opportunity to bolster Flink's competitive edge in
> > > streaming
> > > data integration, fostering the robust growth and prosperity of the
> Apache
> > > Flink
> > > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> > > Flink
> > > means becoming an integral part of a neutral open-source community,
> > > capable of
> > > attracting a more diverse pool of contributors.
> > >
> > > All Flink CDC maintainers are dedicated to continuously contributing to
> > > achieve
> > > seamless integration with Flink. Additionally, PMC members like Jark,
> > > Qingsheng,
> > > and I are willing to infacilitate the expansion of contributors and
> > > committers to
> > > effectively maintain this new sub-project.
> > >
> > > This is a "Adoption of a new Codebase" vote as per the Flink bylaws
> [2].
> > > Only PMC votes are binding. The vote will be open at least 7 days
> > > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> > > until we
> > > achieve the 2/3rd majority. We will follow the instructions in the
> Flink
> > > Bylaws
> > > in the case of insufficient active binding voters:
> > >
> > > > 1. Wait until the minimum length of the voting passes.
> > > > 2. Publicly reach out via personal email to the remaining binding
> voters
> > > in the
> > > voting mail thread for at least 2 attempts with at least 7 days between
> > > two attempts.
> > > > 3. If the binding voter being contacted still failed to respond after
> > > all the attempts,
> > > the binding voter will be considered as inactive for the purpose of
> this
> > > particular voting.
> > >
> > > Welcome voting !
> > >
> > > Best,
> > > Leonard
> > > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > > [2]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>
>
>
> --
>
> Best,
> Benchao Li
>

Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-08 Thread Jark Wu

+1 (binding)

Best,
Jark

On Tue, 9 Jan 2024 at 15:31, Benchao Li  wrote:

> +1 (non-binding)
>
> Feng Wang  于2024年1月9日周二 15:29写道：
> >
> > +1 non-binding
> > Regards,
> > Feng
> >
> > On Tue, Jan 9, 2024 at 3:05 PM Leonard Xu  wrote:
> >
> > > Hello all,
> > >
> > > This is the official vote whether to accept the Flink CDC code
> contribution
> > >  to Apache Flink.
> > >
> > > The current Flink CDC code, documentation, and website can be
> > > found here:
> > > code: https://github.com/ververica/flink-cdc-connectors <
> > > https://github.com/ververica/flink-cdc-connectors>
> > > docs: https://ververica.github.io/flink-cdc-connectors/ <
> > > https://ververica.github.io/flink-cdc-connectors/>
> > >
> > > This vote should capture whether the Apache Flink community is
> interested
> > > in accepting, maintaining, and evolving Flink CDC.
> > >
> > > Regarding my original proposal[1] in the dev mailing list, I firmly
> believe
> > > that this initiative aligns perfectly with Flink. For the Flink
> community,
> > > it represents an opportunity to bolster Flink's competitive edge in
> > > streaming
> > > data integration, fostering the robust growth and prosperity of the
> Apache
> > > Flink
> > > ecosystem. For the Flink CDC project, becoming a sub-project of Apache
> > > Flink
> > > means becoming an integral part of a neutral open-source community,
> > > capable of
> > > attracting a more diverse pool of contributors.
> > >
> > > All Flink CDC maintainers are dedicated to continuously contributing to
> > > achieve
> > > seamless integration with Flink. Additionally, PMC members like Jark,
> > > Qingsheng,
> > > and I are willing to infacilitate the expansion of contributors and
> > > committers to
> > > effectively maintain this new sub-project.
> > >
> > > This is a "Adoption of a new Codebase" vote as per the Flink bylaws
> [2].
> > > Only PMC votes are binding. The vote will be open at least 7 days
> > > (excluding weekends), meaning until Thursday January 18 12:00 UTC, or
> > > until we
> > > achieve the 2/3rd majority. We will follow the instructions in the
> Flink
> > > Bylaws
> > > in the case of insufficient active binding voters:
> > >
> > > > 1. Wait until the minimum length of the voting passes.
> > > > 2. Publicly reach out via personal email to the remaining binding
> voters
> > > in the
> > > voting mail thread for at least 2 attempts with at least 7 days between
> > > two attempts.
> > > > 3. If the binding voter being contacted still failed to respond after
> > > all the attempts,
> > > the binding voter will be considered as inactive for the purpose of
> this
> > > particular voting.
> > >
> > > Welcome voting !
> > >
> > > Best,
> > > Leonard
> > > [1] https://lists.apache.org/thread/o7klnbsotmmql999bnwmdgo56b6kxx9l
> > > [2]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>
>
>
> --
>
> Best,
> Benchao Li
>

[jira] [Created] (FLINK-34036) Various HiveDialectQueryITCase tests fail in GitHub Actions workflow with Hadoop 3.1.3 enabled

2024-01-08 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-34036:
-

 Summary: Various HiveDialectQueryITCase tests fail in GitHub 
Actions workflow with Hadoop 3.1.3 enabled
 Key: FLINK-34036
 URL: https://issues.apache.org/jira/browse/FLINK-34036
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hadoop Compatibility, Connectors / Hive
Affects Versions: 1.19.0
Reporter: Matthias Pohl


The following {{HiveDialectQueryITCase}} tests fail consistently in the 
FLINK-27075 GitHub Actions nightly workflow of Flink on {{master}} (but not 
{{release-1.18}}!):
* {{testInsertDirectory}}
* {{testCastTimeStampToDecimal}}
* {{testNullLiteralAsArgument}}
{code}
Error: 03:38:45 03:38:45.661 [ERROR] Tests run: 22, Failures: 1, Errors: 2, 
Skipped: 0, Time elapsed: 379.0 s <<< FAILURE! -- in 
org.apache.flink.connectors.hive.HiveDialectQueryITCase
Error: 03:38:45 03:38:45.662 [ERROR] 
org.apache.flink.connectors.hive.HiveDialectQueryITCase.testNullLiteralAsArgument
 -- Time elapsed: 0.069 s <<< ERROR!
Jan 09 03:38:45 java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(Ljava/lang/Object;Lorg/apache/hadoop/hive/serde2/objectinspector/PrimitiveObjectInspector;)Ljava/sql/Timestamp;
Jan 09 03:38:45 at 
org.apache.flink.connectors.hive.HiveDialectQueryITCase.testNullLiteralAsArgument(HiveDialectQueryITCase.java:959)
Jan 09 03:38:45 at java.lang.reflect.Method.invoke(Method.java:498)
Jan 09 03:38:45 
Error: 03:38:45 03:38:45.662 [ERROR] 
org.apache.flink.connectors.hive.HiveDialectQueryITCase.testCastTimeStampToDecimal
 -- Time elapsed: 0.007 s <<< ERROR!
Jan 09 03:38:45 org.apache.flink.table.api.ValidationException: Table with 
identifier 'test-catalog.default.t1' does not exist.
Jan 09 03:38:45 at 
org.apache.flink.table.catalog.CatalogManager.dropTableInternal(CatalogManager.java:1266)
Jan 09 03:38:45 at 
org.apache.flink.table.catalog.CatalogManager.dropTable(CatalogManager.java:1206)
Jan 09 03:38:45 at 
org.apache.flink.table.operations.ddl.DropTableOperation.execute(DropTableOperation.java:74)
Jan 09 03:38:45 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1107)
Jan 09 03:38:45 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:735)
Jan 09 03:38:45 at 
org.apache.flink.connectors.hive.HiveDialectQueryITCase.testCastTimeStampToDecimal(HiveDialectQueryITCase.java:835)
Jan 09 03:38:45 at java.lang.reflect.Method.invoke(Method.java:498)
Jan 09 03:38:45 
Error: 03:38:45 03:38:45.663 [ERROR] 
org.apache.flink.connectors.hive.HiveDialectQueryITCase.testInsertDirectory -- 
Time elapsed: 7.326 s <<< FAILURE!
Jan 09 03:38:45 org.opentest4j.AssertionFailedError: 
Jan 09 03:38:45 
Jan 09 03:38:45 expected: "A:english=90#math=100#history=85"
Jan 09 03:38:45  but was: "A:english=90math=100history=85"
Jan 09 03:38:45 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Jan 09 03:38:45 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Jan 09 03:38:45 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Jan 09 03:38:45 at 
org.apache.flink.connectors.hive.HiveDialectQueryITCase.testInsertDirectory(HiveDialectQueryITCase.java:498)
Jan 09 03:38:45 at java.lang.reflect.Method.invoke(Method.java:498)
{code}

The most-recent build failures in GHA workflow failures are:
* 
https://github.com/XComp/flink/actions/runs/7455836411/job/20285758541#step:12:23332
* 
https://github.com/XComp/flink/actions/runs/7447254277/job/20259593089#step:12:23378
* 
https://github.com/XComp/flink/actions/runs/7442459819/job/20246101021#step:12:23332
* 
https://github.com/XComp/flink/actions/runs/7438111934/job/20236674470#step:12:23375
* 
https://github.com/XComp/flink/actions/runs/7435499743/job/20231030744#step:12:23367
Interestingly, the failure doesn't appear in the Azure Pipelines nightlies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34037) FLIP-398: Improve Serialization Configuration And Usage In Flink

2024-01-08 Thread Fang Yong (Jira)

Fang Yong created FLINK-34037:
-

 Summary: FLIP-398: Improve Serialization Configuration And Usage 
In Flink
 Key: FLINK-34037
 URL: https://issues.apache.org/jira/browse/FLINK-34037
 Project: Flink
  Issue Type: Improvement
  Components: API / Type Serialization System, Runtime / Configuration
Affects Versions: 1.19.0
Reporter: Fang Yong


Improve serialization in 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-398%3A+Improve+Serialization+Configuration+And+Usage+In+Flink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

61 matches

Mail list logo