date:20231024

[jira] [Created] (FLINK-33345) Sql gateway only single statement supported

2023-10-24 Thread hunter (Jira)

hunter created FLINK-33345:
--

 Summary: Sql gateway only single statement supported
 Key: FLINK-33345
 URL: https://issues.apache.org/jira/browse/FLINK-33345
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Gateway
Reporter: hunter


When using sql gateway, when submitting multiple sql at one time, an error is 
reported: only single statement supported. I don't quite understand why only 
one sql must be accepted 
[here|https://github.com/apache/flink/blob/fa0dd3559e9697e21795a2634d8d99a0b7efdcf3/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/delegation/ParserImpl.java#L101].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Support configuration to disable filter pushdown for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Xuyang,

If we only add configuration without adding the enableFilterPushDown method in 
the SupportsFilterPushDown interface, 
each connector would have to handle the same logic in the applyFilters method 
to determine whether filter pushdown is needed. 
This would increase complexity and violate the original behavior of the 
applyFilters method. 

On the contrary, we only need to pass the configuration parameter in the newly 
added enableFilterPushDown method 
to decide whether to perform predicate pushdown. 

I think this approach would be clearer and simpler.

Best,
Jiabao


> 2023年10月24日 14:34，Jiabao Sun  写道：
> 
> Thanks Xuyang,
> 
> The table.optimizer.source.predicate-pushdown-enabled options do not provide 
> fine-grained configuration for each source.
> 
> Suppose we have an SQL query with two sources: Kafka and a database (CDC). 
> The database is sensitive to pressure, and we want to configure it to not 
> perform filter pushdown to the database source. 
> However, we still want to perform filter pushdown to the Kafka source to 
> decrease network IO.
> 
> 
> Best,
> Jiabao 
> 
> 
>> 2023年10月24日 14:24，Xuyang  写道：
>> 
>> Hi, the existant configuration 
>> 'table.optimizer.source.predicate-pushdown-enabled' seems to do what you 
>> want. 
>> Can you describe more clearly the difference between what you want and this 
>> configuration ?
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> 
>>   Best！
>>   Xuyang
>> 
>> 
>> 
>> 
>> 
>> At 2023-10-24 14:12:14, "Jiabao Sun"  wrote:
>>> Hi Devs,
>>> 
>>> I would like to start a discussion on support configuration to disable 
>>> filter pushdown for Table/SQL Sources[1].
>>> 
>>> Currently, Flink SQL does not support the ability for users to enable or 
>>> disable filter pushdown. 
>>> However, filter pushdown has some side effects, such as additional 
>>> computational pressure on external systems. 
>>> Moreover, Improper queries can lead to issues such as full table scans, 
>>> which in turn can impact the stability of external systems.
>>> 
>>> I propose to support configuration to disable filter push down for 
>>> Table/SQL sources to let user decide whether to perform filter pushdown.
>>> 
>>> Looking forward to your feedback.
>>> 
>>> [1] 
>>> https://docs.google.com/document/d/1QsbOi9InvmfwFr8YbrnnXOKLPnb8JnqhXIMbGd68SFU/edit?usp=sharing
>>> 
>>> Best,
>>> Jiabao
>

Re: [VOTE] Release 1.18.0, release candidate #3

2023-10-24 Thread Hang Ruan

+1(non-binding)

- verified signatures & hash
- build from the source code succeed with jdk 8
- Reviewed release note
- Started a standalone cluster and submitted a Flink SQL job that read and
wrote with Kafka connector and JSON format

Best,
Hang

Samrat Deb  于2023年10月24日周二 14:06写道：

> +1(non-binding)
>
> - Downloaded artifacts from dist[1]
> - Verified SHA512 checksums
> - Verified GPG signatures
> - Build the source with java 8 and 11
>
> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc3/
>
> Bests,
> Samrat
>
> On Tue, Oct 24, 2023 at 10:44 AM Jingsong Li 
> wrote:
>
> > +1 (binding)
> >
> > - verified signatures & hash
> > - built from source code succeeded
> > - started SQL Client, used Paimon connector to write and read, the
> > result is expected
> >
> > Best,
> > Jingsong
> >
> > On Tue, Oct 24, 2023 at 12:15 PM Yuxin Tan 
> wrote:
> > >
> > > +1(non-binding)
> > >
> > > - Verified checksum
> > > - Build from source code
> > > - Verified signature
> > > - Started a local cluster and run Streaming & Batch wordcount job, the
> > > result is expected
> > > - Verified web PR
> > >
> > > Best,
> > > Yuxin
> > >
> > >
> > > Qingsheng Ren  于2023年10月24日周二 11:19写道：
> > >
> > > > +1 (binding)
> > > >
> > > > - Verified checksums and signatures
> > > > - Built from source with Java 8
> > > > - Started a standalone cluster and submitted a Flink SQL job that
> read
> > and
> > > > wrote with Kafka connector and CSV / JSON format
> > > > - Reviewed web PR and release note
> > > >
> > > > Best,
> > > > Qingsheng
> > > >
> > > > On Mon, Oct 23, 2023 at 10:40 PM Leonard Xu 
> wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > - verified signatures
> > > > > - verified hashsums
> > > > > - built from source code succeeded
> > > > > - checked all dependency artifacts are 1.18
> > > > > - started SQL Client, used MySQL CDC connector to read changelog
> from
> > > > > database , the result is expected
> > > > > - reviewed the web PR, left minor comments
> > > > > - reviewed the release notes PR, left minor comments
> > > > >
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > > > 2023年10月21日 下午7:28，Rui Fan <1996fan...@gmail.com> 写道：
> > > > > >
> > > > > > +1(non-binding)
> > > > > >
> > > > > > - Downloaded artifacts from dist[1]
> > > > > > - Verified SHA512 checksums
> > > > > > - Verified GPG signatures
> > > > > > - Build the source with java-1.8 and verified the licenses
> together
> > > > > > - Verified web PR
> > > > > >
> > > > > > [1]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc3/
> > > > > >
> > > > > > Best,
> > > > > > Rui
> > > > > >
> > > > > > On Fri, Oct 20, 2023 at 10:31 PM Martijn Visser <
> > > > > martijnvis...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >> - Validated hashes
> > > > > >> - Verified signature
> > > > > >> - Verified that no binaries exist in the source archive
> > > > > >> - Build the source with Maven
> > > > > >> - Verified licenses
> > > > > >> - Verified web PR
> > > > > >> - Started a cluster and the Flink SQL client, successfully read
> > and
> > > > > >> wrote with the Kafka connector to Confluent Cloud with AVRO and
> > Schema
> > > > > >> Registry enabled
> > > > > >>
> > > > > >> On Fri, Oct 20, 2023 at 2:55 PM Matthias Pohl
> > > > > >>  wrote:
> > > > > >>>
> > > > > >>> +1 (binding)
> > > > > >>>
> > > > > >>> * Downloaded artifacts
> > > > > >>> * Built Flink from sources
> > > > > >>> * Verified SHA512 checksums GPG signatures
> > > > > >>> * Compared checkout with provided sources
> > > > > >>> * Verified pom file versions
> > > > > >>> * Verified that there are no pom/NOTICE file changes since RC1
> > > > > >>> * Deployed standalone session cluster and ran WordCount example
> > in
> > > > > batch
> > > > > >>> and streaming: Nothing suspicious in log files found
> > > > > >>>
> > > > > >>> On Thu, Oct 19, 2023 at 3:00 PM Piotr Nowojski <
> > pnowoj...@apache.org
> > > > >
> > > > > >> wrote:
> > > > > >>>
> > > > >  +1 (binding)
> > > > > 
> > > > >  Best,
> > > > >  Piotrek
> > > > > 
> > > > >  czw., 19 paź 2023 o 09:55 Yun Tang 
> > napisał(a):
> > > > > 
> > > > > > +1 (non-binding)
> > > > > >
> > > > > >
> > > > > >  *   Build from source code
> > > > > >  *   Verify the pre-built jar packages were built with JDK8
> > > > > >  *   Verify FLIP-291 with a standalone cluster, and it works
> > fine
> > > > > >> with
> > > > > > StateMachine example.
> > > > > >  *   Checked the signature
> > > > > >  *   Viewed the PRs.
> > > > > >
> > > > > > Best
> > > > > > Yun Tang
> > > > > > 
> > > > > > From: Cheng Pan 
> > > > > > Sent: Thursday, October 19, 2023 14:38
> > > > > > To: dev@flink.apache.org 
> > > > > > Subject: RE: [VOTE] Release 1.18.0, release candidate #3
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > >

[jira] [Created] (FLINK-33346) DispatcherResourceCleanupTest.testFatalErrorIfJobCannotBeMarkedDirtyInJobResultStore fails on AZP

2023-10-24 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-33346:
---

 Summary: 
DispatcherResourceCleanupTest.testFatalErrorIfJobCannotBeMarkedDirtyInJobResultStore
 fails on AZP
 Key: FLINK-33346
 URL: https://issues.apache.org/jira/browse/FLINK-33346
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.19.0
Reporter: Sergey Nuyanzin


This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53905&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=6800

failed with 
{noformat}
Oct 22 00:59:32 Caused by: java.io.IOException: Expected IOException.
Oct 22 00:59:32 at 
org.apache.flink.runtime.dispatcher.DispatcherResourceCleanupTest.lambda$testFatalErrorIfJobCannotBeMarkedDirtyInJobResultStore$6(DispatcherResourceCleanupTest.java:558)
Oct 22 00:59:32 at 
org.apache.flink.runtime.testutils.TestingJobResultStore.createDirtyResultAsync(TestingJobResultStore.java:81)
Oct 22 00:59:32 at 
org.apache.flink.runtime.dispatcher.Dispatcher.createDirtyJobResultEntryAsync(Dispatcher.java:1441)
Oct 22 00:59:32 at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createDirtyJobResultEntryIfMissingAsync$45(Dispatcher.java:1422)
Oct 22 00:59:32 at 
java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:995)
Oct 22 00:59:32 ... 39 more

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re:Re: [DISCUSS] Support configuration to disable filter pushdown for Table/SQL Sources

2023-10-24 Thread Xuyang

+1. How to set the configuration value so that the specific source can be 
perceived needs to be considered.




--

Best！
Xuyang





At 2023-10-24 15:05:03, "Jiabao Sun"  wrote:
>Thanks Xuyang,
>
>If we only add configuration without adding the enableFilterPushDown method in 
>the SupportsFilterPushDown interface, 
>each connector would have to handle the same logic in the applyFilters method 
>to determine whether filter pushdown is needed. 
>This would increase complexity and violate the original behavior of the 
>applyFilters method. 
>
>On the contrary, we only need to pass the configuration parameter in the newly 
>added enableFilterPushDown method 
>to decide whether to perform predicate pushdown. 
>
>I think this approach would be clearer and simpler.
>
>Best,
>Jiabao
>
>
>> 2023年10月24日 14:34，Jiabao Sun  写道：
>> 
>> Thanks Xuyang,
>> 
>> The table.optimizer.source.predicate-pushdown-enabled options do not provide 
>> fine-grained configuration for each source.
>> 
>> Suppose we have an SQL query with two sources: Kafka and a database (CDC). 
>> The database is sensitive to pressure, and we want to configure it to not 
>> perform filter pushdown to the database source. 
>> However, we still want to perform filter pushdown to the Kafka source to 
>> decrease network IO.
>> 
>> 
>> Best,
>> Jiabao 
>> 
>> 
>>> 2023年10月24日 14:24，Xuyang  写道：
>>> 
>>> Hi, the existant configuration 
>>> 'table.optimizer.source.predicate-pushdown-enabled' seems to do what you 
>>> want. 
>>> Can you describe more clearly the difference between what you want and this 
>>> configuration ?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> 
>>>   Best！
>>>   Xuyang
>>> 
>>> 
>>> 
>>> 
>>> 
>>> At 2023-10-24 14:12:14, "Jiabao Sun"  
>>> wrote:
 Hi Devs,
 
 I would like to start a discussion on support configuration to disable 
 filter pushdown for Table/SQL Sources[1].
 
 Currently, Flink SQL does not support the ability for users to enable or 
 disable filter pushdown. 
 However, filter pushdown has some side effects, such as additional 
 computational pressure on external systems. 
 Moreover, Improper queries can lead to issues such as full table scans, 
 which in turn can impact the stability of external systems.
 
 I propose to support configuration to disable filter push down for 
 Table/SQL sources to let user decide whether to perform filter pushdown.
 
 Looking forward to your feedback.
 
 [1] 
 https://docs.google.com/document/d/1QsbOi9InvmfwFr8YbrnnXOKLPnb8JnqhXIMbGd68SFU/edit?usp=sharing
 
 Best,
 Jiabao
>>

Re: [VOTE] Release 1.18.0, release candidate #3

2023-10-24 Thread Sergey Nuyanzin

+1 (non-binding)

 - Downloaded artifacts
 - Built Flink from sources
 - Verified SHA512 checksums GPG signatures
 - Compared checkout with provided sources
 - Verified pom/NOTICE
 - Deployed standalone session cluster and ran WordCount example in batch
and streaming

On Tue, Oct 24, 2023 at 9:09 AM Hang Ruan  wrote:

> +1(non-binding)
>
> - verified signatures & hash
> - build from the source code succeed with jdk 8
> - Reviewed release note
> - Started a standalone cluster and submitted a Flink SQL job that read and
> wrote with Kafka connector and JSON format
>
> Best,
> Hang
>
> Samrat Deb  于2023年10月24日周二 14:06写道：
>
> > +1(non-binding)
> >
> > - Downloaded artifacts from dist[1]
> > - Verified SHA512 checksums
> > - Verified GPG signatures
> > - Build the source with java 8 and 11
> >
> > [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc3/
> >
> > Bests,
> > Samrat
> >
> > On Tue, Oct 24, 2023 at 10:44 AM Jingsong Li 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > - verified signatures & hash
> > > - built from source code succeeded
> > > - started SQL Client, used Paimon connector to write and read, the
> > > result is expected
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, Oct 24, 2023 at 12:15 PM Yuxin Tan 
> > wrote:
> > > >
> > > > +1(non-binding)
> > > >
> > > > - Verified checksum
> > > > - Build from source code
> > > > - Verified signature
> > > > - Started a local cluster and run Streaming & Batch wordcount job,
> the
> > > > result is expected
> > > > - Verified web PR
> > > >
> > > > Best,
> > > > Yuxin
> > > >
> > > >
> > > > Qingsheng Ren  于2023年10月24日周二 11:19写道：
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > - Verified checksums and signatures
> > > > > - Built from source with Java 8
> > > > > - Started a standalone cluster and submitted a Flink SQL job that
> > read
> > > and
> > > > > wrote with Kafka connector and CSV / JSON format
> > > > > - Reviewed web PR and release note
> > > > >
> > > > > Best,
> > > > > Qingsheng
> > > > >
> > > > > On Mon, Oct 23, 2023 at 10:40 PM Leonard Xu 
> > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > - verified signatures
> > > > > > - verified hashsums
> > > > > > - built from source code succeeded
> > > > > > - checked all dependency artifacts are 1.18
> > > > > > - started SQL Client, used MySQL CDC connector to read changelog
> > from
> > > > > > database , the result is expected
> > > > > > - reviewed the web PR, left minor comments
> > > > > > - reviewed the release notes PR, left minor comments
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Leonard
> > > > > >
> > > > > > > 2023年10月21日 下午7:28，Rui Fan <1996fan...@gmail.com> 写道：
> > > > > > >
> > > > > > > +1(non-binding)
> > > > > > >
> > > > > > > - Downloaded artifacts from dist[1]
> > > > > > > - Verified SHA512 checksums
> > > > > > > - Verified GPG signatures
> > > > > > > - Build the source with java-1.8 and verified the licenses
> > together
> > > > > > > - Verified web PR
> > > > > > >
> > > > > > > [1]
> > https://dist.apache.org/repos/dist/dev/flink/flink-1.18.0-rc3/
> > > > > > >
> > > > > > > Best,
> > > > > > > Rui
> > > > > > >
> > > > > > > On Fri, Oct 20, 2023 at 10:31 PM Martijn Visser <
> > > > > > martijnvis...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> +1 (binding)
> > > > > > >>
> > > > > > >> - Validated hashes
> > > > > > >> - Verified signature
> > > > > > >> - Verified that no binaries exist in the source archive
> > > > > > >> - Build the source with Maven
> > > > > > >> - Verified licenses
> > > > > > >> - Verified web PR
> > > > > > >> - Started a cluster and the Flink SQL client, successfully
> read
> > > and
> > > > > > >> wrote with the Kafka connector to Confluent Cloud with AVRO
> and
> > > Schema
> > > > > > >> Registry enabled
> > > > > > >>
> > > > > > >> On Fri, Oct 20, 2023 at 2:55 PM Matthias Pohl
> > > > > > >>  wrote:
> > > > > > >>>
> > > > > > >>> +1 (binding)
> > > > > > >>>
> > > > > > >>> * Downloaded artifacts
> > > > > > >>> * Built Flink from sources
> > > > > > >>> * Verified SHA512 checksums GPG signatures
> > > > > > >>> * Compared checkout with provided sources
> > > > > > >>> * Verified pom file versions
> > > > > > >>> * Verified that there are no pom/NOTICE file changes since
> RC1
> > > > > > >>> * Deployed standalone session cluster and ran WordCount
> example
> > > in
> > > > > > batch
> > > > > > >>> and streaming: Nothing suspicious in log files found
> > > > > > >>>
> > > > > > >>> On Thu, Oct 19, 2023 at 3:00 PM Piotr Nowojski <
> > > pnowoj...@apache.org
> > > > > >
> > > > > > >> wrote:
> > > > > > >>>
> > > > > >  +1 (binding)
> > > > > > 
> > > > > >  Best,
> > > > > >  Piotrek
> > > > > > 
> > > > > >  czw., 19 paź 2023 o 09:55 Yun Tang 
> > > napisał(a):
> > > > > > 
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > >
> > > > > > >  *   Build from source code
> > > > > > >  *   Verify the pre-built jar p

Re: Re: [DISCUSS] Support configuration to disable filter pushdown for Table/SQL Sources

2023-10-24 Thread Martijn Visser

Hi,

Please convert the Google Doc into a FLIP and start a FLIP discussion.

Best regards,

Martijn

On Tue, Oct 24, 2023 at 9:20 AM Xuyang  wrote:
>
> +1. How to set the configuration value so that the specific source can be 
> perceived needs to be considered.
>
>
>
>
> --
>
> Best！
> Xuyang
>
>
>
>
>
> At 2023-10-24 15:05:03, "Jiabao Sun"  wrote:
> >Thanks Xuyang,
> >
> >If we only add configuration without adding the enableFilterPushDown method 
> >in the SupportsFilterPushDown interface,
> >each connector would have to handle the same logic in the applyFilters 
> >method to determine whether filter pushdown is needed.
> >This would increase complexity and violate the original behavior of the 
> >applyFilters method.
> >
> >On the contrary, we only need to pass the configuration parameter in the 
> >newly added enableFilterPushDown method
> >to decide whether to perform predicate pushdown.
> >
> >I think this approach would be clearer and simpler.
> >
> >Best,
> >Jiabao
> >
> >
> >> 2023年10月24日 14:34，Jiabao Sun  写道：
> >>
> >> Thanks Xuyang,
> >>
> >> The table.optimizer.source.predicate-pushdown-enabled options do not 
> >> provide fine-grained configuration for each source.
> >>
> >> Suppose we have an SQL query with two sources: Kafka and a database (CDC).
> >> The database is sensitive to pressure, and we want to configure it to not 
> >> perform filter pushdown to the database source.
> >> However, we still want to perform filter pushdown to the Kafka source to 
> >> decrease network IO.
> >>
> >>
> >> Best,
> >> Jiabao
> >> 
> >>
> >>> 2023年10月24日 14:24，Xuyang  写道：
> >>>
> >>> Hi, the existant configuration 
> >>> 'table.optimizer.source.predicate-pushdown-enabled' seems to do what you 
> >>> want.
> >>> Can you describe more clearly the difference between what you want and 
> >>> this configuration ?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>>   Best！
> >>>   Xuyang
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> At 2023-10-24 14:12:14, "Jiabao Sun"  
> >>> wrote:
>  Hi Devs,
> 
>  I would like to start a discussion on support configuration to disable 
>  filter pushdown for Table/SQL Sources[1].
> 
>  Currently, Flink SQL does not support the ability for users to enable or 
>  disable filter pushdown.
>  However, filter pushdown has some side effects, such as additional 
>  computational pressure on external systems.
>  Moreover, Improper queries can lead to issues such as full table scans, 
>  which in turn can impact the stability of external systems.
> 
>  I propose to support configuration to disable filter push down for 
>  Table/SQL sources to let user decide whether to perform filter pushdown.
> 
>  Looking forward to your feedback.
> 
>  [1] 
>  https://docs.google.com/document/d/1QsbOi9InvmfwFr8YbrnnXOKLPnb8JnqhXIMbGd68SFU/edit?usp=sharing
> 
>  Best,
>  Jiabao
> >>

Re: Apply for FLIP Wiki Edit Permission

2023-10-24 Thread Martijn Visser

Hi Dan Zou,

I've updated the permission for you, you should now be able to add/edit FLIPs.

Best regards,

Martijn

On Mon, Oct 23, 2023 at 11:55 AM Dan Zou  wrote:
>
> Hi ,
> I want to apply for FLIP Wiki Edit Permission, I am working on [FLINK-33267] 
> https://issues.apache.org/jira/browse/FLINK-33267 , and I would like to 
> create a FLIP for it.
> My Jira ID is Dan Zou (zou...@apache.org).
>
> Best,
> Dan Zou
>
>
>
>
>

[RESULT][VOTE] Release 1.18.0, release candidate #3

2023-10-24 Thread Jing Ge

Hi everyone,

I'm pleased to announce that we have unanimously approved this release
candidate:

There are 14 approving votes, 6 of which are binding:

- Cheng Pan (non-binding)
- Yun Tang (non-binding)
- Piotr Nowojski (binding)
- Matthias Pohl (binding)
- Martijn Visser (binding)
- Rui Fan (non-binding)
- Leonard Xu (binding)
- Qingsheng Ren(binding)
- Yuxin Tan (non-binding)
- Jingsong Li (binding)
- Samrat Deb (non-binding)
- Jiabao Sun (non-binding)
- Hang Ruan (non-binding)
- Sergey Nuyanzin (non-binding)


There are no disapproving votes.

Thank you for verifying the release candidate. We will now proceed
to finalize the release and announce it once everything is published.

Best regards,
Sergey, Qingsheng, Konstantin, and Jing

[DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Hi Devs,

I would like to start a discussion on FLIP-377: support configuration to 
disable filter pushdown for Table/SQL Sources[1].

Currently, Flink Table/SQL does not expose fine-grained control for users to 
enable or disable filter pushdown.
However, filter pushdown has some side effects, such as additional 
computational pressure on external systems. 
Moreover, Improper queries can lead to issues such as full table scans, which 
in turn can impact the stability of external systems.

Suppose we have an SQL query with two sources: Kafka and a database. 
The database is sensitive to pressure, and we want to configure it to not 
perform filter pushdown to the database source.
However, we still want to perform filter pushdown to the Kafka source to 
decrease network IO.

I propose to support configuration to disable filter push down for Table/SQL 
sources to let user decide whether to perform filter pushdown.

Looking forward to your feedback.

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768

Best,
Jiabao

Re: [DISCUSS] Support configuration to disable filter pushdown for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Martijn and Xuyang,

I opened a new discussion thread about FLIP-377.
https://lists.apache.org/thread/nvxx8sp9jm009yywm075hoffr632tm7j

Best,
Jiabao


> 2023年10月24日 15:39，Martijn Visser  写道：
> 
> Hi,
> 
> Please convert the Google Doc into a FLIP and start a FLIP discussion.
> 
> Best regards,
> 
> Martijn
> 
> On Tue, Oct 24, 2023 at 9:20 AM Xuyang  wrote:
>> 
>> +1. How to set the configuration value so that the specific source can be 
>> perceived needs to be considered.
>> 
>> 
>> 
>> 
>> --
>> 
>>Best！
>>Xuyang
>> 
>> 
>> 
>> 
>> 
>> At 2023-10-24 15:05:03, "Jiabao Sun"  wrote:
>>> Thanks Xuyang,
>>> 
>>> If we only add configuration without adding the enableFilterPushDown method 
>>> in the SupportsFilterPushDown interface,
>>> each connector would have to handle the same logic in the applyFilters 
>>> method to determine whether filter pushdown is needed.
>>> This would increase complexity and violate the original behavior of the 
>>> applyFilters method.
>>> 
>>> On the contrary, we only need to pass the configuration parameter in the 
>>> newly added enableFilterPushDown method
>>> to decide whether to perform predicate pushdown.
>>> 
>>> I think this approach would be clearer and simpler.
>>> 
>>> Best,
>>> Jiabao
>>> 
>>> 
 2023年10月24日 14:34，Jiabao Sun  写道：
 
 Thanks Xuyang,
 
 The table.optimizer.source.predicate-pushdown-enabled options do not 
 provide fine-grained configuration for each source.
 
 Suppose we have an SQL query with two sources: Kafka and a database (CDC).
 The database is sensitive to pressure, and we want to configure it to not 
 perform filter pushdown to the database source.
 However, we still want to perform filter pushdown to the Kafka source to 
 decrease network IO.
 
 
 Best,
 Jiabao
 
 
> 2023年10月24日 14:24，Xuyang  写道：
> 
> Hi, the existant configuration 
> 'table.optimizer.source.predicate-pushdown-enabled' seems to do what you 
> want.
> Can you describe more clearly the difference between what you want and 
> this configuration ?
> 
> 
> 
> 
> 
> 
> 
> --
> 
>  Best！
>  Xuyang
> 
> 
> 
> 
> 
> At 2023-10-24 14:12:14, "Jiabao Sun"  
> wrote:
>> Hi Devs,
>> 
>> I would like to start a discussion on support configuration to disable 
>> filter pushdown for Table/SQL Sources[1].
>> 
>> Currently, Flink SQL does not support the ability for users to enable or 
>> disable filter pushdown.
>> However, filter pushdown has some side effects, such as additional 
>> computational pressure on external systems.
>> Moreover, Improper queries can lead to issues such as full table scans, 
>> which in turn can impact the stability of external systems.
>> 
>> I propose to support configuration to disable filter push down for 
>> Table/SQL sources to let user decide whether to perform filter pushdown.
>> 
>> Looking forward to your feedback.
>> 
>> [1] 
>> https://docs.google.com/document/d/1QsbOi9InvmfwFr8YbrnnXOKLPnb8JnqhXIMbGd68SFU/edit?usp=sharing
>> 
>> Best,
>> Jiabao

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Leonard Xu

Thanks @Jiabao for kicking off this discussion.

Could you add a section to explain the difference between proposed connector 
level config `scan.filter-push-down.enabled` and existing query level config 
`table.optimizer.source.predicate-pushdown-enabled` ?

Best,
Leonard

> 2023年10月24日 下午4:18，Jiabao Sun  写道：
> 
> Hi Devs,
> 
> I would like to start a discussion on FLIP-377: support configuration to 
> disable filter pushdown for Table/SQL Sources[1].
> 
> Currently, Flink Table/SQL does not expose fine-grained control for users to 
> enable or disable filter pushdown.
> However, filter pushdown has some side effects, such as additional 
> computational pressure on external systems. 
> Moreover, Improper queries can lead to issues such as full table scans, which 
> in turn can impact the stability of external systems.
> 
> Suppose we have an SQL query with two sources: Kafka and a database. 
> The database is sensitive to pressure, and we want to configure it to not 
> perform filter pushdown to the database source.
> However, we still want to perform filter pushdown to the Kafka source to 
> decrease network IO.
> 
> I propose to support configuration to disable filter push down for Table/SQL 
> sources to let user decide whether to perform filter pushdown.
> 
> Looking forward to your feedback.
> 
> [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> 
> Best,
> Jiabao

RE: FLIP-233

2023-10-24 Thread David Radley

Thanks Leonard,
Hopefully this will be reopened, as we would very much like this capability and 
want to take over the FLIP, continue the discussion to get a consensus, then 
implement,
   Kind regards, David

From: Leonard Xu 
Date: Tuesday, 24 October 2023 at 02:56
To: dev 
Cc: jd...@amazon.com 
Subject: [EXTERNAL] Re: FLIP-233
+1 to reopen the FLIP, the FLIP  has been stalled for more than a year due to 
the author's time slot.

Glad to see the developers from IBM would like to take over the FLIP, we can 
continue the discussion in FLIP-233 discussion thread [1]

Best,
Leonard

[1] https://lists.apache.org/thread/cd60ln4pjgml7sv4kh23o1fohcfwvjcz

> 2023年10月24日 上午12:41，David Radley  写道：
>
> Hi,
> I notice 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-233%3A+Introduce+HTTP+Connector
>   has been abandoned , due to lack of capacity. I work for IBM and my team is 
> interested in helping to get this connector contributed into Flink. Can we 
> open this Flip again and we can look to get agreement in the discussion 
> thread please,
>
> Kind regards, David.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Leonard,

I have added sub-sections under the “Motivation" chapter to describe the 
differences about them.

Best,
Jiabao


> 2023年10月24日 16:38，Leonard Xu  写道：
> 
> Thanks @Jiabao for kicking off this discussion.
> 
> Could you add a section to explain the difference between proposed connector 
> level config `scan.filter-push-down.enabled` and existing query level config 
> `table.optimizer.source.predicate-pushdown-enabled` ?
> 
> Best,
> Leonard
> 
>> 2023年10月24日 下午4:18，Jiabao Sun  写道：
>> 
>> Hi Devs,
>> 
>> I would like to start a discussion on FLIP-377: support configuration to 
>> disable filter pushdown for Table/SQL Sources[1].
>> 
>> Currently, Flink Table/SQL does not expose fine-grained control for users to 
>> enable or disable filter pushdown.
>> However, filter pushdown has some side effects, such as additional 
>> computational pressure on external systems. 
>> Moreover, Improper queries can lead to issues such as full table scans, 
>> which in turn can impact the stability of external systems.
>> 
>> Suppose we have an SQL query with two sources: Kafka and a database. 
>> The database is sensitive to pressure, and we want to configure it to not 
>> perform filter pushdown to the database source.
>> However, we still want to perform filter pushdown to the Kafka source to 
>> decrease network IO.
>> 
>> I propose to support configuration to disable filter push down for Table/SQL 
>> sources to let user decide whether to perform filter pushdown.
>> 
>> Looking forward to your feedback.
>> 
>> [1] 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>> 
>> Best,
>> Jiabao
>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jark Wu

Hi JIabao,

I think the current interface can already satisfy your requirements.
The connector can reject all the filters by returning the input filters
as `Result#remainingFilters`.

So maybe we don't need to introduce a new method to disable
pushdown, but just introduce an option for the specific connector.

Best,
Jark

On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:

> Thanks @Jiabao for kicking off this discussion.
>
> Could you add a section to explain the difference between proposed
> connector level config `scan.filter-push-down.enabled` and existing query
> level config `table.optimizer.source.predicate-pushdown-enabled` ?
>
> Best,
> Leonard
>
> > 2023年10月24日 下午4:18，Jiabao Sun  写道：
> >
> > Hi Devs,
> >
> > I would like to start a discussion on FLIP-377: support configuration to
> disable filter pushdown for Table/SQL Sources[1].
> >
> > Currently, Flink Table/SQL does not expose fine-grained control for
> users to enable or disable filter pushdown.
> > However, filter pushdown has some side effects, such as additional
> computational pressure on external systems.
> > Moreover, Improper queries can lead to issues such as full table scans,
> which in turn can impact the stability of external systems.
> >
> > Suppose we have an SQL query with two sources: Kafka and a database.
> > The database is sensitive to pressure, and we want to configure it to
> not perform filter pushdown to the database source.
> > However, we still want to perform filter pushdown to the Kafka source to
> decrease network IO.
> >
> > I propose to support configuration to disable filter push down for
> Table/SQL sources to let user decide whether to perform filter pushdown.
> >
> > Looking forward to your feedback.
> >
> > [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >
> > Best,
> > Jiabao
>
>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Jark,

If we only add configuration without adding the enableFilterPushDown method in 
the SupportsFilterPushDown interface,
each connector would have to handle the same logic in the applyFilters method 
to determine whether filter pushdown is needed.
This would increase complexity and violate the original behavior of the 
applyFilters method.

On the contrary, we only need to pass the configuration parameter in the newly 
added enableFilterPushDown method
to decide whether to perform predicate pushdown.

I think this approach would be clearer and simpler.
WDYT?

Best,
Jiabao


> 2023年10月24日 16:58，Jark Wu  写道：
> 
> Hi JIabao,
> 
> I think the current interface can already satisfy your requirements.
> The connector can reject all the filters by returning the input filters
> as `Result#remainingFilters`.
> 
> So maybe we don't need to introduce a new method to disable
> pushdown, but just introduce an option for the specific connector.
> 
> Best,
> Jark
> 
> On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> 
>> Thanks @Jiabao for kicking off this discussion.
>> 
>> Could you add a section to explain the difference between proposed
>> connector level config `scan.filter-push-down.enabled` and existing query
>> level config `table.optimizer.source.predicate-pushdown-enabled` ?
>> 
>> Best,
>> Leonard
>> 
>>> 2023年10月24日 下午4:18，Jiabao Sun  写道：
>>> 
>>> Hi Devs,
>>> 
>>> I would like to start a discussion on FLIP-377: support configuration to
>> disable filter pushdown for Table/SQL Sources[1].
>>> 
>>> Currently, Flink Table/SQL does not expose fine-grained control for
>> users to enable or disable filter pushdown.
>>> However, filter pushdown has some side effects, such as additional
>> computational pressure on external systems.
>>> Moreover, Improper queries can lead to issues such as full table scans,
>> which in turn can impact the stability of external systems.
>>> 
>>> Suppose we have an SQL query with two sources: Kafka and a database.
>>> The database is sensitive to pressure, and we want to configure it to
>> not perform filter pushdown to the database source.
>>> However, we still want to perform filter pushdown to the Kafka source to
>> decrease network IO.
>>> 
>>> I propose to support configuration to disable filter push down for
>> Table/SQL sources to let user decide whether to perform filter pushdown.
>>> 
>>> Looking forward to your feedback.
>>> 
>>> [1]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>>> 
>>> Best,
>>> Jiabao
>> 
>>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jane Chan

Hi Jiabao,

Thanks for driving this discussion. I have a small question that will
"scan.filter-push-down.enabled" take precedence over
"table.optimizer.source.predicate" when the two parameters might conflict
each other?

Best,
Jane

On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun 
wrote:

> Thanks Jark,
>
> If we only add configuration without adding the enableFilterPushDown
> method in the SupportsFilterPushDown interface,
> each connector would have to handle the same logic in the applyFilters
> method to determine whether filter pushdown is needed.
> This would increase complexity and violate the original behavior of the
> applyFilters method.
>
> On the contrary, we only need to pass the configuration parameter in the
> newly added enableFilterPushDown method
> to decide whether to perform predicate pushdown.
>
> I think this approach would be clearer and simpler.
> WDYT?
>
> Best,
> Jiabao
>
>
> > 2023年10月24日 16:58，Jark Wu  写道：
> >
> > Hi JIabao,
> >
> > I think the current interface can already satisfy your requirements.
> > The connector can reject all the filters by returning the input filters
> > as `Result#remainingFilters`.
> >
> > So maybe we don't need to introduce a new method to disable
> > pushdown, but just introduce an option for the specific connector.
> >
> > Best,
> > Jark
> >
> > On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> >
> >> Thanks @Jiabao for kicking off this discussion.
> >>
> >> Could you add a section to explain the difference between proposed
> >> connector level config `scan.filter-push-down.enabled` and existing
> query
> >> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> >>
> >> Best,
> >> Leonard
> >>
> >>> 2023年10月24日 下午4:18，Jiabao Sun  写道：
> >>>
> >>> Hi Devs,
> >>>
> >>> I would like to start a discussion on FLIP-377: support configuration
> to
> >> disable filter pushdown for Table/SQL Sources[1].
> >>>
> >>> Currently, Flink Table/SQL does not expose fine-grained control for
> >> users to enable or disable filter pushdown.
> >>> However, filter pushdown has some side effects, such as additional
> >> computational pressure on external systems.
> >>> Moreover, Improper queries can lead to issues such as full table scans,
> >> which in turn can impact the stability of external systems.
> >>>
> >>> Suppose we have an SQL query with two sources: Kafka and a database.
> >>> The database is sensitive to pressure, and we want to configure it to
> >> not perform filter pushdown to the database source.
> >>> However, we still want to perform filter pushdown to the Kafka source
> to
> >> decrease network IO.
> >>>
> >>> I propose to support configuration to disable filter push down for
> >> Table/SQL sources to let user decide whether to perform filter pushdown.
> >>>
> >>> Looking forward to your feedback.
> >>>
> >>> [1]
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >>>
> >>> Best,
> >>> Jiabao
> >>
> >>
>
>

Re: [VOTE] FLIP-329: Add operator attribute to specify support for object-reuse

2023-10-24 Thread Jing Ge

+1 (binding)

Thanks!

Best Regards,
Jing

On Thu, Oct 19, 2023 at 4:48 AM Dong Lin  wrote:

> Thanks for the FLIP!
>
> +1 (binding)
>
> Xuannan Su 于2023年10月19日 周四10:30写道：
>
> > Hi all,
> >
> > We would like to start the vote for FLIP-329: Add operator attribute
> > to specify support for object-reuse[1]. This FLIP was discussed in
> > this thread [2].
> >
> > The vote will be open until at least Oct 22nd (at least 72 hours),
> > following the consensus voting process.
> >
> > Cheers,
> > Xuannan
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749
> > [2] https://lists.apache.org/thread/2h2r68m7bwsnvd8w1m50rktd7w6mr5n4
> >
>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Martijn Visser

Hi Jiabao,

I'm in favour of Jark's approach: while I can see the need for a
generic flag, I can also foresee the situation where users actually
want to be able to control it per connector. So why not go directly
for that approach?

Best regards,

Martijn

On Tue, Oct 24, 2023 at 11:37 AM Jane Chan  wrote:
>
> Hi Jiabao,
>
> Thanks for driving this discussion. I have a small question that will
> "scan.filter-push-down.enabled" take precedence over
> "table.optimizer.source.predicate" when the two parameters might conflict
> each other?
>
> Best,
> Jane
>
> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun 
> wrote:
>
> > Thanks Jark,
> >
> > If we only add configuration without adding the enableFilterPushDown
> > method in the SupportsFilterPushDown interface,
> > each connector would have to handle the same logic in the applyFilters
> > method to determine whether filter pushdown is needed.
> > This would increase complexity and violate the original behavior of the
> > applyFilters method.
> >
> > On the contrary, we only need to pass the configuration parameter in the
> > newly added enableFilterPushDown method
> > to decide whether to perform predicate pushdown.
> >
> > I think this approach would be clearer and simpler.
> > WDYT?
> >
> > Best,
> > Jiabao
> >
> >
> > > 2023年10月24日 16:58，Jark Wu  写道：
> > >
> > > Hi JIabao,
> > >
> > > I think the current interface can already satisfy your requirements.
> > > The connector can reject all the filters by returning the input filters
> > > as `Result#remainingFilters`.
> > >
> > > So maybe we don't need to introduce a new method to disable
> > > pushdown, but just introduce an option for the specific connector.
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> > >
> > >> Thanks @Jiabao for kicking off this discussion.
> > >>
> > >> Could you add a section to explain the difference between proposed
> > >> connector level config `scan.filter-push-down.enabled` and existing
> > query
> > >> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> > >>
> > >> Best,
> > >> Leonard
> > >>
> > >>> 2023年10月24日 下午4:18，Jiabao Sun  写道：
> > >>>
> > >>> Hi Devs,
> > >>>
> > >>> I would like to start a discussion on FLIP-377: support configuration
> > to
> > >> disable filter pushdown for Table/SQL Sources[1].
> > >>>
> > >>> Currently, Flink Table/SQL does not expose fine-grained control for
> > >> users to enable or disable filter pushdown.
> > >>> However, filter pushdown has some side effects, such as additional
> > >> computational pressure on external systems.
> > >>> Moreover, Improper queries can lead to issues such as full table scans,
> > >> which in turn can impact the stability of external systems.
> > >>>
> > >>> Suppose we have an SQL query with two sources: Kafka and a database.
> > >>> The database is sensitive to pressure, and we want to configure it to
> > >> not perform filter pushdown to the database source.
> > >>> However, we still want to perform filter pushdown to the Kafka source
> > to
> > >> decrease network IO.
> > >>>
> > >>> I propose to support configuration to disable filter push down for
> > >> Table/SQL sources to let user decide whether to perform filter pushdown.
> > >>>
> > >>> Looking forward to your feedback.
> > >>>
> > >>> [1]
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> > >>>
> > >>> Best,
> > >>> Jiabao
> > >>
> > >>
> >
> >

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Jane,

I believe that the configuration "table.optimizer.source.predicate" has a 
higher priority at the planner level than the configuration at the source level,
and it seems easy to implement now.

Best,
Jiabao


> 2023年10月24日 17:36，Jane Chan  写道：
> 
> Hi Jiabao,
> 
> Thanks for driving this discussion. I have a small question that will
> "scan.filter-push-down.enabled" take precedence over
> "table.optimizer.source.predicate" when the two parameters might conflict
> each other?
> 
> Best,
> Jane
> 
> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun 
> wrote:
> 
>> Thanks Jark,
>> 
>> If we only add configuration without adding the enableFilterPushDown
>> method in the SupportsFilterPushDown interface,
>> each connector would have to handle the same logic in the applyFilters
>> method to determine whether filter pushdown is needed.
>> This would increase complexity and violate the original behavior of the
>> applyFilters method.
>> 
>> On the contrary, we only need to pass the configuration parameter in the
>> newly added enableFilterPushDown method
>> to decide whether to perform predicate pushdown.
>> 
>> I think this approach would be clearer and simpler.
>> WDYT?
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月24日 16:58，Jark Wu  写道：
>>> 
>>> Hi JIabao,
>>> 
>>> I think the current interface can already satisfy your requirements.
>>> The connector can reject all the filters by returning the input filters
>>> as `Result#remainingFilters`.
>>> 
>>> So maybe we don't need to introduce a new method to disable
>>> pushdown, but just introduce an option for the specific connector.
>>> 
>>> Best,
>>> Jark
>>> 
>>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
>>> 
 Thanks @Jiabao for kicking off this discussion.
 
 Could you add a section to explain the difference between proposed
 connector level config `scan.filter-push-down.enabled` and existing
>> query
 level config `table.optimizer.source.predicate-pushdown-enabled` ?
 
 Best,
 Leonard
 
> 2023年10月24日 下午4:18，Jiabao Sun  写道：
> 
> Hi Devs,
> 
> I would like to start a discussion on FLIP-377: support configuration
>> to
 disable filter pushdown for Table/SQL Sources[1].
> 
> Currently, Flink Table/SQL does not expose fine-grained control for
 users to enable or disable filter pushdown.
> However, filter pushdown has some side effects, such as additional
 computational pressure on external systems.
> Moreover, Improper queries can lead to issues such as full table scans,
 which in turn can impact the stability of external systems.
> 
> Suppose we have an SQL query with two sources: Kafka and a database.
> The database is sensitive to pressure, and we want to configure it to
 not perform filter pushdown to the database source.
> However, we still want to perform filter pushdown to the Kafka source
>> to
 decrease network IO.
> 
> I propose to support configuration to disable filter push down for
 Table/SQL sources to let user decide whether to perform filter pushdown.
> 
> Looking forward to your feedback.
> 
> [1]
 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> 
> Best,
> Jiabao
 
 
>> 
>>

Re:Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Xuyang

Just like our discussion in thread 
https://lists.apache.org/thread/whh75f6rtwdyqxt47gb39j6m6m0cpphq , +1 for this 
Flip.







--

Best！
Xuyang





在 2023-10-24 18:03:36，"Jiabao Sun"  写道：
>Thanks Jane,
>
>I believe that the configuration "table.optimizer.source.predicate" has a 
>higher priority at the planner level than the configuration at the source 
>level,
>and it seems easy to implement now.
>
>Best,
>Jiabao
>
>
>> 2023年10月24日 17:36，Jane Chan  写道：
>> 
>> Hi Jiabao,
>> 
>> Thanks for driving this discussion. I have a small question that will
>> "scan.filter-push-down.enabled" take precedence over
>> "table.optimizer.source.predicate" when the two parameters might conflict
>> each other?
>> 
>> Best,
>> Jane
>> 
>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun 
>> wrote:
>> 
>>> Thanks Jark,
>>> 
>>> If we only add configuration without adding the enableFilterPushDown
>>> method in the SupportsFilterPushDown interface,
>>> each connector would have to handle the same logic in the applyFilters
>>> method to determine whether filter pushdown is needed.
>>> This would increase complexity and violate the original behavior of the
>>> applyFilters method.
>>> 
>>> On the contrary, we only need to pass the configuration parameter in the
>>> newly added enableFilterPushDown method
>>> to decide whether to perform predicate pushdown.
>>> 
>>> I think this approach would be clearer and simpler.
>>> WDYT?
>>> 
>>> Best,
>>> Jiabao
>>> 
>>> 
 2023年10月24日 16:58，Jark Wu  写道：
 
 Hi JIabao,
 
 I think the current interface can already satisfy your requirements.
 The connector can reject all the filters by returning the input filters
 as `Result#remainingFilters`.
 
 So maybe we don't need to introduce a new method to disable
 pushdown, but just introduce an option for the specific connector.
 
 Best,
 Jark
 
 On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
 
> Thanks @Jiabao for kicking off this discussion.
> 
> Could you add a section to explain the difference between proposed
> connector level config `scan.filter-push-down.enabled` and existing
>>> query
> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> 
> Best,
> Leonard
> 
>> 2023年10月24日 下午4:18，Jiabao Sun  写道：
>> 
>> Hi Devs,
>> 
>> I would like to start a discussion on FLIP-377: support configuration
>>> to
> disable filter pushdown for Table/SQL Sources[1].
>> 
>> Currently, Flink Table/SQL does not expose fine-grained control for
> users to enable or disable filter pushdown.
>> However, filter pushdown has some side effects, such as additional
> computational pressure on external systems.
>> Moreover, Improper queries can lead to issues such as full table scans,
> which in turn can impact the stability of external systems.
>> 
>> Suppose we have an SQL query with two sources: Kafka and a database.
>> The database is sensitive to pressure, and we want to configure it to
> not perform filter pushdown to the database source.
>> However, we still want to perform filter pushdown to the Kafka source
>>> to
> decrease network IO.
>> 
>> I propose to support configuration to disable filter push down for
> Table/SQL sources to let user decide whether to perform filter pushdown.
>> 
>> Looking forward to your feedback.
>> 
>> [1]
> 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>> 
>> Best,
>> Jiabao
> 
> 
>>> 
>>>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Martijn,

Indeed, implementing the logic check in the applyFilters method can fulfill the 
functionality of disabling filter pushdown. 
My concern is that the same logic check may need to be implemented in each 
source.

public Result applyFilters(List filters) {
if (supportsFilterPushDown) {
return applyFiltersInternal(filters);
} else {
return Result.of(Collections.emptyList(), filters);
}
}


If we define enough generic configurations, we can also pass these 
configurations uniformly in the abstract source superclass 
and provide a default implementation to determine whether to allow filter 
pushdown based on the options.

public abstract class FilterableDynamicTableSource
implements DynamicTableSource, SupportsFilterPushDown {

private Configuration sourceConfig;

@Override
public boolean enableFilterPushDown() {
return sourceConfig.get(ENABLE_FILTER_PUSH_DOWN);
}
}


Best,
Jiabao


> 2023年10月24日 17:59，Martijn Visser  写道：
> 
> Hi Jiabao,
> 
> I'm in favour of Jark's approach: while I can see the need for a
> generic flag, I can also foresee the situation where users actually
> want to be able to control it per connector. So why not go directly
> for that approach?
> 
> Best regards,
> 
> Martijn
> 
> On Tue, Oct 24, 2023 at 11:37 AM Jane Chan  wrote:
>> 
>> Hi Jiabao,
>> 
>> Thanks for driving this discussion. I have a small question that will
>> "scan.filter-push-down.enabled" take precedence over
>> "table.optimizer.source.predicate" when the two parameters might conflict
>> each other?
>> 
>> Best,
>> Jane
>> 
>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun 
>> wrote:
>> 
>>> Thanks Jark,
>>> 
>>> If we only add configuration without adding the enableFilterPushDown
>>> method in the SupportsFilterPushDown interface,
>>> each connector would have to handle the same logic in the applyFilters
>>> method to determine whether filter pushdown is needed.
>>> This would increase complexity and violate the original behavior of the
>>> applyFilters method.
>>> 
>>> On the contrary, we only need to pass the configuration parameter in the
>>> newly added enableFilterPushDown method
>>> to decide whether to perform predicate pushdown.
>>> 
>>> I think this approach would be clearer and simpler.
>>> WDYT?
>>> 
>>> Best,
>>> Jiabao
>>> 
>>> 
 2023年10月24日 16:58，Jark Wu  写道：
 
 Hi JIabao,
 
 I think the current interface can already satisfy your requirements.
 The connector can reject all the filters by returning the input filters
 as `Result#remainingFilters`.
 
 So maybe we don't need to introduce a new method to disable
 pushdown, but just introduce an option for the specific connector.
 
 Best,
 Jark
 
 On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
 
> Thanks @Jiabao for kicking off this discussion.
> 
> Could you add a section to explain the difference between proposed
> connector level config `scan.filter-push-down.enabled` and existing
>>> query
> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> 
> Best,
> Leonard
> 
>> 2023年10月24日 下午4:18，Jiabao Sun  写道：
>> 
>> Hi Devs,
>> 
>> I would like to start a discussion on FLIP-377: support configuration
>>> to
> disable filter pushdown for Table/SQL Sources[1].
>> 
>> Currently, Flink Table/SQL does not expose fine-grained control for
> users to enable or disable filter pushdown.
>> However, filter pushdown has some side effects, such as additional
> computational pressure on external systems.
>> Moreover, Improper queries can lead to issues such as full table scans,
> which in turn can impact the stability of external systems.
>> 
>> Suppose we have an SQL query with two sources: Kafka and a database.
>> The database is sensitive to pressure, and we want to configure it to
> not perform filter pushdown to the database source.
>> However, we still want to perform filter pushdown to the Kafka source
>>> to
> decrease network IO.
>> 
>> I propose to support configuration to disable filter push down for
> Table/SQL sources to let user decide whether to perform filter pushdown.
>> 
>> Looking forward to your feedback.
>> 
>> [1]
> 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>> 
>> Best,
>> Jiabao
> 
> 
>>>

Re: FLIP-233

2023-10-24 Thread Jing Ge

Hi David,

Thanks for picking this up. I was wondering if you have any concrete plan
like upgrading the FLIP or directly starting a new discussion with the
current FLIP as it is? Looking forward to having this connector.

Best regards,
Jing

On Tue, Oct 24, 2023 at 10:55 AM David Radley 
wrote:

> Thanks Leonard,
> Hopefully this will be reopened, as we would very much like this
> capability and want to take over the FLIP, continue the discussion to get a
> consensus, then implement,
>Kind regards, David
>
> From: Leonard Xu 
> Date: Tuesday, 24 October 2023 at 02:56
> To: dev 
> Cc: jd...@amazon.com 
> Subject: [EXTERNAL] Re: FLIP-233
> +1 to reopen the FLIP, the FLIP  has been stalled for more than a year due
> to the author's time slot.
>
> Glad to see the developers from IBM would like to take over the FLIP, we
> can continue the discussion in FLIP-233 discussion thread [1]
>
> Best,
> Leonard
>
> [1] https://lists.apache.org/thread/cd60ln4pjgml7sv4kh23o1fohcfwvjcz
>
> > 2023年10月24日 上午12:41，David Radley  写道：
> >
> > Hi,
> > I notice
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-233%3A+Introduce+HTTP+Connector
> has been abandoned , due to lack of capacity. I work for IBM and my team is
> interested in helping to get this connector contributed into Flink. Can we
> open this Flip again and we can look to get agreement in the discussion
> thread please,
> >
> > Kind regards, David.
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Re: FLIP-233

2023-10-24 Thread Martijn Visser

Hi,

There is already https://github.com/getindata/flink-http-connector -
Why do we need to create another one, instead of improving the
existing one?

Best regards,

Martijn

On Tue, Oct 24, 2023 at 12:28 PM Jing Ge  wrote:
>
> Hi David,
>
> Thanks for picking this up. I was wondering if you have any concrete plan
> like upgrading the FLIP or directly starting a new discussion with the
> current FLIP as it is? Looking forward to having this connector.
>
> Best regards,
> Jing
>
> On Tue, Oct 24, 2023 at 10:55 AM David Radley 
> wrote:
>
> > Thanks Leonard,
> > Hopefully this will be reopened, as we would very much like this
> > capability and want to take over the FLIP, continue the discussion to get a
> > consensus, then implement,
> >Kind regards, David
> >
> > From: Leonard Xu 
> > Date: Tuesday, 24 October 2023 at 02:56
> > To: dev 
> > Cc: jd...@amazon.com 
> > Subject: [EXTERNAL] Re: FLIP-233
> > +1 to reopen the FLIP, the FLIP  has been stalled for more than a year due
> > to the author's time slot.
> >
> > Glad to see the developers from IBM would like to take over the FLIP, we
> > can continue the discussion in FLIP-233 discussion thread [1]
> >
> > Best,
> > Leonard
> >
> > [1] https://lists.apache.org/thread/cd60ln4pjgml7sv4kh23o1fohcfwvjcz
> >
> > > 2023年10月24日 上午12:41，David Radley  写道：
> > >
> > > Hi,
> > > I notice
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-233%3A+Introduce+HTTP+Connector
> > has been abandoned , due to lack of capacity. I work for IBM and my team is
> > interested in helping to get this connector contributed into Flink. Can we
> > open this Flip again and we can look to get agreement in the discussion
> > thread please,
> > >
> > > Kind regards, David.
> > >
> > > Unless otherwise stated above:
> > >
> > > IBM United Kingdom Limited
> > > Registered in England and Wales with number 741598
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Hang Ruan

Hi, Jiabao.

Thanks for driving this discussion.

IMO, if there are many connectors containing the same logic, I think this
FLIP is useful.
We do not know how many connectors need to add the same code.

Best,
Hang

Jiabao Sun  于2023年10月24日周二 18:26写道：

> Thanks Martijn,
>
> Indeed, implementing the logic check in the applyFilters method can
> fulfill the functionality of disabling filter pushdown.
> My concern is that the same logic check may need to be implemented in each
> source.
>
> public Result applyFilters(List filters) {
> if (supportsFilterPushDown) {
> return applyFiltersInternal(filters);
> } else {
> return Result.of(Collections.emptyList(), filters);
> }
> }
>
>
> If we define enough generic configurations, we can also pass these
> configurations uniformly in the abstract source superclass
> and provide a default implementation to determine whether to allow filter
> pushdown based on the options.
>
> public abstract class FilterableDynamicTableSource
> implements DynamicTableSource, SupportsFilterPushDown {
>
> private Configuration sourceConfig;
>
> @Override
> public boolean enableFilterPushDown() {
> return sourceConfig.get(ENABLE_FILTER_PUSH_DOWN);
> }
> }
>
>
> Best,
> Jiabao
>
>
> > 2023年10月24日 17:59，Martijn Visser  写道：
> >
> > Hi Jiabao,
> >
> > I'm in favour of Jark's approach: while I can see the need for a
> > generic flag, I can also foresee the situation where users actually
> > want to be able to control it per connector. So why not go directly
> > for that approach?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Tue, Oct 24, 2023 at 11:37 AM Jane Chan 
> wrote:
> >>
> >> Hi Jiabao,
> >>
> >> Thanks for driving this discussion. I have a small question that will
> >> "scan.filter-push-down.enabled" take precedence over
> >> "table.optimizer.source.predicate" when the two parameters might
> conflict
> >> each other?
> >>
> >> Best,
> >> Jane
> >>
> >> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun  .invalid>
> >> wrote:
> >>
> >>> Thanks Jark,
> >>>
> >>> If we only add configuration without adding the enableFilterPushDown
> >>> method in the SupportsFilterPushDown interface,
> >>> each connector would have to handle the same logic in the applyFilters
> >>> method to determine whether filter pushdown is needed.
> >>> This would increase complexity and violate the original behavior of the
> >>> applyFilters method.
> >>>
> >>> On the contrary, we only need to pass the configuration parameter in
> the
> >>> newly added enableFilterPushDown method
> >>> to decide whether to perform predicate pushdown.
> >>>
> >>> I think this approach would be clearer and simpler.
> >>> WDYT?
> >>>
> >>> Best,
> >>> Jiabao
> >>>
> >>>
>  2023年10月24日 16:58，Jark Wu  写道：
> 
>  Hi JIabao,
> 
>  I think the current interface can already satisfy your requirements.
>  The connector can reject all the filters by returning the input
> filters
>  as `Result#remainingFilters`.
> 
>  So maybe we don't need to introduce a new method to disable
>  pushdown, but just introduce an option for the specific connector.
> 
>  Best,
>  Jark
> 
>  On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> 
> > Thanks @Jiabao for kicking off this discussion.
> >
> > Could you add a section to explain the difference between proposed
> > connector level config `scan.filter-push-down.enabled` and existing
> >>> query
> > level config `table.optimizer.source.predicate-pushdown-enabled` ?
> >
> > Best,
> > Leonard
> >
> >> 2023年10月24日 下午4:18，Jiabao Sun  写道：
> >>
> >> Hi Devs,
> >>
> >> I would like to start a discussion on FLIP-377: support
> configuration
> >>> to
> > disable filter pushdown for Table/SQL Sources[1].
> >>
> >> Currently, Flink Table/SQL does not expose fine-grained control for
> > users to enable or disable filter pushdown.
> >> However, filter pushdown has some side effects, such as additional
> > computational pressure on external systems.
> >> Moreover, Improper queries can lead to issues such as full table
> scans,
> > which in turn can impact the stability of external systems.
> >>
> >> Suppose we have an SQL query with two sources: Kafka and a database.
> >> The database is sensitive to pressure, and we want to configure it
> to
> > not perform filter pushdown to the database source.
> >> However, we still want to perform filter pushdown to the Kafka
> source
> >>> to
> > decrease network IO.
> >>
> >> I propose to support configuration to disable filter push down for
> > Table/SQL sources to let user decide whether to perform filter
> pushdown.
> >>
> >> Looking forward to your feedback.
> >>
> >> [1]
> >
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >>
> >> Best,
> >> Jiabao
> >
> >
> >>>
>
>

[jira] [Created] (FLINK-33347) CLONE - CLONE - Build Release Candidate: 1.18.0-rc3

2023-10-24 Thread Jing Ge (Jira)

Jing Ge created FLINK-33347:
---

 Summary: CLONE - CLONE - Build Release Candidate: 1.18.0-rc3
 Key: FLINK-33347
 URL: https://issues.apache.org/jira/browse/FLINK-33347
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.18.0
Reporter: Jing Ge
Assignee: Jing Ge


The core of the release process is the build-vote-fix cycle. Each cycle 
produces one release candidate. The Release Manager repeats this cycle until 
the community approves one release candidate, which is then finalized.
h4. Prerequisites

Set up a few environment variables to simplify Maven commands that follow. This 
identifies the release candidate being built. Start with {{RC_NUM}} equal to 1 
and increment it for each candidate:
{code:java}
RC_NUM="2"
TAG="release-${RELEASE_VERSION}-rc${RC_NUM}"
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33350) CLONE - CLONE - Propose a pull request for website updates

2023-10-24 Thread Jing Ge (Jira)

Jing Ge created FLINK-33350:
---

 Summary: CLONE - CLONE - Propose a pull request for website updates
 Key: FLINK-33350
 URL: https://issues.apache.org/jira/browse/FLINK-33350
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.18.0
Reporter: Jing Ge
Assignee: Jing Ge


The final step of building the candidate is to propose a website pull request 
containing the following changes:
 # update 
[apache/flink-web:_config.yml|https://github.com/apache/flink-web/blob/asf-site/_config.yml]
 ## update {{FLINK_VERSION_STABLE}} and {{FLINK_VERSION_STABLE_SHORT}} as 
required
 ## update version references in quickstarts ({{{}q/{}}} directory) as required
 ## (major only) add a new entry to {{flink_releases}} for the release binaries 
and sources
 ## (minor only) update the entry for the previous release in the series in 
{{flink_releases}}
 ### Please pay notice to the ids assigned to the download entries. They should 
be unique and reflect their corresponding version number.
 ## add a new entry to {{release_archive.flink}}
 # add a blog post announcing the release in _posts
 # add a organized release notes page under docs/content/release-notes and 
docs/content.zh/release-notes (like 
[https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/]).
 The page is based on the non-empty release notes collected from the issues, 
and only the issues that affect existing users should be included (e.g., 
instead of new functionality). It should be in a separate PR since it would be 
merged to the flink project.

(!) Don’t merge the PRs before finalizing the release.

 

h3. Expectations
 * Website pull request proposed to list the 
[release|http://flink.apache.org/downloads.html]
 * (major only) Check {{docs/config.toml}} to ensure that
 ** the version constants refer to the new version
 ** the {{baseurl}} does not point to {{flink-docs-master}}  but 
{{flink-docs-release-X.Y}} instead



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33351) CLONE - CLONE - Vote on the release candidate

2023-10-24 Thread Jing Ge (Jira)

Jing Ge created FLINK-33351:
---

 Summary: CLONE - CLONE - Vote on the release candidate
 Key: FLINK-33351
 URL: https://issues.apache.org/jira/browse/FLINK-33351
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.18.0
Reporter: Jing Ge
Assignee: Jing Ge


Once you have built and individually reviewed the release candidate, please 
share it for the community-wide review. Please review foundation-wide [voting 
guidelines|http://www.apache.org/foundation/voting.html] for more information.

Start the review-and-vote thread on the dev@ mailing list. Here’s an email 
template; please adjust as you see fit.
{quote}From: Release Manager
To: dev@flink.apache.org
Subject: [VOTE] Release 1.2.3, release candidate #3

Hi everyone,
Please review and vote on the release candidate #3 for the version 1.2.3, as 
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which are signed with the key with fingerprint 
 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "release-1.2.3-rc3" [5],
 * website pull request listing the new release and adding announcement blog 
post [6].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1] link
[2] link
[3] [https://dist.apache.org/repos/dist/release/flink/KEYS]
[4] link
[5] link
[6] link
{quote}
*If there are any issues found in the release candidate, reply on the vote 
thread to cancel the vote.* There’s no need to wait 72 hours. Proceed to the 
Fix Issues step below and address the problem. However, some issues don’t 
require cancellation. For example, if an issue is found in the website pull 
request, just correct it on the spot and the vote can continue as-is.

For cancelling a release, the release manager needs to send an email to the 
release candidate thread, stating that the release candidate is officially 
cancelled. Next, all artifacts created specifically for the RC in the previous 
steps need to be removed:
 * Delete the staging repository in Nexus
 * Remove the source / binary RC files from dist.apache.org
 * Delete the source code tag in git

*If there are no issues, reply on the vote thread to close the voting.* Then, 
tally the votes in a separate email. Here’s an email template; please adjust as 
you see fit.
{quote}From: Release Manager
To: dev@flink.apache.org
Subject: [RESULT] [VOTE] Release 1.2.3, release candidate #3

I'm happy to announce that we have unanimously approved this release.

There are XXX approving votes, XXX of which are binding:
 * approver 1
 * approver 2
 * approver 3
 * approver 4

There are no disapproving votes.

Thanks everyone!
{quote}
 

h3. Expectations
 * Community votes to release the proposed candidate, with at least three 
approving PMC votes

Any issues that are raised till the vote is over should be either resolved or 
moved into the next release (if applicable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33348) CLONE - CLONE - Build and stage Java and Python artifacts

2023-10-24 Thread Jing Ge (Jira)

Jing Ge created FLINK-33348:
---

 Summary: CLONE - CLONE - Build and stage Java and Python artifacts
 Key: FLINK-33348
 URL: https://issues.apache.org/jira/browse/FLINK-33348
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Jing Ge
 Fix For: 1.18.0


# Create a local release branch ((!) this step can not be skipped for minor 
releases):
{code:bash}
$ cd ./tools
tools/ $ OLD_VERSION=$CURRENT_SNAPSHOT_VERSION NEW_VERSION=$RELEASE_VERSION 
RELEASE_CANDIDATE=$RC_NUM releasing/create_release_branch.sh
{code}
 # Tag the release commit:
{code:bash}
$ git tag -s ${TAG} -m "${TAG}"
{code}
 # We now need to do several things:
 ## Create the source release archive
 ## Deploy jar artefacts to the [Apache Nexus 
Repository|https://repository.apache.org/], which is the staging area for 
deploying the jars to Maven Central
 ## Build PyFlink wheel packages
You might want to create a directory on your local machine for collecting the 
various source and binary releases before uploading them. Creating the binary 
releases is a lengthy process but you can do this on another machine (for 
example, in the "cloud"). When doing this, you can skip signing the release 
files on the remote machine, download them to your local machine and sign them 
there.
 # Build the source release:
{code:bash}
tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_source_release.sh
{code}
 # Stage the maven artifacts:
{code:bash}
tools $ releasing/deploy_staging_jars.sh
{code}
Review all staged artifacts ([https://repository.apache.org/]). They should 
contain all relevant parts for each module, including pom.xml, jar, test jar, 
source, test source, javadoc, etc. Carefully review any new artifacts.
 # Close the staging repository on Apache Nexus. When prompted for a 
description, enter “Apache Flink, version X, release candidate Y”.
Then, you need to build the PyFlink wheel packages (since 1.11):
 # Set up an azure pipeline in your own Azure account. You can refer to [Azure 
Pipelines|https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository]
 for more details on how to set up azure pipeline for a fork of the Flink 
repository. Note that a google cloud mirror in Europe is used for downloading 
maven artifacts, therefore it is recommended to set your [Azure organization 
region|https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/change-organization-location]
 to Europe to speed up the downloads.
 # Push the release candidate branch to your forked personal Flink repository, 
e.g.
{code:bash}
tools $ git push  
refs/heads/release-${RELEASE_VERSION}-rc${RC_NUM}:release-${RELEASE_VERSION}-rc${RC_NUM}
{code}
 # Trigger the Azure Pipelines manually to build the PyFlink wheel packages
 ## Go to your Azure Pipelines Flink project → Pipelines
 ## Click the "New pipeline" button on the top right
 ## Select "GitHub" → your GitHub Flink repository → "Existing Azure Pipelines 
YAML file"
 ## Select your branch → Set path to "/azure-pipelines.yaml" → click on 
"Continue" → click on "Variables"
 ## Then click "New Variable" button, fill the name with "MODE", and the value 
with "release". Click "OK" to set the variable and the "Save" button to save 
the variables, then back on the "Review your pipeline" screen click "Run" to 
trigger the build.
 ## You should now see a build where only the "CI build (release)" is running
 # Download the PyFlink wheel packages from the build result page after the 
jobs of "build_wheels mac" and "build_wheels linux" have finished.
 ## Download the PyFlink wheel packages
 ### Open the build result page of the pipeline
 ### Go to the {{Artifacts}} page (build_wheels linux -> 1 artifact)
 ### Click {{wheel_Darwin_build_wheels mac}} and {{wheel_Linux_build_wheels 
linux}} separately to download the zip files
 ## Unzip these two zip files
{code:bash}
$ cd /path/to/downloaded_wheel_packages
$ unzip wheel_Linux_build_wheels\ linux.zip
$ unzip wheel_Darwin_build_wheels\ mac.zip{code}
 ## Create directory {{./dist}} under the directory of {{{}flink-python{}}}:
{code:bash}
$ cd 
$ mkdir flink-python/dist{code}
 ## Move the unzipped wheel packages to the directory of 
{{{}flink-python/dist{}}}:
{code:java}
$ mv /path/to/wheel_Darwin_build_wheels\ mac/* flink-python/dist/
$ mv /path/to/wheel_Linux_build_wheels\ linux/* flink-python/dist/
$ cd tools{code}

Finally, we create the binary convenience release files:
{code:bash}
tools $ RELEASE_VERSION=$RELEASE_VERSION releasing/create_binary_release.sh
{code}
If you want to run this step in parallel on a remote machine you have to make 
the release commit available there (for example by pushing to a repository). 
*This is important: the commit inside the binary builds has to match the commit 
of the source builds and the tagged release commit.* 
When

[jira] [Created] (FLINK-33349) CLONE - CLONE - Stage source and binary releases on dist.apache.org

2023-10-24 Thread Jing Ge (Jira)

Jing Ge created FLINK-33349:
---

 Summary: CLONE - CLONE - Stage source and binary releases on 
dist.apache.org
 Key: FLINK-33349
 URL: https://issues.apache.org/jira/browse/FLINK-33349
 Project: Flink
  Issue Type: Sub-task
Reporter: Jing Ge
Assignee: Jing Ge
 Fix For: 1.18.0


Copy the source release to the dev repository of dist.apache.org:
# If you have not already, check out the Flink section of the dev repository on 
dist.apache.org via Subversion. In a fresh directory:
{code:bash}
$ svn checkout https://dist.apache.org/repos/dist/dev/flink --depth=immediates
{code}
# Make a directory for the new release and copy all the artifacts (Flink 
source/binary distributions, hashes, GPG signatures and the python 
subdirectory) into that newly created directory:
{code:bash}
$ mkdir flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
$ mv /tools/releasing/release/* 
flink/flink-${RELEASE_VERSION}-rc${RC_NUM}
{code}
# Add and commit all the files.
{code:bash}
$ cd flink
flink $ svn add flink-${RELEASE_VERSION}-rc${RC_NUM}
flink $ svn commit -m "Add flink-${RELEASE_VERSION}-rc${RC_NUM}"
{code}
# Verify that files are present under 
[https://dist.apache.org/repos/dist/dev/flink|https://dist.apache.org/repos/dist/dev/flink].
# Push the release tag if not done already (the following command assumes to be 
called from within the apache/flink checkout):
{code:bash}
$ git push  refs/tags/release-${RELEASE_VERSION}-rc${RC_NUM}
{code}

 

h3. Expectations
 * Maven artifacts deployed to the staging repository of 
[repository.apache.org|https://repository.apache.org/content/repositories/]
 * Source distribution deployed to the dev repository of 
[dist.apache.org|https://dist.apache.org/repos/dist/dev/flink/]
 * Check hashes (e.g. shasum -c *.sha512)
 * Check signatures (e.g. {{{}gpg --verify 
flink-1.2.3-source-release.tar.gz.asc flink-1.2.3-source-release.tar.gz{}}})
 * {{grep}} for legal headers in each file.
 * If time allows check the NOTICE files of the modules whose dependencies have 
been changed in this release in advance, since the license issues from time to 
time pop up during voting. See [Verifying a Flink 
Release|https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Release]
 "Checking License" section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Martijn Visser

Hi Jiabao,

I don't see that as a concern, but something that would be in general
preferred (because it gives more flexibility to users when to enable /
disable pushdown).

Best regards,

Martijn

On Tue, Oct 24, 2023 at 1:41 PM Hang Ruan  wrote:
>
> Hi, Jiabao.
>
> Thanks for driving this discussion.
>
> IMO, if there are many connectors containing the same logic, I think this
> FLIP is useful.
> We do not know how many connectors need to add the same code.
>
> Best,
> Hang
>
> Jiabao Sun  于2023年10月24日周二 18:26写道：
>
> > Thanks Martijn,
> >
> > Indeed, implementing the logic check in the applyFilters method can
> > fulfill the functionality of disabling filter pushdown.
> > My concern is that the same logic check may need to be implemented in each
> > source.
> >
> > public Result applyFilters(List filters) {
> > if (supportsFilterPushDown) {
> > return applyFiltersInternal(filters);
> > } else {
> > return Result.of(Collections.emptyList(), filters);
> > }
> > }
> >
> >
> > If we define enough generic configurations, we can also pass these
> > configurations uniformly in the abstract source superclass
> > and provide a default implementation to determine whether to allow filter
> > pushdown based on the options.
> >
> > public abstract class FilterableDynamicTableSource
> > implements DynamicTableSource, SupportsFilterPushDown {
> >
> > private Configuration sourceConfig;
> >
> > @Override
> > public boolean enableFilterPushDown() {
> > return sourceConfig.get(ENABLE_FILTER_PUSH_DOWN);
> > }
> > }
> >
> >
> > Best,
> > Jiabao
> >
> >
> > > 2023年10月24日 17:59，Martijn Visser  写道：
> > >
> > > Hi Jiabao,
> > >
> > > I'm in favour of Jark's approach: while I can see the need for a
> > > generic flag, I can also foresee the situation where users actually
> > > want to be able to control it per connector. So why not go directly
> > > for that approach?
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Tue, Oct 24, 2023 at 11:37 AM Jane Chan 
> > wrote:
> > >>
> > >> Hi Jiabao,
> > >>
> > >> Thanks for driving this discussion. I have a small question that will
> > >> "scan.filter-push-down.enabled" take precedence over
> > >> "table.optimizer.source.predicate" when the two parameters might
> > conflict
> > >> each other?
> > >>
> > >> Best,
> > >> Jane
> > >>
> > >> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun  > .invalid>
> > >> wrote:
> > >>
> > >>> Thanks Jark,
> > >>>
> > >>> If we only add configuration without adding the enableFilterPushDown
> > >>> method in the SupportsFilterPushDown interface,
> > >>> each connector would have to handle the same logic in the applyFilters
> > >>> method to determine whether filter pushdown is needed.
> > >>> This would increase complexity and violate the original behavior of the
> > >>> applyFilters method.
> > >>>
> > >>> On the contrary, we only need to pass the configuration parameter in
> > the
> > >>> newly added enableFilterPushDown method
> > >>> to decide whether to perform predicate pushdown.
> > >>>
> > >>> I think this approach would be clearer and simpler.
> > >>> WDYT?
> > >>>
> > >>> Best,
> > >>> Jiabao
> > >>>
> > >>>
> >  2023年10月24日 16:58，Jark Wu  写道：
> > 
> >  Hi JIabao,
> > 
> >  I think the current interface can already satisfy your requirements.
> >  The connector can reject all the filters by returning the input
> > filters
> >  as `Result#remainingFilters`.
> > 
> >  So maybe we don't need to introduce a new method to disable
> >  pushdown, but just introduce an option for the specific connector.
> > 
> >  Best,
> >  Jark
> > 
> >  On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> > 
> > > Thanks @Jiabao for kicking off this discussion.
> > >
> > > Could you add a section to explain the difference between proposed
> > > connector level config `scan.filter-push-down.enabled` and existing
> > >>> query
> > > level config `table.optimizer.source.predicate-pushdown-enabled` ?
> > >
> > > Best,
> > > Leonard
> > >
> > >> 2023年10月24日 下午4:18，Jiabao Sun  写道：
> > >>
> > >> Hi Devs,
> > >>
> > >> I would like to start a discussion on FLIP-377: support
> > configuration
> > >>> to
> > > disable filter pushdown for Table/SQL Sources[1].
> > >>
> > >> Currently, Flink Table/SQL does not expose fine-grained control for
> > > users to enable or disable filter pushdown.
> > >> However, filter pushdown has some side effects, such as additional
> > > computational pressure on external systems.
> > >> Moreover, Improper queries can lead to issues such as full table
> > scans,
> > > which in turn can impact the stability of external systems.
> > >>
> > >> Suppose we have an SQL query with two sources: Kafka and a database.
> > >> The database is sensitive to pressure, and we want to configure it
> > to
> > > not perform filter pushdown

Re: [DISCUSS] FLIP-360: Merging ExecutionGraphInfoStore and JobResultStore into a single component

2023-10-24 Thread Matthias Pohl

Hi Gyula,
thanks for joining the discussion. Yeah, I might have gone a bit too far
with it. I guess I was eager to create separate implementations to make use
of the two different interfaces/purposes on a class level to allow more
purpose-centric testing.

I am fine with cutting it back and having separate interfaces being
implemented by the same class as @Shammon proposed. I updated the FLIP
accordingly.

Matthias

On Mon, Oct 23, 2023 at 2:45 PM Gyula Fóra  wrote:

> I am a bit confused by the split in the CompletedJobStore /
> JobDetailsStore.
> Seems like JobDetailsStore is simply a view on top of CompletedJobStore:
>  - Maybe we should not call it a store? Is it storing anything?
>  - Why couldn't the cleanup triggering be the responsibility of the
> CompletedJobStore, wouldn't it make it simpler to have the storage/cleanup
> related logic in a simple place?
>  - Ideally the JobDetailsStore / JobDetailsProvider could be a very thin
> interface exposed by the CompletedJobStore
>
> Gyula
>
> On Sat, Sep 30, 2023 at 2:18 AM Matthias Pohl
>  wrote:
>
> > Thanks for sharing your thoughts, Shammon FY. I kind of see your point.
> >
> > Initially, I didn't put much thought into splitting up the interface into
> > two because the dispatcher would have been the only component dealing
> with
> > the two interfaces. Having two interfaces wouldn't have added much value
> > (in terms of testability and readability, I thought).
> >
> > But I iterated over the idea once more and came up with a new proposal
> that
> > involves the two components CompletedJobStore and JobDetailsStore. It's
> not
> > 100% what you suggested (because the retrieval of the ExecutionGraphInfo
> > still lives in the CompletedJobStore) but the separation makes sense in
> my
> > opinion:
> > - The CompletedJobStore deals with the big data that might require
> > accessing the disk.
> > - The JobDetailsStore handles the small-footprint data that lives in
> memory
> > all the time. Additionally, it takes care of actually deleting the
> metadata
> > of the completed job in both stores if a TTL is configured.
> >
> > See FLIP-360 [1] with the newly added class and sequence diagrams and
> > additional content. I only updated the Interfaces & Methods section (see
> > diff [2]).
> >
> > I'm looking forward to feedback.
> >
> > Best,
> > Matthias
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-360%3A+merging+the+executiongraphinfostore+and+the+jobresultstore+into+a+single+component+completedjobstore
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=263428420&selectedPageVersions=14&selectedPageVersions=13
> >
> > On Mon, Sep 18, 2023 at 1:20 AM Shammon FY  wrote:
> >
> > > Hi Matthias,
> > >
> > > Thanks for initiating this discussion, and +1 for overall from my side.
> > > It's really strange to have two different places to store completed
> jobs,
> > > this also brings about the complexity of Flink. I agree with using a
> > > unified instance to store the completed job information.
> > >
> > > In terms of ability, `ExecutionGraphInfoStore` and `JobResultStore` are
> > > different: one is mainly used for information display, and the other is
> > for
> > > failover. So after unifying storage, can we use different interfaces to
> > > meet the different requirements for jobs? Adding all these methods for
> > > different components into one interface such as `CompletedJobStore` may
> > be
> > > a little strange. What do you think?
> > >
> > > Best,
> > > Shammon FY
> > >
> > >
> > >
> > > On Fri, Sep 8, 2023 at 8:08 PM Gyula Fóra 
> wrote:
> > >
> > > > Hi Matthias!
> > > >
> > > > Thank you for the detailed proposal, overall I am in favor of making
> > this
> > > > unification to simplify the logic and make the integration for
> external
> > > > components more straightforward.
> > > > I will try to read through the proposal more carefully next week and
> > > > provide some detailed feedback.
> > > >
> > > > +1
> > > >
> > > > Thanks
> > > > Gyula
> > > >
> > > > On Fri, Sep 8, 2023 at 8:36 AM Matthias Pohl  > > > .invalid>
> > > > wrote:
> > > >
> > > > > Just a bit more elaboration on the question that we need to answer
> > > here:
> > > > Do
> > > > > we want to expose the internal ArchivedExecutionGraph data
> structure
> > > > > through JSON?
> > > > >
> > > > > - The JSON approach allows the user to have (almost) full access to
> > the
> > > > > information (that would be otherwise derived from the REST API).
> > > > Therefore,
> > > > > there's no need to spin up a cluster to access this information.
> > > > > Any information that shall be exposed through the REST API needs to
> > be
> > > > > well-defined in this JSON structure, though. Large parts of the
> > > > > ArchivedExecutionGraph data structure (essentially anything that
> > shall
> > > be
> > > > > used to populate the REST API) become public domain, though, which
> > puts
> > > > > more constraints on this data stru

Re: [DISCUSS] Removal of unused e2e tests

2023-10-24 Thread Robert Metzger

I left a comment in FLINK-5.

On Mon, Oct 23, 2023 at 5:18 AM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

> FLINK-17375 [1] removed [2] run-pre-commit-tests.sh in Flink 1.12. Since
> then the following tests are not executed anymore:
> test_state_migration.sh
> test_state_evolution.sh
> test_streaming_kinesis.sh
> test_streaming_classloader.sh
> test_streaming_distributed_cache_via_blob.sh
>
> Certain classes that were prior used for classloading and state evolution
> testing only via the aforementioned scripts are still in the project. I
> would like to understand if the removal was deliberate and if it is OK to
> do a clean up [3].
>
> [1] https://issues.apache.org/jira/browse/FLINK-17375
> [2]
>
> https://github.com/apache/flink/pull/12268/files#diff-39f0aea40d2dd3f026544bb4c2502b2e9eab4c825df5f2b68c6d4ca8c39d7b5e
> [3] https://issues.apache.org/jira/browse/FLINK-5
>
> Best,
> Alexander Fedulov
>

RE: FLIP-233

2023-10-24 Thread David Radley

Hi Martjin,
Yes I am happy to continue to improve the existing Flip.

Hi jing,
I was looking to continue the discussion and update the Flip content.

Are we OK to reopen : FLIP-233? I will update it and the discussion thread 
there.

  Kind regards, David.


From: Martijn Visser 
Date: Tuesday, 24 October 2023 at 11:56
To: dev@flink.apache.org 
Subject: [EXTERNAL] Re: FLIP-233
Hi,

There is already https://github.com/getindata/flink-http-connector  -
Why do we need to create another one, instead of improving the
existing one?

Best regards,

Martijn

On Tue, Oct 24, 2023 at 12:28 PM Jing Ge  wrote:
>
> Hi David,
>
> Thanks for picking this up. I was wondering if you have any concrete plan
> like upgrading the FLIP or directly starting a new discussion with the
> current FLIP as it is? Looking forward to having this connector.
>
> Best regards,
> Jing
>
> On Tue, Oct 24, 2023 at 10:55 AM David Radley 
> wrote:
>
> > Thanks Leonard,
> > Hopefully this will be reopened, as we would very much like this
> > capability and want to take over the FLIP, continue the discussion to get a
> > consensus, then implement,
> >Kind regards, David
> >
> > From: Leonard Xu 
> > Date: Tuesday, 24 October 2023 at 02:56
> > To: dev 
> > Cc: jd...@amazon.com 
> > Subject: [EXTERNAL] Re: FLIP-233
> > +1 to reopen the FLIP, the FLIP  has been stalled for more than a year due
> > to the author's time slot.
> >
> > Glad to see the developers from IBM would like to take over the FLIP, we
> > can continue the discussion in FLIP-233 discussion thread [1]
> >
> > Best,
> > Leonard
> >
> > [1] https://lists.apache.org/thread/cd60ln4pjgml7sv4kh23o1fohcfwvjcz
> >
> > > 2023年10月24日 上午12:41，David Radley  写道：
> > >
> > > Hi,
> > > I notice
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-233%3A+Introduce+HTTP+Connector
> > has been abandoned , due to lack of capacity. I work for IBM and my team is
> > interested in helping to get this connector contributed into Flink. Can we
> > open this Flip again and we can look to get agreement in the discussion
> > thread please,
> > >
> > > Kind regards, David.
> > >
> > > Unless otherwise stated above:
> > >
> > > IBM United Kingdom Limited
> > > Registered in England and Wales with number 741598
> > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Request to release flink 1.6.3

2023-10-24 Thread vikas patil

Hello All,

Facing this FLINK-28185 
issue for one of the flink jobs. We are running flink version 1.6.1 but it
looks like the backport  to 1.6
was never released as 1.6.3. The latest that was released is 1.6.2
. By any chance we can get
the 1.6.3 released ?

Also we use the official flink docker 
image. Not sure if that needs to be updated as well manually. Thanks.

-Vikas

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Jark, Martijn, Xuyang for the valuable feedback.

Adding only the "scan.filter-push-down.enabled" configuration option would be 
great for me as well.
Optimization for this public behavior can be added later.

I made some modifications to the FLIP document and added the approach of adding 
new method to the Rejected Alternatives section. 

Looking forward to your feedback again.

Best,
Jiabao


> 2023年10月24日 16:58，Jark Wu  写道：
> 
> the current interface can already satisfy your requirements.
> The connector can reject all the filters by returning the input filters
> as `Result#remainingFilters`.

Re: Request to release flink 1.6.3

2023-10-24 Thread Rui Fan

Hi Vikas,

Thanks for your feedback!

Do you mean flink 1.16.3 instead of 1.6.3?

The 1.16.2 and 1.17.1 were released on 2023-05-25,
it’s been 5 months. And the flink community has fixed
many bugs in the past 5 months. Usually, there is a
fix(minor) version every three or four months, so I propose
to release 1.16.3 and 1.17.2 now.

If the community agrees to create this new patch release, I could
volunteer as the release manager for one of the 1.16.3 or 1.17.2.

Looking forward to feedback from the community, thank you

[1]
https://flink.apache.org/2023/05/25/apache-flink-1.16.2-release-announcement/
[2]
https://flink.apache.org/2023/05/25/apache-flink-1.17.1-release-announcement/

Best,
Rui

On Tue, Oct 24, 2023 at 9:50 PM vikas patil  wrote:

> Hello All,
>
> Facing this FLINK-28185  >
> issue for one of the flink jobs. We are running flink version 1.6.1 but it
> looks like the backport  to
> 1.6
> was never released as 1.6.3. The latest that was released is 1.6.2
> . By any chance we can get
> the 1.6.3 released ?
>
> Also we use the official flink docker 
> image. Not sure if that needs to be updated as well manually. Thanks.
>
> -Vikas
>

[jira] [Created] (FLINK-33352) OpenAPI spec is lacking mappings for discriminator properties

2023-10-24 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-33352:


 Summary: OpenAPI spec is lacking mappings for discriminator 
properties
 Key: FLINK-33352
 URL: https://issues.apache.org/jira/browse/FLINK-33352
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.17.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.17.2, 1.19.0, 1.18.1






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Request to release flink 1.6.3

2023-10-24 Thread vikas patil

Thanks Rui for the response. Yes, I indeed meant 1.16.3. Sorry about the
typo. Thanks for taking it up.

-Vikas

On Tue, Oct 24, 2023 at 9:27 AM Rui Fan <1996fan...@gmail.com> wrote:

> Hi Vikas,
>
> Thanks for your feedback!
>
> Do you mean flink 1.16.3 instead of 1.6.3?
>
> The 1.16.2 and 1.17.1 were released on 2023-05-25,
> it’s been 5 months. And the flink community has fixed
> many bugs in the past 5 months. Usually, there is a
> fix(minor) version every three or four months, so I propose
> to release 1.16.3 and 1.17.2 now.
>
> If the community agrees to create this new patch release, I could
> volunteer as the release manager for one of the 1.16.3 or 1.17.2.
>
> Looking forward to feedback from the community, thank you
>
> [1]
>
> https://flink.apache.org/2023/05/25/apache-flink-1.16.2-release-announcement/
> [2]
>
> https://flink.apache.org/2023/05/25/apache-flink-1.17.1-release-announcement/
>
> Best,
> Rui
>
> On Tue, Oct 24, 2023 at 9:50 PM vikas patil 
> wrote:
>
> > Hello All,
> >
> > Facing this FLINK-28185 <
> https://issues.apache.org/jira/browse/FLINK-28185
> > >
> > issue for one of the flink jobs. We are running flink version 1.6.1 but
> it
> > looks like the backport  to
> > 1.6
> > was never released as 1.6.3. The latest that was released is 1.6.2
> > . By any chance we can get
> > the 1.6.3 released ?
> >
> > Also we use the official flink docker 
> > image. Not sure if that needs to be updated as well manually. Thanks.
> >
> > -Vikas
> >
>

[jira] [Created] (FLINK-33353) SQL fails because "TimestampType.kind" is not serialized

2023-10-24 Thread Ferenc Csaky (Jira)

Ferenc Csaky created FLINK-33353:


 Summary: SQL fails because "TimestampType.kind" is not serialized 
 Key: FLINK-33353
 URL: https://issues.apache.org/jira/browse/FLINK-33353
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.18.0
Reporter: Ferenc Csaky


We have a custom persistent catalog store, which stores tables, views etc. in a 
DB. In our application, it is required to utilize the serialized formats of 
entities, but the same applies to the Hive, as it functions as a persistent 
catalog.

Take the following example SQL:

{code:sql}
CREATE TABLE IF NOT EXISTS `txn_gen` (
  `txn_id` INT,
  `amount` INT,
  `ts` TIMESTAMP(3),
   WATERMARK FOR `ts` AS `ts` - INTERVAL '1' SECOND
) WITH (
  'connector' = 'datagen',
  'fields.txn_id.min' = '1',
  'fields.txn_id.max' = '5',
  'rows-per-second' = '1'
);

CREATE VIEW IF NOT EXISTS aggr_ten_sec AS
  SELECT txn_id,
 TUMBLE_ROWTIME(`ts`, INTERVAL '10' SECOND) AS w_row_time,
 COUNT(txn_id) AS txn_count
FROM txn_gen
GROUP BY txn_id, TUMBLE(`ts`, INTERVAL '10' SECOND);

SELECT txn_id,
   SUM(txn_count),
   TUMBLE_START(w_row_time, INTERVAL '20' SECOND) AS total_txn_count
  FROM aggr_ten_sec
  GROUP BY txn_id, TUMBLE(w_row_time, INTERVAL '20' SECOND);
{code}

This will work without any problems when we simply execute it in a 
{{TableEnvironment}}, but it fails with the below error when we try to execute 
the query based on the serialized table metadata.
{code}
org.apache.flink.table.api.TableException: Window aggregate can only be defined 
over a time attribute column, but TIMESTAMP(3) encountered.
{code}

If there is a view which would require to use ROWTIME, it will be lost and we 
cannot recreate the same query from the serialized entites.

Currently in {{TimestampType}} the "kind" field is deliberatly annotated as 
{{@Internal}} and is not serialized, although it breaks this functionality.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Request to release flink 1.6.3

2023-10-24 Thread Matthias Pohl

1.17.3 would cover roughly 78 issues [1]. Flink 1.16.3 would cover 50
issues fixed [2]. There would be a discussion on whether we should do a
1.16.3 flush-out-all-the-leftover-issues minor release in the context of
the currently happening 1.18.0 release, anyway.

+1 I think it makes sense to prepare new releases. Thanks for volunteering,
Rui Fan

Matthias

[1]
https://issues.apache.org/jira/browse/FLINK-33352?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.17.2%20ORDER%20BY%20updatedDate
[2]
https://issues.apache.org/jira/browse/FLINK-33116?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.16.3%20ORDER%20BY%20updated%20


On Tue, Oct 24, 2023 at 4:53 PM vikas patil  wrote:

> Thanks Rui for the response. Yes, I indeed meant 1.16.3. Sorry about the
> typo. Thanks for taking it up.
>
> -Vikas
>
> On Tue, Oct 24, 2023 at 9:27 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Hi Vikas,
> >
> > Thanks for your feedback!
> >
> > Do you mean flink 1.16.3 instead of 1.6.3?
> >
> > The 1.16.2 and 1.17.1 were released on 2023-05-25,
> > it’s been 5 months. And the flink community has fixed
> > many bugs in the past 5 months. Usually, there is a
> > fix(minor) version every three or four months, so I propose
> > to release 1.16.3 and 1.17.2 now.
> >
> > If the community agrees to create this new patch release, I could
> > volunteer as the release manager for one of the 1.16.3 or 1.17.2.
> >
> > Looking forward to feedback from the community, thank you
> >
> > [1]
> >
> >
> https://flink.apache.org/2023/05/25/apache-flink-1.16.2-release-announcement/
> > [2]
> >
> >
> https://flink.apache.org/2023/05/25/apache-flink-1.17.1-release-announcement/
> >
> > Best,
> > Rui
> >
> > On Tue, Oct 24, 2023 at 9:50 PM vikas patil 
> > wrote:
> >
> > > Hello All,
> > >
> > > Facing this FLINK-28185 <
> > https://issues.apache.org/jira/browse/FLINK-28185
> > > >
> > > issue for one of the flink jobs. We are running flink version 1.6.1 but
> > it
> > > looks like the backport 
> to
> > > 1.6
> > > was never released as 1.6.3. The latest that was released is 1.6.2
> > > . By any chance we can
> get
> > > the 1.6.3 released ?
> > >
> > > Also we use the official flink docker  >
> > > image. Not sure if that needs to be updated as well manually. Thanks.
> > >
> > > -Vikas
> > >
> >
>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jane Chan

>
> I believe that the configuration "table.optimizer.source.predicate" has a
> higher priority at the planner level than the configuration at the source
> level,
> and it seems easy to implement now.
>

Correct me if I'm wrong, but I think the fine-grained configuration
"scan.filter-push-down.enabled" should have a higher priority because the
default value of "table.optimizer.source.predicate" is true. As a result,
turning off filter push-down for a specific source will not take effect
unless the default value of "table.optimizer.source.predicate" is changed
to false, or, alternatively, let users manually set
"table.optimizer.source.predicate" to false first and then selectively
enable filter push-down for the desired sources, which is less intuitive.
WDYT?

Best,
Jane

On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun 
wrote:

> Thanks Jane,
>
> I believe that the configuration "table.optimizer.source.predicate" has a
> higher priority at the planner level than the configuration at the source
> level,
> and it seems easy to implement now.
>
> Best,
> Jiabao
>
>
> > 2023年10月24日 17:36，Jane Chan  写道：
> >
> > Hi Jiabao,
> >
> > Thanks for driving this discussion. I have a small question that will
> > "scan.filter-push-down.enabled" take precedence over
> > "table.optimizer.source.predicate" when the two parameters might conflict
> > each other?
> >
> > Best,
> > Jane
> >
> > On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun  .invalid>
> > wrote:
> >
> >> Thanks Jark,
> >>
> >> If we only add configuration without adding the enableFilterPushDown
> >> method in the SupportsFilterPushDown interface,
> >> each connector would have to handle the same logic in the applyFilters
> >> method to determine whether filter pushdown is needed.
> >> This would increase complexity and violate the original behavior of the
> >> applyFilters method.
> >>
> >> On the contrary, we only need to pass the configuration parameter in the
> >> newly added enableFilterPushDown method
> >> to decide whether to perform predicate pushdown.
> >>
> >> I think this approach would be clearer and simpler.
> >> WDYT?
> >>
> >> Best,
> >> Jiabao
> >>
> >>
> >>> 2023年10月24日 16:58，Jark Wu  写道：
> >>>
> >>> Hi JIabao,
> >>>
> >>> I think the current interface can already satisfy your requirements.
> >>> The connector can reject all the filters by returning the input filters
> >>> as `Result#remainingFilters`.
> >>>
> >>> So maybe we don't need to introduce a new method to disable
> >>> pushdown, but just introduce an option for the specific connector.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> >>>
>  Thanks @Jiabao for kicking off this discussion.
> 
>  Could you add a section to explain the difference between proposed
>  connector level config `scan.filter-push-down.enabled` and existing
> >> query
>  level config `table.optimizer.source.predicate-pushdown-enabled` ?
> 
>  Best,
>  Leonard
> 
> > 2023年10月24日 下午4:18，Jiabao Sun  写道：
> >
> > Hi Devs,
> >
> > I would like to start a discussion on FLIP-377: support configuration
> >> to
>  disable filter pushdown for Table/SQL Sources[1].
> >
> > Currently, Flink Table/SQL does not expose fine-grained control for
>  users to enable or disable filter pushdown.
> > However, filter pushdown has some side effects, such as additional
>  computational pressure on external systems.
> > Moreover, Improper queries can lead to issues such as full table
> scans,
>  which in turn can impact the stability of external systems.
> >
> > Suppose we have an SQL query with two sources: Kafka and a database.
> > The database is sensitive to pressure, and we want to configure it to
>  not perform filter pushdown to the database source.
> > However, we still want to perform filter pushdown to the Kafka source
> >> to
>  decrease network IO.
> >
> > I propose to support configuration to disable filter push down for
>  Table/SQL sources to let user decide whether to perform filter
> pushdown.
> >
> > Looking forward to your feedback.
> >
> > [1]
> 
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
> >
> > Best,
> > Jiabao
> 
> 
> >>
> >>
>
>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Jane for the feedback.

The default value of "table.optimizer.source.predicate" is true that means by 
default, 
allowing predicate pushdown to all sources is permitted.

Therefore, disabling the pushdown filter for individual sources can take effect.


Best,
Jiabao


> 2023年10月24日 23:52，Jane Chan  写道：
> 
>> 
>> I believe that the configuration "table.optimizer.source.predicate" has a
>> higher priority at the planner level than the configuration at the source
>> level,
>> and it seems easy to implement now.
>> 
> 
> Correct me if I'm wrong, but I think the fine-grained configuration
> "scan.filter-push-down.enabled" should have a higher priority because the
> default value of "table.optimizer.source.predicate" is true. As a result,
> turning off filter push-down for a specific source will not take effect
> unless the default value of "table.optimizer.source.predicate" is changed
> to false, or, alternatively, let users manually set
> "table.optimizer.source.predicate" to false first and then selectively
> enable filter push-down for the desired sources, which is less intuitive.
> WDYT?
> 
> Best,
> Jane
> 
> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun 
> wrote:
> 
>> Thanks Jane,
>> 
>> I believe that the configuration "table.optimizer.source.predicate" has a
>> higher priority at the planner level than the configuration at the source
>> level,
>> and it seems easy to implement now.
>> 
>> Best,
>> Jiabao
>> 
>> 
>>> 2023年10月24日 17:36，Jane Chan  写道：
>>> 
>>> Hi Jiabao,
>>> 
>>> Thanks for driving this discussion. I have a small question that will
>>> "scan.filter-push-down.enabled" take precedence over
>>> "table.optimizer.source.predicate" when the two parameters might conflict
>>> each other?
>>> 
>>> Best,
>>> Jane
>>> 
>>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun > .invalid>
>>> wrote:
>>> 
 Thanks Jark,
 
 If we only add configuration without adding the enableFilterPushDown
 method in the SupportsFilterPushDown interface,
 each connector would have to handle the same logic in the applyFilters
 method to determine whether filter pushdown is needed.
 This would increase complexity and violate the original behavior of the
 applyFilters method.
 
 On the contrary, we only need to pass the configuration parameter in the
 newly added enableFilterPushDown method
 to decide whether to perform predicate pushdown.
 
 I think this approach would be clearer and simpler.
 WDYT?
 
 Best,
 Jiabao
 
 
> 2023年10月24日 16:58，Jark Wu  写道：
> 
> Hi JIabao,
> 
> I think the current interface can already satisfy your requirements.
> The connector can reject all the filters by returning the input filters
> as `Result#remainingFilters`.
> 
> So maybe we don't need to introduce a new method to disable
> pushdown, but just introduce an option for the specific connector.
> 
> Best,
> Jark
> 
> On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> 
>> Thanks @Jiabao for kicking off this discussion.
>> 
>> Could you add a section to explain the difference between proposed
>> connector level config `scan.filter-push-down.enabled` and existing
 query
>> level config `table.optimizer.source.predicate-pushdown-enabled` ?
>> 
>> Best,
>> Leonard
>> 
>>> 2023年10月24日 下午4:18，Jiabao Sun  写道：
>>> 
>>> Hi Devs,
>>> 
>>> I would like to start a discussion on FLIP-377: support configuration
 to
>> disable filter pushdown for Table/SQL Sources[1].
>>> 
>>> Currently, Flink Table/SQL does not expose fine-grained control for
>> users to enable or disable filter pushdown.
>>> However, filter pushdown has some side effects, such as additional
>> computational pressure on external systems.
>>> Moreover, Improper queries can lead to issues such as full table
>> scans,
>> which in turn can impact the stability of external systems.
>>> 
>>> Suppose we have an SQL query with two sources: Kafka and a database.
>>> The database is sensitive to pressure, and we want to configure it to
>> not perform filter pushdown to the database source.
>>> However, we still want to perform filter pushdown to the Kafka source
 to
>> decrease network IO.
>>> 
>>> I propose to support configuration to disable filter push down for
>> Table/SQL sources to let user decide whether to perform filter
>> pushdown.
>>> 
>>> Looking forward to your feedback.
>>> 
>>> [1]
>> 
 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=276105768
>>> 
>>> Best,
>>> Jiabao
>> 
>> 
 
 
>> 
>>

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-10-24 Thread Tzu-Li (Gordon) Tai

Hi Peter,

Thanks a lot for starting this FLIP!

I agree that the current TwoPhaseCommittingSink interfaces is limiting in
that it assumes 1) committers have the same parallelism as writers, and 2)
writers immediately produce finalized committables. This FLIP captures the
problem pretty well, and I do think there are use cases for a more general
flexible interface outside of the Iceberg connector as well.

In terms of the abstraction layering, I was wondering if you've also
considered this approach which I've quickly sketched in my local fork:
https://github.com/tzulitai/flink/commit/e84e3ac57ce023c35037a8470fefdfcad877bcae

With this approach, the sink translator always expect that 2PC sink
implementations should extend `TwoPhaseCommittingSinkBase` and therefore
assumes that a pre-commit topology always exist. For simple 2PC sinks that
do not require transforming committables, we would ship (for convenience)
an additional `SimpleTwoPhaseCommittingSinkBase` where the pre-commit
topology is a no-op passthrough. With that we avoid some of the
"boilerplates" where 2PC sinks with pre-commit topology requires
implementing two interfaces, as proposed in the FLIP.

Quick thought: regarding the awkwardness you mention in the end with sinks
that have post commit topologies, but no pre-commit topologies -
Alternative to the mixin approach in the FLIP, it might make sense to
consider a builder approach for constructing 2PC sinks, which should also
give users type-safety at compile time while not having the awkwardness
with all the types involved. Something along the lines of:

```
new TwoPhaseCommittingSinkBuilder(writerSupplier, committerSupplier)
.withPreCommitTopology(writerResultsStream -> ...)   //
Function, DataStream>
.withPostCommitTopology(committablesStream -> ...) //
Consumer>
.withPreWriteTopology(inputStream -> ...)  //
Function, DataStream>
.build();
```

We could probably do some validation in the build() method, e.g. if writer
/ committer have different types, then clearly a pre-commit topology should
have been defined to transform intermediate writer results.

Obviously, this would take generalization of the TwoPhaseCommittingSink
interface to the extreme, where we just have one interface with all of the
pre-commit / pre-write / post-commit methods built-in, and users would use
the builder as the entrypoint to opt-in / opt-out as needed. The upside is
that the SinkTransformationTranslator logic will become much less cluttered.

I'll need to experiment the builder approach a bit more to see if it makes
sense at all, but wanted to throw out the idea earlier to see what you
think.

On Mon, Oct 9, 2023 at 6:59 AM Péter Váry 
wrote:

> Hi Team,
>
> Did some experimenting and found the originally proposed solution to be a
> bit awkward for cases where WithPostCommitTopology was needed but we do not
> need the WithPreCommitTopology transformation.
> The flexibility of the new API would be better if we would use a mixin like
> approach. The interfaces would only be used to define the specific required
> methods, and they would not need to extend the original
> TwoPhaseCommittingSink interface too.
>
> Since the interfaces WithPreCommitTopology and the WithPostCommitTopology
> interfaces are still Experimental, after talking to Gyula offline, I have
> updated the FLIP to use this new approach.
>
> Any comments, thoughts are welcome.
>
> Thanks,
> Peter
>
> Péter Váry  ezt írta (időpont: 2023. okt. 5.,
> Cs, 16:04):
>
> > Hi Team,
> >
> > In my previous email[1] I have described our challenges migrating the
> > existing Iceberg SinkFunction based implementation, to the new SinkV2
> based
> > implementation.
> >
> > As a result of the discussion around that topic, I have created the
> > FLIP-371 [2] to address the Committer related changes, and now I have
> > created a companion FLIP-372 [3] to address the WithPreCommitTopology
> > related issues.
> >
> > FLIP-372: Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the
> > type of the Committable
> >
> > The main goal of the FLIP-372 is to extend the currently existing
> > TwoPhaseCommittingSink API by adding the possibility to have a
> > PreCommitTopology where the input of and the output types of the pre
> commit
> > transformation are different.
> >
> > Here is the FLIP: FLIP-372: Allow TwoPhaseCommittingSink
> > WithPreCommitTopology to alter the type of the Committable
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Allow+TwoPhaseCommittingSink+WithPreCommitTopology+to+alter+the+type+of+the+Committable
> >
> >
> > Please use this thread to discuss the FLIP related questions, proposals,
> > and feedback.
> >
> > Thanks,
> > Peter
> >
> > - [1] https://lists.apache.org/thread/h3kg7jcgjstpvwlhnofq093vk93ylgsn
> > - [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-371%3A+Provide+initialization+context+for+Committer+creation+in+TwoPhaseCommittingSink
> > - [3]
> >
> https://cwiki.apache.

Re: [DISCUSS] Allow TwoPhaseCommittingSink WithPreCommitTopology to alter the type of the Committable

2023-10-24 Thread Tzu-Li (Gordon) Tai

Also one more meta comment regarding interplay of interface changes across
FLIP-371 and FLIP-372:

With FLIP-371, we're already thinking about some signature changes on the
TwoPhaseCommittingSink interface, such as 1)  introducing
`createCommitter(CommitterInitContext)`, and 2) renaming the writer side
`InitContext` to `WriterInitContext` to align on naming conventions with
the new `CommitterInitContext`.

Instead of trying to adapt the existing TwoPhaseCommittingSink interface,
since we're likely going to have to introduce a complete new interface with
FLIP-372 anyways + deprecate existing TwoPhaseCommittingSink, what do you
think about only introducing the above signature changes in the new
interface and just leaving the old one as is? Having access to the new
features (transforming committables / runtime context for committer
initialization) would be motivation for implementations to migrate as soon
as possible.

On Tue, Oct 24, 2023 at 5:00 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Peter,
>
> Thanks a lot for starting this FLIP!
>
> I agree that the current TwoPhaseCommittingSink interfaces is limiting in
> that it assumes 1) committers have the same parallelism as writers, and 2)
> writers immediately produce finalized committables. This FLIP captures the
> problem pretty well, and I do think there are use cases for a more general
> flexible interface outside of the Iceberg connector as well.
>
> In terms of the abstraction layering, I was wondering if you've also
> considered this approach which I've quickly sketched in my local fork:
> https://github.com/tzulitai/flink/commit/e84e3ac57ce023c35037a8470fefdfcad877bcae
>
> With this approach, the sink translator always expect that 2PC sink
> implementations should extend `TwoPhaseCommittingSinkBase` and therefore
> assumes that a pre-commit topology always exist. For simple 2PC sinks that
> do not require transforming committables, we would ship (for convenience)
> an additional `SimpleTwoPhaseCommittingSinkBase` where the pre-commit
> topology is a no-op passthrough. With that we avoid some of the
> "boilerplates" where 2PC sinks with pre-commit topology requires
> implementing two interfaces, as proposed in the FLIP.
>
> Quick thought: regarding the awkwardness you mention in the end with sinks
> that have post commit topologies, but no pre-commit topologies -
> Alternative to the mixin approach in the FLIP, it might make sense to
> consider a builder approach for constructing 2PC sinks, which should also
> give users type-safety at compile time while not having the awkwardness
> with all the types involved. Something along the lines of:
>
> ```
> new TwoPhaseCommittingSinkBuilder(writerSupplier, committerSupplier)
> .withPreCommitTopology(writerResultsStream -> ...)   //
> Function, DataStream>
> .withPostCommitTopology(committablesStream -> ...) //
> Consumer>
> .withPreWriteTopology(inputStream -> ...)  //
> Function, DataStream>
> .build();
> ```
>
> We could probably do some validation in the build() method, e.g. if writer
> / committer have different types, then clearly a pre-commit topology should
> have been defined to transform intermediate writer results.
>
> Obviously, this would take generalization of the TwoPhaseCommittingSink
> interface to the extreme, where we just have one interface with all of the
> pre-commit / pre-write / post-commit methods built-in, and users would use
> the builder as the entrypoint to opt-in / opt-out as needed. The upside is
> that the SinkTransformationTranslator logic will become much less cluttered.
>
> I'll need to experiment the builder approach a bit more to see if it makes
> sense at all, but wanted to throw out the idea earlier to see what you
> think.
>
> On Mon, Oct 9, 2023 at 6:59 AM Péter Váry 
> wrote:
>
>> Hi Team,
>>
>> Did some experimenting and found the originally proposed solution to be a
>> bit awkward for cases where WithPostCommitTopology was needed but we do
>> not
>> need the WithPreCommitTopology transformation.
>> The flexibility of the new API would be better if we would use a mixin
>> like
>> approach. The interfaces would only be used to define the specific
>> required
>> methods, and they would not need to extend the original
>> TwoPhaseCommittingSink interface too.
>>
>> Since the interfaces WithPreCommitTopology and the WithPostCommitTopology
>> interfaces are still Experimental, after talking to Gyula offline, I have
>> updated the FLIP to use this new approach.
>>
>> Any comments, thoughts are welcome.
>>
>> Thanks,
>> Peter
>>
>> Péter Váry  ezt írta (időpont: 2023. okt.
>> 5.,
>> Cs, 16:04):
>>
>> > Hi Team,
>> >
>> > In my previous email[1] I have described our challenges migrating the
>> > existing Iceberg SinkFunction based implementation, to the new SinkV2
>> based
>> > implementation.
>> >
>> > As a result of the discussion around that topic, I have created the
>> > FLIP-371 [2] to address the Committer related changes, and now I have
>> > create

Re: [VOTE] Apache Flink Kubernetes Operator Release 1.6.1, release candidate #1

2023-10-24 Thread Thomas Weise

+1 (binding)

- Verified checksums, signatures, source release content
- Run unit tests

Side note:   mvn clean verifyfails with Java 17 compiler. While the
build target version may be 11, preferably a higher JDK version can be used
to build the source.

 Caused by: java.lang.IllegalAccessError: class
com.google.googlejavaformat.java.RemoveUnusedImports (in unnamed module
@0x44f433db) cannot access class com.sun.tools.javac.util.Context (in
module jdk.compiler) because module jdk.compiler does not export
com.sun.tools.javac.util to unnamed module @0x44f433db

at
com.google.googlejavaformat.java.RemoveUnusedImports.removeUnusedImports(RemoveUnusedImports.java:187)

Thanks,
Thomas


On Sat, Oct 21, 2023 at 7:35 AM Rui Fan <1996fan...@gmail.com> wrote:

> Hi Everyone,
>
> Please review and vote on the release candidate #1 for the version 1.6.1 of
> Apache Flink Kubernetes Operator,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> **Release Overview**
>
> As an overview, the release consists of the following:
> a) Kubernetes Operator canonical source distribution (including the
> Dockerfile), to be deployed to the release repository at dist.apache.org
> b) Kubernetes Operator Helm Chart to be deployed to the release repository
> at dist.apache.org
> c) Maven artifacts to be deployed to the Maven Central Repository
> d) Docker image to be pushed to dockerhub
>
> **Staging Areas to Review**
>
> The staging areas containing the above mentioned artifacts are as follows,
> for your review:
> * All artifacts for a,b) can be found in the corresponding dev repository
> at dist.apache.org [1]
> * All artifacts for c) can be found at the Apache Nexus Repository [2]
> * The docker image for d) is staged on github [3]
>
> All artifacts are signed with the
> key B2D64016B940A7E0B9B72E0D7D0528B28037D8BC [4]
>
> Other links for your review:
> * source code tag "release-1.6.1-rc1" [5]
> * PR to update the website Downloads page to
> include Kubernetes Operator links [6]
> * PR to update the doc version of flink-kubernetes-operator[7]
>
> **Vote Duration**
>
> The voting time will run for at least 72 hours.
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> **Note on Verification**
>
> You can follow the basic verification guide here[8].
> Note that you don't need to verify everything yourself, but please make
> note of what you have tested together with your +- vote.
>
> [1]
>
> https://dist.apache.org/repos/dist/dev/flink/flink-kubernetes-operator-1.6.1-rc1/
> [2]
> https://repository.apache.org/content/repositories/orgapacheflink-1663/
> [3]
>
> https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/139454270?tag=51eeae1
> [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> [5]
> https://github.com/apache/flink-kubernetes-operator/tree/release-1.6.1-rc1
> [6] https://github.com/apache/flink-web/pull/690
> [7] https://github.com/apache/flink-kubernetes-operator/pull/687
> [8]
>
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Kubernetes+Operator+Release
>
> Best,
> Rui
>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jark Wu

Thank you for updating Jiabao,

The FLIP looks good to me.

Best,
Jark

On Wed, 25 Oct 2023 at 00:42, Jiabao Sun 
wrote:

> Thanks Jane for the feedback.
>
> The default value of "table.optimizer.source.predicate" is true that means
> by default,
> allowing predicate pushdown to all sources is permitted.
>
> Therefore, disabling the pushdown filter for individual sources can take
> effect.
>
>
> Best,
> Jiabao
>
>
> > 2023年10月24日 23:52，Jane Chan  写道：
> >
> >>
> >> I believe that the configuration "table.optimizer.source.predicate" has
> a
> >> higher priority at the planner level than the configuration at the
> source
> >> level,
> >> and it seems easy to implement now.
> >>
> >
> > Correct me if I'm wrong, but I think the fine-grained configuration
> > "scan.filter-push-down.enabled" should have a higher priority because the
> > default value of "table.optimizer.source.predicate" is true. As a result,
> > turning off filter push-down for a specific source will not take effect
> > unless the default value of "table.optimizer.source.predicate" is changed
> > to false, or, alternatively, let users manually set
> > "table.optimizer.source.predicate" to false first and then selectively
> > enable filter push-down for the desired sources, which is less intuitive.
> > WDYT?
> >
> > Best,
> > Jane
> >
> > On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun  .invalid>
> > wrote:
> >
> >> Thanks Jane,
> >>
> >> I believe that the configuration "table.optimizer.source.predicate" has
> a
> >> higher priority at the planner level than the configuration at the
> source
> >> level,
> >> and it seems easy to implement now.
> >>
> >> Best,
> >> Jiabao
> >>
> >>
> >>> 2023年10月24日 17:36，Jane Chan  写道：
> >>>
> >>> Hi Jiabao,
> >>>
> >>> Thanks for driving this discussion. I have a small question that will
> >>> "scan.filter-push-down.enabled" take precedence over
> >>> "table.optimizer.source.predicate" when the two parameters might
> conflict
> >>> each other?
> >>>
> >>> Best,
> >>> Jane
> >>>
> >>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun  >> .invalid>
> >>> wrote:
> >>>
>  Thanks Jark,
> 
>  If we only add configuration without adding the enableFilterPushDown
>  method in the SupportsFilterPushDown interface,
>  each connector would have to handle the same logic in the applyFilters
>  method to determine whether filter pushdown is needed.
>  This would increase complexity and violate the original behavior of
> the
>  applyFilters method.
> 
>  On the contrary, we only need to pass the configuration parameter in
> the
>  newly added enableFilterPushDown method
>  to decide whether to perform predicate pushdown.
> 
>  I think this approach would be clearer and simpler.
>  WDYT?
> 
>  Best,
>  Jiabao
> 
> 
> > 2023年10月24日 16:58，Jark Wu  写道：
> >
> > Hi JIabao,
> >
> > I think the current interface can already satisfy your requirements.
> > The connector can reject all the filters by returning the input
> filters
> > as `Result#remainingFilters`.
> >
> > So maybe we don't need to introduce a new method to disable
> > pushdown, but just introduce an option for the specific connector.
> >
> > Best,
> > Jark
> >
> > On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> >
> >> Thanks @Jiabao for kicking off this discussion.
> >>
> >> Could you add a section to explain the difference between proposed
> >> connector level config `scan.filter-push-down.enabled` and existing
>  query
> >> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> >>
> >> Best,
> >> Leonard
> >>
> >>> 2023年10月24日 下午4:18，Jiabao Sun 
> 写道：
> >>>
> >>> Hi Devs,
> >>>
> >>> I would like to start a discussion on FLIP-377: support
> configuration
>  to
> >> disable filter pushdown for Table/SQL Sources[1].
> >>>
> >>> Currently, Flink Table/SQL does not expose fine-grained control for
> >> users to enable or disable filter pushdown.
> >>> However, filter pushdown has some side effects, such as additional
> >> computational pressure on external systems.
> >>> Moreover, Improper queries can lead to issues such as full table
> >> scans,
> >> which in turn can impact the stability of external systems.
> >>>
> >>> Suppose we have an SQL query with two sources: Kafka and a
> database.
> >>> The database is sensitive to pressure, and we want to configure it
> to
> >> not perform filter pushdown to the database source.
> >>> However, we still want to perform filter pushdown to the Kafka
> source
>  to
> >> decrease network IO.
> >>>
> >>> I propose to support configuration to disable filter push down for
> >> Table/SQL sources to let user decide whether to perform filter
> >> pushdown.
> >>>
> >>> Looking forward to your feedback.
> >>>
> >>> [1]
> >>
> 
> >>
> https

[jira] [Created] (FLINK-33354) Reuse the TaskInformation for multiple slots

2023-10-24 Thread Rui Fan (Jira)

Rui Fan created FLINK-33354:
---

 Summary: Reuse the TaskInformation for multiple slots
 Key: FLINK-33354
 URL: https://issues.apache.org/jira/browse/FLINK-33354
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Affects Versions: 1.17.1, 1.18.0
Reporter: Rui Fan
Assignee: Rui Fan


The background is similar to FLINK-33315.

A hive table with a lot of data, and the HiveSource#partitionBytes is 281MB. 
When slotPerTM = 4, one TM will run 4 HiveSources at the same time.

 

How the TaskExecutor to submit a large task?
 # TaskExecutor#loadBigData will read all bytes from file to 
SerializedValue 
 ** The SerializedValue  has a byte[]
 ** It will cost the heap memory
 ** It will be great than 281 MB, because it not only stores 
HiveSource#partitionBytes, it also stores other information of TaskInformation.
 # Generate the TaskInformation from SerializedValue 
 ** TaskExecutor#submitTask calls the 
tdd.getSerializedTaskInformation()..deserializeValue()
 ** tdd.getSerializedTaskInformation() is SerializedValue 
 ** It will generate the TaskInformation
 ** TaskInformation includes the Configuration 
{color:#9876aa}taskConfiguration{color}
 ** The {color:#9876aa}taskConfiguration{color} includes 
StreamConfig#{color:#9876aa}SERIALIZEDUDF{color}

 

{color:#172b4d}Based on the above process, TM memory will have 2 big byte array 
for each task:{color}
 * {color:#172b4d}The SerializedValue{color}
 * {color:#172b4d}The TaskInformation{color}

When one TM runs 4 HiveSources at the same time, it will have 8 big byte array.

In our production environment, this is also a situation that often leads to TM 
OOM.
h2. Solution:

These data is totally same due to the PermanentBlobKey is same. We can add a 
cache for it to reduce the memory and cpu cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33355) can't reduce the parallelism from 'n' to '1' when recovering through a savepoint.

2023-10-24 Thread zhang (Jira)

zhang created FLINK-33355:
-

 Summary: can't reduce the parallelism from 'n' to '1' when 
recovering through a savepoint.
 Key: FLINK-33355
 URL: https://issues.apache.org/jira/browse/FLINK-33355
 Project: Flink
  Issue Type: Bug
  Components: API / Core
 Environment: flink 1.17.1
Reporter: zhang


If the program includes operators with window, it is not possible to reduce the 
parallelism of the operators from n to 1 when restarting from a savepoint, and 
it will result in an error: 
{code:java}
//IllegalStateException: Failed to rollback to checkpoint/savepoint Checkpoint 
Metadata. Max parallelism mismatch between checkpoint/savepoint state and new 
program. Cannot map operator 0e059b9f403cf6f35592ab773c9408d4 with max 
parallelism 128 to new program with max parallelism 1. This indicates that the 
program has been changed in a non-compatible way after the 
checkpoint/savepoint. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-370: Support Balanced Tasks Scheduling

2023-10-24 Thread Yuepeng Pan

+1 (non-binding)

Regards,
Yuepeng Pan

On 2023/10/23 08:25:30 xiangyu feng wrote:
> Thanks for driving that.
> +1 (non-binding)
> 
> Regards,
> Xiangyu
> 
> Yu Chen  于2023年10月23日周一 15:19写道：
> 
> > +1 (non-binding)
> >
> > We deeply need this capability to balance Tasks at the Taskmanager level in
> > production, which helps to make a more sufficient usage of Taskmanager
> > resources. Thanks for driving that.
> >
> > Best,
> > Yu Chen
> >
> > Yangze Guo  于2023年10月23日周一 15:08写道：
> >
> > > +1 (binding)
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Oct 23, 2023 at 12:00 PM Rui Fan <1996fan...@gmail.com> wrote:
> > > >
> > > > +1(binding)
> > > >
> > > > Thanks to Yuepeng and to everyone who participated in the discussion!
> > > >
> > > > Best,
> > > > Rui
> > > >
> > > > On Mon, Oct 23, 2023 at 11:55 AM Roc Marshal  wrote:
> > > >>
> > > >> Hi all,
> > > >>
> > > >> Thanks for all the feedback on FLIP-370[1][2].
> > > >> I'd like to start a vote for  FLIP-370. The vote will last for at
> > least
> > > 72 hours (Oct. 26th at 10:00 A.M. GMT) unless there is an objection or
> > > insufficient votes.
> > > >>
> > > >> Thanks,
> > > >> Yuepeng Pan
> > > >>
> > > >> [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling
> > > >> [2] https://lists.apache.org/thread/mx3ot0fmk6zr02ccdby0s8oqxqm2pn1x
> > >
> >
>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jingsong Li

+1 for this FLIP.

BTW, I think we can add an option for projection push down too.

Yes, we can do all things in the connector, but a common
implementation helps a lot! And can introduce an unify option!

Best,
Jingsong

On Wed, Oct 25, 2023 at 10:07 AM Jark Wu  wrote:
>
> Thank you for updating Jiabao,
>
> The FLIP looks good to me.
>
> Best,
> Jark
>
> On Wed, 25 Oct 2023 at 00:42, Jiabao Sun 
> wrote:
>
> > Thanks Jane for the feedback.
> >
> > The default value of "table.optimizer.source.predicate" is true that means
> > by default,
> > allowing predicate pushdown to all sources is permitted.
> >
> > Therefore, disabling the pushdown filter for individual sources can take
> > effect.
> >
> >
> > Best,
> > Jiabao
> >
> >
> > > 2023年10月24日 23:52，Jane Chan  写道：
> > >
> > >>
> > >> I believe that the configuration "table.optimizer.source.predicate" has
> > a
> > >> higher priority at the planner level than the configuration at the
> > source
> > >> level,
> > >> and it seems easy to implement now.
> > >>
> > >
> > > Correct me if I'm wrong, but I think the fine-grained configuration
> > > "scan.filter-push-down.enabled" should have a higher priority because the
> > > default value of "table.optimizer.source.predicate" is true. As a result,
> > > turning off filter push-down for a specific source will not take effect
> > > unless the default value of "table.optimizer.source.predicate" is changed
> > > to false, or, alternatively, let users manually set
> > > "table.optimizer.source.predicate" to false first and then selectively
> > > enable filter push-down for the desired sources, which is less intuitive.
> > > WDYT?
> > >
> > > Best,
> > > Jane
> > >
> > > On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun  > .invalid>
> > > wrote:
> > >
> > >> Thanks Jane,
> > >>
> > >> I believe that the configuration "table.optimizer.source.predicate" has
> > a
> > >> higher priority at the planner level than the configuration at the
> > source
> > >> level,
> > >> and it seems easy to implement now.
> > >>
> > >> Best,
> > >> Jiabao
> > >>
> > >>
> > >>> 2023年10月24日 17:36，Jane Chan  写道：
> > >>>
> > >>> Hi Jiabao,
> > >>>
> > >>> Thanks for driving this discussion. I have a small question that will
> > >>> "scan.filter-push-down.enabled" take precedence over
> > >>> "table.optimizer.source.predicate" when the two parameters might
> > conflict
> > >>> each other?
> > >>>
> > >>> Best,
> > >>> Jane
> > >>>
> > >>> On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun  > >> .invalid>
> > >>> wrote:
> > >>>
> >  Thanks Jark,
> > 
> >  If we only add configuration without adding the enableFilterPushDown
> >  method in the SupportsFilterPushDown interface,
> >  each connector would have to handle the same logic in the applyFilters
> >  method to determine whether filter pushdown is needed.
> >  This would increase complexity and violate the original behavior of
> > the
> >  applyFilters method.
> > 
> >  On the contrary, we only need to pass the configuration parameter in
> > the
> >  newly added enableFilterPushDown method
> >  to decide whether to perform predicate pushdown.
> > 
> >  I think this approach would be clearer and simpler.
> >  WDYT?
> > 
> >  Best,
> >  Jiabao
> > 
> > 
> > > 2023年10月24日 16:58，Jark Wu  写道：
> > >
> > > Hi JIabao,
> > >
> > > I think the current interface can already satisfy your requirements.
> > > The connector can reject all the filters by returning the input
> > filters
> > > as `Result#remainingFilters`.
> > >
> > > So maybe we don't need to introduce a new method to disable
> > > pushdown, but just introduce an option for the specific connector.
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> > >
> > >> Thanks @Jiabao for kicking off this discussion.
> > >>
> > >> Could you add a section to explain the difference between proposed
> > >> connector level config `scan.filter-push-down.enabled` and existing
> >  query
> > >> level config `table.optimizer.source.predicate-pushdown-enabled` ?
> > >>
> > >> Best,
> > >> Leonard
> > >>
> > >>> 2023年10月24日 下午4:18，Jiabao Sun 
> > 写道：
> > >>>
> > >>> Hi Devs,
> > >>>
> > >>> I would like to start a discussion on FLIP-377: support
> > configuration
> >  to
> > >> disable filter pushdown for Table/SQL Sources[1].
> > >>>
> > >>> Currently, Flink Table/SQL does not expose fine-grained control for
> > >> users to enable or disable filter pushdown.
> > >>> However, filter pushdown has some side effects, such as additional
> > >> computational pressure on external systems.
> > >>> Moreover, Improper queries can lead to issues such as full table
> > >> scans,
> > >> which in turn can impact the stability of external systems.
> > >>>
> > >>> Suppose we have an SQL query with two sources: K

[jira] [Created] (FLINK-33356) The navigation bar on Flink’s official website is messed up.

2023-10-24 Thread Junrui Li (Jira)

Junrui Li created FLINK-33356:
-

 Summary: The navigation bar on Flink’s official website is messed 
up.
 Key: FLINK-33356
 URL: https://issues.apache.org/jira/browse/FLINK-33356
 Project: Flink
  Issue Type: Bug
  Components: Project Website
Reporter: Junrui Li
 Attachments: image-2023-10-25-11-55-52-653.png

The side navigation bar on the Flink official website at the following link: 
[https://nightlies.apache.org/flink/flink-docs-master/] appears to be messed 
up, as shown in the attached screenshot.

!image-2023-10-25-11-55-52-653.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Benchao Li

I agree with Jane that fine-grained configurations should have higher
priority than job level configurations.

For current proposal, we can achieve that:
- Set "table.optimizer.source.predicate" = "true" to enable by
default, and set ""scan.filter-push-down.enabled" = "false" to disable
it per table source
- Set "table.optimizer.source.predicate" = "false" to disable by
default, and set ""scan.filter-push-down.enabled" = "true" to enable
it per table source

Jane Chan  于2023年10月24日周二 23:55写道：
>
> >
> > I believe that the configuration "table.optimizer.source.predicate" has a
> > higher priority at the planner level than the configuration at the source
> > level,
> > and it seems easy to implement now.
> >
>
> Correct me if I'm wrong, but I think the fine-grained configuration
> "scan.filter-push-down.enabled" should have a higher priority because the
> default value of "table.optimizer.source.predicate" is true. As a result,
> turning off filter push-down for a specific source will not take effect
> unless the default value of "table.optimizer.source.predicate" is changed
> to false, or, alternatively, let users manually set
> "table.optimizer.source.predicate" to false first and then selectively
> enable filter push-down for the desired sources, which is less intuitive.
> WDYT?
>
> Best,
> Jane
>
> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun 
> wrote:
>
> > Thanks Jane,
> >
> > I believe that the configuration "table.optimizer.source.predicate" has a
> > higher priority at the planner level than the configuration at the source
> > level,
> > and it seems easy to implement now.
> >
> > Best,
> > Jiabao
> >
> >
> > > 2023年10月24日 17:36，Jane Chan  写道：
> > >
> > > Hi Jiabao,
> > >
> > > Thanks for driving this discussion. I have a small question that will
> > > "scan.filter-push-down.enabled" take precedence over
> > > "table.optimizer.source.predicate" when the two parameters might conflict
> > > each other?
> > >
> > > Best,
> > > Jane
> > >
> > > On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun  > .invalid>
> > > wrote:
> > >
> > >> Thanks Jark,
> > >>
> > >> If we only add configuration without adding the enableFilterPushDown
> > >> method in the SupportsFilterPushDown interface,
> > >> each connector would have to handle the same logic in the applyFilters
> > >> method to determine whether filter pushdown is needed.
> > >> This would increase complexity and violate the original behavior of the
> > >> applyFilters method.
> > >>
> > >> On the contrary, we only need to pass the configuration parameter in the
> > >> newly added enableFilterPushDown method
> > >> to decide whether to perform predicate pushdown.
> > >>
> > >> I think this approach would be clearer and simpler.
> > >> WDYT?
> > >>
> > >> Best,
> > >> Jiabao
> > >>
> > >>
> > >>> 2023年10月24日 16:58，Jark Wu  写道：
> > >>>
> > >>> Hi JIabao,
> > >>>
> > >>> I think the current interface can already satisfy your requirements.
> > >>> The connector can reject all the filters by returning the input filters
> > >>> as `Result#remainingFilters`.
> > >>>
> > >>> So maybe we don't need to introduce a new method to disable
> > >>> pushdown, but just introduce an option for the specific connector.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>> On Tue, 24 Oct 2023 at 16:38, Leonard Xu  wrote:
> > >>>
> >  Thanks @Jiabao for kicking off this discussion.
> > 
> >  Could you add a section to explain the difference between proposed
> >  connector level config `scan.filter-push-down.enabled` and existing
> > >> query
> >  level config `table.optimizer.source.predicate-pushdown-enabled` ?
> > 
> >  Best,
> >  Leonard
> > 
> > > 2023年10月24日 下午4:18，Jiabao Sun  写道：
> > >
> > > Hi Devs,
> > >
> > > I would like to start a discussion on FLIP-377: support configuration
> > >> to
> >  disable filter pushdown for Table/SQL Sources[1].
> > >
> > > Currently, Flink Table/SQL does not expose fine-grained control for
> >  users to enable or disable filter pushdown.
> > > However, filter pushdown has some side effects, such as additional
> >  computational pressure on external systems.
> > > Moreover, Improper queries can lead to issues such as full table
> > scans,
> >  which in turn can impact the stability of external systems.
> > >
> > > Suppose we have an SQL query with two sources: Kafka and a database.
> > > The database is sensitive to pressure, and we want to configure it to
> >  not perform filter pushdown to the database source.
> > > However, we still want to perform filter pushdown to the Kafka source
> > >> to
> >  decrease network IO.
> > >
> > > I propose to support configuration to disable filter push down for
> >  Table/SQL sources to let user decide whether to perform filter
> > pushdown.
> > >
> > > Looking forward to your feedback.
> > >
> > > [1]
> > 
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action

[jira] [Created] (FLINK-33357) add Apache Software License 2

2023-10-24 Thread Jira

蔡灿材 created FLINK-33357:
---

 Summary: add Apache Software License 2
 Key: FLINK-33357
 URL: https://issues.apache.org/jira/browse/FLINK-33357
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.6.0
Reporter: 蔡灿材
 Fix For: kubernetes-operator-1.5.0
 Attachments: 2023-10-25 12-08-58屏幕截图.png

Flinkdeployments.flink.apache.org - v1. Currently yml and 
flinksessionjobs.flink.apache.org - v1. Yml don't

add add Apache Software License 2



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33358) Flink SQL Client fails to start in Flink on YARN

2023-10-24 Thread Prabhu Joseph (Jira)

Prabhu Joseph created FLINK-33358:
-

 Summary: Flink SQL Client fails to start in Flink on YARN
 Key: FLINK-33358
 URL: https://issues.apache.org/jira/browse/FLINK-33358
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN, Table SQL / Client
Affects Versions: 1.18.0
Reporter: Prabhu Joseph


Flink SQL Client fails to start in Flink on YARN with below error
{code:java}
flink-yarn-session -tm 2048 -s 2 -d

/usr/lib/flink/bin/sql-client.sh 

Exception in thread "main" org.apache.flink.table.client.SqlClientException: 
Could not read from command line.
at 
org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:221)
at 
org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:179)
at 
org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:121)
at 
org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:114)
at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:169)
at org.apache.flink.table.client.SqlClient.start(SqlClient.java:118)
at 
org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:228)
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:179)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.flink.table.client.config.SqlClientOptions
at 
org.apache.flink.table.client.cli.parser.SqlClientSyntaxHighlighter.highlight(SqlClientSyntaxHighlighter.java:59)
at 
org.jline.reader.impl.LineReaderImpl.getHighlightedBuffer(LineReaderImpl.java:3633)
at 
org.jline.reader.impl.LineReaderImpl.getDisplayedBufferWithPrompts(LineReaderImpl.java:3615)
at 
org.jline.reader.impl.LineReaderImpl.redisplay(LineReaderImpl.java:3554)
at 
org.jline.reader.impl.LineReaderImpl.doCleanup(LineReaderImpl.java:2340)
at 
org.jline.reader.impl.LineReaderImpl.cleanup(LineReaderImpl.java:2332)
at 
org.jline.reader.impl.LineReaderImpl.readLine(LineReaderImpl.java:626)
at 
org.apache.flink.table.client.cli.CliClient.getAndExecuteStatements(CliClient.java:194)
... 7 more
{code}
The issue is due to the old jline jar from Hadoop classpath 
(/usr/lib/hadoop-yarn/lib/jline-3.9.0.jar) taking first precedence. Flink-1.18 
requires jline-3.21.0.jar.

Placing flink-sql-client.jar (bundled with jline-3.21) before the Hadoop 
classpath fixes the issue.
{code:java}
diff --git a/flink-table/flink-sql-client/bin/sql-client.sh 
b/flink-table/flink-sql-client/bin/sql-client.sh
index 24746c5dc8..4ab8635de2 100755
--- a/flink-table/flink-sql-client/bin/sql-client.sh
+++ b/flink-table/flink-sql-client/bin/sql-client.sh
@@ -89,7 +89,7 @@ if [[ "$CC_CLASSPATH" =~ .*flink-sql-client.*.jar ]]; then
 elif [ -n "$FLINK_SQL_CLIENT_JAR" ]; then
 
 # start client with jar
-exec "$JAVA_RUN" $FLINK_ENV_JAVA_OPTS $JVM_ARGS "${log_setting[@]}" 
-classpath "`manglePathList 
"$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS:$FLINK_SQL_CLIENT_JAR"`" 
org.apache.flink.table.client.SqlClient "$@" --jar "`manglePath 
$FLINK_SQL_CLIENT_JAR`"
+exec "$JAVA_RUN" $FLINK_ENV_JAVA_OPTS $JVM_ARGS "${log_setting[@]}" 
-classpath "`manglePathList 
"$CC_CLASSPATH:$FLINK_SQL_CLIENT_JAR:$INTERNAL_HADOOP_CLASSPATHS`" 
org.apache.flink.table.client.SqlClient "$@" --jar "`manglePath 
$FLINK_SQL_CLIENT_JAR`"
 
 # write error message to stderr
 else
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jiabao Sun

Thanks Benchao for the feedback.

For the current proposal, we recommend keeping the default value of 
"table.optimizer.source.predicate" as true, 
and setting the the default value of newly introduced option 
"scan.filter-push-down.enabled" to true as well.

The main purpose of doing this is to maintain consistency with previous 
versions, as whether to perform 
filter pushdown in the old version solely depends on the 
"table.optimizer.source.predicate" option.
That means by default, as long as a TableSource implements the 
SupportsFilterPushDown interface, filter pushdown is allowed.
And it seems that we don't have much benefit in changing the default value of 
"table.optimizer.source.predicate" to false.

Regarding the priority of these two configurations, I believe that 
"table.optimizer.source.predicate" 
takes precedence over "scan.filter-push-down.enabled" and it exhibits the 
following behavior.

1. "table.optimizer.source.predicate" = "true" and 
"scan.filter-push-down.enabled" = "true"
This is the default behavior, allowing filter pushdown for sources.

2. "table.optimizer.source.predicate" = "true" and 
"scan.filter-push-down.enabled" = "false"
Allow the planner to perform predicate pushdown, but individual sources do not 
enable filter pushdown.

3. "table.optimizer.source.predicate" = "false"
Predicate pushdown is not allowed for the planner.
Regardless of the value of the "scan.filter-push-down.enabled" configuration, 
filter pushdown is disabled.
In this scenario, the behavior remains consistent with the old version as well.

From an implementation perspective, setting the priority of 
"scan.filter-push-down.enabled" higher than "table.optimizer.source.predicate" 
is difficult to achieve now.
Because the PushFilterIntoSourceScanRuleBase at the planner level takes 
precedence over the source-level FilterPushDownSpec. 
Only when the PushFilterIntoSourceScanRuleBase is enabled, will the 
Source-level filter pushdown be performed.

Additionally, in my opinion, there doesn't seem to be much benefit in setting a 
higher priority for "scan.filter-push-down.enabled".
It may instead affect compatibility and increase implementation complexity.

WDYT?

Best,
Jiabao

> 2023年10月25日 11:56，Benchao Li  写道：
> 
> I agree with Jane that fine-grained configurations should have higher
> priority than job level configurations.
> 
> For current proposal, we can achieve that:
> - Set "table.optimizer.source.predicate" = "true" to enable by
> default, and set ""scan.filter-push-down.enabled" = "false" to disable
> it per table source
> - Set "table.optimizer.source.predicate" = "false" to disable by
> default, and set ""scan.filter-push-down.enabled" = "true" to enable
> it per table source
> 
> Jane Chan  于2023年10月24日周二 23:55写道：
>> 
>>> 
>>> I believe that the configuration "table.optimizer.source.predicate" has a
>>> higher priority at the planner level than the configuration at the source
>>> level,
>>> and it seems easy to implement now.
>>> 
>> 
>> Correct me if I'm wrong, but I think the fine-grained configuration
>> "scan.filter-push-down.enabled" should have a higher priority because the
>> default value of "table.optimizer.source.predicate" is true. As a result,
>> turning off filter push-down for a specific source will not take effect
>> unless the default value of "table.optimizer.source.predicate" is changed
>> to false, or, alternatively, let users manually set
>> "table.optimizer.source.predicate" to false first and then selectively
>> enable filter push-down for the desired sources, which is less intuitive.
>> WDYT?
>> 
>> Best,
>> Jane
>> 
>> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun 
>> wrote:
>> 
>>> Thanks Jane,
>>> 
>>> I believe that the configuration "table.optimizer.source.predicate" has a
>>> higher priority at the planner level than the configuration at the source
>>> level,
>>> and it seems easy to implement now.
>>> 
>>> Best,
>>> Jiabao
>>> 
>>> 
 2023年10月24日 17:36，Jane Chan  写道：

 Hi Jiabao,

 Thanks for driving this discussion. I have a small question that will
 "scan.filter-push-down.enabled" take precedence over
 "table.optimizer.source.predicate" when the two parameters might conflict
 each other?

 Best,
 Jane

 On Tue, Oct 24, 2023 at 5:05 PM Jiabao Sun >> .invalid>
 wrote:

> Thanks Jark,
> 
> If we only add configuration without adding the enableFilterPushDown
> method in the SupportsFilterPushDown interface,
> each connector would have to handle the same logic in the applyFilters
> method to determine whether filter pushdown is needed.
> This would increase complexity and violate the original behavior of the
> applyFilters method.
> 
> On the contrary, we only need to pass the configuration parameter in the
> newly added enableFilterPushDown method
> to decide whether to perform predicate pushdown.
> 
> I think this approach would be clearer and simpler.
>>

Re: [DISCUSS] FLIP-377: Support configuration to disable filter push down for Table/SQL Sources

2023-10-24 Thread Jane Chan

Hi Jiabao,

Thanks for clarifying this. While by "scan.filter-push-down.enabled takes a
higher priority" I meant that this value should be respected whenever it is
set explicitly.

The conclusion that

2. "table.optimizer.source.predicate" = "true" and
> "scan.filter-push-down.enabled" = "false"
> Allow the planner to perform predicate pushdown, but individual sources do
> not enable filter pushdown.
>

This indicates that the option "scan.filter-push-down.enabled = false" for
an individual source connector does indeed override the global-level
planner settings to make a difference. And thus "has a higher priority".

While for

3. "table.optimizer.source.predicate" = "false"
> Predicate pushdown is not allowed for the planner.
> Regardless of the value of the "scan.filter-push-down.enabled"
> configuration, filter pushdown is disabled.
> In this scenario, the behavior remains consistent with the old version as
> well.
>

I still think "scan.filter-push-down.enabled" should also be respected if
it is enabled for individual connectors. WDYT?

Best,
Jane

On Wed, Oct 25, 2023 at 1:27 PM Jiabao Sun 
wrote:

> Thanks Benchao for the feedback.
>
> For the current proposal, we recommend keeping the default value of
> "table.optimizer.source.predicate" as true,
> and setting the the default value of newly introduced option
> "scan.filter-push-down.enabled" to true as well.
>
> The main purpose of doing this is to maintain consistency with previous
> versions, as whether to perform
> filter pushdown in the old version solely depends on the
> "table.optimizer.source.predicate" option.
> That means by default, as long as a TableSource implements the
> SupportsFilterPushDown interface, filter pushdown is allowed.
> And it seems that we don't have much benefit in changing the default value
> of "table.optimizer.source.predicate" to false.
>
> Regarding the priority of these two configurations, I believe that
> "table.optimizer.source.predicate"
> takes precedence over "scan.filter-push-down.enabled" and it exhibits the
> following behavior.
>
> 1. "table.optimizer.source.predicate" = "true" and
> "scan.filter-push-down.enabled" = "true"
> This is the default behavior, allowing filter pushdown for sources.
>
> 2. "table.optimizer.source.predicate" = "true" and
> "scan.filter-push-down.enabled" = "false"
> Allow the planner to perform predicate pushdown, but individual sources do
> not enable filter pushdown.
>
> 3. "table.optimizer.source.predicate" = "false"
> Predicate pushdown is not allowed for the planner.
> Regardless of the value of the "scan.filter-push-down.enabled"
> configuration, filter pushdown is disabled.
> In this scenario, the behavior remains consistent with the old version as
> well.
>
>
> From an implementation perspective, setting the priority of
> "scan.filter-push-down.enabled" higher than
> "table.optimizer.source.predicate" is difficult to achieve now.
> Because the PushFilterIntoSourceScanRuleBase at the planner level takes
> precedence over the source-level FilterPushDownSpec.
> Only when the PushFilterIntoSourceScanRuleBase is enabled, will the
> Source-level filter pushdown be performed.
>
> Additionally, in my opinion, there doesn't seem to be much benefit in
> setting a higher priority for "scan.filter-push-down.enabled".
> It may instead affect compatibility and increase implementation complexity.
>
> WDYT?
>
> Best,
> Jiabao
>
>
> > 2023年10月25日 11:56，Benchao Li  写道：
> >
> > I agree with Jane that fine-grained configurations should have higher
> > priority than job level configurations.
> >
> > For current proposal, we can achieve that:
> > - Set "table.optimizer.source.predicate" = "true" to enable by
> > default, and set ""scan.filter-push-down.enabled" = "false" to disable
> > it per table source
> > - Set "table.optimizer.source.predicate" = "false" to disable by
> > default, and set ""scan.filter-push-down.enabled" = "true" to enable
> > it per table source
> >
> > Jane Chan  于2023年10月24日周二 23:55写道：
> >>
> >>>
> >>> I believe that the configuration "table.optimizer.source.predicate"
> has a
> >>> higher priority at the planner level than the configuration at the
> source
> >>> level,
> >>> and it seems easy to implement now.
> >>>
> >>
> >> Correct me if I'm wrong, but I think the fine-grained configuration
> >> "scan.filter-push-down.enabled" should have a higher priority because
> the
> >> default value of "table.optimizer.source.predicate" is true. As a
> result,
> >> turning off filter push-down for a specific source will not take effect
> >> unless the default value of "table.optimizer.source.predicate" is
> changed
> >> to false, or, alternatively, let users manually set
> >> "table.optimizer.source.predicate" to false first and then selectively
> >> enable filter push-down for the desired sources, which is less
> intuitive.
> >> WDYT?
> >>
> >> Best,
> >> Jane
> >>
> >> On Tue, Oct 24, 2023 at 6:05 PM Jiabao Sun  .invalid>
> >> wrote:
> >>
> >>> Thanks Jane,
> >>>
> >>> I beli

57 matches

Mail list logo