Re: [VOTE] Apache Flink Table Store 0.2.1, release candidate #2

2022-10-12 Thread Jark Wu
Thanks for driving the release, Jingsong!

I only have one comment about the versions on the GitHub source code
which I don't think needs another RC.

- Checked all POM files point to the same version: *Action Required*
  * Source code under release-0.2.1-rc2 tag on GitHub[1] is still
0.2-SNAPSHOT
  * Source code in staging has appropriate versions
- Build and compile the source code locally: *OK*
- Verified signatures and hashes: *OK*
- Checked no missing artifacts in the staging area: *OK*
- Reviewed the release PR: *OK*
- Ran the quick start with both Flink 1.14 and 1.15: *OK*
- Verified the diff pom files between 0.2.0 vs 0.2.1 to check dependencies:
*OK*


Best,
Jark

[1]:
https://github.com/apache/flink-table-store/blob/release-0.2.1-rc2/pom.xml

On Tue, 11 Oct 2022 at 16:16, Yu Li  wrote:

> +1 (binding)
>
> - Checked release notes: *Action Required*
>   * The fix version of FLINK-29554 is 0.2.1 but still open, please confirm
> whether this should be included or we should move it out of 0.2.1
> - Checked sums and signatures: *OK*
> - Checked the jars in the staging repo: *OK*
> - Checked source distribution doesn't include binaries: *OK*
> - Maven clean install from source: *OK*
> - Checked version consistency in pom files: *OK*
> - Went through the quick start: *OK*
>   * Verified with both flink 1.14.5 and 1.15.1
> - Checked the website updates: *OK*
>   * Minor: left some comments in the PR, please check
>
> Thanks for driving this release, Jingsong!
>
> Best Regards,
> Yu
>
>
> On Sat, 8 Oct 2022 at 10:25, Jingsong Li  wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #2 for the version
> > 0.2.1 of Apache Flink Table Store, as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > **Release Overview**
> >
> > As an overview, the release consists of the following:
> > a) Table Store canonical source distribution to be deployed to the
> > release repository at dist.apache.org
> > b) Table Store binary convenience releases to be deployed to the
> > release repository at dist.apache.org
> > c) Maven artifacts to be deployed to the Maven Central Repository
> >
> > **Staging Areas to Review**
> >
> > The staging areas containing the above mentioned artifacts are as
> follows,
> > for your review:
> > * All artifacts for a) and b) can be found in the corresponding dev
> > repository at dist.apache.org [2]
> > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> >
> > All artifacts are signed with the key
> > 2C2B6A653B07086B65E4369F7C76245E0A318150 [4]
> >
> > Other links for your review:
> > * JIRA release notes [5]
> > * source code tag "release-0.2.1-rc2" [6]
> > * PR to update the website Downloads page to include Table Store links
> [7]
> >
> > **Vote Duration**
> >
> > The voting time will run for at least 72 hours.
> > It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >
> > Best,
> > Jingsong Lee
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> > [2]
> >
> https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.2.1-rc2/
> > [3]
> > https://repository.apache.org/content/repositories/orgapacheflink-1539/
> > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [5]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352257
> > [6] https://github.com/apache/flink-table-store/tree/release-0.2.1-rc2
> > [7] https://github.com/apache/flink-web/pull/571
> >
>


[jira] [Created] (FLINK-29592) Add Transformer and Estimator for RobustScaler

2022-10-12 Thread Jiang Xin (Jira)
Jiang Xin created FLINK-29592:
-

 Summary: Add Transformer and Estimator for RobustScaler
 Key: FLINK-29592
 URL: https://issues.apache.org/jira/browse/FLINK-29592
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: Jiang Xin


Add Transformer and Estimator for RobustScaler.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: [DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on de-serialized record

2022-10-12 Thread Sergey Troshkov
Hi everyone! What is the status of this FLIP? I am really interested
in moving this forward.

Best regards,
Sergey

On 2022/01/04 07:04:01 Dong Lin wrote:
> Hi all,
>
> We created FLIP-208: Update KafkaSource to detect EOF based on
> de-serialized records. Please find the KIP wiki in the link
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-208%3A+Update+KafkaSource+to+detect+EOF+based+on+de-serialized+records
> .
>
> This FLIP aims to address the use-case where users need to stop a Flink job
> gracefully based on the content of de-serialized records observed in the
> KafkaSource. This feature is needed by users who currently depend on
> KafkaDeserializationSchema::isEndOfStream() to migrate their Flink job from
> FlinkKafkaConsumer to KafkaSource.
>
> Could you help review this FLIP when you get time? Your comments are
> appreciated!
>
> Cheers,
> Dong
>


Re: Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-12 Thread Márton Balassi
+1 from me, thanks for the clarification.

On Wed, Oct 12, 2022 at 7:57 AM Péter Váry 
wrote:

> Thanks Abid,
>
> Count me in, and drop a note, if I can help in any way.
>
> Thanks,
> Peter
>
> On Tue, Oct 11, 2022, 20:13  wrote:
>
> > Hi Martijn,
> >
> > Yes catalog integration exists and catalogs can be created using Flink
> > SQL.
> >
> >
> https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs
> > has more details.
> > We may need some discussion within Iceberg community but based on the
> > current iceberg-flink code structure we are looking to externalize this
> as
> > well.
> >
> > Thanks
> > Abid
> >
> >
> > On 2022/10/11 08:24:44 Martijn Visser wrote:
> > > Hi Abid,
> > >
> > > Thanks for the FLIP. I have a question about Iceberg's Catalog: has
> that
> > > integration between Flink and Iceberg been created already and are you
> > > looking to externalize that as well?
> > >
> > > Thanks,
> > >
> > > Martijn
> > >
> > > On Tue, Oct 11, 2022 at 12:14 AM  wrote:
> > >
> > > > Hi Marton,
> > > >
> > > > Yes, we are initiating this as part of the Externalize Flink
> Connectors
> > > > effort. Plan is to externalize the existing Flink connector from
> > Iceberg
> > > > repo into a separate repo under the Flink umbrella.
> > > >
> > > > Sorry about the doc permissions! I was able to create a FLIP-267:
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > <
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > >
> > > > Lets use that to discuss.
> > > >
> > > > Thanks
> > > > Abid
> > > >
> > > > On 2022/10/10 12:57:32 Márton Balassi wrote:
> > > > > Hi Abid,
> > > > >
> > > > > Just to clarify does your suggestion mean that the Iceberg
> community
> > > > would
> > > > > like to remove the iceberg-flink connector from the Iceberg
> codebase
> > and
> > > > > maintain it under Flink instead? A new separate repo under the
> Flink
> > > > > project umbrella given the current existing effort to extract
> > connectors
> > > > to
> > > > > their individual repos (externalize) makes sense to me.
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo
> > > > >
> > > > > Best,
> > > > > Marton
> > > > >
> > > > >
> > > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li 
> wrote:
> > > > >
> > > > > > Thanks Abid for driving.
> > > > > >
> > > > > > +1 for this.
> > > > > >
> > > > > > Can you open the permissions for
> > > > > >
> > > > > >
> > > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > > ?
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I would like to start a discussion about contributing Iceberg
> > Flink
> > > > > > Connector to Flink.
> > > > > > >
> > > > > > > I created a doc <
> > > > > >
> > > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > >
> > > > > > with all the details following the Flink Connector template as I
> > don’t
> > > > have
> > > > > > permissions to create a FLIP yet.
> > > > > > > High level details are captured below:
> > > > > > >
> > > > > > > Motivation:
> > > > > > >
> > > > > > > This FLIP aims to contribute the existing Apache Iceberg Flink
> > > > Connector
> > > > > > to Flink.
> > > > > > >
> > > > > > > Apache Iceberg is an open table format for huge analytic
> > datasets.
> > > > > > Iceberg adds tables to compute engines including Spark, Trino,
> > > > PrestoDB,
> > > > > > Flink, Hive and Impala using a high-performance table format that
> > works
> > > > > > just like a SQL table.
> > > > > > > Iceberg avoids unpleasant surprises. Schema evolution works and
> > won’t
> > > > > > inadvertently un-delete data. Users don’t need to know about
> > > > partitioning
> > > > > > to get fast queries. Iceberg was designed to solve correctness
> > > > problems in
> > > > > > eventually-consistent cloud object stores.
> > > > > > >
> > > > > > > Iceberg supports both Flink’s DataStream API and Table API.
> > Based on
> > > > the
> > > > > > guideline of the Flink community, only the latest 2 minor
> versions
> > are
> > > > > > actively maintained. See the Multi-Engine Support#apache-flink
> for
> > > > further
> > > > > > details.
> > > > > > >
> > > > > > >
> > > > > > > Iceberg connector supports:
> > > > > > >
> > > > > > > • Source: detailed Source design <
> > > > > >
> > > >
> >
> https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#
> > > > >,
> > > > > > based on FLIP-27
> > > > > > > • Sink: detailed Sink design and interfaces used <
> > > > > >
> > > >
> >
> https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit#
> > > > > > >
> > > > > > > • Usa

[jira] [Created] (FLINK-29593) Add Approximate Quantile Util

2022-10-12 Thread Jiang Xin (Jira)
Jiang Xin created FLINK-29593:
-

 Summary: Add Approximate Quantile Util
 Key: FLINK-29593
 URL: https://issues.apache.org/jira/browse/FLINK-29593
 Project: Flink
  Issue Type: Sub-task
  Components: Library / Machine Learning
Reporter: Jiang Xin


Add a helper to compute an approximate quantile summary of distributed or 
streaming datasets. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29594) RMQSourceITCase.testMessageDelivery timed out

2022-10-12 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-29594:
-

 Summary: RMQSourceITCase.testMessageDelivery timed out
 Key: FLINK-29594
 URL: https://issues.apache.org/jira/browse/FLINK-29594
 Project: Flink
  Issue Type: Bug
  Components: Connectors/ RabbitMQ
Affects Versions: 1.17.0
Reporter: Matthias Pohl


[This 
build|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41843&view=logs&j=fc7981dc-d266-55b0-5fff-f0d0a2294e36&t=1a9b228a-3e0e-598f-fc81-c321539dfdbf&l=38211]
 failed (not exclusively) due to {{RMQSourceITCase.testMessageDelivery}} timing 
out.

I wasn't able to reproduce it locally with 200 test runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


RE: [DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on de-serialized record

2022-10-12 Thread Sergey Troshkov
Hi everyone! What is the status of this FLIP? I am really interested
in moving this forward

Best regards,
Sergey

On 2022/01/04 07:04:01 Dong Lin wrote:
> Hi all,
>
> We created FLIP-208: Update KafkaSource to detect EOF based on
> de-serialized records. Please find the KIP wiki in the link
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-208%3A+Update+KafkaSource+to+detect+EOF+based+on+de-serialized+records
> .
>
> This FLIP aims to address the use-case where users need to stop a Flink job
> gracefully based on the content of de-serialized records observed in the
> KafkaSource. This feature is needed by users who currently depend on
> KafkaDeserializationSchema::isEndOfStream() to migrate their Flink job from
> FlinkKafkaConsumer to KafkaSource.
>
> Could you help review this FLIP when you get time? Your comments are
> appreciated!
>
> Cheers,
> Dong
>


Re: Flink falls back on to kryo serializer for GenericTypes

2022-10-12 Thread Chesnay Schepler
There's no alternative to Kryo for generic types, apart from 
implementing your Flink serializer (but technically at that point the 
type is no longer treated as a generic type).


enableForAvro only forces Avro to be used for POJO types.

On 11/10/2022 09:29, Sucheth S wrote:

Hello,

How to avoid flink's kryo serializer for GenericTypes ? Kryo is having 
some performance issues.


Tried below but no luck.
env.getConfig().disableForceKryo();
env.getConfig().enableForceAvro();
Tried this - env.getConfig().disableGenericTypes();
getting - Generic types have been disabled in the ExecutionConfig and type 
org.apache.avro.generic.GenericRecord is treated as a generic type


Regards,
Sucheth Shivakumar
website: https://sucheths.com
mobile : +1(650)-576-8050
San Mateo, United States




[jira] [Created] (FLINK-29595) Add Estimator and Transformer for ChiSqSelector

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29595:


 Summary: Add Estimator and Transformer for ChiSqSelector
 Key: FLINK-29595
 URL: https://issues.apache.org/jira/browse/FLINK-29595
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add the Estimator and Transformer for ChiSqSelector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29596) Add Estimator and Transformer for RFormula

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29596:


 Summary: Add Estimator and Transformer for RFormula
 Key: FLINK-29596
 URL: https://issues.apache.org/jira/browse/FLINK-29596
 Project: Flink
  Issue Type: New Feature
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Estimator and Transformer for RFormula.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


回复: [DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on de-serialized record

2022-10-12 Thread 阮 航
Hi, community.

I have done some research about FLIP-208 before. I am looking forward to help 
to complete this FLIP and take this ticket.

Best regards,
Hang


发件人: Sergey Troshkov 
发送时间: 2022年10月12日 17:04
收件人: dev@flink.apache.org 
主题: RE: [DISCUSS] FLIP-208: Update KafkaSource to detect EOF based on 
de-serialized record

Hi everyone! What is the status of this FLIP? I am really interested
in moving this forward

Best regards,
Sergey

On 2022/01/04 07:04:01 Dong Lin wrote:
> Hi all,
>
> We created FLIP-208: Update KafkaSource to detect EOF based on
> de-serialized records. Please find the KIP wiki in the link
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-208%3A+Update+KafkaSource+to+detect+EOF+based+on+de-serialized+records
> .
>
> This FLIP aims to address the use-case where users need to stop a Flink job
> gracefully based on the content of de-serialized records observed in the
> KafkaSource. This feature is needed by users who currently depend on
> KafkaDeserializationSchema::isEndOfStream() to migrate their Flink job from
> FlinkKafkaConsumer to KafkaSource.
>
> Could you help review this FLIP when you get time? Your comments are
> appreciated!
>
> Cheers,
> Dong
>


[jira] [Created] (FLINK-29597) Add Estimator and Transformer for QuantileDiscretizer

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29597:


 Summary: Add Estimator and Transformer for QuantileDiscretizer
 Key: FLINK-29597
 URL: https://issues.apache.org/jira/browse/FLINK-29597
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Estimator and Transformer for QuantileDiscretizer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29598) Add Estimator and Transformer for Imputer

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29598:


 Summary: Add Estimator and Transformer for Imputer
 Key: FLINK-29598
 URL: https://issues.apache.org/jira/browse/FLINK-29598
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Estimator and Transformer for Imputer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29599) Add Estimator and Transformer for MinHashLSH

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29599:


 Summary: Add Estimator and Transformer for MinHashLSH
 Key: FLINK-29599
 URL: https://issues.apache.org/jira/browse/FLINK-29599
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Estimator and Transformer for MinHashLSH.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29601) Add Estimator and Transformer for UnivariateFeatureSelector

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29601:


 Summary: Add Estimator and Transformer for 
UnivariateFeatureSelector
 Key: FLINK-29601
 URL: https://issues.apache.org/jira/browse/FLINK-29601
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Estimator and Transformer for UnivariateFeatureSelector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29600) Add Estimator and Transformer for BucketedRandomProjectionLSH

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29600:


 Summary: Add Estimator and Transformer for 
BucketedRandomProjectionLSH
 Key: FLINK-29600
 URL: https://issues.apache.org/jira/browse/FLINK-29600
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Estimator and Transformer for BucketedRandomProjectionLSH.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29603) Add Transformer for StopWordsRemover

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29603:


 Summary: Add Transformer for StopWordsRemover
 Key: FLINK-29603
 URL: https://issues.apache.org/jira/browse/FLINK-29603
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Transformer for StopWordsRemover.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29602) Add Transformer for SQLTransformer

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29602:


 Summary: Add Transformer for SQLTransformer
 Key: FLINK-29602
 URL: https://issues.apache.org/jira/browse/FLINK-29602
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Transformer for SQLTransformer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29604) Add Estimator and Transformer for CountVectorizer

2022-10-12 Thread Yunfeng Zhou (Jira)
Yunfeng Zhou created FLINK-29604:


 Summary: Add Estimator and Transformer for CountVectorizer
 Key: FLINK-29604
 URL: https://issues.apache.org/jira/browse/FLINK-29604
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Affects Versions: ml-2.2.0
Reporter: Yunfeng Zhou
 Fix For: ml-2.2.0


Add Estimator and Transformer for CountVectorizer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Apache Flink Table Store 0.2.1, release candidate #2

2022-10-12 Thread Jingsong Li
Thanks Yu,

FLINK-29554 has been adjusted.

Thanks Jark,

I updated the tag of rc2. This is a process problem. I have updated [1].

[1] 
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Table+Store+Release

Best,
Jingsong

On Wed, Oct 12, 2022 at 3:10 PM Jark Wu  wrote:
>
> Thanks for driving the release, Jingsong!
>
> I only have one comment about the versions on the GitHub source code
> which I don't think needs another RC.
>
> - Checked all POM files point to the same version: *Action Required*
>   * Source code under release-0.2.1-rc2 tag on GitHub[1] is still
> 0.2-SNAPSHOT
>   * Source code in staging has appropriate versions
> - Build and compile the source code locally: *OK*
> - Verified signatures and hashes: *OK*
> - Checked no missing artifacts in the staging area: *OK*
> - Reviewed the release PR: *OK*
> - Ran the quick start with both Flink 1.14 and 1.15: *OK*
> - Verified the diff pom files between 0.2.0 vs 0.2.1 to check dependencies:
> *OK*
>
>
> Best,
> Jark
>
> [1]:
> https://github.com/apache/flink-table-store/blob/release-0.2.1-rc2/pom.xml
>
> On Tue, 11 Oct 2022 at 16:16, Yu Li  wrote:
>
> > +1 (binding)
> >
> > - Checked release notes: *Action Required*
> >   * The fix version of FLINK-29554 is 0.2.1 but still open, please confirm
> > whether this should be included or we should move it out of 0.2.1
> > - Checked sums and signatures: *OK*
> > - Checked the jars in the staging repo: *OK*
> > - Checked source distribution doesn't include binaries: *OK*
> > - Maven clean install from source: *OK*
> > - Checked version consistency in pom files: *OK*
> > - Went through the quick start: *OK*
> >   * Verified with both flink 1.14.5 and 1.15.1
> > - Checked the website updates: *OK*
> >   * Minor: left some comments in the PR, please check
> >
> > Thanks for driving this release, Jingsong!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Sat, 8 Oct 2022 at 10:25, Jingsong Li  wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #2 for the version
> > > 0.2.1 of Apache Flink Table Store, as follows:
> > >
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > **Release Overview**
> > >
> > > As an overview, the release consists of the following:
> > > a) Table Store canonical source distribution to be deployed to the
> > > release repository at dist.apache.org
> > > b) Table Store binary convenience releases to be deployed to the
> > > release repository at dist.apache.org
> > > c) Maven artifacts to be deployed to the Maven Central Repository
> > >
> > > **Staging Areas to Review**
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a) and b) can be found in the corresponding dev
> > > repository at dist.apache.org [2]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> > >
> > > All artifacts are signed with the key
> > > 2C2B6A653B07086B65E4369F7C76245E0A318150 [4]
> > >
> > > Other links for your review:
> > > * JIRA release notes [5]
> > > * source code tag "release-0.2.1-rc2" [6]
> > > * PR to update the website Downloads page to include Table Store links
> > [7]
> > >
> > > **Vote Duration**
> > >
> > > The voting time will run for at least 72 hours.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> > > [2]
> > >
> > https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.2.1-rc2/
> > > [3]
> > > https://repository.apache.org/content/repositories/orgapacheflink-1539/
> > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [5]
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352257
> > > [6] https://github.com/apache/flink-table-store/tree/release-0.2.1-rc2
> > > [7] https://github.com/apache/flink-web/pull/571
> > >
> >


Re: [VOTE] Apache Flink Table Store 0.2.1, release candidate #2

2022-10-12 Thread Jingsong Li
+1 (binding)

- Checked sums and signatures: *OK*
- Checked the jars in the staging repo: *OK*
- Checked source distribution doesn't include binaries: *OK*
- Maven clean install from source: *OK*

Best,
Jingsong

On Wed, Oct 12, 2022 at 6:10 PM Jingsong Li  wrote:
>
> Thanks Yu,
>
> FLINK-29554 has been adjusted.
>
> Thanks Jark,
>
> I updated the tag of rc2. This is a process problem. I have updated [1].
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Table+Store+Release
>
> Best,
> Jingsong
>
> On Wed, Oct 12, 2022 at 3:10 PM Jark Wu  wrote:
> >
> > Thanks for driving the release, Jingsong!
> >
> > I only have one comment about the versions on the GitHub source code
> > which I don't think needs another RC.
> >
> > - Checked all POM files point to the same version: *Action Required*
> >   * Source code under release-0.2.1-rc2 tag on GitHub[1] is still
> > 0.2-SNAPSHOT
> >   * Source code in staging has appropriate versions
> > - Build and compile the source code locally: *OK*
> > - Verified signatures and hashes: *OK*
> > - Checked no missing artifacts in the staging area: *OK*
> > - Reviewed the release PR: *OK*
> > - Ran the quick start with both Flink 1.14 and 1.15: *OK*
> > - Verified the diff pom files between 0.2.0 vs 0.2.1 to check dependencies:
> > *OK*
> >
> >
> > Best,
> > Jark
> >
> > [1]:
> > https://github.com/apache/flink-table-store/blob/release-0.2.1-rc2/pom.xml
> >
> > On Tue, 11 Oct 2022 at 16:16, Yu Li  wrote:
> >
> > > +1 (binding)
> > >
> > > - Checked release notes: *Action Required*
> > >   * The fix version of FLINK-29554 is 0.2.1 but still open, please confirm
> > > whether this should be included or we should move it out of 0.2.1
> > > - Checked sums and signatures: *OK*
> > > - Checked the jars in the staging repo: *OK*
> > > - Checked source distribution doesn't include binaries: *OK*
> > > - Maven clean install from source: *OK*
> > > - Checked version consistency in pom files: *OK*
> > > - Went through the quick start: *OK*
> > >   * Verified with both flink 1.14.5 and 1.15.1
> > > - Checked the website updates: *OK*
> > >   * Minor: left some comments in the PR, please check
> > >
> > > Thanks for driving this release, Jingsong!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Sat, 8 Oct 2022 at 10:25, Jingsong Li  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #2 for the version
> > > > 0.2.1 of Apache Flink Table Store, as follows:
> > > >
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > **Release Overview**
> > > >
> > > > As an overview, the release consists of the following:
> > > > a) Table Store canonical source distribution to be deployed to the
> > > > release repository at dist.apache.org
> > > > b) Table Store binary convenience releases to be deployed to the
> > > > release repository at dist.apache.org
> > > > c) Maven artifacts to be deployed to the Maven Central Repository
> > > >
> > > > **Staging Areas to Review**
> > > >
> > > > The staging areas containing the above mentioned artifacts are as
> > > follows,
> > > > for your review:
> > > > * All artifacts for a) and b) can be found in the corresponding dev
> > > > repository at dist.apache.org [2]
> > > > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> > > >
> > > > All artifacts are signed with the key
> > > > 2C2B6A653B07086B65E4369F7C76245E0A318150 [4]
> > > >
> > > > Other links for your review:
> > > > * JIRA release notes [5]
> > > > * source code tag "release-0.2.1-rc2" [6]
> > > > * PR to update the website Downloads page to include Table Store links
> > > [7]
> > > >
> > > > **Vote Duration**
> > > >
> > > > The voting time will run for at least 72 hours.
> > > > It is adopted by majority approval, with at least 3 PMC affirmative
> > > votes.
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > [1]
> > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> > > > [2]
> > > >
> > > https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.2.1-rc2/
> > > > [3]
> > > > https://repository.apache.org/content/repositories/orgapacheflink-1539/
> > > > [4] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [5]
> > > >
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352257
> > > > [6] https://github.com/apache/flink-table-store/tree/release-0.2.1-rc2
> > > > [7] https://github.com/apache/flink-web/pull/571
> > > >
> > >


Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-12 Thread Qingsheng Ren
Thanks Chesnay for the reply. +1 for making a unified and clearer
metric definition distinguishing internal and external data transfers.
As you described, having IO in operators is quite common such as
dimension tables in Table/SQL API. This definitely deserves a FLIP and
an overall design.

However I think it's necessary to change the metric back to
numRecordsOut instead of sticking with numRecordsSend in 1.15 and
1.16. The most important argument is for compatibility as I mentioned
in my previous email, otherwise all users have to modify their configs
of metric systems after upgrading to Flink 1.15+, and all custom
connectors have to change their implementations to migrate to the new
metric name. I believe other ones participating and approving this
proposal share the same concern about compatibility too. Also
considering this issue is blocking the release of 1.16, maybe we could
fix this asap, and as for defining a new metric for internal data
transfers we can have an in-depth discussion later. WDYT?

Best,
Qingsheng

On Tue, Oct 11, 2022 at 6:06 PM Chesnay Schepler  wrote:
>
> Currently I think that would be a mistake.
>
> Ultimately what we have here is the culmination of us never really 
> considering how the numRecordsOut metric should behave for operators that 
> emit data to other operators _and_ external systems. This goes beyond sinks.
> This even applies to numRecordsIn, for cases where functions query/write data 
> from/to the outside, (e.g., Async IO).
>
> Having 2 separate metrics for that, 1 exclusively for internal data 
> transfers, and 1 exclusively for external data transfers, is the only way to 
> get a consistent metric definition in the long-run.
> We can jump back-and-forth now or just commit to it.
>
> I don't think we can really judge this based on FLIP-33. It was IIRC written 
> before the two phase sinks were added, which heavily blurred the lines of 
> what a sink even is. Because it definitely is _not_ the last operator in a 
> chain anymore.
>
> What I would suggest is to stick with what we got (although I despise the 
> name numRecordsSend), and alias the numRecordsOut metric for all 
> non-TwoPhaseCommittingSink.
>
> On 11/10/2022 05:54, Qingsheng Ren wrote:
>
> Thanks for the details Chesnay!
>
> By “alias” I mean to respect the original definition made in FLIP-33 for 
> numRecordsOut, which is the number of records written to the external system, 
> and keep numRecordsSend as the same value as numRecordsOut for compatibility.
>
> I think keeping numRecordsOut for the output to the external system is more 
> intuitive to end users because in most cases the metric of data flow output 
> is more essential. I agree with you that a new metric is required, but 
> considering compatibility and users’ intuition I prefer to keep the initial 
> definition of numRecordsOut in FLIP-33 and name a new metric for sink 
> writer’s output to downstream operators. This might be against consistency 
> with metrics in other operators in Flink but maybe it’s acceptable to have 
> the sink as a special case.
>
> Best,
> Qingsheng
> On Oct 10, 2022, 19:13 +0800, Chesnay Schepler , wrote:
>
> > I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut
>
> But that's not possible. If it were that simple there would have never been a 
> need to introduce another metric in the first place.
>
> It's a rather fundamental issue with how the new sinks work, in that they 
> emit data to the external system (usually considered as "numRecordsOut" of 
> sinks) while _also_ sending data to a downstream operator (usually considered 
> as "numRecordsOut" of tasks).
> The original issue was that the numRecordsOut of the sink counted both (which 
> is completely wrong).
>
> A new metric was always required; otherwise you inevitably end up breaking 
> some semantic.
> Adding a new metric for what the sink writes to the external system is, for 
> better or worse, more consistent with how these metrics usually work in Flink.
>
> On 10/10/2022 12:45, Qingsheng Ren wrote:
>
> Thanks everyone for joining the discussion!
>
> > Do you have any idea what has happened in the process here?
>
> The discussion in this PR [1] shows some details and could be helpful to 
> understand the original motivation of the renaming. We do have a test case 
> for guarding metrics but unfortunaly the case was also modified so the 
> defense was broken.
>
> I think the reason why both the developer and the reviewer forgot to trigger 
> an discussion and gave a green pass on the change is that metrics are quite 
> “trivial” to be noticed as public APIs. As mentioned by Martijn I couldn’t 
> find a place noting that metrics are public APIs and should be treated 
> carefully while contributing and reviewing.
>
> IMHO three actions could be made to prevent this kind of changes in the 
> future:
>
> a. Add test case for metrics (which we already have in SinkMetricsITCase)
> b. We emphasize that any public-interface breaking changes 

Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-12 Thread Qingsheng Ren
As a supplement, considering it could be a big reconstruction
redefining internal and external traffic and touching metric names in
almost all operators, this requires a lot of discussions and we might
do it finally in Flink 2.0. I think compatibility is a bigger blocker
in front of us, as the output of sink is a metric that users care a
lot about.

Thanks,
Qingsheng

On Wed, Oct 12, 2022 at 6:20 PM Qingsheng Ren  wrote:
>
> Thanks Chesnay for the reply. +1 for making a unified and clearer
> metric definition distinguishing internal and external data transfers.
> As you described, having IO in operators is quite common such as
> dimension tables in Table/SQL API. This definitely deserves a FLIP and
> an overall design.
>
> However I think it's necessary to change the metric back to
> numRecordsOut instead of sticking with numRecordsSend in 1.15 and
> 1.16. The most important argument is for compatibility as I mentioned
> in my previous email, otherwise all users have to modify their configs
> of metric systems after upgrading to Flink 1.15+, and all custom
> connectors have to change their implementations to migrate to the new
> metric name. I believe other ones participating and approving this
> proposal share the same concern about compatibility too. Also
> considering this issue is blocking the release of 1.16, maybe we could
> fix this asap, and as for defining a new metric for internal data
> transfers we can have an in-depth discussion later. WDYT?
>
> Best,
> Qingsheng
>
> On Tue, Oct 11, 2022 at 6:06 PM Chesnay Schepler  wrote:
> >
> > Currently I think that would be a mistake.
> >
> > Ultimately what we have here is the culmination of us never really 
> > considering how the numRecordsOut metric should behave for operators that 
> > emit data to other operators _and_ external systems. This goes beyond sinks.
> > This even applies to numRecordsIn, for cases where functions query/write 
> > data from/to the outside, (e.g., Async IO).
> >
> > Having 2 separate metrics for that, 1 exclusively for internal data 
> > transfers, and 1 exclusively for external data transfers, is the only way 
> > to get a consistent metric definition in the long-run.
> > We can jump back-and-forth now or just commit to it.
> >
> > I don't think we can really judge this based on FLIP-33. It was IIRC 
> > written before the two phase sinks were added, which heavily blurred the 
> > lines of what a sink even is. Because it definitely is _not_ the last 
> > operator in a chain anymore.
> >
> > What I would suggest is to stick with what we got (although I despise the 
> > name numRecordsSend), and alias the numRecordsOut metric for all 
> > non-TwoPhaseCommittingSink.
> >
> > On 11/10/2022 05:54, Qingsheng Ren wrote:
> >
> > Thanks for the details Chesnay!
> >
> > By “alias” I mean to respect the original definition made in FLIP-33 for 
> > numRecordsOut, which is the number of records written to the external 
> > system, and keep numRecordsSend as the same value as numRecordsOut for 
> > compatibility.
> >
> > I think keeping numRecordsOut for the output to the external system is more 
> > intuitive to end users because in most cases the metric of data flow output 
> > is more essential. I agree with you that a new metric is required, but 
> > considering compatibility and users’ intuition I prefer to keep the initial 
> > definition of numRecordsOut in FLIP-33 and name a new metric for sink 
> > writer’s output to downstream operators. This might be against consistency 
> > with metrics in other operators in Flink but maybe it’s acceptable to have 
> > the sink as a special case.
> >
> > Best,
> > Qingsheng
> > On Oct 10, 2022, 19:13 +0800, Chesnay Schepler , wrote:
> >
> > > I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut
> >
> > But that's not possible. If it were that simple there would have never been 
> > a need to introduce another metric in the first place.
> >
> > It's a rather fundamental issue with how the new sinks work, in that they 
> > emit data to the external system (usually considered as "numRecordsOut" of 
> > sinks) while _also_ sending data to a downstream operator (usually 
> > considered as "numRecordsOut" of tasks).
> > The original issue was that the numRecordsOut of the sink counted both 
> > (which is completely wrong).
> >
> > A new metric was always required; otherwise you inevitably end up breaking 
> > some semantic.
> > Adding a new metric for what the sink writes to the external system is, for 
> > better or worse, more consistent with how these metrics usually work in 
> > Flink.
> >
> > On 10/10/2022 12:45, Qingsheng Ren wrote:
> >
> > Thanks everyone for joining the discussion!
> >
> > > Do you have any idea what has happened in the process here?
> >
> > The discussion in this PR [1] shows some details and could be helpful to 
> > understand the original motivation of the renaming. We do have a test case 
> > for guarding metrics but unfortunaly the case was also modifi

[jira] [Created] (FLINK-29605) Create a bounded version of SourceTestSuiteBase#testSourceMetrics

2022-10-12 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-29605:


 Summary: Create a bounded version of 
SourceTestSuiteBase#testSourceMetrics
 Key: FLINK-29605
 URL: https://issues.apache.org/jira/browse/FLINK-29605
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Reporter: Etienne Chauchot


_SourceTestSuiteBase#testSourceMetrics_ is targeted to unbounded sources. 
Metrics tests still make sense with bounded sources.  For the bounded case, 
_SourceTestSuiteBase#testSourceMetrics,_ [killing the 
job|https://github.com/apache/flink/blob/4934bd69052f2a69e8021d337373f4480c802359/flink-test-utils-parent/flink-connector-test-utils/src/main/java/org/apache/flink/connector/testframe/testsuites/SourceTestSuiteBase.java#L469]
  is not needed and in some cases (fast job) the kill is executed when the job 
is already in the finished state leading to a confusing exception. This is why 
for the bounded case the killJob should be removed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-12 Thread Jing Ge
Hi Qingsheng,

Just want to make sure we are on the same page. Are you suggesting
switching the naming between "numXXXSend" and "numXXXOut" or reverting all
the changes we did with FLINK-26126 and FLINK-26492?

For the naming switch, please pay attention that the behaviour has been
changed since we introduced SinkV2[1]. So, please be aware of different
numbers(behaviour change) even with the same metrics name. Sticking with
the old name with the new behaviour (very bad idea, IMHO) might seem like
saving the effort in the first place, but it might end up with monitoring
unexpected metrics, which is even worse for users, i.e. I didn't change
anything, but something has been broken since the last update.

For reverting, I am not sure how to fix the issue mentioned in FLINK-26126
after reverting all changes. Like Chesnay has already pointed out, with
SinkV2 we have two different output lines - one with the external system
and the other with the downstream operator. In this case, "numXXXSend" is
rather a new metric than a replacement of "numXXXOut". The "numXXXOut"
metric can still be used, depending on what the user wants to monitor.


Best regards,
Jing

[1]
https://github.com/apache/flink/blob/51fc20db30d001a95de95b3b9993eeb06f558f6c/flink-metrics/flink-metrics-core/src/main/java/org/apache/flink/metrics/groups/SinkWriterMetricGroup.java#L48


On Wed, Oct 12, 2022 at 12:48 PM Qingsheng Ren  wrote:

> As a supplement, considering it could be a big reconstruction
> redefining internal and external traffic and touching metric names in
> almost all operators, this requires a lot of discussions and we might
> do it finally in Flink 2.0. I think compatibility is a bigger blocker
> in front of us, as the output of sink is a metric that users care a
> lot about.
>
> Thanks,
> Qingsheng
>
> On Wed, Oct 12, 2022 at 6:20 PM Qingsheng Ren  wrote:
> >
> > Thanks Chesnay for the reply. +1 for making a unified and clearer
> > metric definition distinguishing internal and external data transfers.
> > As you described, having IO in operators is quite common such as
> > dimension tables in Table/SQL API. This definitely deserves a FLIP and
> > an overall design.
> >
> > However I think it's necessary to change the metric back to
> > numRecordsOut instead of sticking with numRecordsSend in 1.15 and
> > 1.16. The most important argument is for compatibility as I mentioned
> > in my previous email, otherwise all users have to modify their configs
> > of metric systems after upgrading to Flink 1.15+, and all custom
> > connectors have to change their implementations to migrate to the new
> > metric name. I believe other ones participating and approving this
> > proposal share the same concern about compatibility too. Also
> > considering this issue is blocking the release of 1.16, maybe we could
> > fix this asap, and as for defining a new metric for internal data
> > transfers we can have an in-depth discussion later. WDYT?
> >
> > Best,
> > Qingsheng
> >
> > On Tue, Oct 11, 2022 at 6:06 PM Chesnay Schepler 
> wrote:
> > >
> > > Currently I think that would be a mistake.
> > >
> > > Ultimately what we have here is the culmination of us never really
> considering how the numRecordsOut metric should behave for operators that
> emit data to other operators _and_ external systems. This goes beyond sinks.
> > > This even applies to numRecordsIn, for cases where functions
> query/write data from/to the outside, (e.g., Async IO).
> > >
> > > Having 2 separate metrics for that, 1 exclusively for internal data
> transfers, and 1 exclusively for external data transfers, is the only way
> to get a consistent metric definition in the long-run.
> > > We can jump back-and-forth now or just commit to it.
> > >
> > > I don't think we can really judge this based on FLIP-33. It was IIRC
> written before the two phase sinks were added, which heavily blurred the
> lines of what a sink even is. Because it definitely is _not_ the last
> operator in a chain anymore.
> > >
> > > What I would suggest is to stick with what we got (although I despise
> the name numRecordsSend), and alias the numRecordsOut metric for all
> non-TwoPhaseCommittingSink.
> > >
> > > On 11/10/2022 05:54, Qingsheng Ren wrote:
> > >
> > > Thanks for the details Chesnay!
> > >
> > > By “alias” I mean to respect the original definition made in FLIP-33
> for numRecordsOut, which is the number of records written to the external
> system, and keep numRecordsSend as the same value as numRecordsOut for
> compatibility.
> > >
> > > I think keeping numRecordsOut for the output to the external system is
> more intuitive to end users because in most cases the metric of data flow
> output is more essential. I agree with you that a new metric is required,
> but considering compatibility and users’ intuition I prefer to keep the
> initial definition of numRecordsOut in FLIP-33 and name a new metric for
> sink writer’s output to downstream operators. This might be against
> consistency with met

[jira] [Created] (FLINK-29606) Dynamic Execution Environment

2022-10-12 Thread Hamid EL MAAZOUZ (Jira)
Hamid EL MAAZOUZ created FLINK-29606:


 Summary: Dynamic Execution Environment
 Key: FLINK-29606
 URL: https://issues.apache.org/jira/browse/FLINK-29606
 Project: Flink
  Issue Type: New Feature
  Components: API / Core
Reporter: Hamid EL MAAZOUZ


The goal of this feature ticket is to discuss the possibility of supporting 
adding/removing sources/sinks to a running Flink job.

If it's possible, then newbie contributors like myself need to know where to 
start from ^^

This feature will be useful cases where we have dynamic collections of 
source/sink configurations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29607) Simplify controller flow by introducing FlinkControllerContext

2022-10-12 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-29607:
--

 Summary: Simplify controller flow by introducing 
FlinkControllerContext
 Key: FLINK-29607
 URL: https://issues.apache.org/jira/browse/FLINK-29607
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
Assignee: Gyula Fora


Currently contextual information such as observer/reconciler implementations, 
flink service, status recorder, generated configs are created/passed around in 
many pieces leading to a lot of overall code duplication in the system.

We should introduce context object that capture these bits that could be reused 
across the controller flow to simplify the logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29608) 使用 pyflink1.17dev datastream 经过reduce 后 add_sink(FlinkKafkaProducer()) 有问题

2022-10-12 Thread Jira
王伟 created FLINK-29608:
--

 Summary: 使用 pyflink1.17dev   datastream 经过reduce 后 
add_sink(FlinkKafkaProducer()) 有问题
 Key: FLINK-29608
 URL: https://issues.apache.org/jira/browse/FLINK-29608
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: shaded-16.0
 Environment: python 3.8

pyflink 1.17dev
Reporter: 王伟


13> (1,missing)
6> (1,offset)
Traceback (most recent call last):
  File "stream.py", line 84, in 
    main()
  File "stream.py", line 79, in main
    env.execute('datastream_api_demo')
  File 
"/home/ustc/anaconda3/envs/pcb_server/lib/python3.8/site-packages/pyflink/datastream/stream_execution_environment.py",
 line 764, in execute
    return 
JobExecutionResult(self._j_stream_execution_environment.execute(j_stream_graph))
  File 
"/home/ustc/anaconda3/envs/pcb_server/lib/python3.8/site-packages/py4j/java_gateway.py",
 line 1321, in __call__
    return_value = get_return_value(
  File 
"/home/ustc/anaconda3/envs/pcb_server/lib/python3.8/site-packages/pyflink/util/exceptions.py",
 line 146, in deco
    return f(*a, **kw)
  File 
"/home/ustc/anaconda3/envs/pcb_server/lib/python3.8/site-packages/py4j/protocol.py",
 line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o0.execute.
: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
        at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
        at 
org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
        at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
        at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
        at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
        at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
        at 
org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1277)
        at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$1(ClassLoadingUtils.java:93)
        at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
        at 
org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
        at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
        at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
        at 
java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
        at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
        at akka.dispatch.OnComplete.internal(Future.scala:300)
        at akka.dispatch.OnComplete.internal(Future.scala:297)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
        at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
        at 
org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
        at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
        at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
        at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
        at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
        at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621)
        at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24)
        at 
akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)
        at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
        at scala.concurrent.impl.CallbackRunnab

回复: [DISCUSS] Reverting sink metric name changes made in 1.15

2022-10-12 Thread Ruan Hang
Thanks for raising the discussion, Qingsheng,

+1 on reverting the breaking changes.
+1 for making a unified and clearer metric definition in Flink 2.0

Best,
Hang


发件人: Jing Ge 
发送时间: 2022年10月12日 19:20
收件人: Qingsheng Ren 
抄送: Chesnay Schepler ; dev ; user 
; Martijn Visser ; Becket Qin 
; Jingsong Li ; Jark Wu 
; Leonard Xu ; Xintong Song 

主题: Re: [DISCUSS] Reverting sink metric name changes made in 1.15

Hi Qingsheng,

Just want to make sure we are on the same page. Are you suggesting switching 
the naming between "numXXXSend" and "numXXXOut" or reverting all the changes we 
did with FLINK-26126 and FLINK-26492?

For the naming switch, please pay attention that the behaviour has been changed 
since we introduced SinkV2[1]. So, please be aware of different 
numbers(behaviour change) even with the same metrics name. Sticking with the 
old name with the new behaviour (very bad idea, IMHO) might seem like saving 
the effort in the first place, but it might end up with monitoring unexpected 
metrics, which is even worse for users, i.e. I didn't change anything, but 
something has been broken since the last update.

For reverting, I am not sure how to fix the issue mentioned in FLINK-26126 
after reverting all changes. Like Chesnay has already pointed out, with SinkV2 
we have two different output lines - one with the external system and the other 
with the downstream operator. In this case, "numXXXSend" is rather a new metric 
than a replacement of "numXXXOut". The "numXXXOut" metric can still be used, 
depending on what the user wants to monitor.


Best regards,
Jing

[1] 
https://github.com/apache/flink/blob/51fc20db30d001a95de95b3b9993eeb06f558f6c/flink-metrics/flink-metrics-core/src/main/java/org/apache/flink/metrics/groups/SinkWriterMetricGroup.java#L48

On Wed, Oct 12, 2022 at 12:48 PM Qingsheng Ren 
mailto:re...@apache.org>> wrote:
As a supplement, considering it could be a big reconstruction
redefining internal and external traffic and touching metric names in
almost all operators, this requires a lot of discussions and we might
do it finally in Flink 2.0. I think compatibility is a bigger blocker
in front of us, as the output of sink is a metric that users care a
lot about.

Thanks,
Qingsheng

On Wed, Oct 12, 2022 at 6:20 PM Qingsheng Ren 
mailto:re...@apache.org>> wrote:
>
> Thanks Chesnay for the reply. +1 for making a unified and clearer
> metric definition distinguishing internal and external data transfers.
> As you described, having IO in operators is quite common such as
> dimension tables in Table/SQL API. This definitely deserves a FLIP and
> an overall design.
>
> However I think it's necessary to change the metric back to
> numRecordsOut instead of sticking with numRecordsSend in 1.15 and
> 1.16. The most important argument is for compatibility as I mentioned
> in my previous email, otherwise all users have to modify their configs
> of metric systems after upgrading to Flink 1.15+, and all custom
> connectors have to change their implementations to migrate to the new
> metric name. I believe other ones participating and approving this
> proposal share the same concern about compatibility too. Also
> considering this issue is blocking the release of 1.16, maybe we could
> fix this asap, and as for defining a new metric for internal data
> transfers we can have an in-depth discussion later. WDYT?
>
> Best,
> Qingsheng
>
> On Tue, Oct 11, 2022 at 6:06 PM Chesnay Schepler 
> mailto:ches...@apache.org>> wrote:
> >
> > Currently I think that would be a mistake.
> >
> > Ultimately what we have here is the culmination of us never really 
> > considering how the numRecordsOut metric should behave for operators that 
> > emit data to other operators _and_ external systems. This goes beyond sinks.
> > This even applies to numRecordsIn, for cases where functions query/write 
> > data from/to the outside, (e.g., Async IO).
> >
> > Having 2 separate metrics for that, 1 exclusively for internal data 
> > transfers, and 1 exclusively for external data transfers, is the only way 
> > to get a consistent metric definition in the long-run.
> > We can jump back-and-forth now or just commit to it.
> >
> > I don't think we can really judge this based on FLIP-33. It was IIRC 
> > written before the two phase sinks were added, which heavily blurred the 
> > lines of what a sink even is. Because it definitely is _not_ the last 
> > operator in a chain anymore.
> >
> > What I would suggest is to stick with what we got (although I despise the 
> > name numRecordsSend), and alias the numRecordsOut metric for all 
> > non-TwoPhaseCommittingSink.
> >
> > On 11/10/2022 05:54, Qingsheng Ren wrote:
> >
> > Thanks for the details Chesnay!
> >
> > By “alias” I mean to respect the original definition made in FLIP-33 for 
> > numRecordsOut, which is the number of records written to the external 
> > system, and keep numRecordsSend as the same value as numRecordsOut for 
> > compatibility.
> >

Re: Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-12 Thread Hang Ruan
+1, I have used the iceberg connector and catalog before. It is easy to use.

Márton Balassi  于2022年10月12日周三 16:38写道:

> +1 from me, thanks for the clarification.
>
> On Wed, Oct 12, 2022 at 7:57 AM Péter Váry 
> wrote:
>
> > Thanks Abid,
> >
> > Count me in, and drop a note, if I can help in any way.
> >
> > Thanks,
> > Peter
> >
> > On Tue, Oct 11, 2022, 20:13  wrote:
> >
> > > Hi Martijn,
> > >
> > > Yes catalog integration exists and catalogs can be created using Flink
> > > SQL.
> > >
> > >
> >
> https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs
> > > has more details.
> > > We may need some discussion within Iceberg community but based on the
> > > current iceberg-flink code structure we are looking to externalize this
> > as
> > > well.
> > >
> > > Thanks
> > > Abid
> > >
> > >
> > > On 2022/10/11 08:24:44 Martijn Visser wrote:
> > > > Hi Abid,
> > > >
> > > > Thanks for the FLIP. I have a question about Iceberg's Catalog: has
> > that
> > > > integration between Flink and Iceberg been created already and are
> you
> > > > looking to externalize that as well?
> > > >
> > > > Thanks,
> > > >
> > > > Martijn
> > > >
> > > > On Tue, Oct 11, 2022 at 12:14 AM  wrote:
> > > >
> > > > > Hi Marton,
> > > > >
> > > > > Yes, we are initiating this as part of the Externalize Flink
> > Connectors
> > > > > effort. Plan is to externalize the existing Flink connector from
> > > Iceberg
> > > > > repo into a separate repo under the Flink umbrella.
> > > > >
> > > > > Sorry about the doc permissions! I was able to create a FLIP-267:
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > > <
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > > >
> > > > > Lets use that to discuss.
> > > > >
> > > > > Thanks
> > > > > Abid
> > > > >
> > > > > On 2022/10/10 12:57:32 Márton Balassi wrote:
> > > > > > Hi Abid,
> > > > > >
> > > > > > Just to clarify does your suggestion mean that the Iceberg
> > community
> > > > > would
> > > > > > like to remove the iceberg-flink connector from the Iceberg
> > codebase
> > > and
> > > > > > maintain it under Flink instead? A new separate repo under the
> > Flink
> > > > > > project umbrella given the current existing effort to extract
> > > connectors
> > > > > to
> > > > > > their individual repos (externalize) makes sense to me.
> > > > > >
> > > > > > [1]
> > https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo
> > > > > >
> > > > > > Best,
> > > > > > Marton
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li 
> > wrote:
> > > > > >
> > > > > > > Thanks Abid for driving.
> > > > > > >
> > > > > > > +1 for this.
> > > > > > >
> > > > > > > Can you open the permissions for
> > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > > > ?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I would like to start a discussion about contributing Iceberg
> > > Flink
> > > > > > > Connector to Flink.
> > > > > > > >
> > > > > > > > I created a doc <
> > > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > >
> > > > > > > with all the details following the Flink Connector template as
> I
> > > don’t
> > > > > have
> > > > > > > permissions to create a FLIP yet.
> > > > > > > > High level details are captured below:
> > > > > > > >
> > > > > > > > Motivation:
> > > > > > > >
> > > > > > > > This FLIP aims to contribute the existing Apache Iceberg
> Flink
> > > > > Connector
> > > > > > > to Flink.
> > > > > > > >
> > > > > > > > Apache Iceberg is an open table format for huge analytic
> > > datasets.
> > > > > > > Iceberg adds tables to compute engines including Spark, Trino,
> > > > > PrestoDB,
> > > > > > > Flink, Hive and Impala using a high-performance table format
> that
> > > works
> > > > > > > just like a SQL table.
> > > > > > > > Iceberg avoids unpleasant surprises. Schema evolution works
> and
> > > won’t
> > > > > > > inadvertently un-delete data. Users don’t need to know about
> > > > > partitioning
> > > > > > > to get fast queries. Iceberg was designed to solve correctness
> > > > > problems in
> > > > > > > eventually-consistent cloud object stores.
> > > > > > > >
> > > > > > > > Iceberg supports both Flink’s DataStream API and Table API.
> > > Based on
> > > > > the
> > > > > > > guideline of the Flink community, only the latest 2 minor
> > versions
> > > are
> > > > > > > actively maintained. See the Multi-Engine Support#apache-flink
> > for
> > > > > further
> > > > > > > details.
> > > > > > > >
> > > > > > > >
> > > > > > > > Iceberg connec

[DISCUSS] FLIP-263: Improve resolving schema compatibility

2022-10-12 Thread Hangxiang Yu
Dear Flink developers,

I would like to start a discussion thread on FLIP-263[1] proposing to
improve the usability of resolving schema compatibility.

Currently, the place for compatibility checks is
TypeSerializerSnapshot#resolveSchemaCompatibility
which belongs to the old serializer, There are no ways for users to specify the
compatibility with the old serializer in the new customized serializer.

The FLIP hopes to reverse the direction of resolving schema compatibility
to improve the usability of resolving schema compatibility.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility


dev@flink.apache.org

2022-10-12 Thread Chesnay Schepler
Since the discussion 
(https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo) has 
stalled a bit but we need a conclusion to move forward I'm opening a vote.


Proposal summary:

1) Branch model
1.1) The default branch is called "main" and used for the next major 
iteration.

1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
1.3) Branches are not specific to a Flink version. (i.e., no v3.2-1.15)

2) Versioning
2.1) Source releases: major.minor.patch
2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
(This may imply releasing the exact same connector jar multiple times 
under different versions)


3) Flink compatibility
3.1) The Flink versions supported by the project (last 2 major Flink 
versions) must be supported.
3.2) How this is achived is left to the connector, as long as it 
conforms to the rest of the proposal.


4) Support
4.1) The last 2 major connector releases are supported with only the 
latter receiving additional features, with the following exceptions:
4.1.a) If the older major connector version does not support any 
currently supported Flink version, then it is no longer supported.
4.1.b) If the last 2 major versions do not cover all supported Flink 
versions, then the latest connector version that supports the older 
Flink version /additionally /gets patch support.
4.2) For a given major connector version only the latest minor version 
is supported.

(This means if 1.1.x is released there will be no more 1.0.x release)


I'd like to clarify that these won't be set in stone for eternity.
We should re-evaluate how well this model works over time and adjust it 
accordingly, consistently across all connectors.
I do believe that as is this strikes a good balance between 
maintainability for us and clarity to users.



Voting schema:

Consensus, committers have binding votes, open for at least 72 hours.


[jira] [Created] (FLINK-29609) Clean up jobmanager deployment on suspend after recording savepoint info

2022-10-12 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-29609:
--

 Summary: Clean up jobmanager deployment on suspend after recording 
savepoint info
 Key: FLINK-29609
 URL: https://issues.apache.org/jira/browse/FLINK-29609
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
 Fix For: kubernetes-operator-1.3.0


Currently in case of suspending with savepoint. The jobmanager pod will linger 
there forever after cancelling the job.

This is currently used to ensure consistency in case the 
operator/cancel-with-savepoint operation fails.

Once we are sure however that the savepoint has been recorded and the job is 
shut down, we should clean up all the resources. Optionally we can make this 
configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Reference operator from Flink Kubernetes deployment docs

2022-10-12 Thread Gyula Fóra
Hi Devs!

I would like to start a discussion about referencing the Flink Kubernetes
Operator directly from the Flink Kubernetes deployment documentation.

Currently the Flink deployment/resource provider docs provide some
information for the Standalone and Native Kubernetes integration without
any reference to the operator.

I think we reached a point with the operator where we should provide a bit
more visibility and value to the users by directly proposing to use the
operator when considering Flink on Kubernetes. We should definitely keep
the current docs but make the point that for most users the easiest way to
use Flink on Kubernetes is probably through the operator (where they can
now benefit from both standalone and native integration under the hood).
This should help us avoid cases where a new user completely misses the
existence of the operator when starting out based on the Flink docs.

What do you think?

Gyula


Re: [DISCUSS] Reference operator from Flink Kubernetes deployment docs

2022-10-12 Thread Chesnay Schepler
I don't see a reason for why we shouldn't at least mention the operator 
in the kubernetes docs.


On 12/10/2022 16:25, Gyula Fóra wrote:

Hi Devs!

I would like to start a discussion about referencing the Flink Kubernetes
Operator directly from the Flink Kubernetes deployment documentation.

Currently the Flink deployment/resource provider docs provide some
information for the Standalone and Native Kubernetes integration without
any reference to the operator.

I think we reached a point with the operator where we should provide a bit
more visibility and value to the users by directly proposing to use the
operator when considering Flink on Kubernetes. We should definitely keep
the current docs but make the point that for most users the easiest way to
use Flink on Kubernetes is probably through the operator (where they can
now benefit from both standalone and native integration under the hood).
This should help us avoid cases where a new user completely misses the
existence of the operator when starting out based on the Flink docs.

What do you think?

Gyula





[DISCUSS] Adding client.id.prefix to the KafkaSink

2022-10-12 Thread Yaroslav Tkachenko
Hi everyone,

I'd like to propose adding client.id.prefix to the KafkaSink to mirror the
functionality provided by the KafkaSource.

Defining client.id is very important when running workloads with many
different Kafka clients: they help with identification and enforcing
quotas. Due to the specific implementation details, you can't simply set
client.id for a KafkaSource or KafkaSink in Flink, you can find more
context in https://issues.apache.org/jira/browse/FLINK-8093, which was
opened in 2017 and is still unresolved.

I don't see any reason to treat KafkaSink differently: if we can define
a client.id.prefix for the KafkaSource we should be able to define it for
the KafkaSink.

I've created an issue https://issues.apache.org/jira/browse/FLINK-28842 and
a PR https://github.com/apache/flink/pull/20475 to resolve this.


Re: [DISCUSS] FLIP-263: Improve resolving schema compatibility

2022-10-12 Thread Zakelly Lan
Hi Hangxiang,

Thanks for driving this. It is reasonable to let the new serializer
claim its compatibility with the old serializer. However, I think
there is a little confusion as you described in your proposed change
'Step 1'. You mean that we let the new serializer check the
compatibility first, and if it gives the 'INCOMPATIBLE' then we let
the old serializer do the check as before. The 'INCOMPATIBLE' from the
new serializer means 'do not know' or 'unresolved' here, but sometimes
the new serializer should express a decisive meaning that it is really
not compatible with the older one, which is the 'INCOMPATIBLE' should
mean. There is a semantic ambiguity or missing. I think there are two
options to resolve this:
1. Provide 'UNRESOLVED' in type of the compatibility checking result,
and let the proposed new interface return it by default. OR
2. Replace any use of the old interface with the new interface. In the
default implementation of the new interface, call the old interface to
leverage the old result. This approach provides the ability to totally
replace original checking logic (by implementing the new interface in
new serializer) while maintaining good backward compatibility.

What do you think?

Best,
Zakelly.


On Wed, Oct 12, 2022 at 8:41 PM Hangxiang Yu  wrote:
>
> Dear Flink developers,
>
> I would like to start a discussion thread on FLIP-263[1] proposing to
> improve the usability of resolving schema compatibility.
>
> Currently, the place for compatibility checks is
> TypeSerializerSnapshot#resolveSchemaCompatibility
> which belongs to the old serializer, There are no ways for users to specify 
> the
> compatibility with the old serializer in the new customized serializer.
>
> The FLIP hopes to reverse the direction of resolving schema compatibility
> to improve the usability of resolving schema compatibility.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility


Utilizing Kafka headers in Flink Kafka connector

2022-10-12 Thread Great Info
I have some flink applications that read streams from Kafka, now
the producer side code has introduced some additional information in Kafka
headers while producing records.
Now I need to change my consumer-side logic to process the records if the
header contains a specific value, if the header value is different than the
one I am looking I just need to move forward with the next steam.

I got some sample reference code
but this logic needs to
deserialize and verify the header. Is there any simple way to ignore the
record before deserializing?


Re: Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-12 Thread Danny Cranmer
+1 from me, let's open a VOTE thread to make this official.

On Wed, Oct 12, 2022 at 1:37 PM Hang Ruan  wrote:

> +1, I have used the iceberg connector and catalog before. It is easy to
> use.
>
> Márton Balassi  于2022年10月12日周三 16:38写道:
>
> > +1 from me, thanks for the clarification.
> >
> > On Wed, Oct 12, 2022 at 7:57 AM Péter Váry 
> > wrote:
> >
> > > Thanks Abid,
> > >
> > > Count me in, and drop a note, if I can help in any way.
> > >
> > > Thanks,
> > > Peter
> > >
> > > On Tue, Oct 11, 2022, 20:13  wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > Yes catalog integration exists and catalogs can be created using
> Flink
> > > > SQL.
> > > >
> > > >
> > >
> >
> https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs
> > > > has more details.
> > > > We may need some discussion within Iceberg community but based on the
> > > > current iceberg-flink code structure we are looking to externalize
> this
> > > as
> > > > well.
> > > >
> > > > Thanks
> > > > Abid
> > > >
> > > >
> > > > On 2022/10/11 08:24:44 Martijn Visser wrote:
> > > > > Hi Abid,
> > > > >
> > > > > Thanks for the FLIP. I have a question about Iceberg's Catalog: has
> > > that
> > > > > integration between Flink and Iceberg been created already and are
> > you
> > > > > looking to externalize that as well?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Martijn
> > > > >
> > > > > On Tue, Oct 11, 2022 at 12:14 AM  wrote:
> > > > >
> > > > > > Hi Marton,
> > > > > >
> > > > > > Yes, we are initiating this as part of the Externalize Flink
> > > Connectors
> > > > > > effort. Plan is to externalize the existing Flink connector from
> > > > Iceberg
> > > > > > repo into a separate repo under the Flink umbrella.
> > > > > >
> > > > > > Sorry about the doc permissions! I was able to create a FLIP-267:
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > > > <
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > > > >
> > > > > > Lets use that to discuss.
> > > > > >
> > > > > > Thanks
> > > > > > Abid
> > > > > >
> > > > > > On 2022/10/10 12:57:32 Márton Balassi wrote:
> > > > > > > Hi Abid,
> > > > > > >
> > > > > > > Just to clarify does your suggestion mean that the Iceberg
> > > community
> > > > > > would
> > > > > > > like to remove the iceberg-flink connector from the Iceberg
> > > codebase
> > > > and
> > > > > > > maintain it under Flink instead? A new separate repo under the
> > > Flink
> > > > > > > project umbrella given the current existing effort to extract
> > > > connectors
> > > > > > to
> > > > > > > their individual repos (externalize) makes sense to me.
> > > > > > >
> > > > > > > [1]
> > > https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo
> > > > > > >
> > > > > > > Best,
> > > > > > > Marton
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li 
> > > wrote:
> > > > > > >
> > > > > > > > Thanks Abid for driving.
> > > > > > > >
> > > > > > > > +1 for this.
> > > > > > > >
> > > > > > > > Can you open the permissions for
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > > > > ?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jingsong
> > > > > > > >
> > > > > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I would like to start a discussion about contributing
> Iceberg
> > > > Flink
> > > > > > > > Connector to Flink.
> > > > > > > > >
> > > > > > > > > I created a doc <
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > > >
> > > > > > > > with all the details following the Flink Connector template
> as
> > I
> > > > don’t
> > > > > > have
> > > > > > > > permissions to create a FLIP yet.
> > > > > > > > > High level details are captured below:
> > > > > > > > >
> > > > > > > > > Motivation:
> > > > > > > > >
> > > > > > > > > This FLIP aims to contribute the existing Apache Iceberg
> > Flink
> > > > > > Connector
> > > > > > > > to Flink.
> > > > > > > > >
> > > > > > > > > Apache Iceberg is an open table format for huge analytic
> > > > datasets.
> > > > > > > > Iceberg adds tables to compute engines including Spark,
> Trino,
> > > > > > PrestoDB,
> > > > > > > > Flink, Hive and Impala using a high-performance table format
> > that
> > > > works
> > > > > > > > just like a SQL table.
> > > > > > > > > Iceberg avoids unpleasant surprises. Schema evolution works
> > and
> > > > won’t
> > > > > > > > inadvertently un-delete data. Users don’t need to know about
> > > > > > partitioning
> > > > > > > > to get fast queries. Iceberg was designed to solve
> correctness
> > > > > > problems in
> >

dev@flink.apache.org

2022-10-12 Thread Danny Cranmer
Thanks for the concise summary Chesnay.

+1 from me (binding)

Just one clarification, for "3.1) The Flink versions supported by the
project (last 2 major Flink versions) must be supported.". Do we actually
mean major here, as in Flink 1.x.x and 2.x.x? Right now we would only
support Flink 1.15.x and not 1.14.x? I would be inclined to support the
latest 2 minor Flink versions (major.minor.patch) given that we only have 1
active major Flink version.

Danny

On Wed, Oct 12, 2022 at 2:12 PM Chesnay Schepler  wrote:

> Since the discussion
> (https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo) has
> stalled a bit but we need a conclusion to move forward I'm opening a vote.
>
> Proposal summary:
>
> 1) Branch model
> 1.1) The default branch is called "main" and used for the next major
> iteration.
> 1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
> 1.3) Branches are not specific to a Flink version. (i.e., no v3.2-1.15)
>
> 2) Versioning
> 2.1) Source releases: major.minor.patch
> 2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
> (This may imply releasing the exact same connector jar multiple times
> under different versions)
>
> 3) Flink compatibility
> 3.1) The Flink versions supported by the project (last 2 major Flink
> versions) must be supported.
> 3.2) How this is achived is left to the connector, as long as it
> conforms to the rest of the proposal.
>
> 4) Support
> 4.1) The last 2 major connector releases are supported with only the
> latter receiving additional features, with the following exceptions:
> 4.1.a) If the older major connector version does not support any
> currently supported Flink version, then it is no longer supported.
> 4.1.b) If the last 2 major versions do not cover all supported Flink
> versions, then the latest connector version that supports the older
> Flink version /additionally /gets patch support.
> 4.2) For a given major connector version only the latest minor version
> is supported.
> (This means if 1.1.x is released there will be no more 1.0.x release)
>
>
> I'd like to clarify that these won't be set in stone for eternity.
> We should re-evaluate how well this model works over time and adjust it
> accordingly, consistently across all connectors.
> I do believe that as is this strikes a good balance between
> maintainability for us and clarity to users.
>
>
> Voting schema:
>
> Consensus, committers have binding votes, open for at least 72 hours.
>


Re: Utilizing Kafka headers in Flink Kafka connector

2022-10-12 Thread Yaroslav Tkachenko
Hi,

You can implement a custom KafkaRecordDeserializationSchema (example
https://docs.immerok.cloud/docs/cookbook/reading-apache-kafka-headers-with-apache-flink/#the-custom-deserializer)
and just avoid emitting the record if the header value matches what you
need.

On Wed, Oct 12, 2022 at 11:04 AM Great Info  wrote:

> I have some flink applications that read streams from Kafka, now
> the producer side code has introduced some additional information in Kafka
> headers while producing records.
> Now I need to change my consumer-side logic to process the records if the
> header contains a specific value, if the header value is different than the
> one I am looking I just need to move forward with the next steam.
>
> I got some sample reference code
> but this logic needs to
> deserialize and verify the header. Is there any simple way to ignore the
> record before deserializing?
>


Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-12 Thread Martijn Visser
Hi everyone,

Thanks again for all your feedback. It's very much appreciated.

My overall feeling is that people are not opposed to the FLIP. There is
demand for adding Java 17 support before dropping the Scala APIs. Given
that the proposal for actually dropping the Scala APIs would only happen
with a Flink 2.0 and Java 17 support would either happen in a new minor
version or a new major version (I haven't seen a FLIP or discussion being
opened adding Java 17 support, only on deprecating Java 8), Java 17 support
would either be there earlier (in a new minor version) or at the same time
(with Flink 2.0) when the Scala APIs would be dropped.

If there are no more discussion topics, I would move this FLIP to a vote at
the beginning of next week.

Best regards,

Martijn

On Sun, Oct 9, 2022 at 10:36 AM guenterh.lists 
wrote:

> Hi Martijn
>
> I do not maintain a large production application based on Flink, so it
> would not be a problem for me to convert existing implementations to
> whatever API.
>
> I am working in the area of cultural heritage, which is mainly about the
> processing of structured (meta)-data (scientific libraries, archives and
> museums)
> My impression: People without much background/experience with Java
> implementations find it easier to get into the functional mindset as
> supported in Scala. That's why I think it would be very unfortunate if the
> use of Scala in Flink becomes more and more limited or neglected.
>
> I think using the Java API in Scala is a possible way also in my
> environment.
>
> In the last weeks I tried to port the examples from the "Flink Course" of
> Daniel Ciorcilan (https://rockthejvm.com/p/flink - he mainly offers Scala
> courses), which are exclusively based on the native Scala API, to the Java
> API. This has worked without any problems as far as I can see. So far I
> haven't tried any examples based on the Table API or streaming workflows in
> batch mode (which would be important for our environment).
>
> My main trouble: So far I don't know enough about the limitations of using
> the Java API in a Scala implementation and what that means. My current
> understanding: the limitation is mainly in deriving the type information in
> generic APIs with Scala types. For me it would be very significant and
> helpful if there would be more information, descriptions and examples about
> this topic.
>
> So far unfortunately I had too little time to deal with a wrapper like
> flink-scala-api (https://github.com/findify/flink-scala-api ) and the
> current alternative is probably going to be deprecated in the future (
> https://github.com/ariskk/flink4s/issues/17#issuecomment-1125806808 )
>
> Günter
>
>
> On 04.10.22 13:58, Martijn Visser wrote:
>
> Hi Marton,
>
> You're making a good point, I originally wanted to include already the
> User mailing list to get their feedback but forgot to do so. I'll do some
> more outreach via other channels as well.
>
> @Users of Flink, I've made a proposal to deprecate and remove Scala API
> support in a future version of Flink. Your feedback on this topic is very
> much appreciated.
>
> Regarding the large Scala codebase for Flink, a potential alternative
> could be to have a wrapper for all Java APIs that makes them available as
> Scala APIs. However, this still requires Scala maintainers and I don't
> think that we currently have those in our community. The easiest solution
> for them would be to use the Java APIs directly. Yes it would involve work,
> but we won't actually be able to remove the Scala APIs until Flink 2.0 so
> there's still time for that :)
>
> Best regards,
>
> Martijn
>
> On Tue, Oct 4, 2022 at 1:26 AM Márton Balassi 
> wrote:
>
>> Hi Martjin,
>>
>> Thanks for compiling the FLIP. I agree with the sentiment that Scala poses
>> considerable maintenance overhead and key improvements (like 2.13 or
>> 2.12.8
>> supports) are hanging stale. With that said before we make this move we
>> should attempt to understand the userbase affected.
>> A quick Slack and user mailing list search does return quite a bit of
>> results for scala (admittedly a cursory look at them suggest that many of
>> them have to do with missing features in Scala that exist in Java or Scala
>> versions). I would love to see some polls on this topic, we could also use
>> the Flink twitter handle to ask the community about this.
>>
>> I am aware of users having large existing Scala codebases for Flink. This
>> move would pose a very large effort on them, as they would need to rewrite
>> much of their existing code. What are the alternatives in your opinion,
>> Martjin?
>>
>> On Tue, Oct 4, 2022 at 6:22 AM Martijn Visser 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > I would like to open a discussion thread on FLIP-265 Deprecate and
>> remove
>> > Scala API support. Please take a look at
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-265+Deprecate+and+remove+Scala+API+support
>> > and provide your feedback.
>> >
>> > Best regards,
>> >

[VOTE] Remove HCatalog connector

2022-10-12 Thread Martijn Visser
Hi everyone,

Since no comments were made, I'm opening a vote to remove the HCatalog
connector [1]

The voting period will be open at least 72hrs.

Best regards,

Martijn

[1]
https://lists.apache.org/thread/j8jc5zrhnqlv8y3lkmc3wdo9ysgmsr84
-- 
Martijn
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


dev@flink.apache.org

2022-10-12 Thread Martijn Visser
+1 (binding), I am indeed assuming that Chesnay meant the last two minor
versions as supported.

Op wo 12 okt. 2022 om 20:18 schreef Danny Cranmer 

> Thanks for the concise summary Chesnay.
>
> +1 from me (binding)
>
> Just one clarification, for "3.1) The Flink versions supported by the
> project (last 2 major Flink versions) must be supported.". Do we actually
> mean major here, as in Flink 1.x.x and 2.x.x? Right now we would only
> support Flink 1.15.x and not 1.14.x? I would be inclined to support the
> latest 2 minor Flink versions (major.minor.patch) given that we only have 1
> active major Flink version.
>
> Danny
>
> On Wed, Oct 12, 2022 at 2:12 PM Chesnay Schepler 
> wrote:
>
> > Since the discussion
> > (https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo) has
> > stalled a bit but we need a conclusion to move forward I'm opening a
> vote.
> >
> > Proposal summary:
> >
> > 1) Branch model
> > 1.1) The default branch is called "main" and used for the next major
> > iteration.
> > 1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
> > 1.3) Branches are not specific to a Flink version. (i.e., no v3.2-1.15)
> >
> > 2) Versioning
> > 2.1) Source releases: major.minor.patch
> > 2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
> > (This may imply releasing the exact same connector jar multiple times
> > under different versions)
> >
> > 3) Flink compatibility
> > 3.1) The Flink versions supported by the project (last 2 major Flink
> > versions) must be supported.
> > 3.2) How this is achived is left to the connector, as long as it
> > conforms to the rest of the proposal.
> >
> > 4) Support
> > 4.1) The last 2 major connector releases are supported with only the
> > latter receiving additional features, with the following exceptions:
> > 4.1.a) If the older major connector version does not support any
> > currently supported Flink version, then it is no longer supported.
> > 4.1.b) If the last 2 major versions do not cover all supported Flink
> > versions, then the latest connector version that supports the older
> > Flink version /additionally /gets patch support.
> > 4.2) For a given major connector version only the latest minor version
> > is supported.
> > (This means if 1.1.x is released there will be no more 1.0.x release)
> >
> >
> > I'd like to clarify that these won't be set in stone for eternity.
> > We should re-evaluate how well this model works over time and adjust it
> > accordingly, consistently across all connectors.
> > I do believe that as is this strikes a good balance between
> > maintainability for us and clarity to users.
> >
> >
> > Voting schema:
> >
> > Consensus, committers have binding votes, open for at least 72 hours.
> >
>
-- 
Martijn
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


dev@flink.apache.org

2022-10-12 Thread Ferenc Csaky
+1 from my side (non-binding)

Best,
F


--- Original Message ---
On Wednesday, October 12th, 2022 at 15:47, Martijn Visser 
 wrote:


> 
> 
> +1 (binding), I am indeed assuming that Chesnay meant the last two minor
> versions as supported.
> 
> Op wo 12 okt. 2022 om 20:18 schreef Danny Cranmer dannycran...@apache.org
> 
> > Thanks for the concise summary Chesnay.
> > 
> > +1 from me (binding)
> > 
> > Just one clarification, for "3.1) The Flink versions supported by the
> > project (last 2 major Flink versions) must be supported.". Do we actually
> > mean major here, as in Flink 1.x.x and 2.x.x? Right now we would only
> > support Flink 1.15.x and not 1.14.x? I would be inclined to support the
> > latest 2 minor Flink versions (major.minor.patch) given that we only have 1
> > active major Flink version.
> > 
> > Danny
> > 
> > On Wed, Oct 12, 2022 at 2:12 PM Chesnay Schepler ches...@apache.org
> > wrote:
> > 
> > > Since the discussion
> > > (https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo) has
> > > stalled a bit but we need a conclusion to move forward I'm opening a
> > > vote.
> > > 
> > > Proposal summary:
> > > 
> > > 1) Branch model
> > > 1.1) The default branch is called "main" and used for the next major
> > > iteration.
> > > 1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
> > > 1.3) Branches are not specific to a Flink version. (i.e., no v3.2-1.15)
> > > 
> > > 2) Versioning
> > > 2.1) Source releases: major.minor.patch
> > > 2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
> > > (This may imply releasing the exact same connector jar multiple times
> > > under different versions)
> > > 
> > > 3) Flink compatibility
> > > 3.1) The Flink versions supported by the project (last 2 major Flink
> > > versions) must be supported.
> > > 3.2) How this is achived is left to the connector, as long as it
> > > conforms to the rest of the proposal.
> > > 
> > > 4) Support
> > > 4.1) The last 2 major connector releases are supported with only the
> > > latter receiving additional features, with the following exceptions:
> > > 4.1.a) If the older major connector version does not support any
> > > currently supported Flink version, then it is no longer supported.
> > > 4.1.b) If the last 2 major versions do not cover all supported Flink
> > > versions, then the latest connector version that supports the older
> > > Flink version /additionally /gets patch support.
> > > 4.2) For a given major connector version only the latest minor version
> > > is supported.
> > > (This means if 1.1.x is released there will be no more 1.0.x release)
> > > 
> > > I'd like to clarify that these won't be set in stone for eternity.
> > > We should re-evaluate how well this model works over time and adjust it
> > > accordingly, consistently across all connectors.
> > > I do believe that as is this strikes a good balance between
> > > maintainability for us and clarity to users.
> > > 
> > > Voting schema:
> > > 
> > > Consensus, committers have binding votes, open for at least 72 hours.
> 
> --
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser


Re: [DISCUSS] Drop Gelly

2022-10-12 Thread Martijn Visser
Hi everyone,

I'm reviving this really old discussion thread, but I just stumbled across 
Gelly again and realized that this discussion never was finished. 

I'll open up a vote thread for dropping the current DataSet based Gelly 
library. 

Best regards,

Martijn

On 2022/01/05 03:37:18 Yun Gao wrote:
> Hi,
> 
> Very thanks for initiating the discussion!
> 
> Also +1 to drop the current DataSet based Gelly library so that we could 
> finally drop the 
> legacy DataSet API. 
> 
> For whether to keep the graph computing ability, from my side graph query / 
> graph computing and
> chaining them with the preprocessing pipeline should be an actually existent 
> requirements. 
> Currently we also already have the basis for a graph computing library on 
> DataStream API
> with the new iteration library[1], thus it would be already feasible to have 
> a stream / batch
> unified graph computing library on top of the DataStream API. And it would 
> indeed be most suitable as 
> a separate ecosystem project. 
> 
> Best,
> Yun
> 
> [1] https://cwiki.apache.org/confluence/x/hAEBCw
> 
> 
>  --Original Mail --
> Sender:Martijn Visser 
> Send Date:Wed Jan 5 02:58:53 2022
> Recipients:Zhipeng Zhang 
> CC:David Anderson , Till Rohrmann 
> , dev , User 
> 
> Subject:Re: [DISCUSS] Drop Gelly
> 
> Hi Zhipeng,
> 
> I think that we're seeing more code being externalised, for example with the 
> Flink Remote Shuffle service [1] and the ongoing discussion on the external 
> connector repository [2], it makes sense to go for your second option. Maybe 
> it fits under Flink Extended [3]. 
> 
> The main question becomes who can contribute and maintain this library. 
> Another (intermediate) solution might also be to find someone who can 
> migrate/move the current Gelly codebase to use Flink's DataStream API in 
> batch mode, so it wouldn't be using the DataSet API anymore. This has 
> recently also happened with the State Processor API [4]. 
> 
> Best regards,
> 
> Martijn
> 
> [1] https://github.com/flink-extended/flink-remote-shuffle
> [2] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> [3] https://github.com/flink-extended/
> [4] https://issues.apache.org/jira/browse/FLINK-24912
> On Tue, 4 Jan 2022 at 14:01, Zhipeng Zhang  wrote:
> 
> Hi Martijin,
> 
> Thanks for the feedback. I am not proposing  to bundle the new graph library 
> with Alink. I am +1 for dropping the DataSet-based Gelly library, but we 
> probably need a new graph library in Flink for the possible migration.
> 
> We haven't decided what to do yet and probably need more discussion. There 
> are some possible solutions:
> 1. We include a new DataStream-based graph library in FlinkML[1], given that 
> graphs and machine learning algorithms are more often used together 
> [2][3][4]. To achieve this, we could reuse the `AlgoOperator` interface in 
> FlinkML.
> 2. We include a new DataStream-based graph library as a separate module/repo. 
> This is consistent with existing libraries like Spark [5].
> 
> What do you think?
> 
> 
> [1] https://github.com/apache/flink-ml
> [2] https://arxiv.org/abs/1403.6652
> [3] https://arxiv.org/abs/1503.03578
> [4] https://github.com/apache/spark
> 
> Best,
> Zhipeng
> Martijn Visser  于2022年1月4日周二 15:27写道:
> 
> Hi Zhipeng,
> 
> Good that you've reached out, I wasn't aware that Gelly is being used in 
> Alink. Are you proposing to write a new graph library as a successor of Gelly 
> and bundle that with Alink? 
> 
> Best regards,
> 
> Martijn
> On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang  wrote:
> 
> Hi everyone,
> 
> Thanks for starting the discussion :)
> 
> We (Alink team [1]) are actually using part of the Gelly library to support 
> graph algorithms (connected component, single source shortest path, etc.) for 
> users in Alibaba Inc.
> 
> As DataSet API is going to be dropped, shall we also provide a new graph 
> library based on DataStream runtime (similar as we did for machine learning)?
> 
> [1] https://github.com/Alibaba/alink
> David Anderson  于2022年1月4日周二 00:01写道:
> 
> Most of the inquiries I've had about Gelly in recent memory have been from 
> folks looking for a streaming solution, and it's only been a handful. 
> 
> +1 for dropping Gelly
> 
> David
> On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann  wrote:
> 
> I haven't seen any changes or requests to/for Gelly in ages. Hence, I would 
> assume that it is not really used and can be removed.
> 
> +1 for dropping Gelly.
> 
> Cheers,
> Till
> On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser  wrote:
> 
> Hi everyone,
> 
> Flink is bundled with Gelly, a Graph API library [1]. This has been marked as 
> approaching end-of-life for quite some time [2].
> 
> Gelly is built on top of Flink's DataSet API, which is deprecated and slowly 
> being phased out [3]. It only works on batch jobs. Based on the activity in 
> the Dev and User mailing lists, I don't see a lot of questions popping up 
> regarding the usage of Gelly. Removing Gelly

[VOTE] Drop Gelly

2022-10-12 Thread Martijn Visser
Hi everyone,

I would like to open a vote for dropping Gelly, which was discussed a long
time ago but never put to a vote [1].

Voting will be open for at least 72 hours.

Best regards,

Martijn
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser

[1] https://lists.apache.org/thread/2m6wtgjvxcogbf9d5q7mqt4ofqjf2ojc


Re: [DISCUSS] Adding client.id.prefix to the KafkaSink

2022-10-12 Thread Martijn Visser
Hi Yaroslav,

+1 from my end. Thanks for bringing this up!

Best regards,

Martijn

On Wed, Oct 12, 2022 at 6:34 PM Yaroslav Tkachenko
 wrote:

> Hi everyone,
>
> I'd like to propose adding client.id.prefix to the KafkaSink to mirror the
> functionality provided by the KafkaSource.
>
> Defining client.id is very important when running workloads with many
> different Kafka clients: they help with identification and enforcing
> quotas. Due to the specific implementation details, you can't simply set
> client.id for a KafkaSource or KafkaSink in Flink, you can find more
> context in https://issues.apache.org/jira/browse/FLINK-8093, which was
> opened in 2017 and is still unresolved.
>
> I don't see any reason to treat KafkaSink differently: if we can define
> a client.id.prefix for the KafkaSource we should be able to define it for
> the KafkaSink.
>
> I've created an issue https://issues.apache.org/jira/browse/FLINK-28842
> and
> a PR https://github.com/apache/flink/pull/20475 to resolve this.
>


Re: [DISCUSS] Reference operator from Flink Kubernetes deployment docs

2022-10-12 Thread Martijn Visser
+1 from my end to include the operator in the related Kubernetes sections
of the Flink docs

On Wed, Oct 12, 2022 at 5:31 PM Chesnay Schepler  wrote:

> I don't see a reason for why we shouldn't at least mention the operator
> in the kubernetes docs.
>
> On 12/10/2022 16:25, Gyula Fóra wrote:
> > Hi Devs!
> >
> > I would like to start a discussion about referencing the Flink Kubernetes
> > Operator directly from the Flink Kubernetes deployment documentation.
> >
> > Currently the Flink deployment/resource provider docs provide some
> > information for the Standalone and Native Kubernetes integration without
> > any reference to the operator.
> >
> > I think we reached a point with the operator where we should provide a
> bit
> > more visibility and value to the users by directly proposing to use the
> > operator when considering Flink on Kubernetes. We should definitely keep
> > the current docs but make the point that for most users the easiest way
> to
> > use Flink on Kubernetes is probably through the operator (where they can
> > now benefit from both standalone and native integration under the hood).
> > This should help us avoid cases where a new user completely misses the
> > existence of the operator when starting out based on the Flink docs.
> >
> > What do you think?
> >
> > Gyula
> >
>
>


dev@flink.apache.org

2022-10-12 Thread Steven Wu
With the model of externalized Flink connector repo (which I fully
support), there is one challenge of supporting versions of two upstream
projects (similar to what Peter Vary mentioned earlier).

E.g., today the Flink Iceberg connector lives in Iceberg repo. We have
separate modules 1.13, 1.14, 1.15 to list the supported Flink
versions, which is still manageable. With the new model, now we may need to
multiply that with the number of Iceberg versions that we are going to
support, e.g. 0.13, 0.14, 1.0. That multiplication factor would be
non-manageable.

>From the flink-connector-elasticsearch repo, it is unclear how we define
the supported Flink versions. Only one/latest Flink version?


On Fri, Sep 30, 2022 at 8:36 AM Péter Váry 
wrote:

> +1 having an option storing every version of a connector in one repo
>
> Also, it would be good to have the major(.minor) version of the connected
> system in the name of the connector jar, depending of the compatibility. I
> think this compatibility is mostly system dependent.
>
> Thanks, Peter
>
>
> On Fri, Sep 30, 2022, 09:32 Martijn Visser 
> wrote:
>
> > Hi Peter,
> >
> > I think this also depends on the support SLA that the technology that you
> > connect to provides. For example, with Flink and Elasticsearch, we choose
> > to follow Elasticsearch supported versions. So that means that when
> support
> > for Elasticsearch 8 is introduced, support for Elasticsearch 6 should be
> > dropped (since Elastic only support the last major version and the latest
> > minor version prior to that)
> >
> > I don't see value in having different connectors for Iceberg 0.14 and
> 0.15
> > in separate repositories. I think that will confuse the user. I would
> > expect that with modules you should be able to have support for multiple
> > versions in one repository.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Fri, Sep 30, 2022 at 7:44 AM Péter Váry 
> > wrote:
> >
> > > Thanks for the quick response!
> > >
> > > Would this mean, that we have different connectors for Iceberg 0.14,
> and
> > > Iceberg 0.15. Would these different versions kept in different
> > repository?
> > >
> > > My feeling is that this model is fine for the stable/slow moving
> systems
> > > like Hive/HBase. For other systems, which are evolving faster, this is
> > less
> > > than ideal.
> > >
> > > For those, who have more knowledge about the Flink ecosystem: How do
> you
> > > feel? What is the distribution of the connectors between the slow
> moving
> > > and the fast moving systems?
> > >
> > > Thanks, Peter
> > >
> > >
> > > On Thu, Sep 29, 2022, 16:46 Danny Cranmer 
> > wrote:
> > >
> > > > If you look at ElasticSearch [1] as an example there are different
> > > variants
> > > > of the connector depending on the "connected" system:
> > > > - flink-connector-elasticsearch6
> > > > - flink-connector-elasticsearch7
> > > >
> > > > Looks like Hive and HBase follow a similar pattern in the main Flink
> > > repo/
> > > >
> > > > [1] https://github.com/apache/flink-connector-elasticsearch
> > > >
> > > > On Thu, Sep 29, 2022 at 3:17 PM Péter Váry <
> > peter.vary.apa...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Team,
> > > > >
> > > > > Just joining the conversation for the first time, so pardon me if I
> > > > repeat
> > > > > already answered questions.
> > > > >
> > > > > It might be already discussed, but I think the version for the
> > > > "connected"
> > > > > system could be important as well.
> > > > >
> > > > > There might be some API changes between Iceberg 0.14.2, and 1.0.0,
> > > which
> > > > > would require as to rewrite part of the code for the Flink-Iceberg
> > > > > connector.
> > > > > It would be important for the users:
> > > > > - Which Flink version(s) are this connector working with?
> > > > > - Which Iceberg version(s) are this connector working with?
> > > > > - Which code version we have for this connector?
> > > > >
> > > > > Does this make sense? What is the community's experience with the
> > > > connected
> > > > > systems? Are they stable enough for omitting their version number
> > from
> > > > the
> > > > > naming of the connectors? Would this worth the proliferation of the
> > > > > versions?
> > > > >
> > > > > Thanks,
> > > > > Peter
> > > > >
> > > > > Chesnay Schepler  ezt írta (időpont: 2022.
> > szept.
> > > > 29.,
> > > > > Cs, 14:11):
> > > > >
> > > > > > 2) No; the branch names would not have a Flink version in them;
> > > v1.0.0,
> > > > > > v1.0.1 etc.
> > > > > >
> > > > > > On 29/09/2022 14:03, Martijn Visser wrote:
> > > > > > > If I summarize it correctly, that means that:
> > > > > > >
> > > > > > > 1. The versioning scheme would be  > > > > > > version>-, where there
> will
> > > > never
> > > > > > be a
> > > > > > > patch release for a minor version if a newer minor version
> > already
> > > > > > exists.
> > > > > > > E.g., 1.0.0-1.15; 1.0.1-1.15; 1.1.0-1.15; 1.2.0-1.15;
> > > > > > >
> > > > > > > 2. The branch naming scheme would be
> > > > > vmajo

Re: [DISCUSS] Reference operator from Flink Kubernetes deployment docs

2022-10-12 Thread Thomas Weise
+1


On Wed, Oct 12, 2022 at 5:03 PM Martijn Visser 
wrote:

> +1 from my end to include the operator in the related Kubernetes sections
> of the Flink docs
>
> On Wed, Oct 12, 2022 at 5:31 PM Chesnay Schepler 
> wrote:
>
> > I don't see a reason for why we shouldn't at least mention the operator
> > in the kubernetes docs.
> >
> > On 12/10/2022 16:25, Gyula Fóra wrote:
> > > Hi Devs!
> > >
> > > I would like to start a discussion about referencing the Flink
> Kubernetes
> > > Operator directly from the Flink Kubernetes deployment documentation.
> > >
> > > Currently the Flink deployment/resource provider docs provide some
> > > information for the Standalone and Native Kubernetes integration
> without
> > > any reference to the operator.
> > >
> > > I think we reached a point with the operator where we should provide a
> > bit
> > > more visibility and value to the users by directly proposing to use the
> > > operator when considering Flink on Kubernetes. We should definitely
> keep
> > > the current docs but make the point that for most users the easiest way
> > to
> > > use Flink on Kubernetes is probably through the operator (where they
> can
> > > now benefit from both standalone and native integration under the
> hood).
> > > This should help us avoid cases where a new user completely misses the
> > > existence of the operator when starting out based on the Flink docs.
> > >
> > > What do you think?
> > >
> > > Gyula
> > >
> >
> >
>


dev@flink.apache.org

2022-10-12 Thread Thomas Weise
"Branches are not specific to a Flink version. (i.e., no v3.2-1.15)"

Sorry for the late question. I could not find in the discussion thread how
a connector can make use of features of the latest Flink version that were
not present in the previous Flink version, when branches cannot be Flink
version specific?

Thanks,
Thomas

On Wed, Oct 12, 2022 at 4:09 PM Ferenc Csaky 
wrote:

> +1 from my side (non-binding)
>
> Best,
> F
>
>
> --- Original Message ---
> On Wednesday, October 12th, 2022 at 15:47, Martijn Visser <
> martijnvis...@apache.org> wrote:
>
>
> >
> >
> > +1 (binding), I am indeed assuming that Chesnay meant the last two minor
> > versions as supported.
> >
> > Op wo 12 okt. 2022 om 20:18 schreef Danny Cranmer
> dannycran...@apache.org
> >
> > > Thanks for the concise summary Chesnay.
> > >
> > > +1 from me (binding)
> > >
> > > Just one clarification, for "3.1) The Flink versions supported by the
> > > project (last 2 major Flink versions) must be supported.". Do we
> actually
> > > mean major here, as in Flink 1.x.x and 2.x.x? Right now we would only
> > > support Flink 1.15.x and not 1.14.x? I would be inclined to support the
> > > latest 2 minor Flink versions (major.minor.patch) given that we only
> have 1
> > > active major Flink version.
> > >
> > > Danny
> > >
> > > On Wed, Oct 12, 2022 at 2:12 PM Chesnay Schepler ches...@apache.org
> > > wrote:
> > >
> > > > Since the discussion
> > > > (https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo)
> has
> > > > stalled a bit but we need a conclusion to move forward I'm opening a
> > > > vote.
> > > >
> > > > Proposal summary:
> > > >
> > > > 1) Branch model
> > > > 1.1) The default branch is called "main" and used for the next major
> > > > iteration.
> > > > 1.2) Remaining branches are called "vmajor.minor". (e.g., v3.2)
> > > > 1.3) Branches are not specific to a Flink version. (i.e., no
> v3.2-1.15)
> > > >
> > > > 2) Versioning
> > > > 2.1) Source releases: major.minor.patch
> > > > 2.2) Jar artifacts: major.minor.match-flink-major.flink-minor
> > > > (This may imply releasing the exact same connector jar multiple times
> > > > under different versions)
> > > >
> > > > 3) Flink compatibility
> > > > 3.1) The Flink versions supported by the project (last 2 major Flink
> > > > versions) must be supported.
> > > > 3.2) How this is achived is left to the connector, as long as it
> > > > conforms to the rest of the proposal.
> > > >
> > > > 4) Support
> > > > 4.1) The last 2 major connector releases are supported with only the
> > > > latter receiving additional features, with the following exceptions:
> > > > 4.1.a) If the older major connector version does not support any
> > > > currently supported Flink version, then it is no longer supported.
> > > > 4.1.b) If the last 2 major versions do not cover all supported Flink
> > > > versions, then the latest connector version that supports the older
> > > > Flink version /additionally /gets patch support.
> > > > 4.2) For a given major connector version only the latest minor
> version
> > > > is supported.
> > > > (This means if 1.1.x is released there will be no more 1.0.x release)
> > > >
> > > > I'd like to clarify that these won't be set in stone for eternity.
> > > > We should re-evaluate how well this model works over time and adjust
> it
> > > > accordingly, consistently across all connectors.
> > > > I do believe that as is this strikes a good balance between
> > > > maintainability for us and clarity to users.
> > > >
> > > > Voting schema:
> > > >
> > > > Consensus, committers have binding votes, open for at least 72 hours.
> >
> > --
> > Martijn
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
>


Re: Re: Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-12 Thread Martijn Visser
Hi,

Just to double check: while I believe most Iceberg content refers to
Iceberg as a format, it's considered as a connector in Flink terms, right?
So there is no 'format' option to specify.

And one more question: is there a specific goal or reason why you would
want to contribute this to Flink instead of keeping it under the Iceberg
umbrella?

Best regards,

Martijn

Op wo 12 okt. 2022 om 20:11 schreef Danny Cranmer 

> +1 from me, let's open a VOTE thread to make this official.
>
> On Wed, Oct 12, 2022 at 1:37 PM Hang Ruan  wrote:
>
> > +1, I have used the iceberg connector and catalog before. It is easy to
> > use.
> >
> > Márton Balassi  于2022年10月12日周三 16:38写道:
> >
> > > +1 from me, thanks for the clarification.
> > >
> > > On Wed, Oct 12, 2022 at 7:57 AM Péter Váry <
> peter.vary.apa...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Abid,
> > > >
> > > > Count me in, and drop a note, if I can help in any way.
> > > >
> > > > Thanks,
> > > > Peter
> > > >
> > > > On Tue, Oct 11, 2022, 20:13  wrote:
> > > >
> > > > > Hi Martijn,
> > > > >
> > > > > Yes catalog integration exists and catalogs can be created using
> > Flink
> > > > > SQL.
> > > > >
> > > > >
> > > >
> > >
> >
> https://iceberg.apache.org/docs/latest/flink/#creating-catalogs-and-using-catalogs
> > > > > has more details.
> > > > > We may need some discussion within Iceberg community but based on
> the
> > > > > current iceberg-flink code structure we are looking to externalize
> > this
> > > > as
> > > > > well.
> > > > >
> > > > > Thanks
> > > > > Abid
> > > > >
> > > > >
> > > > > On 2022/10/11 08:24:44 Martijn Visser wrote:
> > > > > > Hi Abid,
> > > > > >
> > > > > > Thanks for the FLIP. I have a question about Iceberg's Catalog:
> has
> > > > that
> > > > > > integration between Flink and Iceberg been created already and
> are
> > > you
> > > > > > looking to externalize that as well?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Martijn
> > > > > >
> > > > > > On Tue, Oct 11, 2022 at 12:14 AM 
> wrote:
> > > > > >
> > > > > > > Hi Marton,
> > > > > > >
> > > > > > > Yes, we are initiating this as part of the Externalize Flink
> > > > Connectors
> > > > > > > effort. Plan is to externalize the existing Flink connector
> from
> > > > > Iceberg
> > > > > > > repo into a separate repo under the Flink umbrella.
> > > > > > >
> > > > > > > Sorry about the doc permissions! I was able to create a
> FLIP-267:
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > > > > <
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+267%3A+Iceberg+Connector
> > > > > > > >
> > > > > > > Lets use that to discuss.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Abid
> > > > > > >
> > > > > > > On 2022/10/10 12:57:32 Márton Balassi wrote:
> > > > > > > > Hi Abid,
> > > > > > > >
> > > > > > > > Just to clarify does your suggestion mean that the Iceberg
> > > > community
> > > > > > > would
> > > > > > > > like to remove the iceberg-flink connector from the Iceberg
> > > > codebase
> > > > > and
> > > > > > > > maintain it under Flink instead? A new separate repo under
> the
> > > > Flink
> > > > > > > > project umbrella given the current existing effort to extract
> > > > > connectors
> > > > > > > to
> > > > > > > > their individual repos (externalize) makes sense to me.
> > > > > > > >
> > > > > > > > [1]
> > > > https://lists.apache.org/thread/mpzzlpob9ymkjfybm96vz2y2m5fjyvfo
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Marton
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Oct 10, 2022 at 5:31 AM Jingsong Li  >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Abid for driving.
> > > > > > > > >
> > > > > > > > > +1 for this.
> > > > > > > > >
> > > > > > > > > Can you open the permissions for
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > > > > > ?
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Jingsong
> > > > > > > > >
> > > > > > > > > On Mon, Oct 10, 2022 at 9:22 AM Abid Mohammed
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I would like to start a discussion about contributing
> > Iceberg
> > > > > Flink
> > > > > > > > > Connector to Flink.
> > > > > > > > > >
> > > > > > > > > > I created a doc <
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing
> > > > > > > >
> > > > > > > > > with all the details following the Flink Connector template
> > as
> > > I
> > > > > don’t
> > > > > > > have
> > > > > > > > > permissions to create a FLIP yet.
> > > > > > > > > > High level details are captured below:
> > > > > > > > > >
> > > > > > > > > > Motivation:
> > > > > > > > > >
> >

Re: [VOTE] Remove HCatalog connector

2022-10-12 Thread Jingsong Li
+1

Thanks for driving.

Best,
Jingsong

On Thu, Oct 13, 2022 at 3:46 AM Martijn Visser  wrote:
>
> Hi everyone,
>
> Since no comments were made, I'm opening a vote to remove the HCatalog
> connector [1]
>
> The voting period will be open at least 72hrs.
>
> Best regards,
>
> Martijn
>
> [1]
> https://lists.apache.org/thread/j8jc5zrhnqlv8y3lkmc3wdo9ysgmsr84
> --
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser


[RESULT][VOTE] Apache Flink Table Store 0.2.1, release candidate #2

2022-10-12 Thread Jingsong Li
I'm happy to announce that we have unanimously approved this release.

There are 3 approving votes, 3 of which are binding:
* Yu Li (binding)
* Jark Wu (binding)
* Jingsong Lee (binding)

There are no disapproving votes.

Thank you for verifying the release candidate. I will now proceed to
finalize the release and announce it once everything is published.

Best,
Jingsong


[jira] [Created] (FLINK-29610) Infinite timeout is used in SavepointHandlers calls to RestfulGateway

2022-10-12 Thread Jiale Tan (Jira)
Jiale Tan created FLINK-29610:
-

 Summary: Infinite timeout is used in SavepointHandlers calls to 
RestfulGateway
 Key: FLINK-29610
 URL: https://issues.apache.org/jira/browse/FLINK-29610
 Project: Flink
  Issue Type: Bug
  Components: Runtime / REST
Reporter: Jiale Tan


In {{{}SavepointHandlers{}}}, both 
{{[StopWithSavepointHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L214]}}
 and 
{{[SavepointTriggerHandler|https://github.com/apache/flink/blob/cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.java#L258]}}
 are calling {{RestfulGateway}} with {{RpcUtils.INF_TIMEOUT}}

 

As pointed out in 
[this|https://github.com/apache/flink/pull/20852#discussion_r992218970] 
discussion, we will need to either figure out why {{RpcUtils.INF_TIMEOUT}} is 
used, or remove it if there is no strong reason to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Remove HCatalog connector

2022-10-12 Thread Hang Ruan
+1,

Best,
Hang

Jingsong Li  于2022年10月13日周四 10:09写道:

> +1
>
> Thanks for driving.
>
> Best,
> Jingsong
>
> On Thu, Oct 13, 2022 at 3:46 AM Martijn Visser 
> wrote:
> >
> > Hi everyone,
> >
> > Since no comments were made, I'm opening a vote to remove the HCatalog
> > connector [1]
> >
> > The voting period will be open at least 72hrs.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1]
> > https://lists.apache.org/thread/j8jc5zrhnqlv8y3lkmc3wdo9ysgmsr84
> > --
> > Martijn
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
>


[ANNOUNCE] Apache Flink Table Store 0.2.1 released

2022-10-12 Thread Jingsong Lee
The Apache Flink community is very happy to announce the release of
Apache Flink Table Store 0.2.1.

Apache Flink Table Store is a unified storage to build dynamic tables
for both streaming and batch processing in Flink, supporting
high-speed data ingestion and timely data query.

Please check out the release blog post for an overview of the release:
https://flink.apache.org/news/2022/10/13/release-table-store-0.2.1.html

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Flink Table Store can be found at:
https://search.maven.org/search?q=g:org.apache.flink%20table-store

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352257

We would like to thank all contributors of the Apache Flink community
who made this release possible!

Best,
Jingsong Lee


Re: [VOTE] Remove HCatalog connector

2022-10-12 Thread yuxia
+1 (non-binding)
Thanks for driving.

Best regards,
Yuxia

- 原始邮件 -
发件人: "Hang Ruan" 
收件人: "dev" 
发送时间: 星期四, 2022年 10 月 13日 上午 10:16:46
主题: Re: [VOTE] Remove HCatalog connector

+1,

Best,
Hang

Jingsong Li  于2022年10月13日周四 10:09写道:

> +1
>
> Thanks for driving.
>
> Best,
> Jingsong
>
> On Thu, Oct 13, 2022 at 3:46 AM Martijn Visser 
> wrote:
> >
> > Hi everyone,
> >
> > Since no comments were made, I'm opening a vote to remove the HCatalog
> > connector [1]
> >
> > The voting period will be open at least 72hrs.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1]
> > https://lists.apache.org/thread/j8jc5zrhnqlv8y3lkmc3wdo9ysgmsr84
> > --
> > Martijn
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
>


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-12 Thread Yun Gao
Congratulations Danny!
Best,
Yun Gao
--
From:yuxia 
Send Time:2022 Oct. 12 (Wed.) 09:49
To:dev 
Subject:Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations Danny!
Best regards,
Yuxia
- 原始邮件 -
发件人: "Xingbo Huang" 
收件人: "dev" 
发送时间: 星期三, 2022年 10 月 12日 上午 9:44:22
主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
Congratulations Danny!
Best,
Xingbo
Sergey Nuyanzin  于2022年10月12日周三 01:26写道:
> Congratulations, Danny
>
> On Tue, Oct 11, 2022, 15:18 Lincoln Lee  wrote:
>
> > Congratulations Danny!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Congxian Qiu  于2022年10月11日周二 19:42写道:
> >
> > > Congratulations Danny!
> > >
> > > Best,
> > > Congxian
> > >
> > >
> > > Leonard Xu  于2022年10月11日周二 18:03写道:
> > >
> > > > Congratulations Danny!
> > > >
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > >
> > >
> >
>


Re: [VOTE] Drop Gelly

2022-10-12 Thread Yun Gao
+1
Best,
Yun Gao
--
From:Martijn Visser 
Send Time:2022 Oct. 13 (Thu.) 04:59
To:dev 
Subject:[VOTE] Drop Gelly
Hi everyone,
I would like to open a vote for dropping Gelly, which was discussed a long
time ago but never put to a vote [1].
Voting will be open for at least 72 hours.
Best regards,
Martijn
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser
[1] https://lists.apache.org/thread/2m6wtgjvxcogbf9d5q7mqt4ofqjf2ojc


Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer

2022-10-12 Thread Hang Ruan
Congratulations Danny!

Best,
Hang

Yun Gao  于2022年10月13日周四 10:56写道:

> Congratulations Danny!
> Best,
> Yun Gao
> --
> From:yuxia 
> Send Time:2022 Oct. 12 (Wed.) 09:49
> To:dev 
> Subject:Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
> Congratulations Danny!
> Best regards,
> Yuxia
> - 原始邮件 -
> 发件人: "Xingbo Huang" 
> 收件人: "dev" 
> 发送时间: 星期三, 2022年 10 月 12日 上午 9:44:22
> 主题: Re: [ANNOUNCE] New Apache Flink PMC Member - Danny Cranmer
> Congratulations Danny!
> Best,
> Xingbo
> Sergey Nuyanzin  于2022年10月12日周三 01:26写道:
> > Congratulations, Danny
> >
> > On Tue, Oct 11, 2022, 15:18 Lincoln Lee  wrote:
> >
> > > Congratulations Danny!
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Congxian Qiu  于2022年10月11日周二 19:42写道:
> > >
> > > > Congratulations Danny!
> > > >
> > > > Best,
> > > > Congxian
> > > >
> > > >
> > > > Leonard Xu  于2022年10月11日周二 18:03写道:
> > > >
> > > > > Congratulations Danny!
> > > > >
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [Discuss]- Donate Iceberg Flink Connector

2022-10-12 Thread Zheng Yu Chen
+1, thanks to drive it

Abid Mohammed  于2022年10月10日周一 09:22写道:

> Hi,
>
> I would like to start a discussion about contributing Iceberg Flink
> Connector to Flink.
>
> I created a doc <
> https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing>
> with all the details following the Flink Connector template as I don’t have
> permissions to create a FLIP yet.
> High level details are captured below:
>
> Motivation:
>
> This FLIP aims to contribute the existing Apache Iceberg Flink Connector
> to Flink.
>
> Apache Iceberg is an open table format for huge analytic datasets. Iceberg
> adds tables to compute engines including Spark, Trino, PrestoDB, Flink,
> Hive and Impala using a high-performance table format that works just like
> a SQL table.
> Iceberg avoids unpleasant surprises. Schema evolution works and won’t
> inadvertently un-delete data. Users don’t need to know about partitioning
> to get fast queries. Iceberg was designed to solve correctness problems in
> eventually-consistent cloud object stores.
>
> Iceberg supports both Flink’s DataStream API and Table API. Based on the
> guideline of the Flink community, only the latest 2 minor versions are
> actively maintained. See the Multi-Engine Support#apache-flink for further
> details.
>
>
> Iceberg connector supports:
>
> • Source: detailed Source design <
> https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#>,
> based on FLIP-27
> • Sink: detailed Sink design and interfaces used <
> https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit#
> >
> • Usable in both DataStream and Table API/SQL
> • DataStream read/append/overwrite
> • SQL create/alter/drop table, select, insert into, insert
> overwrite
> • Streaming or batch read in Java API
> • Support for Flink’s Python API
>
> See Iceberg Flink  for
> detailed usage instructions.
>
> Looking forward to the discussion!
>
> Thanks
> Abid


Re: [DISCUSS] FLIP-263: Improve resolving schema compatibility

2022-10-12 Thread yanfei lei
Hi Hangxiang,


Thanks for raising the discussion, +1 for reversing the direction of
resolving schema compatibility.

As you described, in 'Step 1', Typeserializer#resolveSchemaCompatibility
will return TYPE.INCOMPATIBLE default,
Typeserializer#resolveSchemaCompatibility is a default method; in 'Step
2&3', you proposed deprecating
TypeSerializerSnapshot#resolveSchemaCompatibility for the long run, will
Typeserializer#resolveSchemaCompatibility become an abstract method that
must be implemented here?


After this FILP, the new serializer claims its compatibility with the old
serializer, does it support the migration from

built-in serializer -> custom serializer -> built-in serializer ?

as you mentioned, some built-in serializers are final class, we can't
change the #resolveSchemaCompatibility() method of them.


Best,

Yanfei

Zakelly Lan  于2022年10月13日周四 00:56写道:

> Hi Hangxiang,
>
> Thanks for driving this. It is reasonable to let the new serializer
> claim its compatibility with the old serializer. However, I think
> there is a little confusion as you described in your proposed change
> 'Step 1'. You mean that we let the new serializer check the
> compatibility first, and if it gives the 'INCOMPATIBLE' then we let
> the old serializer do the check as before. The 'INCOMPATIBLE' from the
> new serializer means 'do not know' or 'unresolved' here, but sometimes
> the new serializer should express a decisive meaning that it is really
> not compatible with the older one, which is the 'INCOMPATIBLE' should
> mean. There is a semantic ambiguity or missing. I think there are two
> options to resolve this:
> 1. Provide 'UNRESOLVED' in type of the compatibility checking result,
> and let the proposed new interface return it by default. OR
> 2. Replace any use of the old interface with the new interface. In the
> default implementation of the new interface, call the old interface to
> leverage the old result. This approach provides the ability to totally
> replace original checking logic (by implementing the new interface in
> new serializer) while maintaining good backward compatibility.
>
> What do you think?
>
> Best,
> Zakelly.
>
>
> On Wed, Oct 12, 2022 at 8:41 PM Hangxiang Yu  wrote:
> >
> > Dear Flink developers,
> >
> > I would like to start a discussion thread on FLIP-263[1] proposing to
> > improve the usability of resolving schema compatibility.
> >
> > Currently, the place for compatibility checks is
> > TypeSerializerSnapshot#resolveSchemaCompatibility
> > which belongs to the old serializer, There are no ways for users to
> specify the
> > compatibility with the old serializer in the new customized serializer.
> >
> > The FLIP hopes to reverse the direction of resolving schema compatibility
> > to improve the usability of resolving schema compatibility.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility
>


[jira] [Created] (FLINK-29611) Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest

2022-10-12 Thread Sopan Phaltankar (Jira)
Sopan Phaltankar created FLINK-29611:


 Summary: Fix flaky tests in CoBroadcastWithNonKeyedOperatorTest
 Key: FLINK-29611
 URL: https://issues.apache.org/jira/browse/FLINK-29611
 Project: Flink
  Issue Type: Bug
Reporter: Sopan Phaltankar


The test 
_org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport_
 has tge following failure:

Failures:
[ERROR]   CoBroadcastWithNonKeyedOperatorTest.testMultiStateSupport:74 
Wrong Side Output: arrays first differed at element [0]; expected:6> but was:5>

I analyzed the assertion failure and found that the root cause is because the 
test method calls ctx.getBroadcastState(STATE_DESCRIPTOR).immutableEntries() 
which calls the entrySet() method of the underlying HashMap. entrySet() returns 
the entries in a non-deterministic way, causing the test to be flaky. 

The fix would be to change _HashMap_ to _LinkedHashMap_ where the Map is 
getting initialized.
On further analysis, it was found that the Map is getting initialized on line 
53 of org.apache.flink.runtime.state.HeapBroadcastState class.

After changing from HashMap to LinkedHashMap, the above test is passing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Drop Gelly

2022-10-12 Thread Gyula Fóra
+1

On Thu, 13 Oct 2022 at 04:56, Yun Gao  wrote:

> +1
> Best,
> Yun Gao
> --
> From:Martijn Visser 
> Send Time:2022 Oct. 13 (Thu.) 04:59
> To:dev 
> Subject:[VOTE] Drop Gelly
> Hi everyone,
> I would like to open a vote for dropping Gelly, which was discussed a long
> time ago but never put to a vote [1].
> Voting will be open for at least 72 hours.
> Best regards,
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
> [1] https://lists.apache.org/thread/2m6wtgjvxcogbf9d5q7mqt4ofqjf2ojc
>


Re: [VOTE] Remove HCatalog connector

2022-10-12 Thread Gyula Fóra
+1

On Thu, 13 Oct 2022 at 04:53, yuxia  wrote:

> +1 (non-binding)
> Thanks for driving.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Hang Ruan" 
> 收件人: "dev" 
> 发送时间: 星期四, 2022年 10 月 13日 上午 10:16:46
> 主题: Re: [VOTE] Remove HCatalog connector
>
> +1,
>
> Best,
> Hang
>
> Jingsong Li  于2022年10月13日周四 10:09写道:
>
> > +1
> >
> > Thanks for driving.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Oct 13, 2022 at 3:46 AM Martijn Visser  >
> > wrote:
> > >
> > > Hi everyone,
> > >
> > > Since no comments were made, I'm opening a vote to remove the HCatalog
> > > connector [1]
> > >
> > > The voting period will be open at least 72hrs.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > [1]
> > > https://lists.apache.org/thread/j8jc5zrhnqlv8y3lkmc3wdo9ysgmsr84
> > > --
> > > Martijn
> > > https://twitter.com/MartijnVisser82
> > > https://github.com/MartijnVisser
> >
>


Re: [VOTE] Drop Gelly

2022-10-12 Thread Yu Li
Hi Martijn,

>From the last replies of the previous discussion, I could see there are
still users using Gelly that expressed their wish to somehow keep this
module [1] [2] [3]. Therefore, Before giving my vote, I'd like to
confirm what "Drop Gelly" exactly means. Does it mean that 1) we remove the
codes and documents of Gelly completely, or 2) just move it out of the main
repository into a separate one and mark it as EOM (or maybe call for
maintainers), or 3) something else? And if the #1 option, is there any
replacement (or any plan)? Thanks.

Best Regards,
Yu

[1] https://lists.apache.org/thread/4yxb7xnb2070h5lypcd3wxnsck9zwz8f
[2] https://lists.apache.org/thread/x21p382vjt6nrjnj51fxxtcrp1dqtzyz
[3] https://lists.apache.org/thread/w2f4yvb75tg5t7g3l7t9z6bvpwmd1t6y


On Thu, 13 Oct 2022 at 10:56, Yun Gao  wrote:

> +1
> Best,
> Yun Gao
> --
> From:Martijn Visser 
> Send Time:2022 Oct. 13 (Thu.) 04:59
> To:dev 
> Subject:[VOTE] Drop Gelly
> Hi everyone,
> I would like to open a vote for dropping Gelly, which was discussed a long
> time ago but never put to a vote [1].
> Voting will be open for at least 72 hours.
> Best regards,
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
> [1] https://lists.apache.org/thread/2m6wtgjvxcogbf9d5q7mqt4ofqjf2ojc
>


SQL Gateway and SQL Client

2022-10-12 Thread Alexey Leonov-Vendrovskiy
Hi all,

I’m Alexey from Confluent. This is my first email in this discussion list.
I’m rather new to Flink, and to local customs of communication. I want to
dive deeper and hopefully get more involved over time.

Currently I have a few questions around SQL Gateway and SQL Client.
Specifically I wanted to learn what is the vision around the nearest future
of these two components.

In what Flink’s release the connection from SQL Client to the Gateway is
expected to be added? I was looking at
https://issues.apache.org/jira/browse/FLINK-29486, and recently it got
renamed from “Enable SQL Client to Connect SQL Gateway in Remote Mode” to
“Introduce Client Parser to get statement type”.  I did some search, but
didn’t find a good place where the client's work in this direction is
discussed or tracked.

A couple questions about the SQL Gateway. The FLIP-91

mentions “Authentication module” (2) and “Persistent Gateway” (4) as
possible future work. Were there any recent discussions on these subjects?
Or maybe there are some ideas how to move these directions forward? Another
related topic: are there ideas around making SQL Gateway a multi-tenant
component?

Thank you,

Alexey


[jira] [Created] (FLINK-29612) Extract changelog files out of DataFileMeta#extraFiles

2022-10-12 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-29612:
---

 Summary: Extract changelog files out of DataFileMeta#extraFiles
 Key: FLINK-29612
 URL: https://issues.apache.org/jira/browse/FLINK-29612
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Caizhi Weng
 Fix For: table-store-0.3.0


Currently changelog files are stored as extra files in {{DataFileMeta}}. 
However for the full compaction changelog we're about to introduce, it cannot 
be added as extra files because their statistics might be different from the 
corresponding merge tree files.

We need to extract changelog files out of DataFileMeta#extraFiles.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Adding client.id.prefix to the KafkaSink

2022-10-12 Thread Mason Chen
Hi Yaroslav,

+1 from my end too. I get some questions internally on the warning logs due
to conflicting client-id and it would be nice to resolve them.

Best,
Mason

On Wed, Oct 12, 2022 at 2:03 PM Martijn Visser 
wrote:

> Hi Yaroslav,
>
> +1 from my end. Thanks for bringing this up!
>
> Best regards,
>
> Martijn
>
> On Wed, Oct 12, 2022 at 6:34 PM Yaroslav Tkachenko
>  wrote:
>
> > Hi everyone,
> >
> > I'd like to propose adding client.id.prefix to the KafkaSink to mirror
> the
> > functionality provided by the KafkaSource.
> >
> > Defining client.id is very important when running workloads with many
> > different Kafka clients: they help with identification and enforcing
> > quotas. Due to the specific implementation details, you can't simply set
> > client.id for a KafkaSource or KafkaSink in Flink, you can find more
> > context in https://issues.apache.org/jira/browse/FLINK-8093, which was
> > opened in 2017 and is still unresolved.
> >
> > I don't see any reason to treat KafkaSink differently: if we can define
> > a client.id.prefix for the KafkaSource we should be able to define it for
> > the KafkaSink.
> >
> > I've created an issue https://issues.apache.org/jira/browse/FLINK-28842
> > and
> > a PR https://github.com/apache/flink/pull/20475 to resolve this.
> >
>


Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-10-12 Thread Mason Chen
Hi Ryan,

Thanks for the additional context! Yes, the offset initializer would need
to take a cluster as a parameter and the MultiClusterKafkaSourceSplit can
be exposed in an initializer.

Best,
Mason

On Thu, Oct 6, 2022 at 11:00 AM Ryan van Huuksloot <
ryan.vanhuuksl...@shopify.com> wrote:

> Hi Mason,
>
> Thanks for the clarification! In regards to the addition to the
> OffsetInitializer of this API - this would be an awesome addition and I
> think this entire FLIP would be a great addition to the Flink.
>
> To provide more context as to why we need particular offsets, we use
> Hybrid Source to currently backfill from buckets prior to reading from
> Kafka. We have a service that will tell us what offset has last been loaded
> into said bucket which we will use to initialize the KafkaSource
> OffsetsInitializer. We couldn't use a timestamp here and the offset would
> be different for each Cluster.
>
> In pseudocode, we'd want the ability to do something like this with
> HybridSources - if this is possible.
>
> ```scala
> val offsetsMetadata: Map[TopicPartition, Long] = // Get current offsets
> from OffsetReaderService
> val multiClusterArchiveSource: MultiBucketFileSource[T] = // Data is read
> from different buckets (multiple topics)
> val multiClusterKafkaSource: MultiClusterKafkaSource[T] =
> MultiClusterKafkaSource.builder()
>   .setKafkaMetadataService(new KafkaMetadataServiceImpl())
>   .setStreamIds(List.of("my-stream-1", "my-stream-2"))
>   .setGroupId("myConsumerGroup")
>
> .setDeserializer(KafkaRecordDeserializationSchema.valueOnly(StringDeserializer.class))
>   .setStartingOffsets(offsetsMetadata)
>   .setProperties(properties)
>   .build()
> val source =
> HybridSource.builder(multiClusterArchiveSource).addSource(multiClusterKafkaSource).build()
> ```
>
> Few notes:
> - TopicPartition won't work because the topic may be the same name as this
> is something that is supported IIRC
> - I chose to pass a map into starting offsets just for demonstrative
> purposes, I would be fine with whatever data structure would work best
>
> Ryan van Huuksloot
> Data Developer | Production Engineering | Streaming Capabilities
> [image: Shopify]
> 
>
>
> On Mon, Oct 3, 2022 at 11:29 PM Mason Chen  wrote:
>
>> Hi Ryan,
>>
>> Just copying your message over to the email chain.
>>
>> Hi Mason,
>>> First off, thanks for putting this FLIP together! Sorry for the delay.
>>> Full disclosure Mason and I chatted a little bit at Flink Forward 2022 but
>>> I have tried to capture the questions I had for him then.
>>> I'll start the conversation with a few questions:
>>> 1. The concept of streamIds is not clear to me in the proposal and could
>>> use some more information. If I understand correctly, they will be used in
>>> the MetadataService to link KafkaClusters to ones you want to use? If you
>>> assign stream ids using `setStreamIds`, how can you dynamically increase
>>> the number of clusters you consume if the list of StreamIds is static? I am
>>> basing this off of your example .setStreamIds(List.of("my-stream-1",
>>> "my-stream-2")) so I could be off base with my assumption. If you don't
>>> mind clearing up the intention, that would be great!
>>> 2. How would offsets work if you wanted to use this
>>> MultiClusterKafkaSource with a file based backfill? In the case I am
>>> thinking of, you have a bucket backed archive of Kafka data per cluster.
>>> and you want to pick up from the last offset in the archived system, how
>>> would you set OffsetInitializers "per cluster" potentially as a function or
>>> are you limited to setting an OffsetInitializer for the entire Source?
>>> 3. Just to make sure - because this system will layer on top of Flink-27
>>> and use KafkaSource for some aspects under the hood, the watermark
>>> alignment that was introduced in FLIP-182 / Flink 1.15 would be possible
>>> across multiple clusters if you assign them to the same alignment group?
>>> Thanks!
>>> Ryan
>>
>>
>> 1. The stream ids are static--however, what the physical clusters and
>> topics that they map to can mutate. Let's say my-stream-1 maps to cluster-1
>> and topic-1. The KafkaMetadataService can return a different mapping when
>> metadata is fetched the next time e.g. my-stream-1 mapping to cluster-1 and
>> topic-1, and cluster-2 and topic-2. Let me add more details on how the
>> KafkaMetadataService is used.
>> 2. The current design limits itself to a single configured
>> OffsetInitializer that is used for every underlying KafkaSource.
>> 3. Yes, it is in our plan to integrate this source with watermark
>> alignment in which the user can align watermarks from all clusters within
>> the single. It will leverage the Kafka Source implementation to achieve
>> this.
>>
>> With regards to 2, it's an interesting idea. I think we can extend the
>> design to support a map of offset initializers to clusters, which would
>> solve your file based backfill. If y

Re: SQL Gateway and SQL Client

2022-10-12 Thread yuxia
> In what Flink’s release the connection from SQL Client to the Gateway is
expected to be added? 
Flink 1.17

> “Authentication module” (2) and “Persistent Gateway” (4) as
possible future work. Were there any recent discussions on these subjects?
No recent discussions on these subjects, but I think it'll come in Flink 1.17

> Another related topic: are there ideas around making SQL Gateway a 
> multi-tenant
component?
Yes.

Shengkaiis the maintainer of SQL Client and SQL gateway, maybe he can provide 
more information.



Best regards,
Yuxia

- 原始邮件 -
发件人: "Alexey Leonov-Vendrovskiy" 
收件人: "dev" 
发送时间: 星期四, 2022年 10 月 13日 下午 12:33:08
主题: SQL Gateway and SQL Client

Hi all,

I’m Alexey from Confluent. This is my first email in this discussion list.
I’m rather new to Flink, and to local customs of communication. I want to
dive deeper and hopefully get more involved over time.

Currently I have a few questions around SQL Gateway and SQL Client.
Specifically I wanted to learn what is the vision around the nearest future
of these two components.

In what Flink’s release the connection from SQL Client to the Gateway is
expected to be added? I was looking at
https://issues.apache.org/jira/browse/FLINK-29486, and recently it got
renamed from “Enable SQL Client to Connect SQL Gateway in Remote Mode” to
“Introduce Client Parser to get statement type”.  I did some search, but
didn’t find a good place where the client's work in this direction is
discussed or tracked.

A couple questions about the SQL Gateway. The FLIP-91

mentions “Authentication module” (2) and “Persistent Gateway” (4) as
possible future work. Were there any recent discussions on these subjects?
Or maybe there are some ideas how to move these directions forward? Another
related topic: are there ideas around making SQL Gateway a multi-tenant
component?

Thank you,

Alexey


Re: [VOTE] Remove HCatalog connector

2022-10-12 Thread Leonard Xu
Thanks martijn for driving this work.

+1

Best,
Leonard


> 2022年10月13日 下午12:29,Gyula Fóra  写道:
> 
> +1
> 
> On Thu, 13 Oct 2022 at 04:53, yuxia  wrote:
> 
>> +1 (non-binding)
>> Thanks for driving.
>> 
>> Best regards,
>> Yuxia
>> 
>> - 原始邮件 -
>> 发件人: "Hang Ruan" 
>> 收件人: "dev" 
>> 发送时间: 星期四, 2022年 10 月 13日 上午 10:16:46
>> 主题: Re: [VOTE] Remove HCatalog connector
>> 
>> +1,
>> 
>> Best,
>> Hang
>> 
>> Jingsong Li  于2022年10月13日周四 10:09写道:
>> 
>>> +1
>>> 
>>> Thanks for driving.
>>> 
>>> Best,
>>> Jingsong
>>> 
>>> On Thu, Oct 13, 2022 at 3:46 AM Martijn Visser >> 
>>> wrote:
 
 Hi everyone,
 
 Since no comments were made, I'm opening a vote to remove the HCatalog
 connector [1]
 
 The voting period will be open at least 72hrs.
 
 Best regards,
 
 Martijn
 
 [1]
 https://lists.apache.org/thread/j8jc5zrhnqlv8y3lkmc3wdo9ysgmsr84
 --
 Martijn
 https://twitter.com/MartijnVisser82
 https://github.com/MartijnVisser
>>> 
>> 



[jira] [Created] (FLINK-29613) An error occurred during the running of the Flink pulsar,it shows "We only support normal message id currently."

2022-10-12 Thread qiaomengnan (Jira)
qiaomengnan created FLINK-29613:
---

 Summary: An error occurred during the running of the Flink 
pulsar,it shows "We only support normal message id currently."
 Key: FLINK-29613
 URL: https://issues.apache.org/jira/browse/FLINK-29613
 Project: Flink
  Issue Type: Bug
Reporter: qiaomengnan


java.lang.RuntimeException: One or more fetchers have encountered exception
at 
nextMessageIdorg.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:225)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169)
at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:130)
at 
org.apache.flink.connector.pulsar.source.reader.source.PulsarOrderedSourceReader.pollNext(PulsarOrderedSourceReader.java:109)
at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:385)
at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:519)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: SplitFetcher thread 1 received 
unexpected exception while polling the records
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:150)
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:105)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Suppressed: java.lang.RuntimeException: SplitFetcher thread 0 received 
unexpected exception while polling the records
... 7 more
Caused by: java.lang.IllegalArgumentException: We only support normal message 
id currently.
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
at 
org.apache.flink.connector.pulsar.source.enumerator.cursor.MessageIdUtils.unwrapMessageId(MessageIdUtils.java:61)
at 
org.apache.flink.connector.pulsar.source.enumerator.cursor.MessageIdUtils.nextMessageId(MessageIdUtils.java:43)
at 
org.apache.flink.connector.pulsar.source.reader.split.PulsarOrderedPartitionSplitReader.beforeCreatingConsumer(PulsarOrderedPartitionSplitReader.java:94)
at 
org.apache.flink.connector.pulsar.source.reader.split.PulsarPartitionSplitReaderBase.handleSplitsChanges(PulsarPartitionSplitReaderBase.java:160)
at 
org.apache.flink.connector.pulsar.source.reader.split.PulsarOrderedPartitionSplitReader.handleSplitsChanges(PulsarOrderedPartitionSplitReader.java:52)
at 
org.apache.flink.connector.base.source.reader.fetcher.AddSplitsTask.run(AddSplitsTask.java:51)
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142)
... 6 more
Caused by: java.lang.IllegalArgumentException: We only support normal message 
id currently.
at org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:138)
at 
org.apache.flink.connector.pulsar.source.enumerator.cursor.MessageIdUtils.unwrapMessageId(MessageIdUtils.java:61)
at 
org.apache.flink.connector.pulsar.source.enumerator.cursor.MessageIdUtils.nextMessageId(MessageIdUtils.java:43)
at 
org.apache.flink.connector.pulsar.source.reader.split.PulsarOrderedPartitionSplitReader.beforeCreatingConsumer(PulsarOrderedPartitionSplitReader.java:94)
at 
org.apache.flink.connector.pulsar.source.reader.split.PulsarPartitionSplitReaderBase.handleSplitsChanges(PulsarPartitionSplitReaderBase.java:160)
at 
org.apache.flink.connector.pulsar.source.reader.split.PulsarOrderedPartitionSplitReader.handleSplitsChanges(PulsarOrderedPartitionSplitReader.java:52)
at 
org.apache.flink.connector.base.source.reader.fetcher.AddSplitsTask.run(AddSplitsTask.java:51)
at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142)
... 6 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Adding client.id.prefix to the KafkaSink

2022-10-12 Thread Hang Ruan
Hi Yaroslav,

+1 from me.  It is meaningful to keep the common Kafka client configuration
the same between Kafka source and sink.

Best,
Hang

Mason Chen  于2022年10月13日周四 13:51写道:

> Hi Yaroslav,
>
> +1 from my end too. I get some questions internally on the warning logs due
> to conflicting client-id and it would be nice to resolve them.
>
> Best,
> Mason
>
> On Wed, Oct 12, 2022 at 2:03 PM Martijn Visser 
> wrote:
>
> > Hi Yaroslav,
> >
> > +1 from my end. Thanks for bringing this up!
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, Oct 12, 2022 at 6:34 PM Yaroslav Tkachenko
> >  wrote:
> >
> > > Hi everyone,
> > >
> > > I'd like to propose adding client.id.prefix to the KafkaSink to mirror
> > the
> > > functionality provided by the KafkaSource.
> > >
> > > Defining client.id is very important when running workloads with many
> > > different Kafka clients: they help with identification and enforcing
> > > quotas. Due to the specific implementation details, you can't simply
> set
> > > client.id for a KafkaSource or KafkaSink in Flink, you can find more
> > > context in https://issues.apache.org/jira/browse/FLINK-8093, which was
> > > opened in 2017 and is still unresolved.
> > >
> > > I don't see any reason to treat KafkaSink differently: if we can define
> > > a client.id.prefix for the KafkaSource we should be able to define it
> for
> > > the KafkaSink.
> > >
> > > I've created an issue
> https://issues.apache.org/jira/browse/FLINK-28842
> > > and
> > > a PR https://github.com/apache/flink/pull/20475 to resolve this.
> > >
> >
>


Re: [DISCUSS] FLIP-265 Deprecate and remove Scala API support

2022-10-12 Thread Salva Alcántara
Hi Martijn,

Maybe a bit of an off-topic, but regarding Java 17 support, will it be
possible to replace POJOs with Java records in existing applications?

In a project I maintain we use Lombok a lot, but with Java records we would
probably stop using it (or significantly reduce its usage).

Will there be a way to promote existing POJOs (either written "manually" or
using Lombok) to Java records without breaking serialization? (assuming
that those POJOs are used as immutable values, e.g., setters are never
used).

Regards,

Salva

On Wed, Oct 12, 2022 at 9:11 PM Martijn Visser 
wrote:

> Hi everyone,
>
> Thanks again for all your feedback. It's very much appreciated.
>
> My overall feeling is that people are not opposed to the FLIP. There is
> demand for adding Java 17 support before dropping the Scala APIs. Given
> that the proposal for actually dropping the Scala APIs would only happen
> with a Flink 2.0 and Java 17 support would either happen in a new minor
> version or a new major version (I haven't seen a FLIP or discussion being
> opened adding Java 17 support, only on deprecating Java 8), Java 17 support
> would either be there earlier (in a new minor version) or at the same time
> (with Flink 2.0) when the Scala APIs would be dropped.
>
> If there are no more discussion topics, I would move this FLIP to a vote
> at the beginning of next week.
>
> Best regards,
>
> Martijn
>
> On Sun, Oct 9, 2022 at 10:36 AM guenterh.lists 
> wrote:
>
>> Hi Martijn
>>
>> I do not maintain a large production application based on Flink, so it
>> would not be a problem for me to convert existing implementations to
>> whatever API.
>>
>> I am working in the area of cultural heritage, which is mainly about the
>> processing of structured (meta)-data (scientific libraries, archives and
>> museums)
>> My impression: People without much background/experience with Java
>> implementations find it easier to get into the functional mindset as
>> supported in Scala. That's why I think it would be very unfortunate if the
>> use of Scala in Flink becomes more and more limited or neglected.
>>
>> I think using the Java API in Scala is a possible way also in my
>> environment.
>>
>> In the last weeks I tried to port the examples from the "Flink Course" of
>> Daniel Ciorcilan (https://rockthejvm.com/p/flink - he mainly offers
>> Scala courses), which are exclusively based on the native Scala API, to the
>> Java API. This has worked without any problems as far as I can see. So far
>> I haven't tried any examples based on the Table API or streaming workflows
>> in batch mode (which would be important for our environment).
>>
>> My main trouble: So far I don't know enough about the limitations of
>> using the Java API in a Scala implementation and what that means. My
>> current understanding: the limitation is mainly in deriving the type
>> information in generic APIs with Scala types. For me it would be very
>> significant and helpful if there would be more information, descriptions
>> and examples about this topic.
>>
>> So far unfortunately I had too little time to deal with a wrapper like
>> flink-scala-api (https://github.com/findify/flink-scala-api ) and the
>> current alternative is probably going to be deprecated in the future (
>> https://github.com/ariskk/flink4s/issues/17#issuecomment-1125806808 )
>>
>> Günter
>>
>>
>> On 04.10.22 13:58, Martijn Visser wrote:
>>
>> Hi Marton,
>>
>> You're making a good point, I originally wanted to include already the
>> User mailing list to get their feedback but forgot to do so. I'll do some
>> more outreach via other channels as well.
>>
>> @Users of Flink, I've made a proposal to deprecate and remove Scala API
>> support in a future version of Flink. Your feedback on this topic is very
>> much appreciated.
>>
>> Regarding the large Scala codebase for Flink, a potential alternative
>> could be to have a wrapper for all Java APIs that makes them available as
>> Scala APIs. However, this still requires Scala maintainers and I don't
>> think that we currently have those in our community. The easiest solution
>> for them would be to use the Java APIs directly. Yes it would involve work,
>> but we won't actually be able to remove the Scala APIs until Flink 2.0 so
>> there's still time for that :)
>>
>> Best regards,
>>
>> Martijn
>>
>> On Tue, Oct 4, 2022 at 1:26 AM Márton Balassi 
>> wrote:
>>
>>> Hi Martjin,
>>>
>>> Thanks for compiling the FLIP. I agree with the sentiment that Scala
>>> poses
>>> considerable maintenance overhead and key improvements (like 2.13 or
>>> 2.12.8
>>> supports) are hanging stale. With that said before we make this move we
>>> should attempt to understand the userbase affected.
>>> A quick Slack and user mailing list search does return quite a bit of
>>> results for scala (admittedly a cursory look at them suggest that many of
>>> them have to do with missing features in Scala that exist in Java or
>>> Scala
>>> versions). I would love to see some polls on this topi

[jira] [Created] (FLINK-29614) Introduce Spark writer for table store

2022-10-12 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-29614:


 Summary: Introduce Spark writer for table store
 Key: FLINK-29614
 URL: https://issues.apache.org/jira/browse/FLINK-29614
 Project: Flink
  Issue Type: New Feature
  Components: Table Store
Reporter: Jingsong Lee


The main difficulty is that the Spark SourceV2 interface currently does not 
support custom distribution, and the Table Store must have consistent 
distribution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29615) MetricStore does not remove metrics of nonexistent subtasks when adaptive scheduler lowers job parallelism

2022-10-12 Thread Zhanghao Chen (Jira)
Zhanghao Chen created FLINK-29615:
-

 Summary: MetricStore does not remove metrics of nonexistent 
subtasks when adaptive scheduler lowers job parallelism
 Key: FLINK-29615
 URL: https://issues.apache.org/jira/browse/FLINK-29615
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics, Runtime / REST
Affects Versions: 1.15.0, 1.16.0
Reporter: Zhanghao Chen
 Fix For: 1.17.0


*Problem*

MetricStore does not remove metrics of nonexistent subtasks when adaptive 
scheduler lowers job parallelism and users will see metrics of nonexistent 
subtasks on Web UI (e.g. the task backpressure page) or REST API response.

 

*Proposed Solution*

Thanks to [FLINK-29132] SubtaskMetricStore causes memory leak. - ASF JIRA 
(apache.org) & [FLINK-28588] Enhance REST API for Speculative Execution - ASF 
JIRA (apache.org),  Flink will now update current execution attempts when 
updating metrics. Since the active subtask info is included in the current 
execution attempt info, we are able to retain active subtasks using the current 
execution attempt info.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Remove HCatalog connector

2022-10-12 Thread Qingsheng Ren
+1

Thanks for driving this Martijn!

Best,
Qingsheng

> On Oct 13, 2022, at 03:45, Martijn Visser  wrote:
> 
> Hi everyone,
> 
> Since no comments were made, I'm opening a vote to remove the HCatalog
> connector [1]
> 
> The voting period will be open at least 72hrs.
> 
> Best regards,
> 
> Martijn
> 
> [1]
> https://lists.apache.org/thread/j8jc5zrhnqlv8y3lkmc3wdo9ysgmsr84
> -- 
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser



[jira] [Created] (FLINK-29616) Polish Table Store Pom file to avoid warning.

2022-10-12 Thread Aiden Gong (Jira)
Aiden Gong created FLINK-29616:
--

 Summary: Polish Table Store Pom file to avoid warning.
 Key: FLINK-29616
 URL: https://issues.apache.org/jira/browse/FLINK-29616
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Aiden Gong
 Fix For: table-store-0.3.0
 Attachments: image-2022-10-13-14-49-39-582.png

!image-2022-10-13-14-49-39-582.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)