date:20220609

[jira] [Created] (FLINK-27967) Unable to inject entropy on s3 prefix when doing a save point in Flink

2022-06-09 Thread Vinay Devadiga (Jira)

Vinay Devadiga created FLINK-27967:
--

 Summary: Unable to inject entropy on s3 prefix when doing a save 
point in Flink
 Key: FLINK-27967
 URL: https://issues.apache.org/jira/browse/FLINK-27967
 Project: Flink
  Issue Type: Bug
  Components: Command Line Client
Affects Versions: 1.14.0
Reporter: Vinay Devadiga
 Attachments: Screenshot 2022-06-09 at 12.10.51 PM.png

Hi while using flink 1.14.0,I ran the example job 
/examples/streaming/StateMachineExample.jar and the job submitted successfully 
and then I tried to save point it to s3 with entropy enabled but the entropy 
key was not respected here are my configurations.Can anyone please guide me 
through issue, I an trying to go through the code but could not find anything 
sustainable.
|flink-conf|fs.allowed-fallback-filesystems|s3|Cluster configuration| |
|flink-conf|execution.checkpointing.unaligned|true|Cluster configuration| |
|flink-conf|state.backend.incremental|true|Cluster configuration| |
|flink-conf|execution.checkpointing.timeout|600min|Cluster configuration| |
|flink-conf|execution.checkpointing.externalized-checkpoint-retention|RETAIN_ON_CANCELLATION|Cluster
 configuration| |
|flink-conf|state.backend|rocksdb|Cluster configuration| |
|flink-conf|s3.entropy.key|_entropy_|Cluster configuration| |
|flink-conf|state.checkpoints.dir|s3://vinaydevuswest2/_entropy_/flink/checkpoint-data/|Cluster
 configuration| |
|flink-conf|execution.checkpointing.max-concurrent-checkpoints|1|Cluster 
configuration| |
|flink-conf|execution.checkpointing.min-pause|5000|Cluster configuration| |
|flink-conf|execution.checkpointing.checkpoints-after-tasks-finish.enabled|true|Cluster
 configuration| |
|flink-conf|state.savepoints.dir|s3://vinaydevuswest2/_entropy_/flink/savepoint-data/|Cluster
 configuration| |
|flink-conf|state.storage.fs.memory-threshold|0|Cluster configuration| |
|flink-conf|s3.entropy.length|4|Cluster configuration| |
|flink-conf|execution.checkpointing.tolerable-failed-checkpoints|30|Cluster 
configuration| |
|flink-conf|execution.checkpointing.mode|EXACTLY_ONCE|Cluster configuration|



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-09 Thread Martijn Visser

Hi Alex,

Thanks for creating the FLIP and opening up the discussion. +1 overall for
getting this in place.

One question: you've already mentioned that this focussed on the DataStream
API. I think it would be a bit confusing that we have a Datagen connector
(on the Table side) that wouldn't leverage this target interface. I think
it would be good if we could already have one generic Datagen connector
which works for both DataStream API (so that would be a new one in the
Flink repo) and that the Datagen in the Table landscape is using this
target interface too. What do you think?

Best regards,

Martijn

Op wo 8 jun. 2022 om 14:21 schreef Alexander Fedulov <
alexan...@ververica.com>:

> Hi Xianxun,
>
> Thanks for bringing it up. I do believe it would be useful to have such a
> CDC data generator but I see the
> efforts to provide one a bit orthogonal to the DataSourceGenerator proposed
> in the FLIP. FLIP-238 focuses
> on the DataStream API and I could see integration into the Table/SQL
> ecosystem as the next step that I would
> prefer to keep separate (see KafkaDynamicSource reusing
> KafkaSource
> under the hood [1]).
>
> [1]
>
> https://github.com/apache/flink/blob/3adca15859b36c61116fc56fabc24e9647f0ec5f/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/KafkaDynamicSource.java#L223
>
> Best,
> Alexander Fedulov
>
>
>
>
> On Wed, Jun 8, 2022 at 11:01 AM Xianxun Ye  wrote:
>
> > Hey Alexander,
> >
> > Making datagen source connector easier to use is really helpful during
> > doing some PoC/Demo.
> > And I thought about is it possible to produce a changelog stream by
> > datagen source, so a new flink developer can practice flink sql with cdc
> > data using Flink SQL Client CLI.
> > In the flink-examples-table module, a ChangelogSocketExample class[1]
> > describes how to ingest delete or insert data by 'nc' command. Can we
> > support producing a changelog stream by the new datagen source?
> >
> > [1]
> >
> https://github.com/apache/flink/blob/master/flink-examples/flink-examples-table/src/main/java/org/apache/flink/table/examples/java/connectors/ChangelogSocketExample.java#L79
> >
> > Best regards,
> >
> > Xianxun
> >
> > On 06/8/2022 08:10，Alexander Fedulov
> >  wrote：
> >
> > I looked a bit further and it seems it should actually be easier than I
> > initially thought:  SourceReader extends CheckpointListener interface and
> > with its custom implementation it should be possible to achieve similar
> > results. A prototype that I have for the generator uses an
> > IteratorSourceReader
> > under the hood by default but we could consider adding the ability to
> > supply something like a DataGeneratorSourceReaderFactory that would allow
> > provisioning the DataGeneratorSource with customized implementations for
> > cases like this.
> >
> > Best,
> > Alexander Fedulov
> >
> > On Wed, Jun 8, 2022 at 12:58 AM Alexander Fedulov <
> alexan...@ververica.com
> > >
> > wrote:
> >
> > Hi Steven,
> >
> > This is going to be tricky since in the new Source API the checkpointing
> > aspects that you based your logic on are pushed further away from the
> > low-level interfaces responsible for handling data and splits [1]. At the
> > same time, the SourceCoordinatorProvider is hardwired into the internals
> > of the framework, so I don't think it will be possible to provide a
> > customized implementation for testing purposes.
> >
> > The only chance to tie data generation to checkpointing in the new Source
> > API that I see at the moment is via the SplitEnumerator serializer (
> > getEnumeratorCheckpointSerializer() method) [2]. In theory, it should be
> > possible to share a variable visible both to the generator function and
> to
> > the serializer and manipulate it whenever the serialize() method gets
> > called upon a checkpoint request. That said, you still won't get
> > notifications of successful checkpoints that you currently use (this info
> > is only available to the SourceCoordinator).
> >
> > In general, regardless of the generator implementation itself, the new
> > Source
> > API does not seem to support the use case of verifying checkpoints
> > contents in lockstep with produced data, at least I do not see an
> immediate
> > solution for this. Can you think of a different way of checking the
> > correctness of the Iceberg Sink implementation that does not rely on this
> > approach?
> >
> > Best,
> > Alexander Fedulov
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/0f19c2472c54aac97e4067f5398731ab90036d1a/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L337
> >
> > [2]
> >
> >
> https://github.com/apache/flink/blob/e4b000818c15b5b781c4e5262ba83bfc9d65121a/flink-core/src/main/java/org/apache/flink/api/connector/source/Source.java#L97
> >
> > On Tue, Jun 7, 2022 at 6:03 PM Steven Wu  wrote:
> >
> > In Iceberg source, we have a data generator source that can control the
> > records per checkpoint

Re: [VOTE] FLIP-223: Support HiveServer2 Endpoint

2022-06-09 Thread Jingsong Li

+1

Thanks for driving.

Best,
Jingsong

On Thu, Jun 9, 2022 at 2:17 PM Nicholas Jiang  wrote:
>
> +1 (not-binding)
>
> Best,
> Nicholas Jiang
>
> On 2022/06/07 05:31:21 Shengkai Fang wrote:
> > Hi, everyone.
> >
> > Thanks for all feedback for FLIP-223: Support HiveServer2 Endpoint[1] on
> > the discussion thread[2]. I'd like to start a vote for it. The vote will be
> > open for at least 72 hours unless there is an objection or not enough votes.
> >
> > Best,
> > Shengkai
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> > [2] https://lists.apache.org/thread/9r1j7ho2m8zbqy3tl7vvj9gnocggwr6x
> >

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-09 Thread Paul Lam

Hi team,

It's great to see our opinions are finally converging!

> `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `


LGTM. Adding it to the FLIP.

To Jark,

> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ”

Good point. The default savepoint dir should be enough for most cases.

To Jing,

> DROP SAVEPOINT ALL

I think it’s valid to have such a statement, but I have two concerns:
`ALL` is already an SQL keyword, thus it may cause ambiguity.
Flink CLI and REST API doesn’t provided the corresponding functionalities, and 
we’d better keep them aligned.
How about making this statement as follow-up tasks which should touch REST API 
and Flink CLI?

Best,
Paul Lam

> 2022年6月9日 11:53，godfrey he  写道：
> 
> Hi all,
> 
> Regarding `PIPELINE`, it comes from flink-core module, see
> `PipelineOptions` class for more details.
> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with 
> `JOBS`.
> 
> +1 to discuss JOBTREE in other FLIP.
> 
> +1 to `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
> 
> +1 to `CREATE SAVEPOINT FOR JOB ` and `DROP SAVEPOINT 
> `
> 
> Best,
> Godfrey
> 
> Jing Ge  于2022年6月9日周四 01:48写道：
>> 
>> Hi Paul, Hi Jark,
>> 
>> Re JOBTREE, agree that it is out of the scope of this FLIP
>> 
>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP 
>> SAVEPOINT ALL' housekeeping. WDYT?
>> 
>> Best regards,
>> Jing
>> 
>> 
>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu  wrote:
>>> 
>>> Hi Jing,
>>> 
>>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the 
>>> scope
>>> of this FLIP and can be discussed in another FLIP.
>>> 
>>> Job lineage is a big topic that may involve many problems:
>>> 1) how to collect and report job entities, attributes, and lineages?
>>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
>>> 3) how does Flink SQL CLI/Gateway know the lineage information and show 
>>> jobtree?
>>> 4) ...
>>> 
>>> Best,
>>> Jark
>>> 
>>> On Wed, 8 Jun 2022 at 20:44, Jark Wu  wrote:
 
 Hi Paul,
 
 I'm fine with using JOBS. The only concern is that this may conflict with 
 displaying more detailed
 information for query (e.g. query content, plan) in the future, e.g. SHOW 
 QUERIES EXTENDED in ksqldb[1].
 This is not a big problem as we can introduce SHOW QUERIES in the future 
 if necessary.
 
> STOP JOBS  (with options `table.job.stop-with-savepoint` and 
> `table.job.stop-with-drain`)
 What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
 It might be trivial and error-prone to set configuration before executing 
 a statement,
 and the configuration will affect all statements after that.
 
> CREATE SAVEPOINT  FOR JOB 
 We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
 and always use configuration "state.savepoints.dir" as the default 
 savepoint dir.
 The concern with using "" is here should be savepoint dir,
 and savepoint_path is the returned value.
 
 I'm fine with other changes.
 
 Thanks,
 Jark
 
 [1]: 
 https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
 
 
 
 On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:
> 
> Hi Jing,
> 
> Thank you for your inputs!
> 
> TBH, I haven’t considered the ETL scenario that you mentioned. I think 
> they’re managed just like other jobs interns of job lifecycles (please 
> correct me if I’m wrong).
> 
> WRT to the SQL statements about SQL lineages, I think it might be a 
> little bit out of the scope of the FLIP, since it’s mainly about 
> lifecycles. By the way, do we have these functionalities in Flink CLI or 
> REST API already?
> 
> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the 
> community is more in favor of `DROP SAVEPOINT `. I’m 
> updating the FLIP arcading to the latest discussions.
> 
> Best,
> Paul Lam
> 
> 2022年6月8日 07:31，Jing Ge  写道：
> 
> Hi Paul,
> 
> Sorry that I am a little bit too late to join this thread. Thanks for 
> driving this and starting this informative discussion. The FLIP looks 
> really interesting. It will help us a lot to manage Flink SQL jobs.
> 
> Have you considered the ETL scenario with Flink SQL, where multiple SQLs 
> build a DAG for many DAGs?
> 
> 1)
> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to 
> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are 
> responsible to *produce* data as the result(cube, materialized view, 
> etc.) for the future consumption by queries. The INSERT INTO SELECT FROM 
> example in FLIP and CTAS are typical SQL in this case. I would prefer to 
> call them Jobs instead of Queries.
> 
> 2)
> Speaking of ETL DAG, we might want to see the lineage. Is it possible to 
> support syntax like:
> 
> SHOW JOBTREE

Re: [DISCUSS] FLIP-237: Thrift Format Support in Flink

2022-06-09 Thread Martijn Visser

Hi Chen,

Thanks for creating the FLIP and opening the discussion. I have a couple of
questions/remarks:

* What I'm missing overall, is the section on 'Public Interfaces'. The FLIP
has a large Proposed Changes section, but it reads more like your journey
when you implemented Thrift in your fork. For example, it mentions corrupt
Thrift payloads causing issues, but I can't determine if you want to
propose to deal with this upfront or not. (I would not deal with this
upfront in the format implementation, because there will be users who want
to job the fail while others just want it to continue and send something to
a DLQ)
* I think 'Proposed Changes' needs a summary because you're suggesting
multiple things and it easy to miss one of them.
* With regards to mapping ENUM to STRING, what is to be expected if Flink
will support ENUM in the future?
* Is there a tight dependency between Thrift and Hive? When we externalize
the Hive connector, can the Thrift format still work?
* The FLIP mentions usage of Thrift on both DataStream and SQL jobs;
however, the FLIP is very SQL oriented.
* With regards to the Table/View Inference DDL, this is only providing the
Hive metastore as options. I would like to understand how this could work
with Catalogs in general, not with Hive only. What type of
compatibility guarantees (backward, forward, full) does Thrift offer?

Best regards,

Martijn

Op di 7 jun. 2022 om 18:56 schreef Chen Qin :

> Thanks for the pointers, I moved the proposal to the wiki and updated the
> FLIP status page.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-237%3A+Thrift+Format+Support
>
>
> Looking forward to getting community feedback.
>
> Chen
>
>
>
> On Tue, Jun 7, 2022 at 12:12 AM Jark Wu  wrote:
>
> > Yes. The community recommends keeping content on the wiki page
> > and discuss in the mailing list. Discussions on the google doc are
> > not so visible to the community, and "If it didn’t happen on a mailing
> > list, it didn’t happen."
> >
> > Best,
> > Jark
> >
> > On Tue, 7 Jun 2022 at 14:15, Jing Ge  wrote:
> >
> > > Hi Chen,
> > >
> > > Thanks for driving this! Afaik, the community has the consensus to
> *Start
> > > a [DISCUSS] thread on the Apache mailing list*[1]. I just walked
> through
> > > some existing FLIPs and didn't find any that have been using google doc
> > as
> > > the discussion thread. Would you like to follow the current process and
> > > move the content to the FLIP Wiki page? Another question would be:
> could
> > > anyone in the community confirm that it is also fine to use google doc
> to
> > > discuss? We'd better clarify it before kicking off the discussion.
> > Thanks!
> > >
> > > By the way, you might want to reserve the FLIP number 237 on [1] in the
> > > table *Adopted/Accepted but unreleased FLIPs.*
> > >
> > > Best regards,
> > > Jing
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-WhatshouldbeincludedinaFLIP
> > > ?
> > >
> > > On Tue, Jun 7, 2022 at 6:41 AM Chen Qin  wrote:
> > >
> > > > Hi there,
> > > >
> > > > I want to kick off the first round of FLIP-237
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-237%3A+Thrift+Format+Support
> > > > >:
> > > > Thrift Format Support discussion. Notice for the area marked as WIP,
> we
> > > are
> > > > looking for more feedback from folks, those areas would either stay
> in
> > > the
> > > > scope of current FLIP or removed based on feedback and discussion
> > > results.
> > > >
> > > > Google Doc
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1EhHewAW39pm-TX6fuUZLogWHK7vtgJ616WRwH7OOrgg/edit#
> > > > >
> > > >
> > > >
> > > > Thanks,
> > > > Chen Q
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-09 Thread Martijn Visser

Hi all,

I would not include a DROP SAVEPOINT syntax. With the recently introduced
CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot
ownership and if you have a savepoint established "with NO_CLAIM it creates
its own copy and leaves the existing one up to the user." [1] We shouldn't
then again make it fuzzy by making it possible that Flink can remove
snapshots.

Best regards,

Martijn

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership

Op do 9 jun. 2022 om 09:27 schreef Paul Lam :

> Hi team,
>
> It's great to see our opinions are finally converging!
>
> `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
>
>
> LGTM. Adding it to the FLIP.
>
> To Jark,
>
> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ”
>
>
> Good point. The default savepoint dir should be enough for most cases.
>
> To Jing,
>
> DROP SAVEPOINT ALL
>
>
> I think it’s valid to have such a statement, but I have two concerns:
>
>- `ALL` is already an SQL keyword, thus it may cause ambiguity.
>- Flink CLI and REST API doesn’t provided the corresponding
>functionalities, and we’d better keep them aligned.
>
> How about making this statement as follow-up tasks which should touch REST
> API and Flink CLI?
>
> Best,
> Paul Lam
>
> 2022年6月9日 11:53，godfrey he  写道：
>
> Hi all,
>
> Regarding `PIPELINE`, it comes from flink-core module, see
> `PipelineOptions` class for more details.
> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with
> `JOBS`.
>
> +1 to discuss JOBTREE in other FLIP.
>
> +1 to `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
>
> +1 to `CREATE SAVEPOINT FOR JOB ` and `DROP SAVEPOINT
> `
>
> Best,
> Godfrey
>
> Jing Ge  于2022年6月9日周四 01:48写道：
>
>
> Hi Paul, Hi Jark,
>
> Re JOBTREE, agree that it is out of the scope of this FLIP
>
> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP
> SAVEPOINT ALL' housekeeping. WDYT?
>
> Best regards,
> Jing
>
>
> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu  wrote:
>
>
> Hi Jing,
>
> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the
> scope
> of this FLIP and can be discussed in another FLIP.
>
> Job lineage is a big topic that may involve many problems:
> 1) how to collect and report job entities, attributes, and lineages?
> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
> 3) how does Flink SQL CLI/Gateway know the lineage information and show
> jobtree?
> 4) ...
>
> Best,
> Jark
>
> On Wed, 8 Jun 2022 at 20:44, Jark Wu  wrote:
>
>
> Hi Paul,
>
> I'm fine with using JOBS. The only concern is that this may conflict with
> displaying more detailed
> information for query (e.g. query content, plan) in the future, e.g. SHOW
> QUERIES EXTENDED in ksqldb[1].
> This is not a big problem as we can introduce SHOW QUERIES in the future
> if necessary.
>
> STOP JOBS  (with options `table.job.stop-with-savepoint` and
> `table.job.stop-with-drain`)
>
> What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
> It might be trivial and error-prone to set configuration before executing
> a statement,
> and the configuration will affect all statements after that.
>
> CREATE SAVEPOINT  FOR JOB 
>
> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
> and always use configuration "state.savepoints.dir" as the default
> savepoint dir.
> The concern with using "" is here should be savepoint dir,
> and savepoint_path is the returned value.
>
> I'm fine with other changes.
>
> Thanks,
> Jark
>
> [1]:
> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>
>
>
> On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:
>
>
> Hi Jing,
>
> Thank you for your inputs!
>
> TBH, I haven’t considered the ETL scenario that you mentioned. I think
> they’re managed just like other jobs interns of job lifecycles (please
> correct me if I’m wrong).
>
> WRT to the SQL statements about SQL lineages, I think it might be a little
> bit out of the scope of the FLIP, since it’s mainly about lifecycles. By
> the way, do we have these functionalities in Flink CLI or REST API already?
>
> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the
> community is more in favor of `DROP SAVEPOINT `. I’m
> updating the FLIP arcading to the latest discussions.
>
> Best,
> Paul Lam
>
> 2022年6月8日 07:31，Jing Ge  写道：
>
> Hi Paul,
>
> Sorry that I am a little bit too late to join this thread. Thanks for
> driving this and starting this informative discussion. The FLIP looks
> really interesting. It will help us a lot to manage Flink SQL jobs.
>
> Have you considered the ETL scenario with Flink SQL, where multiple SQLs
> build a DAG for many DAGs?
>
> 1)
> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to
> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are
> responsible to *produce* data as the result(cube, materialized view, etc.)
> for the future consumption by queries. The INSERT INTO SELECT FROM example
> in FLIP and CTAS are

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Jing Ge

Hi,

I am very happy to see opinions from different perspectives. That will help
us understand the problem better. Thanks all for the informative discussion.

Let's see the big picture and check following facts together:

1. FLIP-27 was intended to solve some technical issues that are very
difficult to solve with SourceFunction[1]. When we say "SourceFunction is
easy", well, it depends. If we take a look at the implementation of the
Kafka connector, we will know how complicated it is to build a serious
connector for production with the old SourceFunction. To every problem
there is a solution and to every solution there is a problem. The fact is
that there is no perfect but a feasible solution. If we try to solve
complicated problems, we have to expose some complexity. Comparing to
connectors for POC, demo, training(no offense), I would also solve issues
for connectors like Kafka connector that are widely used in production with
higher priority. I think that should be one reason why FLIP-27 has been
designed and why the new source API went public.

2. FLIP-27 and the implementation was introduced roughly at the end of 2019
and went public on 19.04.2021, which means Flink has provided two different
public/graduated source solutions for more than one year. On the day that
the new source API went public, there should be a consensus in the
community that we should start the migration. Old SourceFunction interface,
in the ideal case, should have been deprecated on that day, otherwise we
should not graduate the new source API to avoid confusing (connector)
developers[2].

3. It is true that the new source API is hard to understand and even hard
to implement for simple cases. Thanks for the feedback. That is something
we need to improve. The current design&implementation could be considered
as the low level API. The next step is to create the high level API to
reduce some unnecessary complexity for those simple cases. But, IMHO, this
should not be the prerequisite to postpone the deprecation of the old
SourceFunction APIs.

4. As long as the old SourceFunction is not marked as deprecated,
developers will continue asking which one should be used. Let's make a
concrete example. If a new connector is developed now and the developer
asks for a suggestion of the choice between the old and new source API on
the ML, which one should we suggest? I think it should be the new Source
API. If a fresh new connector has been developed with the old
SourceFunction API before asking for the consensus in the community and the
developer wants to merge it to the master. Should we allow it? If the
answer of all these questions is pointing to the new Source API, the old
SourceFunction is de facto already deprecated, just has not been marked as
@deprecated, which confuses developers even more.

 Best regards,
Jing

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
[2] https://lists.apache.org/thread/7okp4y46n3o3rx5mn0t3qobrof8zxwqs

On Wed, Jun 8, 2022 at 2:21 AM Alexander Fedulov 
wrote:

> Hey Austin,
>
> Since we are getting deeper into the implementation details of the
> DataGeneratorSource
> and it is not the main topic of this thread, I propose to move our
> discussion to where it belongs: [DISCUSS] FLIP-238 [1]. Could you please
> briefly formulate your requirements to make it easier for the others to
> follow? I am happy to continue this conversation there.
>
> [1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt
>
> Best,
> Alexander Fedulov
>
> On Tue, Jun 7, 2022 at 6:14 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
> > > @Austin, in the FLIP I mentioned above [1], the user is expected to
> > pass a MapFunction > OUT>
> > to the generator. I wonder if you could have your external client and
> > polling logic wrapped in a custom
> > MapFunction implementation class? Would that answer your needs or do you
> > have some
> > more sophisticated scenario in mind?
> >
> > At first glance, the FLIP looks good but for this case in regards to the
> > map function, but leaves out 1) ability to control polling intervals and
> 2)
> > ability to produce an unknown number of records, both per-poll and
> overall
> > boundedness. Do you think something like this could be built from the
> same
> > pieces?
> > I'm also wondering what handles threading, is that on the user or is that
> > part of the DataGeneratorSource?
> >
> > Best,
> > Austin
> >
> > On Tue, Jun 7, 2022 at 9:34 AM Alexander Fedulov <
> alexan...@ververica.com>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for all the input and a lively discussion. It seems that there
> is
> > a
> > > consensus that due to
> > > the inherent complexity of FLIP-27 sources we should provide more
> > > user-facing utilities to bridge
> > > the gap between the existing SourceFunction-based functionality and the
> > new
> > > APIs.
> > >
> > > To start addressing this I picked the issue that David raised and many
> > >

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-09 Thread Paul Lam

Hi Martijn,

I think the `DROP SAVEPOINT` statement would not conflict with NO_CLAIM mode, 
since the statement is triggered by users instead of Flink runtime.

We’re simply providing a tool for user to cleanup the savepoints, just like 
`bin/flink savepoint -d :savepointPath` in Flink CLI [1].

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#disposing-savepoints

Best,
Paul Lam

> 2022年6月9日 15:41，Martijn Visser  写道：
> 
> Hi all,
> 
> I would not include a DROP SAVEPOINT syntax. With the recently introduced 
> CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot 
> ownership and if you have a savepoint established "with NO_CLAIM it creates 
> its own copy and leaves the existing one up to the user." [1] We shouldn't 
> then again make it fuzzy by making it possible that Flink can remove 
> snapshots. 
> 
> Best regards,
> 
> Martijn
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
>  
> 
> Op do 9 jun. 2022 om 09:27 schreef Paul Lam  >:
> Hi team,
> 
> It's great to see our opinions are finally converging!
> 
>> `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
> 
> 
> LGTM. Adding it to the FLIP.
> 
> To Jark,
> 
>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ”
> 
> Good point. The default savepoint dir should be enough for most cases.
> 
> To Jing,
> 
>> DROP SAVEPOINT ALL
> 
> I think it’s valid to have such a statement, but I have two concerns:
> `ALL` is already an SQL keyword, thus it may cause ambiguity.
> Flink CLI and REST API doesn’t provided the corresponding functionalities, 
> and we’d better keep them aligned.
> How about making this statement as follow-up tasks which should touch REST 
> API and Flink CLI?
> 
> Best,
> Paul Lam
> 
>> 2022年6月9日 11:53，godfrey he > > 写道：
>> 
>> Hi all,
>> 
>> Regarding `PIPELINE`, it comes from flink-core module, see
>> `PipelineOptions` class for more details.
>> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with 
>> `JOBS`.
>> 
>> +1 to discuss JOBTREE in other FLIP.
>> 
>> +1 to `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
>> 
>> +1 to `CREATE SAVEPOINT FOR JOB ` and `DROP SAVEPOINT 
>> `
>> 
>> Best,
>> Godfrey
>> 
>> Jing Ge mailto:j...@ververica.com>> 于2022年6月9日周四 
>> 01:48写道：
>>> 
>>> Hi Paul, Hi Jark,
>>> 
>>> Re JOBTREE, agree that it is out of the scope of this FLIP
>>> 
>>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP 
>>> SAVEPOINT ALL' housekeeping. WDYT?
>>> 
>>> Best regards,
>>> Jing
>>> 
>>> 
>>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu >> > wrote:
 
 Hi Jing,
 
 Regarding JOBTREE (job lineage), I agree with Paul that this is out of the 
 scope
 of this FLIP and can be discussed in another FLIP.
 
 Job lineage is a big topic that may involve many problems:
 1) how to collect and report job entities, attributes, and lineages?
 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
 3) how does Flink SQL CLI/Gateway know the lineage information and show 
 jobtree?
 4) ...
 
 Best,
 Jark
 
 On Wed, 8 Jun 2022 at 20:44, Jark Wu >>> > wrote:
> 
> Hi Paul,
> 
> I'm fine with using JOBS. The only concern is that this may conflict with 
> displaying more detailed
> information for query (e.g. query content, plan) in the future, e.g. SHOW 
> QUERIES EXTENDED in ksqldb[1].
> This is not a big problem as we can introduce SHOW QUERIES in the future 
> if necessary.
> 
>> STOP JOBS  (with options `table.job.stop-with-savepoint` and 
>> `table.job.stop-with-drain`)
> What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
> It might be trivial and error-prone to set configuration before executing 
> a statement,
> and the configuration will affect all statements after that.
> 
>> CREATE SAVEPOINT  FOR JOB 
> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
> and always use configuration "state.savepoints.dir" as the default 
> savepoint dir.
> The concern with using "" is here should be savepoint dir,
> and savepoint_path is the returned value.
> 
> I'm fine with other changes.
> 
> Thanks,
> Jark
> 
> [1]: 
> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>  
> 
> 
> 
> 
> On Wed, 8 Jun 2022 at 15:07, Paul Lam  > wrote:
>> 
>> Hi Jing,
>> 
>> Thank you for your inputs!
>> 
>> TBH, I haven’t considered the ETL scenario that you mentioned. I think 
>> they’re managed just like other jobs interns of job life

[jira] [Created] (FLINK-27968) end-to-end-tests-sql CI test failed

2022-06-09 Thread LuNng Wang (Jira)

LuNng Wang created FLINK-27968:
--

 Summary: end-to-end-tests-sql CI test failed
 Key: FLINK-27968
 URL: https://issues.apache.org/jira/browse/FLINK-27968
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI, Table SQL / Client, Table SQL / Planner
Reporter: LuNng Wang


 
{code:java}
Jun 09 03:15:01 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
Jun 09 03:15:01 at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
Jun 09 03:15:01 at 
org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
Jun 09 03:15:01 at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
Jun 09 03:15:01 Caused by: java.io.IOException: error=13, Permission denied
Jun 09 03:15:01 at java.lang.UNIXProcess.forkAndExec(Native Method)
Jun 09 03:15:01 at java.lang.UNIXProcess.(UNIXProcess.java:247)
Jun 09 03:15:01 at java.lang.ProcessImpl.start(ProcessImpl.java:134)
Jun 09 03:15:01 at 
java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
Jun 09 03:15:01 ... 65 more
Jun 09 03:15:01 
Jun 09 03:15:02 [INFO] 
Jun 09 03:15:02 [INFO] Results:
Jun 09 03:15:02 [INFO] 
Jun 09 03:15:02 [ERROR] Errors: 
Jun 09 03:15:02 [ERROR] PlannerScalaFreeITCase.testImperativeUdaf
Jun 09 03:15:02 [ERROR]   Run 1: Cannot run program 
"/tmp/junit915579470100095315/junit4815507674620015662/bin/sql-client.sh": 
error=13, Permission denied
Jun 09 03:15:02 [ERROR]   Run 2: Cannot run program 
"/tmp/junit5631176215080579455/junit3588658300175738616/bin/sql-client.sh": 
error=13, Permission denied
Jun 09 03:15:02 [INFO] 
Jun 09 03:15:02 [INFO] 
Jun 09 03:15:02 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
 {code}
 

 

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=36468&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Martijn Visser

Hi all,

I think implicitly we've already considered the SourceFunction and
SinkFunction as deprecated. They are even marked as so on the Flink roadmap
[1]. That also shows that connectors that are using these interfaces are
either approaching end-of-life. The fact that we're actively migrating
connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on FLIPs)
shows that we've already determined that target.

With regards to the motivation of FLIP-27, I think reading up on the
original discussion thread is also worthwhile [2] to see more context.
FLIP-27 was also very important as it brought a unified connector which can
support both streaming and batch (with batch being considered a special
case of streaming in Flink's vision).

So +1 to deprecate SourceFunction. I would also argue that we should
already mark the SinkFunction as deprecated to avoid having this discussion
again in a couple of months.

Best regards,

Martijn

[1] https://flink.apache.org/roadmap.html
[2] https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y

Op do 9 jun. 2022 om 09:48 schreef Jing Ge :

> Hi,
>
> I am very happy to see opinions from different perspectives. That will help
> us understand the problem better. Thanks all for the informative
> discussion.
>
> Let's see the big picture and check following facts together:
>
> 1. FLIP-27 was intended to solve some technical issues that are very
> difficult to solve with SourceFunction[1]. When we say "SourceFunction is
> easy", well, it depends. If we take a look at the implementation of the
> Kafka connector, we will know how complicated it is to build a serious
> connector for production with the old SourceFunction. To every problem
> there is a solution and to every solution there is a problem. The fact is
> that there is no perfect but a feasible solution. If we try to solve
> complicated problems, we have to expose some complexity. Comparing to
> connectors for POC, demo, training(no offense), I would also solve issues
> for connectors like Kafka connector that are widely used in production with
> higher priority. I think that should be one reason why FLIP-27 has been
> designed and why the new source API went public.
>
> 2. FLIP-27 and the implementation was introduced roughly at the end of 2019
> and went public on 19.04.2021, which means Flink has provided two different
> public/graduated source solutions for more than one year. On the day that
> the new source API went public, there should be a consensus in the
> community that we should start the migration. Old SourceFunction interface,
> in the ideal case, should have been deprecated on that day, otherwise we
> should not graduate the new source API to avoid confusing (connector)
> developers[2].
>
> 3. It is true that the new source API is hard to understand and even hard
> to implement for simple cases. Thanks for the feedback. That is something
> we need to improve. The current design&implementation could be considered
> as the low level API. The next step is to create the high level API to
> reduce some unnecessary complexity for those simple cases. But, IMHO, this
> should not be the prerequisite to postpone the deprecation of the old
> SourceFunction APIs.
>
> 4. As long as the old SourceFunction is not marked as deprecated,
> developers will continue asking which one should be used. Let's make a
> concrete example. If a new connector is developed now and the developer
> asks for a suggestion of the choice between the old and new source API on
> the ML, which one should we suggest? I think it should be the new Source
> API. If a fresh new connector has been developed with the old
> SourceFunction API before asking for the consensus in the community and the
> developer wants to merge it to the master. Should we allow it? If the
> answer of all these questions is pointing to the new Source API, the old
> SourceFunction is de facto already deprecated, just has not been marked as
> @deprecated, which confuses developers even more.
>
>  Best regards,
> Jing
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> [2] https://lists.apache.org/thread/7okp4y46n3o3rx5mn0t3qobrof8zxwqs
>
> On Wed, Jun 8, 2022 at 2:21 AM Alexander Fedulov 
> wrote:
>
> > Hey Austin,
> >
> > Since we are getting deeper into the implementation details of the
> > DataGeneratorSource
> > and it is not the main topic of this thread, I propose to move our
> > discussion to where it belongs: [DISCUSS] FLIP-238 [1]. Could you please
> > briefly formulate your requirements to make it easier for the others to
> > follow? I am happy to continue this conversation there.
> >
> > [1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt
> >
> > Best,
> > Alexander Fedulov
> >
> > On Tue, Jun 7, 2022 at 6:14 PM Austin Cawley-Edwards <
> > austin.caw...@gmail.com> wrote:
> >
> > > > @Austin, in the FLIP I mentioned above [1], the user is expected to
> > > pass a MapFunction > > OUT>
>

Re: [DISCUSS] Releasing 1.15.1

2022-06-09 Thread Yun Gao

Hi David,

Very thanks for driving the new version, also +1 since we already accumulated 
some fixes.

Regarding https://issues.apache.org/jira/browse/FLINK-27492, currently there 
are still some
controversy with how to deal with the artifacts. I also agree we may not hold 
up the release 
with this issue. We'll try to reach to the consensus as soon as possible to try 
best catching
up with the release. 

Best,
Yun



--
From:LuNing Wang 
Send Time:2022 Jun. 9 (Thu.) 10:10
To:dev 
Subject:Re: [DISCUSS] Releasing 1.15.1

Hi David,

+1
Thank you for driving this.

Best regards,
LuNing Wang

Jing Ge  于2022年6月8日周三 20:45写道：

> +1
>
> Thanks David for driving it!
>
> Best regards,
> Jing
>
>
> On Wed, Jun 8, 2022 at 2:32 PM Xingbo Huang  wrote:
>
> > Hi David,
> >
> > +1
> > Thank you for driving this.
> >
> > Best,
> > Xingbo
> >
> > Chesnay Schepler  于2022年6月8日周三 18:41写道：
> >
> > > +1
> > >
> > > Thank you for proposing this. I can take care of the PMC-side of
> things.
> > >
> > > On 08/06/2022 12:37, Jingsong Li wrote:
> > > > +1
> > > >
> > > > Thanks David for volunteering to manage the release.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
> > > >> Hi David, thank you for driving the release.
> > > >>
> > > >> +1 for the 1.15.1 release. The release-1.15 branch
> > > >> already contains many bug fixes and some SQL
> > > >> issues are quite critical.
> > > >>
> > > >> Btw, FLINK-27606 has been merged just now.
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >>
> > > >> On Wed, 8 Jun 2022 at 17:40, David Anderson 
> > > wrote:
> > > >>
> > > >>> I would like to start a discussion on releasing 1.15.1. Flink 1.15
> > was
> > > >>> released on the 5th of May [1] and so far 43 issues have been
> > resolved,
> > > >>> including several user-facing issues with blocker and critical
> > > priorities
> > > >>> [2]. (The recent problem with FileSink rolling policies not working
> > > >>> properly in 1.15.0 got me thinking it might be time for bug-fix
> > > release.)
> > > >>>
> > > >>> There currently remain 16 unresolved tickets with a fixVersion of
> > > 1.15.1
> > > >>> [3], five of which are about CI infrastructure and tests. There is
> > > only one
> > > >>> such ticket marked Critical:
> > > >>>
> > > >>> https://issues.apache.org/jira/browse/FLINK-27492 Flink table
> scala
> > > >>> example
> > > >>> does not including the scala-api jars
> > > >>>
> > > >>> I'm not convinced we should hold up a release for this issue, but
> on
> > > the
> > > >>> other hand, it seems that this issue can be resolved by making a
> > > decision
> > > >>> about how to handle the missing dependencies. @Timo Walther
> > > >>>  @yun_gao can you give an update?
> > > >>>
> > > >>> Two other open issues seem to have made significant progress
> (listed
> > > >>> below). Would it make sense to wait for either of these? Are there
> > any
> > > >>> other open tickets we should consider waiting for?
> > > >>>
> > > >>> https://issues.apache.org/jira/browse/FLINK-27420 Suspended
> > > SlotManager
> > > >>> fail to reregister metrics when started again
> > > >>> https://issues.apache.org/jira/browse/FLINK-27606 CompileException
> > > when
> > > >>> using UDAF with merge() method
> > > >>>
> > > >>> I would volunteer to manage the release. Is there a PMC member who
> > > would
> > > >>> join me to help, as needed?
> > > >>>
> > > >>> Best,
> > > >>> David
> > > >>>
> > > >>> [1]
> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > > >>>
> > > >>> [2]
> > > >>>
> > > >>>
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > > >>>
> > > >>> [3]
> > > >>>
> > > >>>
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > > >>>
> > >
> > >
> >
>

Re: [VOTE] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-06-09 Thread Jing Zhang

+1 (binding)

Best,
Jing Zhang

Martijn Visser  于2022年6月9日周四 14:58写道：

> +1 (binding)
>
> Op do 9 jun. 2022 om 04:31 schreef Jingsong Li :
>
> > +1 (binding)
> >
> > Best,
> > Jingsong
> >
> > On Tue, Jun 7, 2022 at 5:20 PM Jark Wu  wrote:
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 7 Jun 2022 at 13:44, Jing Ge  wrote:
> > >
> > > > Hi Godfrey,
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > > On Tue, Jun 7, 2022 at 4:42 AM godfrey he 
> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Thanks for all the feedback so far. Based on the discussion[1] we
> > seem
> > > > > to have consensus, so I would like to start a vote on FLIP-231 for
> > > > > which the FLIP has now also been updated[2].
> > > > >
> > > > > The vote will last for at least 72 hours (Jun 10th 12:00 GMT)
> unless
> > > > > there is an objection or insufficient votes.
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/88kxk7lh8bq2s2c2qrf06f3pnf9fkxj2
> > > > > [2]
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
> > > > >
> > > > > Best,
> > > > > Godfrey
> > > > >
> > > >
> >
>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-09 Thread Martijn Visser

Hi Paul,

That's a fair point, but I still think we should not offer that capability
via the CLI either. But that's a different discussion :)

Thanks,

Martijn

Op do 9 jun. 2022 om 10:08 schreef Paul Lam :

> Hi Martijn,
>
> I think the `DROP SAVEPOINT` statement would not conflict with NO_CLAIM
> mode, since the statement is triggered by users instead of Flink runtime.
>
> We’re simply providing a tool for user to cleanup the savepoints, just
> like `bin/flink savepoint -d :savepointPath` in Flink CLI [1].
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#disposing-savepoints
>
> Best,
> Paul Lam
>
> 2022年6月9日 15:41，Martijn Visser  写道：
>
> Hi all,
>
> I would not include a DROP SAVEPOINT syntax. With the recently introduced
> CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot
> ownership and if you have a savepoint established "with NO_CLAIM it creates
> its own copy and leaves the existing one up to the user." [1] We shouldn't
> then again make it fuzzy by making it possible that Flink can remove
> snapshots.
>
> Best regards,
>
> Martijn
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
>
> Op do 9 jun. 2022 om 09:27 schreef Paul Lam :
>
>> Hi team,
>>
>> It's great to see our opinions are finally converging!
>>
>> `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
>>
>>
>> LGTM. Adding it to the FLIP.
>>
>> To Jark,
>>
>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ”
>>
>>
>> Good point. The default savepoint dir should be enough for most cases.
>>
>> To Jing,
>>
>> DROP SAVEPOINT ALL
>>
>>
>> I think it’s valid to have such a statement, but I have two concerns:
>>
>>- `ALL` is already an SQL keyword, thus it may cause ambiguity.
>>- Flink CLI and REST API doesn’t provided the corresponding
>>functionalities, and we’d better keep them aligned.
>>
>> How about making this statement as follow-up tasks which should touch
>> REST API and Flink CLI?
>>
>> Best,
>> Paul Lam
>>
>> 2022年6月9日 11:53，godfrey he  写道：
>>
>> Hi all,
>>
>> Regarding `PIPELINE`, it comes from flink-core module, see
>> `PipelineOptions` class for more details.
>> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with
>> `JOBS`.
>>
>> +1 to discuss JOBTREE in other FLIP.
>>
>> +1 to `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
>>
>> +1 to `CREATE SAVEPOINT FOR JOB ` and `DROP SAVEPOINT
>> `
>>
>> Best,
>> Godfrey
>>
>> Jing Ge  于2022年6月9日周四 01:48写道：
>>
>>
>> Hi Paul, Hi Jark,
>>
>> Re JOBTREE, agree that it is out of the scope of this FLIP
>>
>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP
>> SAVEPOINT ALL' housekeeping. WDYT?
>>
>> Best regards,
>> Jing
>>
>>
>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu  wrote:
>>
>>
>> Hi Jing,
>>
>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of
>> the scope
>> of this FLIP and can be discussed in another FLIP.
>>
>> Job lineage is a big topic that may involve many problems:
>> 1) how to collect and report job entities, attributes, and lineages?
>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
>> 3) how does Flink SQL CLI/Gateway know the lineage information and show
>> jobtree?
>> 4) ...
>>
>> Best,
>> Jark
>>
>> On Wed, 8 Jun 2022 at 20:44, Jark Wu  wrote:
>>
>>
>> Hi Paul,
>>
>> I'm fine with using JOBS. The only concern is that this may conflict with
>> displaying more detailed
>> information for query (e.g. query content, plan) in the future, e.g. SHOW
>> QUERIES EXTENDED in ksqldb[1].
>> This is not a big problem as we can introduce SHOW QUERIES in the future
>> if necessary.
>>
>> STOP JOBS  (with options `table.job.stop-with-savepoint` and
>> `table.job.stop-with-drain`)
>>
>> What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
>> It might be trivial and error-prone to set configuration before executing
>> a statement,
>> and the configuration will affect all statements after that.
>>
>> CREATE SAVEPOINT  FOR JOB 
>>
>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
>> and always use configuration "state.savepoints.dir" as the default
>> savepoint dir.
>> The concern with using "" is here should be savepoint dir,
>> and savepoint_path is the returned value.
>>
>> I'm fine with other changes.
>>
>> Thanks,
>> Jark
>>
>> [1]:
>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>>
>>
>>
>> On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:
>>
>>
>> Hi Jing,
>>
>> Thank you for your inputs!
>>
>> TBH, I haven’t considered the ETL scenario that you mentioned. I think
>> they’re managed just like other jobs interns of job lifecycles (please
>> correct me if I’m wrong).
>>
>> WRT to the SQL statements about SQL lineages, I think it might be a
>> little bit out of the scope of the FLIP, since it’s mainly about
>> lifecycles. By the way, do we have these functionalities in Flink CLI or
>> REST API already?
>>
>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry f

[jira] [Created] (FLINK-27969) StreamPhysicalOverAggregate doesn't support consuming update and delete changes

2022-06-09 Thread Spongebob (Jira)

Spongebob created FLINK-27969:
-

 Summary: StreamPhysicalOverAggregate doesn't support consuming 
update and delete changes
 Key: FLINK-27969
 URL: https://issues.apache.org/jira/browse/FLINK-27969
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.14.3
Reporter: Spongebob


Exception trace:
{code:java}
// exception
StreamPhysicalOverAggregate doesn't support consuming update and delete changes 
which is produced by node Join(joinType=[LeftOuterJoin], where=[(COL2 = COL4)], 
select=[...], leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey]) {code}
FlinkSQL that scheduled as streaming table like this:
{code:java}
// dml
SELECT RANK() OVER (PARTITION BY A.COL1 ORDER BY A.COL2) AS ODER_ONUM
FROM A
INNER JOIN B ON A.COL1 = B.COL1
LEFT JOIN C ON C.COL3 = 1 AND CAST(A.COL2 AS STRING) = C.COL4{code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [VOTE] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-06-09 Thread Leonard Xu

+1 (binding)

minor typo: the interface in FLIP title should be SupportsStatisticReport as we 
discussed before

Best,
Leonard

> 2022年6月9日 下午4:32，Jing Zhang  写道：
> 
> +1 (binding)
> 
> Best,
> Jing Zhang
> 
> Martijn Visser  于2022年6月9日周四 14:58写道：
> 
>> +1 (binding)
>> 
>> Op do 9 jun. 2022 om 04:31 schreef Jingsong Li :
>> 
>>> +1 (binding)
>>> 
>>> Best,
>>> Jingsong
>>> 
>>> On Tue, Jun 7, 2022 at 5:20 PM Jark Wu  wrote:
 
 +1 (binding)
 
 Best,
 Jark
 
 On Tue, 7 Jun 2022 at 13:44, Jing Ge  wrote:
 
> Hi Godfrey,
> 
> +1 (non-binding)
> 
> Best regards,
> Jing
> 
> 
> On Tue, Jun 7, 2022 at 4:42 AM godfrey he 
>> wrote:
> 
>> Hi everyone,
>> 
>> Thanks for all the feedback so far. Based on the discussion[1] we
>>> seem
>> to have consensus, so I would like to start a vote on FLIP-231 for
>> which the FLIP has now also been updated[2].
>> 
>> The vote will last for at least 72 hours (Jun 10th 12:00 GMT)
>> unless
>> there is an objection or insufficient votes.
>> 
>> [1]
>> https://lists.apache.org/thread/88kxk7lh8bq2s2c2qrf06f3pnf9fkxj2
>> [2]
>> 
> 
>>> 
>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
>> 
>> Best,
>> Godfrey
>> 
> 
>>> 
>>

[jira] [Created] (FLINK-27970) [JUnit5 Migration] Module: flink-hadoop-buik

2022-06-09 Thread Ryan Skraba (Jira)

Ryan Skraba created FLINK-27970:
---

 Summary: [JUnit5 Migration] Module: flink-hadoop-buik
 Key: FLINK-27970
 URL: https://issues.apache.org/jira/browse/FLINK-27970
 Project: Flink
  Issue Type: Sub-task
Reporter: Ryan Skraba






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27971) [JUnit5 Migration] Module: flink-json

2022-06-09 Thread Ryan Skraba (Jira)

Ryan Skraba created FLINK-27971:
---

 Summary: [JUnit5 Migration] Module: flink-json
 Key: FLINK-27971
 URL: https://issues.apache.org/jira/browse/FLINK-27971
 Project: Flink
  Issue Type: Sub-task
Reporter: Ryan Skraba






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27972) Race condition between task/savepoint notification failure

2022-06-09 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-27972:


 Summary: Race condition between task/savepoint notification failure
 Key: FLINK-27972
 URL: https://issues.apache.org/jira/browse/FLINK-27972
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.15.0
Reporter: Chesnay Schepler


When a task throws an exception in notifyCheckpointComplete we send 2 messages 
to the JobManager:
1) we inform the CheckpointCoordinator about the failed savepoint
2) we inform the scheduler about the failed task.

Depending on how these arrive the adaptive scheduler exhibits different 
behaviors. If 1) arrives first it properly informs the user about the created 
savepoint which might contain uncommitted transactions; if 2) arrives first it 
just restarts the job.

I'm not sure how big of an issue the latter case is.

In any case we might want to consider having the StopWithSavepoint state wait 
until the savepoint future has failed before doing anything else.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27973) Introduce TableWrite and TableCommit as an abstraction layer above FileStore for writing RowData

2022-06-09 Thread Caizhi Weng (Jira)

Caizhi Weng created FLINK-27973:
---

 Summary: Introduce TableWrite and TableCommit as an abstraction 
layer above FileStore for writing RowData
 Key: FLINK-27973
 URL: https://issues.apache.org/jira/browse/FLINK-27973
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Affects Versions: table-store-0.2.0
Reporter: Caizhi Weng


In this step we introduce {{TableWrite}} and {{TableCommit}}. They are an 
abstraction layer above {{FileStoreWrite}}, {{FileStoreCommit}} and 
{{FileStoreExpire}} to provide RowData writing and committing.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Lijie Wang

Hi all,

FYI, currently, some commonly used methods in StreamExecutionEnvironment
are still based on the old SourceFunction (and there is no alternative):
`StreamExecutionEnvironment#readFile(...)`
`StreamExecutionEnvironment#readTextFile(...)`

I think these should be migrated to the new source API before deprecate the
SourceFunction.

Best,
Lijie

Martijn Visser  于2022年6月9日周四 16:05写道：

> Hi all,
>
> I think implicitly we've already considered the SourceFunction and
> SinkFunction as deprecated. They are even marked as so on the Flink roadmap
> [1]. That also shows that connectors that are using these interfaces are
> either approaching end-of-life. The fact that we're actively migrating
> connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on FLIPs)
> shows that we've already determined that target.
>
> With regards to the motivation of FLIP-27, I think reading up on the
> original discussion thread is also worthwhile [2] to see more context.
> FLIP-27 was also very important as it brought a unified connector which can
> support both streaming and batch (with batch being considered a special
> case of streaming in Flink's vision).
>
> So +1 to deprecate SourceFunction. I would also argue that we should
> already mark the SinkFunction as deprecated to avoid having this discussion
> again in a couple of months.
>
> Best regards,
>
> Martijn
>
> [1] https://flink.apache.org/roadmap.html
> [2] https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y
>
> Op do 9 jun. 2022 om 09:48 schreef Jing Ge :
>
> > Hi,
> >
> > I am very happy to see opinions from different perspectives. That will
> help
> > us understand the problem better. Thanks all for the informative
> > discussion.
> >
> > Let's see the big picture and check following facts together:
> >
> > 1. FLIP-27 was intended to solve some technical issues that are very
> > difficult to solve with SourceFunction[1]. When we say "SourceFunction is
> > easy", well, it depends. If we take a look at the implementation of the
> > Kafka connector, we will know how complicated it is to build a serious
> > connector for production with the old SourceFunction. To every problem
> > there is a solution and to every solution there is a problem. The fact is
> > that there is no perfect but a feasible solution. If we try to solve
> > complicated problems, we have to expose some complexity. Comparing to
> > connectors for POC, demo, training(no offense), I would also solve issues
> > for connectors like Kafka connector that are widely used in production
> with
> > higher priority. I think that should be one reason why FLIP-27 has been
> > designed and why the new source API went public.
> >
> > 2. FLIP-27 and the implementation was introduced roughly at the end of
> 2019
> > and went public on 19.04.2021, which means Flink has provided two
> different
> > public/graduated source solutions for more than one year. On the day that
> > the new source API went public, there should be a consensus in the
> > community that we should start the migration. Old SourceFunction
> interface,
> > in the ideal case, should have been deprecated on that day, otherwise we
> > should not graduate the new source API to avoid confusing (connector)
> > developers[2].
> >
> > 3. It is true that the new source API is hard to understand and even hard
> > to implement for simple cases. Thanks for the feedback. That is something
> > we need to improve. The current design&implementation could be considered
> > as the low level API. The next step is to create the high level API to
> > reduce some unnecessary complexity for those simple cases. But, IMHO,
> this
> > should not be the prerequisite to postpone the deprecation of the old
> > SourceFunction APIs.
> >
> > 4. As long as the old SourceFunction is not marked as deprecated,
> > developers will continue asking which one should be used. Let's make a
> > concrete example. If a new connector is developed now and the developer
> > asks for a suggestion of the choice between the old and new source API on
> > the ML, which one should we suggest? I think it should be the new Source
> > API. If a fresh new connector has been developed with the old
> > SourceFunction API before asking for the consensus in the community and
> the
> > developer wants to merge it to the master. Should we allow it? If the
> > answer of all these questions is pointing to the new Source API, the old
> > SourceFunction is de facto already deprecated, just has not been marked
> as
> > @deprecated, which confuses developers even more.
> >
> >  Best regards,
> > Jing
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > [2] https://lists.apache.org/thread/7okp4y46n3o3rx5mn0t3qobrof8zxwqs
> >
> > On Wed, Jun 8, 2022 at 2:21 AM Alexander Fedulov <
> alexan...@ververica.com>
> > wrote:
> >
> > > Hey Austin,
> > >
> > > Since we are getting deeper into the implementation details of the
> > > DataGeneratorSou

Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-09 Thread Alexander Fedulov

Hi Martijn,

It seems that they serve a bit different purposes though. The
DataGenTableSource is for generating random data described by the Table DDL
and is tied into the RowDataGenerator/DataGenerator concept which is
implemented as an Iterator.  The proposed API in contrast is supposed to
provide users with an easy way to supply their custom data. Another
difference is that a DataGenerator is supposed to be stateful and has to
snapshot its state, whereas the proposed API is purely driven by the input
index IDs and can be stateless yet remain deterministic. Are you sure it is
a good idea to mix them into the same API? We could think of using a
different name to make it less confusing for the users (something like
DataSupplierSource).

Best,
Alexander Fedulov

On Thu, Jun 9, 2022 at 9:14 AM Martijn Visser 
wrote:

> Hi Alex,
>
> Thanks for creating the FLIP and opening up the discussion. +1 overall for
> getting this in place.
>
> One question: you've already mentioned that this focussed on the DataStream
> API. I think it would be a bit confusing that we have a Datagen connector
> (on the Table side) that wouldn't leverage this target interface. I think
> it would be good if we could already have one generic Datagen connector
> which works for both DataStream API (so that would be a new one in the
> Flink repo) and that the Datagen in the Table landscape is using this
> target interface too. What do you think?
>
> Best regards,
>
> Martijn
>
> Op wo 8 jun. 2022 om 14:21 schreef Alexander Fedulov <
> alexan...@ververica.com>:
>
> > Hi Xianxun,
> >
> > Thanks for bringing it up. I do believe it would be useful to have such a
> > CDC data generator but I see the
> > efforts to provide one a bit orthogonal to the DataSourceGenerator
> proposed
> > in the FLIP. FLIP-238 focuses
> > on the DataStream API and I could see integration into the Table/SQL
> > ecosystem as the next step that I would
> > prefer to keep separate (see KafkaDynamicSource reusing
> > KafkaSource
> > under the hood [1]).
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/3adca15859b36c61116fc56fabc24e9647f0ec5f/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/KafkaDynamicSource.java#L223
> >
> > Best,
> > Alexander Fedulov
> >
> >
> >
> >
> > On Wed, Jun 8, 2022 at 11:01 AM Xianxun Ye  wrote:
> >
> > > Hey Alexander,
> > >
> > > Making datagen source connector easier to use is really helpful during
> > > doing some PoC/Demo.
> > > And I thought about is it possible to produce a changelog stream by
> > > datagen source, so a new flink developer can practice flink sql with
> cdc
> > > data using Flink SQL Client CLI.
> > > In the flink-examples-table module, a ChangelogSocketExample class[1]
> > > describes how to ingest delete or insert data by 'nc' command. Can we
> > > support producing a changelog stream by the new datagen source?
> > >
> > > [1]
> > >
> >
> https://github.com/apache/flink/blob/master/flink-examples/flink-examples-table/src/main/java/org/apache/flink/table/examples/java/connectors/ChangelogSocketExample.java#L79
> > >
> > > Best regards,
> > >
> > > Xianxun
> > >
> > > On 06/8/2022 08:10，Alexander Fedulov
> > >  wrote：
> > >
> > > I looked a bit further and it seems it should actually be easier than I
> > > initially thought:  SourceReader extends CheckpointListener interface
> and
> > > with its custom implementation it should be possible to achieve similar
> > > results. A prototype that I have for the generator uses an
> > > IteratorSourceReader
> > > under the hood by default but we could consider adding the ability to
> > > supply something like a DataGeneratorSourceReaderFactory that would
> allow
> > > provisioning the DataGeneratorSource with customized implementations
> for
> > > cases like this.
> > >
> > > Best,
> > > Alexander Fedulov
> > >
> > > On Wed, Jun 8, 2022 at 12:58 AM Alexander Fedulov <
> > alexan...@ververica.com
> > > >
> > > wrote:
> > >
> > > Hi Steven,
> > >
> > > This is going to be tricky since in the new Source API the
> checkpointing
> > > aspects that you based your logic on are pushed further away from the
> > > low-level interfaces responsible for handling data and splits [1]. At
> the
> > > same time, the SourceCoordinatorProvider is hardwired into the
> internals
> > > of the framework, so I don't think it will be possible to provide a
> > > customized implementation for testing purposes.
> > >
> > > The only chance to tie data generation to checkpointing in the new
> Source
> > > API that I see at the moment is via the SplitEnumerator serializer (
> > > getEnumeratorCheckpointSerializer() method) [2]. In theory, it should
> be
> > > possible to share a variable visible both to the generator function and
> > to
> > > the serializer and manipulate it whenever the serialize() method gets
> > > called upon a checkpoint request. That said, you still won't get
> > > notifications of successful checkpoints that you currently use

Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-09 Thread Martijn Visser

Hey Alex,

Yes, I think we need to make sure that we're not causing confusion (I know
I already was confused). I think the DataSupplierSource is already better,
but perhaps there are others who have an even better idea.

Thanks,

Martijn

Op do 9 jun. 2022 om 14:28 schreef Alexander Fedulov <
alexan...@ververica.com>:

> Hi Martijn,
>
> It seems that they serve a bit different purposes though. The
> DataGenTableSource is for generating random data described by the Table
> DDL and is tied into the RowDataGenerator/DataGenerator concept which is
> implemented as an Iterator.  The proposed API in contrast is supposed
> to provide users with an easy way to supply their custom data. Another
> difference is that a DataGenerator is supposed to be stateful and has to
> snapshot its state, whereas the proposed API is purely driven by the input
> index IDs and can be stateless yet remain deterministic. Are you sure it
> is a good idea to mix them into the same API? We could think of using a
> different name to make it less confusing for the users (something like
> DataSupplierSource).
>
> Best,
> Alexander Fedulov
>
> On Thu, Jun 9, 2022 at 9:14 AM Martijn Visser 
> wrote:
>
>> Hi Alex,
>>
>> Thanks for creating the FLIP and opening up the discussion. +1 overall for
>> getting this in place.
>>
>> One question: you've already mentioned that this focussed on the
>> DataStream
>> API. I think it would be a bit confusing that we have a Datagen connector
>> (on the Table side) that wouldn't leverage this target interface. I think
>> it would be good if we could already have one generic Datagen connector
>> which works for both DataStream API (so that would be a new one in the
>> Flink repo) and that the Datagen in the Table landscape is using this
>> target interface too. What do you think?
>>
>> Best regards,
>>
>> Martijn
>>
>> Op wo 8 jun. 2022 om 14:21 schreef Alexander Fedulov <
>> alexan...@ververica.com>:
>>
>> > Hi Xianxun,
>> >
>> > Thanks for bringing it up. I do believe it would be useful to have such
>> a
>> > CDC data generator but I see the
>> > efforts to provide one a bit orthogonal to the DataSourceGenerator
>> proposed
>> > in the FLIP. FLIP-238 focuses
>> > on the DataStream API and I could see integration into the Table/SQL
>> > ecosystem as the next step that I would
>> > prefer to keep separate (see KafkaDynamicSource reusing
>> > KafkaSource
>> > under the hood [1]).
>> >
>> > [1]
>> >
>> >
>> https://github.com/apache/flink/blob/3adca15859b36c61116fc56fabc24e9647f0ec5f/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/KafkaDynamicSource.java#L223
>> >
>> > Best,
>> > Alexander Fedulov
>> >
>> >
>> >
>> >
>> > On Wed, Jun 8, 2022 at 11:01 AM Xianxun Ye  wrote:
>> >
>> > > Hey Alexander,
>> > >
>> > > Making datagen source connector easier to use is really helpful during
>> > > doing some PoC/Demo.
>> > > And I thought about is it possible to produce a changelog stream by
>> > > datagen source, so a new flink developer can practice flink sql with
>> cdc
>> > > data using Flink SQL Client CLI.
>> > > In the flink-examples-table module, a ChangelogSocketExample class[1]
>> > > describes how to ingest delete or insert data by 'nc' command. Can we
>> > > support producing a changelog stream by the new datagen source?
>> > >
>> > > [1]
>> > >
>> >
>> https://github.com/apache/flink/blob/master/flink-examples/flink-examples-table/src/main/java/org/apache/flink/table/examples/java/connectors/ChangelogSocketExample.java#L79
>> > >
>> > > Best regards,
>> > >
>> > > Xianxun
>> > >
>> > > On 06/8/2022 08:10，Alexander Fedulov
>> > >  wrote：
>> > >
>> > > I looked a bit further and it seems it should actually be easier than
>> I
>> > > initially thought:  SourceReader extends CheckpointListener interface
>> and
>> > > with its custom implementation it should be possible to achieve
>> similar
>> > > results. A prototype that I have for the generator uses an
>> > > IteratorSourceReader
>> > > under the hood by default but we could consider adding the ability to
>> > > supply something like a DataGeneratorSourceReaderFactory that would
>> allow
>> > > provisioning the DataGeneratorSource with customized implementations
>> for
>> > > cases like this.
>> > >
>> > > Best,
>> > > Alexander Fedulov
>> > >
>> > > On Wed, Jun 8, 2022 at 12:58 AM Alexander Fedulov <
>> > alexan...@ververica.com
>> > > >
>> > > wrote:
>> > >
>> > > Hi Steven,
>> > >
>> > > This is going to be tricky since in the new Source API the
>> checkpointing
>> > > aspects that you based your logic on are pushed further away from the
>> > > low-level interfaces responsible for handling data and splits [1]. At
>> the
>> > > same time, the SourceCoordinatorProvider is hardwired into the
>> internals
>> > > of the framework, so I don't think it will be possible to provide a
>> > > customized implementation for testing purposes.
>> > >
>> > > The only chance to tie data generation to checkpointing in

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Martijn Visser

Hi Lijie,

I don't see any problem with deprecating those methods at this moment, as
long as we don't remove them until the replacements are available. Besides
that, are we sure there are no replacements already, especially with the
new FileSource?

Best regards,

Martijn

Op do 9 jun. 2022 om 14:23 schreef Lijie Wang :

> Hi all,
>
> FYI, currently, some commonly used methods in StreamExecutionEnvironment
> are still based on the old SourceFunction (and there is no alternative):
> `StreamExecutionEnvironment#readFile(...)`
> `StreamExecutionEnvironment#readTextFile(...)`
>
> I think these should be migrated to the new source API before deprecate the
> SourceFunction.
>
> Best,
> Lijie
>
> Martijn Visser  于2022年6月9日周四 16:05写道：
>
> > Hi all,
> >
> > I think implicitly we've already considered the SourceFunction and
> > SinkFunction as deprecated. They are even marked as so on the Flink
> roadmap
> > [1]. That also shows that connectors that are using these interfaces are
> > either approaching end-of-life. The fact that we're actively migrating
> > connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on
> FLIPs)
> > shows that we've already determined that target.
> >
> > With regards to the motivation of FLIP-27, I think reading up on the
> > original discussion thread is also worthwhile [2] to see more context.
> > FLIP-27 was also very important as it brought a unified connector which
> can
> > support both streaming and batch (with batch being considered a special
> > case of streaming in Flink's vision).
> >
> > So +1 to deprecate SourceFunction. I would also argue that we should
> > already mark the SinkFunction as deprecated to avoid having this
> discussion
> > again in a couple of months.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1] https://flink.apache.org/roadmap.html
> > [2] https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y
> >
> > Op do 9 jun. 2022 om 09:48 schreef Jing Ge :
> >
> > > Hi,
> > >
> > > I am very happy to see opinions from different perspectives. That will
> > help
> > > us understand the problem better. Thanks all for the informative
> > > discussion.
> > >
> > > Let's see the big picture and check following facts together:
> > >
> > > 1. FLIP-27 was intended to solve some technical issues that are very
> > > difficult to solve with SourceFunction[1]. When we say "SourceFunction
> is
> > > easy", well, it depends. If we take a look at the implementation of the
> > > Kafka connector, we will know how complicated it is to build a serious
> > > connector for production with the old SourceFunction. To every problem
> > > there is a solution and to every solution there is a problem. The fact
> is
> > > that there is no perfect but a feasible solution. If we try to solve
> > > complicated problems, we have to expose some complexity. Comparing to
> > > connectors for POC, demo, training(no offense), I would also solve
> issues
> > > for connectors like Kafka connector that are widely used in production
> > with
> > > higher priority. I think that should be one reason why FLIP-27 has been
> > > designed and why the new source API went public.
> > >
> > > 2. FLIP-27 and the implementation was introduced roughly at the end of
> > 2019
> > > and went public on 19.04.2021, which means Flink has provided two
> > different
> > > public/graduated source solutions for more than one year. On the day
> that
> > > the new source API went public, there should be a consensus in the
> > > community that we should start the migration. Old SourceFunction
> > interface,
> > > in the ideal case, should have been deprecated on that day, otherwise
> we
> > > should not graduate the new source API to avoid confusing (connector)
> > > developers[2].
> > >
> > > 3. It is true that the new source API is hard to understand and even
> hard
> > > to implement for simple cases. Thanks for the feedback. That is
> something
> > > we need to improve. The current design&implementation could be
> considered
> > > as the low level API. The next step is to create the high level API to
> > > reduce some unnecessary complexity for those simple cases. But, IMHO,
> > this
> > > should not be the prerequisite to postpone the deprecation of the old
> > > SourceFunction APIs.
> > >
> > > 4. As long as the old SourceFunction is not marked as deprecated,
> > > developers will continue asking which one should be used. Let's make a
> > > concrete example. If a new connector is developed now and the developer
> > > asks for a suggestion of the choice between the old and new source API
> on
> > > the ML, which one should we suggest? I think it should be the new
> Source
> > > API. If a fresh new connector has been developed with the old
> > > SourceFunction API before asking for the consensus in the community and
> > the
> > > developer wants to merge it to the master. Should we allow it? If the
> > > answer of all these questions is pointing to the new Source API, the
> old
> > > SourceFunction is de

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Lijie Wang

 Hi Martijn,

I don't mean it's a blocker. Just a information. And I'm also +1 for this.

Put it another way: should we migrate the `#readFile(...)` to new API or
provide a similar method "readxxx“ based on the new Source API?

And if we don't migrate it, does it mean that the `#readFile(...)` should
also be marked as deprecated?

Best,
Lijie

Martijn Visser  于2022年6月9日周四 21:03写道：

> Hi Lijie,
>
> I don't see any problem with deprecating those methods at this moment, as
> long as we don't remove them until the replacements are available. Besides
> that, are we sure there are no replacements already, especially with the
> new FileSource?
>
> Best regards,
>
> Martijn
>
> Op do 9 jun. 2022 om 14:23 schreef Lijie Wang :
>
> > Hi all,
> >
> > FYI, currently, some commonly used methods in StreamExecutionEnvironment
> > are still based on the old SourceFunction (and there is no alternative):
> > `StreamExecutionEnvironment#readFile(...)`
> > `StreamExecutionEnvironment#readTextFile(...)`
> >
> > I think these should be migrated to the new source API before deprecate
> the
> > SourceFunction.
> >
> > Best,
> > Lijie
> >
> > Martijn Visser  于2022年6月9日周四 16:05写道：
> >
> > > Hi all,
> > >
> > > I think implicitly we've already considered the SourceFunction and
> > > SinkFunction as deprecated. They are even marked as so on the Flink
> > roadmap
> > > [1]. That also shows that connectors that are using these interfaces
> are
> > > either approaching end-of-life. The fact that we're actively migrating
> > > connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on
> > FLIPs)
> > > shows that we've already determined that target.
> > >
> > > With regards to the motivation of FLIP-27, I think reading up on the
> > > original discussion thread is also worthwhile [2] to see more context.
> > > FLIP-27 was also very important as it brought a unified connector which
> > can
> > > support both streaming and batch (with batch being considered a special
> > > case of streaming in Flink's vision).
> > >
> > > So +1 to deprecate SourceFunction. I would also argue that we should
> > > already mark the SinkFunction as deprecated to avoid having this
> > discussion
> > > again in a couple of months.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > [1] https://flink.apache.org/roadmap.html
> > > [2] https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y
> > >
> > > Op do 9 jun. 2022 om 09:48 schreef Jing Ge :
> > >
> > > > Hi,
> > > >
> > > > I am very happy to see opinions from different perspectives. That
> will
> > > help
> > > > us understand the problem better. Thanks all for the informative
> > > > discussion.
> > > >
> > > > Let's see the big picture and check following facts together:
> > > >
> > > > 1. FLIP-27 was intended to solve some technical issues that are very
> > > > difficult to solve with SourceFunction[1]. When we say
> "SourceFunction
> > is
> > > > easy", well, it depends. If we take a look at the implementation of
> the
> > > > Kafka connector, we will know how complicated it is to build a
> serious
> > > > connector for production with the old SourceFunction. To every
> problem
> > > > there is a solution and to every solution there is a problem. The
> fact
> > is
> > > > that there is no perfect but a feasible solution. If we try to solve
> > > > complicated problems, we have to expose some complexity. Comparing to
> > > > connectors for POC, demo, training(no offense), I would also solve
> > issues
> > > > for connectors like Kafka connector that are widely used in
> production
> > > with
> > > > higher priority. I think that should be one reason why FLIP-27 has
> been
> > > > designed and why the new source API went public.
> > > >
> > > > 2. FLIP-27 and the implementation was introduced roughly at the end
> of
> > > 2019
> > > > and went public on 19.04.2021, which means Flink has provided two
> > > different
> > > > public/graduated source solutions for more than one year. On the day
> > that
> > > > the new source API went public, there should be a consensus in the
> > > > community that we should start the migration. Old SourceFunction
> > > interface,
> > > > in the ideal case, should have been deprecated on that day, otherwise
> > we
> > > > should not graduate the new source API to avoid confusing (connector)
> > > > developers[2].
> > > >
> > > > 3. It is true that the new source API is hard to understand and even
> > hard
> > > > to implement for simple cases. Thanks for the feedback. That is
> > something
> > > > we need to improve. The current design&implementation could be
> > considered
> > > > as the low level API. The next step is to create the high level API
> to
> > > > reduce some unnecessary complexity for those simple cases. But, IMHO,
> > > this
> > > > should not be the prerequisite to postpone the deprecation of the old
> > > > SourceFunction APIs.
> > > >
> > > > 4. As long as the old SourceFunction is not marked as deprecated,
> > > > developers will co

[jira] [Created] (FLINK-27974) Potentially wrong classloader being used to create dynamic table sources

2022-06-09 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-27974:


 Summary: Potentially wrong classloader being used to create 
dynamic table sources
 Key: FLINK-27974
 URL: https://issues.apache.org/jira/browse/FLINK-27974
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.0
Reporter: Chesnay Schepler


A user reported an issue on slack where a job fails in the CLI because of 
{{ClassNotFoundException: 
org.apache.flink.table.planner.delegation.ParserFactory}} in 
{{FileSystemTableFactory#formatFactoryExists}} when trying to load the 
{{Factory}} service.

While looking through the call stack I noticed that the classloader passed via 
the context is a thread's context classloader, set in 
{{CatalogSourceTable#createDynamicTableSource}}.

This seems a bit fishy; since this runs in the context of the CLI this CL is 
likely the user CL, but the planner classes are loaded in a separate 
classloader (not in the parent). As a result the planner classes cannot be 
looked up via the service loader mechanism.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27975) Remove unnecessary RBAC rules from operator

2022-06-09 Thread Jira

Márton Balassi created FLINK-27975:
--

 Summary: Remove unnecessary RBAC rules from operator
 Key: FLINK-27975
 URL: https://issues.apache.org/jira/browse/FLINK-27975
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Márton Balassi
 Fix For: kubernetes-operator-1.1.0


[~jeesmon] reported the following RBAC rules obsolete:

{code}
 - apiGroups:
  - flink-operator
resources:
  - "*"
verbs:
  - "*"
{code}

https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/templates/rbac.yaml#L24-L29

Also * on nodes was flagged in his security review, rightfully. The rule seems 
too permissive in my opinion too. As far as I remember it was needed for our 
services potentially using NodePort (we use ClusterIp by default). This should 
be properly verified and tidied up. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27976) [WebUi] Allow order by jobname

2022-06-09 Thread Jira

João Boto created FLINK-27976:
-

 Summary: [WebUi] Allow order by jobname
 Key: FLINK-27976
 URL: https://issues.apache.org/jira/browse/FLINK-27976
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Reporter: João Boto


Allow to order jobs (running and canceled) by job name



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Ingo Bürk


Hi,

these APIs don't expose the underlying source directly, so I don't think 
we need to worry about deprecating them as well. There's also nothing 
inherently wrong with using a deprecated API internally, though even 
just for the experience of using our own new APIs I would personally say 
that they should be migrated to the new Source API. It's hard to reason 
that users must migrate to a new API if we don't do it internally as well.



Best
Ingo

On 09.06.22 15:41, Lijie Wang wrote:

  Hi Martijn,

I don't mean it's a blocker. Just a information. And I'm also +1 for this.

Put it another way: should we migrate the `#readFile(...)` to new API or
provide a similar method "readxxx“ based on the new Source API?

And if we don't migrate it, does it mean that the `#readFile(...)` should
also be marked as deprecated?

Best,
Lijie

Martijn Visser  于2022年6月9日周四 21:03写道：


Hi Lijie,

I don't see any problem with deprecating those methods at this moment, as
long as we don't remove them until the replacements are available. Besides
that, are we sure there are no replacements already, especially with the
new FileSource?

Best regards,

Martijn

Op do 9 jun. 2022 om 14:23 schreef Lijie Wang :


Hi all,

FYI, currently, some commonly used methods in StreamExecutionEnvironment
are still based on the old SourceFunction (and there is no alternative):
`StreamExecutionEnvironment#readFile(...)`
`StreamExecutionEnvironment#readTextFile(...)`

I think these should be migrated to the new source API before deprecate

the

SourceFunction.

Best,
Lijie

Martijn Visser  于2022年6月9日周四 16:05写道：


Hi all,

I think implicitly we've already considered the SourceFunction and
SinkFunction as deprecated. They are even marked as so on the Flink

roadmap

[1]. That also shows that connectors that are using these interfaces

are

either approaching end-of-life. The fact that we're actively migrating
connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on

FLIPs)

shows that we've already determined that target.

With regards to the motivation of FLIP-27, I think reading up on the
original discussion thread is also worthwhile [2] to see more context.
FLIP-27 was also very important as it brought a unified connector which

can

support both streaming and batch (with batch being considered a special
case of streaming in Flink's vision).

So +1 to deprecate SourceFunction. I would also argue that we should
already mark the SinkFunction as deprecated to avoid having this

discussion

again in a couple of months.

Best regards,

Martijn

[1] https://flink.apache.org/roadmap.html
[2] https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y

Op do 9 jun. 2022 om 09:48 schreef Jing Ge :


Hi,

I am very happy to see opinions from different perspectives. That

will

help

us understand the problem better. Thanks all for the informative
discussion.

Let's see the big picture and check following facts together:

1. FLIP-27 was intended to solve some technical issues that are very
difficult to solve with SourceFunction[1]. When we say

"SourceFunction

is

easy", well, it depends. If we take a look at the implementation of

the

Kafka connector, we will know how complicated it is to build a

serious

connector for production with the old SourceFunction. To every

problem

there is a solution and to every solution there is a problem. The

fact

is

that there is no perfect but a feasible solution. If we try to solve
complicated problems, we have to expose some complexity. Comparing to
connectors for POC, demo, training(no offense), I would also solve

issues

for connectors like Kafka connector that are widely used in

production

with

higher priority. I think that should be one reason why FLIP-27 has

been

designed and why the new source API went public.

2. FLIP-27 and the implementation was introduced roughly at the end

of

2019

and went public on 19.04.2021, which means Flink has provided two

different

public/graduated source solutions for more than one year. On the day

that

the new source API went public, there should be a consensus in the
community that we should start the migration. Old SourceFunction

interface,

in the ideal case, should have been deprecated on that day, otherwise

we

should not graduate the new source API to avoid confusing (connector)
developers[2].

3. It is true that the new source API is hard to understand and even

hard

to implement for simple cases. Thanks for the feedback. That is

something

we need to improve. The current design&implementation could be

considered

as the low level API. The next step is to create the high level API

to

reduce some unnecessary complexity for those simple cases. But, IMHO,

this

should not be the prerequisite to postpone the deprecation of the old
SourceFunction APIs.

4. As long as the old SourceFunction is not marked as deprecated,
developers will continue asking which one should be used. Let's make

a

concrete example. If a new connector is develope

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Alexander Fedulov

Hi all,

It seems that there is some understandable cautiousness with regard to
deprecating methods and subclasses that do not have alternatives just yet.

We should probably first agree if it is in general OK for Flink to use
@Deprecated
annotation for parts of the code that do not have alternatives. In that
case,
we could add a comment along the lines of:
"This implementation is based on a deprecated SourceFunction API that
will gradually be phased out from Flink. No direct substitute exists at the
moment.
If you want to have a more future-proof solution, consider helping the
project by
contributing an implementation based on the new Source API."

This should clearly communicate the message that usage of these
methods/classes
is discouraged and at the same time promote contributions for addressing
the gap.
What do you think?

Best,
Alexander Fedulov


On Thu, Jun 9, 2022 at 6:27 PM Ingo Bürk  wrote:

> Hi,
>
> these APIs don't expose the underlying source directly, so I don't think
> we need to worry about deprecating them as well. There's also nothing
> inherently wrong with using a deprecated API internally, though even
> just for the experience of using our own new APIs I would personally say
> that they should be migrated to the new Source API. It's hard to reason
> that users must migrate to a new API if we don't do it internally as well.
>
>
> Best
> Ingo
>
> On 09.06.22 15:41, Lijie Wang wrote:
> >   Hi Martijn,
> >
> > I don't mean it's a blocker. Just a information. And I'm also +1 for
> this.
> >
> > Put it another way: should we migrate the `#readFile(...)` to new API or
> > provide a similar method "readxxx“ based on the new Source API?
> >
> > And if we don't migrate it, does it mean that the `#readFile(...)` should
> > also be marked as deprecated?
> >
> > Best,
> > Lijie
> >
> > Martijn Visser  于2022年6月9日周四 21:03写道：
> >
> >> Hi Lijie,
> >>
> >> I don't see any problem with deprecating those methods at this moment,
> as
> >> long as we don't remove them until the replacements are available.
> Besides
> >> that, are we sure there are no replacements already, especially with the
> >> new FileSource?
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> Op do 9 jun. 2022 om 14:23 schreef Lijie Wang  >:
> >>
> >>> Hi all,
> >>>
> >>> FYI, currently, some commonly used methods in
> StreamExecutionEnvironment
> >>> are still based on the old SourceFunction (and there is no
> alternative):
> >>> `StreamExecutionEnvironment#readFile(...)`
> >>> `StreamExecutionEnvironment#readTextFile(...)`
> >>>
> >>> I think these should be migrated to the new source API before deprecate
> >> the
> >>> SourceFunction.
> >>>
> >>> Best,
> >>> Lijie
> >>>
> >>> Martijn Visser  于2022年6月9日周四 16:05写道：
> >>>
>  Hi all,
> 
>  I think implicitly we've already considered the SourceFunction and
>  SinkFunction as deprecated. They are even marked as so on the Flink
> >>> roadmap
>  [1]. That also shows that connectors that are using these interfaces
> >> are
>  either approaching end-of-life. The fact that we're actively migrating
>  connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on
> >>> FLIPs)
>  shows that we've already determined that target.
> 
>  With regards to the motivation of FLIP-27, I think reading up on the
>  original discussion thread is also worthwhile [2] to see more context.
>  FLIP-27 was also very important as it brought a unified connector
> which
> >>> can
>  support both streaming and batch (with batch being considered a
> special
>  case of streaming in Flink's vision).
> 
>  So +1 to deprecate SourceFunction. I would also argue that we should
>  already mark the SinkFunction as deprecated to avoid having this
> >>> discussion
>  again in a couple of months.
> 
>  Best regards,
> 
>  Martijn
> 
>  [1] https://flink.apache.org/roadmap.html
>  [2] https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y
> 
>  Op do 9 jun. 2022 om 09:48 schreef Jing Ge :
> 
> > Hi,
> >
> > I am very happy to see opinions from different perspectives. That
> >> will
>  help
> > us understand the problem better. Thanks all for the informative
> > discussion.
> >
> > Let's see the big picture and check following facts together:
> >
> > 1. FLIP-27 was intended to solve some technical issues that are very
> > difficult to solve with SourceFunction[1]. When we say
> >> "SourceFunction
> >>> is
> > easy", well, it depends. If we take a look at the implementation of
> >> the
> > Kafka connector, we will know how complicated it is to build a
> >> serious
> > connector for production with the old SourceFunction. To every
> >> problem
> > there is a solution and to every solution there is a problem. The
> >> fact
> >>> is
> > that there is no perfect but a feasible solution. If we try to solve
> > complicated problems, we have to

[jira] [Created] (FLINK-27977) SavePoints housekeeping API in Flink Cli, Rest API, SQL client

2022-06-09 Thread Jing Ge (Jira)

Jing Ge created FLINK-27977:
---

 Summary: SavePoints housekeeping API in Flink Cli, Rest API, SQL 
client
 Key: FLINK-27977
 URL: https://issues.apache.org/jira/browse/FLINK-27977
 Project: Flink
  Issue Type: Improvement
Reporter: Jing Ge


We ran into this issue that a lot of savepoints have been created by customers 
(via their apps). It will take extra (hacking) effort to clean it. 

We should support Savepoints housekeeping to delete all savepoints:
 # REST API - /savepoints-disposal{*}{*}
 # Flink CLI - {{$ ./bin/flink savepoint --disposeAll}}
 # SQL client - DROP SAVEPOINTS (alternative option could be DROP SAVEPOINT 
ALL, but ALL is a SQL keyword) 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-09 Thread Jing Ge

Hi Paul,

Fired a ticket: https://issues.apache.org/jira/browse/FLINK-27977 for
savepoints housekeeping.

Best regards,
Jing

On Thu, Jun 9, 2022 at 10:37 AM Martijn Visser 
wrote:

> Hi Paul,
>
> That's a fair point, but I still think we should not offer that capability
> via the CLI either. But that's a different discussion :)
>
> Thanks,
>
> Martijn
>
> Op do 9 jun. 2022 om 10:08 schreef Paul Lam :
>
>> Hi Martijn,
>>
>> I think the `DROP SAVEPOINT` statement would not conflict with NO_CLAIM
>> mode, since the statement is triggered by users instead of Flink runtime.
>>
>> We’re simply providing a tool for user to cleanup the savepoints, just
>> like `bin/flink savepoint -d :savepointPath` in Flink CLI [1].
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#disposing-savepoints
>>
>> Best,
>> Paul Lam
>>
>> 2022年6月9日 15:41，Martijn Visser  写道：
>>
>> Hi all,
>>
>> I would not include a DROP SAVEPOINT syntax. With the recently introduced
>> CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot
>> ownership and if you have a savepoint established "with NO_CLAIM it creates
>> its own copy and leaves the existing one up to the user." [1] We shouldn't
>> then again make it fuzzy by making it possible that Flink can remove
>> snapshots.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
>>
>> Op do 9 jun. 2022 om 09:27 schreef Paul Lam :
>>
>>> Hi team,
>>>
>>> It's great to see our opinions are finally converging!
>>>
>>> `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
>>>
>>>
>>> LGTM. Adding it to the FLIP.
>>>
>>> To Jark,
>>>
>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ”
>>>
>>>
>>> Good point. The default savepoint dir should be enough for most cases.
>>>
>>> To Jing,
>>>
>>> DROP SAVEPOINT ALL
>>>
>>>
>>> I think it’s valid to have such a statement, but I have two concerns:
>>>
>>>- `ALL` is already an SQL keyword, thus it may cause ambiguity.
>>>- Flink CLI and REST API doesn’t provided the corresponding
>>>functionalities, and we’d better keep them aligned.
>>>
>>> How about making this statement as follow-up tasks which should touch
>>> REST API and Flink CLI?
>>>
>>> Best,
>>> Paul Lam
>>>
>>> 2022年6月9日 11:53，godfrey he  写道：
>>>
>>> Hi all,
>>>
>>> Regarding `PIPELINE`, it comes from flink-core module, see
>>> `PipelineOptions` class for more details.
>>> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with
>>> `JOBS`.
>>>
>>> +1 to discuss JOBTREE in other FLIP.
>>>
>>> +1 to `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `
>>>
>>> +1 to `CREATE SAVEPOINT FOR JOB ` and `DROP SAVEPOINT
>>> `
>>>
>>> Best,
>>> Godfrey
>>>
>>> Jing Ge  于2022年6月9日周四 01:48写道：
>>>
>>>
>>> Hi Paul, Hi Jark,
>>>
>>> Re JOBTREE, agree that it is out of the scope of this FLIP
>>>
>>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP
>>> SAVEPOINT ALL' housekeeping. WDYT?
>>>
>>> Best regards,
>>> Jing
>>>
>>>
>>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu  wrote:
>>>
>>>
>>> Hi Jing,
>>>
>>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of
>>> the scope
>>> of this FLIP and can be discussed in another FLIP.
>>>
>>> Job lineage is a big topic that may involve many problems:
>>> 1) how to collect and report job entities, attributes, and lineages?
>>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
>>> 3) how does Flink SQL CLI/Gateway know the lineage information and show
>>> jobtree?
>>> 4) ...
>>>
>>> Best,
>>> Jark
>>>
>>> On Wed, 8 Jun 2022 at 20:44, Jark Wu  wrote:
>>>
>>>
>>> Hi Paul,
>>>
>>> I'm fine with using JOBS. The only concern is that this may conflict
>>> with displaying more detailed
>>> information for query (e.g. query content, plan) in the future, e.g.
>>> SHOW QUERIES EXTENDED in ksqldb[1].
>>> This is not a big problem as we can introduce SHOW QUERIES in the future
>>> if necessary.
>>>
>>> STOP JOBS  (with options `table.job.stop-with-savepoint` and
>>> `table.job.stop-with-drain`)
>>>
>>> What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
>>> It might be trivial and error-prone to set configuration before
>>> executing a statement,
>>> and the configuration will affect all statements after that.
>>>
>>> CREATE SAVEPOINT  FOR JOB 
>>>
>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
>>> and always use configuration "state.savepoints.dir" as the default
>>> savepoint dir.
>>> The concern with using "" is here should be savepoint
>>> dir,
>>> and savepoint_path is the returned value.
>>>
>>> I'm fine with other changes.
>>>
>>> Thanks,
>>> Jark
>>>
>>> [1]:
>>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>>>
>>>
>>>
>>> On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:
>>>
>>>
>>> Hi Jing,
>>>
>>> Thank you for your inputs!
>>>
>>> TBH, I haven’t considered the ETL scenario that you mentioned. I think
>>> they’re managed

[jira] [Created] (FLINK-27978) Update spotless and add add-exports to support jdk17

2022-06-09 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-27978:
---

 Summary: Update spotless and add add-exports to support jdk17
 Key: FLINK-27978
 URL: https://issues.apache.org/jira/browse/FLINK-27978
 Project: Flink
  Issue Type: Sub-task
Reporter: Sergey Nuyanzin






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [VOTE] FLIP-223: Support HiveServer2 Endpoint

2022-06-09 Thread Jing Ge

+1 (not-binding)

Best regards,
Jing

On Thu, Jun 9, 2022 at 9:23 AM Jingsong Li  wrote:

> +1
>
> Thanks for driving.
>
> Best,
> Jingsong
>
> On Thu, Jun 9, 2022 at 2:17 PM Nicholas Jiang 
> wrote:
> >
> > +1 (not-binding)
> >
> > Best,
> > Nicholas Jiang
> >
> > On 2022/06/07 05:31:21 Shengkai Fang wrote:
> > > Hi, everyone.
> > >
> > > Thanks for all feedback for FLIP-223: Support HiveServer2 Endpoint[1]
> on
> > > the discussion thread[2]. I'd like to start a vote for it. The vote
> will be
> > > open for at least 72 hours unless there is an objection or not enough
> votes.
> > >
> > > Best,
> > > Shengkai
> > >
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> > > [2] https://lists.apache.org/thread/9r1j7ho2m8zbqy3tl7vvj9gnocggwr6x
> > >
>

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Konstantin Knauf

Hi everyone,

thank you Jing for redirecting the discussion back to the topic at hand. I
agree with all of your points.

+1 to deprecate SourceFunction

Is there really no replacement for the StreamExecutionEnvironment#readXXX.
There is already a FLIP-27 based FileSource, right? What's missing to
recommend using that as opposed to the the readXXX methods?

Cheers,

Konstantin

Am Do., 9. Juni 2022 um 20:11 Uhr schrieb Alexander Fedulov <
alexan...@ververica.com>:

> Hi all,
>
> It seems that there is some understandable cautiousness with regard to
> deprecating methods and subclasses that do not have alternatives just yet.
>
> We should probably first agree if it is in general OK for Flink to use
> @Deprecated
> annotation for parts of the code that do not have alternatives. In that
> case,
> we could add a comment along the lines of:
> "This implementation is based on a deprecated SourceFunction API that
> will gradually be phased out from Flink. No direct substitute exists at the
> moment.
> If you want to have a more future-proof solution, consider helping the
> project by
> contributing an implementation based on the new Source API."
>
> This should clearly communicate the message that usage of these
> methods/classes
> is discouraged and at the same time promote contributions for addressing
> the gap.
> What do you think?
>
> Best,
> Alexander Fedulov
>
>
> On Thu, Jun 9, 2022 at 6:27 PM Ingo Bürk  wrote:
>
> > Hi,
> >
> > these APIs don't expose the underlying source directly, so I don't think
> > we need to worry about deprecating them as well. There's also nothing
> > inherently wrong with using a deprecated API internally, though even
> > just for the experience of using our own new APIs I would personally say
> > that they should be migrated to the new Source API. It's hard to reason
> > that users must migrate to a new API if we don't do it internally as
> well.
> >
> >
> > Best
> > Ingo
> >
> > On 09.06.22 15:41, Lijie Wang wrote:
> > >   Hi Martijn,
> > >
> > > I don't mean it's a blocker. Just a information. And I'm also +1 for
> > this.
> > >
> > > Put it another way: should we migrate the `#readFile(...)` to new API
> or
> > > provide a similar method "readxxx“ based on the new Source API?
> > >
> > > And if we don't migrate it, does it mean that the `#readFile(...)`
> should
> > > also be marked as deprecated?
> > >
> > > Best,
> > > Lijie
> > >
> > > Martijn Visser  于2022年6月9日周四 21:03写道：
> > >
> > >> Hi Lijie,
> > >>
> > >> I don't see any problem with deprecating those methods at this moment,
> > as
> > >> long as we don't remove them until the replacements are available.
> > Besides
> > >> that, are we sure there are no replacements already, especially with
> the
> > >> new FileSource?
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> Op do 9 jun. 2022 om 14:23 schreef Lijie Wang <
> wangdachui9...@gmail.com
> > >:
> > >>
> > >>> Hi all,
> > >>>
> > >>> FYI, currently, some commonly used methods in
> > StreamExecutionEnvironment
> > >>> are still based on the old SourceFunction (and there is no
> > alternative):
> > >>> `StreamExecutionEnvironment#readFile(...)`
> > >>> `StreamExecutionEnvironment#readTextFile(...)`
> > >>>
> > >>> I think these should be migrated to the new source API before
> deprecate
> > >> the
> > >>> SourceFunction.
> > >>>
> > >>> Best,
> > >>> Lijie
> > >>>
> > >>> Martijn Visser  于2022年6月9日周四 16:05写道：
> > >>>
> >  Hi all,
> > 
> >  I think implicitly we've already considered the SourceFunction and
> >  SinkFunction as deprecated. They are even marked as so on the Flink
> > >>> roadmap
> >  [1]. That also shows that connectors that are using these interfaces
> > >> are
> >  either approaching end-of-life. The fact that we're actively
> migrating
> >  connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus add-on
> > >>> FLIPs)
> >  shows that we've already determined that target.
> > 
> >  With regards to the motivation of FLIP-27, I think reading up on the
> >  original discussion thread is also worthwhile [2] to see more
> context.
> >  FLIP-27 was also very important as it brought a unified connector
> > which
> > >>> can
> >  support both streaming and batch (with batch being considered a
> > special
> >  case of streaming in Flink's vision).
> > 
> >  So +1 to deprecate SourceFunction. I would also argue that we should
> >  already mark the SinkFunction as deprecated to avoid having this
> > >>> discussion
> >  again in a couple of months.
> > 
> >  Best regards,
> > 
> >  Martijn
> > 
> >  [1] https://flink.apache.org/roadmap.html
> >  [2]
> https://lists.apache.org/thread/334co89dbhc8qpr9nvmz8t1gp4sz2c8y
> > 
> >  Op do 9 jun. 2022 om 09:48 schreef Jing Ge :
> > 
> > > Hi,
> > >
> > > I am very happy to see opinions from different perspectives. That
> > >> will
> >  help
> > > us understand the problem bet

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-09 Thread Lijie Wang

Hi all,

Sorry for my mistake. The `StreamExecutionEnvironment#readFiles` and can be
easily replaced by `FileSource#forRecordStreamFormat/forBulkFileFormat`. I
have no other concerns.

 +1 to deprecate SourceFunction and deprecate the methods (in
StreamExecutionEnvironment) based on SourceFunction .

Best,
Lijie

Konstantin Knauf  于2022年6月10日周五 05:11写道：

> Hi everyone,
>
> thank you Jing for redirecting the discussion back to the topic at hand. I
> agree with all of your points.
>
> +1 to deprecate SourceFunction
>
> Is there really no replacement for the StreamExecutionEnvironment#readXXX.
> There is already a FLIP-27 based FileSource, right? What's missing to
> recommend using that as opposed to the the readXXX methods?
>
> Cheers,
>
> Konstantin
>
> Am Do., 9. Juni 2022 um 20:11 Uhr schrieb Alexander Fedulov <
> alexan...@ververica.com>:
>
> > Hi all,
> >
> > It seems that there is some understandable cautiousness with regard to
> > deprecating methods and subclasses that do not have alternatives just
> yet.
> >
> > We should probably first agree if it is in general OK for Flink to use
> > @Deprecated
> > annotation for parts of the code that do not have alternatives. In that
> > case,
> > we could add a comment along the lines of:
> > "This implementation is based on a deprecated SourceFunction API that
> > will gradually be phased out from Flink. No direct substitute exists at
> the
> > moment.
> > If you want to have a more future-proof solution, consider helping the
> > project by
> > contributing an implementation based on the new Source API."
> >
> > This should clearly communicate the message that usage of these
> > methods/classes
> > is discouraged and at the same time promote contributions for addressing
> > the gap.
> > What do you think?
> >
> > Best,
> > Alexander Fedulov
> >
> >
> > On Thu, Jun 9, 2022 at 6:27 PM Ingo Bürk  wrote:
> >
> > > Hi,
> > >
> > > these APIs don't expose the underlying source directly, so I don't
> think
> > > we need to worry about deprecating them as well. There's also nothing
> > > inherently wrong with using a deprecated API internally, though even
> > > just for the experience of using our own new APIs I would personally
> say
> > > that they should be migrated to the new Source API. It's hard to reason
> > > that users must migrate to a new API if we don't do it internally as
> > well.
> > >
> > >
> > > Best
> > > Ingo
> > >
> > > On 09.06.22 15:41, Lijie Wang wrote:
> > > >   Hi Martijn,
> > > >
> > > > I don't mean it's a blocker. Just a information. And I'm also +1 for
> > > this.
> > > >
> > > > Put it another way: should we migrate the `#readFile(...)` to new API
> > or
> > > > provide a similar method "readxxx“ based on the new Source API?
> > > >
> > > > And if we don't migrate it, does it mean that the `#readFile(...)`
> > should
> > > > also be marked as deprecated?
> > > >
> > > > Best,
> > > > Lijie
> > > >
> > > > Martijn Visser  于2022年6月9日周四 21:03写道：
> > > >
> > > >> Hi Lijie,
> > > >>
> > > >> I don't see any problem with deprecating those methods at this
> moment,
> > > as
> > > >> long as we don't remove them until the replacements are available.
> > > Besides
> > > >> that, are we sure there are no replacements already, especially with
> > the
> > > >> new FileSource?
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Martijn
> > > >>
> > > >> Op do 9 jun. 2022 om 14:23 schreef Lijie Wang <
> > wangdachui9...@gmail.com
> > > >:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>> FYI, currently, some commonly used methods in
> > > StreamExecutionEnvironment
> > > >>> are still based on the old SourceFunction (and there is no
> > > alternative):
> > > >>> `StreamExecutionEnvironment#readFile(...)`
> > > >>> `StreamExecutionEnvironment#readTextFile(...)`
> > > >>>
> > > >>> I think these should be migrated to the new source API before
> > deprecate
> > > >> the
> > > >>> SourceFunction.
> > > >>>
> > > >>> Best,
> > > >>> Lijie
> > > >>>
> > > >>> Martijn Visser  于2022年6月9日周四 16:05写道：
> > > >>>
> > >  Hi all,
> > > 
> > >  I think implicitly we've already considered the SourceFunction and
> > >  SinkFunction as deprecated. They are even marked as so on the
> Flink
> > > >>> roadmap
> > >  [1]. That also shows that connectors that are using these
> interfaces
> > > >> are
> > >  either approaching end-of-life. The fact that we're actively
> > migrating
> > >  connectors from Source/SinkFunction to FLIP-27/FLIP-143 (plus
> add-on
> > > >>> FLIPs)
> > >  shows that we've already determined that target.
> > > 
> > >  With regards to the motivation of FLIP-27, I think reading up on
> the
> > >  original discussion thread is also worthwhile [2] to see more
> > context.
> > >  FLIP-27 was also very important as it brought a unified connector
> > > which
> > > >>> can
> > >  support both streaming and batch (with batch being considered a
> > > special
> > >  case of streaming in Flink's vision).
> > > >

[jira] [Created] (FLINK-27979) Support to upgrade session cluster in a more fine grained way

2022-06-09 Thread Aitozi (Jira)

Aitozi created FLINK-27979:
--

 Summary: Support to upgrade session cluster in a more fine grained 
way
 Key: FLINK-27979
 URL: https://issues.apache.org/jira/browse/FLINK-27979
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


Currently, we upgrade the session cluster by delete the session cluster 
directly, which do not respect to the upgrade mode of the session job. I think 
we could improve it by performing the upgrade in two phase



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27980) Add python examples to document of "DataStream API Integration"

2022-06-09 Thread pengmd (Jira)

pengmd created FLINK-27980:
--

 Summary: Add python examples to document of "DataStream API 
Integration" 
 Key: FLINK-27980
 URL: https://issues.apache.org/jira/browse/FLINK-27980
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Reporter: pengmd
 Fix For: 1.16.0


Add python examples to document of "DataStream API Integration" 

document urls: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/data_stream_api/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27981) Add python examples to document of Time Attributes"

2022-06-09 Thread pengmd (Jira)

pengmd created FLINK-27981:
--

 Summary: Add python examples to document of Time Attributes"
 Key: FLINK-27981
 URL: https://issues.apache.org/jira/browse/FLINK-27981
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Reporter: pengmd
 Fix For: 1.16.0


Add python examples to document of Time Attributes"

document url: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/time_attributes/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[RESULT][VOTE]FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

2022-06-09 Thread Lincoln Lee

Hi everyone,

FLIP-234[1] has been accepted.

There are 4 approving votes which are all binding[2]

- Binding: Jark Wu
- Binding: Jingsong Li
- Binding: godfrey he
- Binding: Martijn Visser

None against.

Thanks again to everyone!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
[2] https://lists.apache.org/thread/bb0kqjs8co3hhmtklmwptws4fc4rz810

Best,
Lincoln Lee

[RESULT][VOTE] FLIP-231: Introduce SupportsStatisticReport to support reporting statistics from source connectors

2022-06-09 Thread godfrey he

Hi, everyone.

FLIP-231: Introduce SupportsStatisticReport to support reporting
statistics from source connectors[1] has been accepted.

There are 5 binding votes, 1 non-binding votes[2].
- Jing Ge(non-binding)
- Jark Wu(binding)
- Jingsong Li(binding)
- Martijn Visser(binding)
- Jing Zhang(binding)
- Leonard Xu(binding)

None against.

Thanks again for every one who concerns on this FLIP.


[1]
https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
[2] https://lists.apache.org/thread/j1mqblpbp60hgwg2fnhp44cktfp76zd2


Best,
Godfrey

[jira] [Created] (FLINK-27982) FLIP-231: Introduce SupportsStatisticReport to support reporting statistics from source connectors

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27982:
--

 Summary: FLIP-231: Introduce SupportsStatisticReport to support 
reporting statistics from source connectors
 Key: FLINK-27982
 URL: https://issues.apache.org/jira/browse/FLINK-27982
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0


https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27983) Introduce SupportsStatisticsReport interface

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27983:
--

 Summary: Introduce SupportsStatisticsReport interface
 Key: FLINK-27983
 URL: https://issues.apache.org/jira/browse/FLINK-27983
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27984) Introduce FileBasedStatisticsReportableDecodingFormat interface

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27984:
--

 Summary: Introduce FileBasedStatisticsReportableDecodingFormat 
interface
 Key: FLINK-27984
 URL: https://issues.apache.org/jira/browse/FLINK-27984
 Project: Flink
  Issue Type: Sub-task
Reporter: godfrey he






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27985) Introduce FlinkRecomputeStatisticsProgram to compute statistics after filter push and partition pruning

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27985:
--

 Summary: Introduce FlinkRecomputeStatisticsProgram to compute 
statistics after filter push and partition pruning
 Key: FLINK-27985
 URL: https://issues.apache.org/jira/browse/FLINK-27985
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27986) Refactor the name of finish method for JdbcOutputFormatBuilder

2022-06-09 Thread Lei Xie (Jira)

Lei Xie created FLINK-27986:
---

 Summary: Refactor the name of finish method for 
JdbcOutputFormatBuilder
 Key: FLINK-27986
 URL: https://issues.apache.org/jira/browse/FLINK-27986
 Project: Flink
  Issue Type: Improvement
Reporter: Lei Xie






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27987) Let FileSystemTableSource extend from SupportsStatisticReport

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27987:
--

 Summary: Let FileSystemTableSource extend from 
SupportsStatisticReport
 Key: FLINK-27987
 URL: https://issues.apache.org/jira/browse/FLINK-27987
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27988) Let HiveTableSource extend from SupportsStatisticReport

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27988:
--

 Summary: Let HiveTableSource extend from SupportsStatisticReport
 Key: FLINK-27988
 URL: https://issues.apache.org/jira/browse/FLINK-27988
 Project: Flink
  Issue Type: Sub-task
Reporter: godfrey he






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27989) CSV format supports reporting statistics

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27989:
--

 Summary: CSV format supports reporting statistics
 Key: FLINK-27989
 URL: https://issues.apache.org/jira/browse/FLINK-27989
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27991) ORC format supports reporting statistics

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27991:
--

 Summary: ORC format supports reporting statistics
 Key: FLINK-27991
 URL: https://issues.apache.org/jira/browse/FLINK-27991
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27990) Parquet format supports reporting statistics

2022-06-09 Thread godfrey he (Jira)

godfrey he created FLINK-27990:
--

 Summary: Parquet format supports reporting statistics
 Key: FLINK-27990
 URL: https://issues.apache.org/jira/browse/FLINK-27990
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Affects Versions: 1.16.0
Reporter: godfrey he






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27992) cep StreamExecMatch need check the parallelism and maxParallelism of the two transformation in it

2022-06-09 Thread jackylau (Jira)

jackylau created FLINK-27992:


 Summary: cep StreamExecMatch need check the parallelism and 
maxParallelism of the two transformation in it
 Key: FLINK-27992
 URL: https://issues.apache.org/jira/browse/FLINK-27992
 Project: Flink
  Issue Type: Improvement
  Components: Library / CEP
Affects Versions: 1.16.0
Reporter: jackylau
 Fix For: 1.16.0


StreamExecMatch node has two transformation (StreamRecordTimestampInserter -> 
Match), the upstream of StreamExecMatch is hash edge when use set different 
parallelism and maxParallelism it will cause problem.

because the window operator using downstream node's max parallelism compute 
keygroup and cep operator  using max parallelism of itself and it may not equal

such as:

window - --(hash edge)>  StreamRecordTimestampInserter --(forward edge)–> Cep 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27993) When only use 'flex SQL' to develop programs, `watermark` can be defined when creating view tables

2022-06-09 Thread HunterHunter (Jira)

HunterHunter created FLINK-27993:


 Summary: When only use 'flex SQL' to develop programs, `watermark` 
can be defined when creating view tables
 Key: FLINK-27993
 URL: https://issues.apache.org/jira/browse/FLINK-27993
 Project: Flink
  Issue Type: Improvement
Reporter: HunterHunter


When I only use 'flex SQL' to develop programs, I cannot define watermarks from 
view table，I have a data source table a, which has only one field `messge`. I 
need to parse it into others fields(like `time` field), and then build a view 
table, but I have no place to define watermark.

Or in other scenarios, I need to perform various transformations from a table 
to get the `eventtime` field. Or merge and convert from multiple tables to the 
final table

 

eg:

create table A(

messge string

) with( kafka )

create view table B select udf(messge) as `eventtime`  from A.

I can define `watermark` , as i know, I can only define it when creating a 
table.

 

My current practice is to mix Table APIs.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [VOTE] FLIP-223: Support HiveServer2 Endpoint

2022-06-09 Thread LuNing Wang

+ 1

Best regards,
LuNing Wang

Jing Ge  于2022年6月10日周五 04:08写道：

> +1 (not-binding)
>
> Best regards,
> Jing
>
> On Thu, Jun 9, 2022 at 9:23 AM Jingsong Li  wrote:
>
> > +1
> >
> > Thanks for driving.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Jun 9, 2022 at 2:17 PM Nicholas Jiang 
> > wrote:
> > >
> > > +1 (not-binding)
> > >
> > > Best,
> > > Nicholas Jiang
> > >
> > > On 2022/06/07 05:31:21 Shengkai Fang wrote:
> > > > Hi, everyone.
> > > >
> > > > Thanks for all feedback for FLIP-223: Support HiveServer2 Endpoint[1]
> > on
> > > > the discussion thread[2]. I'd like to start a vote for it. The vote
> > will be
> > > > open for at least 72 hours unless there is an objection or not enough
> > votes.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > >
> > > > [1]
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> > > > [2] https://lists.apache.org/thread/9r1j7ho2m8zbqy3tl7vvj9gnocggwr6x
> > > >
> >
>

[jira] [Created] (FLINK-27994) ERROR org.apache.flink.runtime.source.coordinator.SourceCoordinator

2022-06-09 Thread wangkang (Jira)

wangkang created FLINK-27994:


 Summary:  ERROR 
org.apache.flink.runtime.source.coordinator.SourceCoordinator
 Key: FLINK-27994
 URL: https://issues.apache.org/jira/browse/FLINK-27994
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.14.4, 1.14.3, 1.14.2
 Environment: 
{color:#0e4a8e}<{color}{color:#458588}properties{color}{color:#0e4a8e}>
{color}{color:#0e4a8e} 
{color}{color:#bcbf01}<{color}{color:#458588}flink.version{color}{color:#bcbf01}>{color}1.14.4{color:#bcbf01}
{color}{color:#bcbf01} 
<{color}{color:#458588}scala.binary.version{color}{color:#bcbf01}>{color}2.12{color:#bcbf01}
{color}{color:#bcbf01} 
<{color}{color:#458588}hadoop.version{color}{color:#bcbf01}>{color}3.2.1{color:#bcbf01}
{color}{color:#bcbf01} 
<{color}{color:#458588}slf4j.version{color}{color:#bcbf01}>{color}1.7.36{color:#bcbf01}
{color}{color:#bcbf01} 
<{color}{color:#458588}project.build.sourceEncoding{color}{color:#bcbf01}>{color}UTF-8{color:#bcbf01}
{color}{color:#0e4a8e}{color}

{color:#bcbf01}<{color}{color:#458588}dependency{color}{color:#bcbf01}>
{color}{color:#bcbf01} 
{color}{color:#bc0ba2}<{color}{color:#458588}groupId{color}{color:#bc0ba2}>{color}com.alibaba{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}artifactId{color}{color:#bc0ba2}>{color}fastjson{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}version{color}{color:#bc0ba2}>{color}1.2.83{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}scope{color}{color:#bc0ba2}>{color}provided{color:#bc0ba2}
{color}{color:#bcbf01}
{color}{color:#bcbf01}
{color}{color:#bcbf01}<{color}{color:#458588}dependency{color}{color:#bcbf01}>
{color}{color:#bcbf01} 
{color}{color:#bc0ba2}<{color}{color:#458588}groupId{color}{color:#bc0ba2}>{color}mysql{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}artifactId{color}{color:#bc0ba2}>{color}mysql-connector-java{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}version{color}{color:#bc0ba2}>{color}8.0.21{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}scope{color}{color:#bc0ba2}>{color}provided{color:#bc0ba2}
{color}{color:#bc0ba2} <{color}{color:#458588}exclusions{color}{color:#bc0ba2}>
{color}{color:#bc0ba2} 
{color}{color:#61aa0d}<{color}{color:#458588}exclusion{color}{color:#61aa0d}>
{color}{color:#61aa0d} 
{color}{color:#3f9101}<{color}{color:#458588}artifactId{color}{color:#3f9101}>{color}protobuf-java{color:#3f9101}
{color}{color:#3f9101} 
<{color}{color:#458588}groupId{color}{color:#3f9101}>{color}com.google.protobuf{color:#3f9101}
{color}{color:#3f9101} 
{color}{color:#61aa0d}
{color}{color:#61aa0d} 
{color}{color:#bc0ba2}
{color}{color:#bcbf01}{color}

{color:#bcbf01}<{color}{color:#458588}dependency{color}{color:#bcbf01}>
{color}{color:#bcbf01} 
{color}{color:#bc0ba2}<{color}{color:#458588}groupId{color}{color:#bc0ba2}>{color}com.ververica{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}artifactId{color}{color:#bc0ba2}>{color}flink-connector-mysql-cdc{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}version{color}{color:#bc0ba2}>{color}2.2.1{color:#bc0ba2}
{color}{color:#bc0ba2} 
<{color}{color:#458588}scope{color}{color:#bc0ba2}>{color}provided{color:#bc0ba2}
{color}{color:#bc0ba2} <{color}{color:#458588}exclusions{color}{color:#bc0ba2}>
{color}{color:#bc0ba2} 
{color}{color:#61aa0d}<{color}{color:#458588}exclusion{color}{color:#61aa0d}>
{color}{color:#61aa0d} 
{color}{color:#3f9101}<{color}{color:#458588}artifactId{color}{color:#3f9101}>{color}guava{color:#3f9101}
{color}{color:#3f9101} 
<{color}{color:#458588}groupId{color}{color:#3f9101}>{color}com.google.guava{color:#3f9101}
{color}{color:#3f9101} 
{color}{color:#61aa0d}
{color}{color:#61aa0d} <{color}{color:#458588}exclusion{color}{color:#61aa0d}>
{color}{color:#61aa0d} 
{color}{color:#3f9101}<{color}{color:#458588}artifactId{color}{color:#3f9101}>{color}jackson-core{color:#3f9101}
{color}{color:#3f9101} 
<{color}{color:#458588}groupId{color}{color:#3f9101}>{color}com.fasterxml.jackson.core{color:#3f9101}
{color}{color:#3f9101} 
{color}{color:#61aa0d}
{color}{color:#61aa0d} <{color}{color:#458588}exclusion{color}{color:#61aa0d}>
{color}{color:#61aa0d} 
{color}{color:#3f9101}<{color}{color:#458588}artifactId{color}{color:#3f9101}>{color}jackson-databind{color:#3f9101}
{color}{color:#3f9101} 
<{color}{color:#458588}groupId{color}{color:#3f9101}>{color}com.fasterxml.jackson.core{color:#3f9101}
{color}{color:#3f9101} 
{color}{color:#61aa0d}
{color}{color:#61aa0d} <{color}{color:#458588}exclusion{color}{color:#61aa0d}>
{color}{color:#61aa0d} 
{color}{color:#3f9101}<{color}{color:#458588}artifactId{color}{color:#3f9101}>{color}slf4j-api{color:#3f9101}
{color}{color:#3f9101} 
<{color}{color:#458588}groupId{color}{color:#3f9101}>{color}org.slf4j{color:#3f9101}
{color}{color:#3f9101} 
{color}{color:#61aa0d}
{color}{color:#

[jira] [Created] (FLINK-27995) Upgrade Janio version

2022-06-09 Thread Shengkai Fang (Jira)

Shengkai Fang created FLINK-27995:
-

 Summary: Upgrade Janio version 
 Key: FLINK-27995
 URL: https://issues.apache.org/jira/browse/FLINK-27995
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Affects Versions: 1.16.0
Reporter: Shengkai Fang


Currently, the Janio version doesn't support JDK11 well. 

 
[https://lists.apache.org/thread/q052xdn1mnhjm9k4ojjjz22dk4r1xwfz]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

55 matches

Mail list logo