Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-08 Thread Paul Lam
Hi Jing,

Thank you for your inputs!

TBH, I haven’t considered the ETL scenario that you mentioned. I think they’re 
managed just like other jobs interns of job lifecycles (please correct me if 
I’m wrong).

WRT to the SQL statements about SQL lineages, I think it might be a little bit 
out of the scope of the FLIP, since it’s mainly about lifecycles. By the way, 
do we have these functionalities in Flink CLI or REST API already? 

WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the 
community is more in favor of `DROP SAVEPOINT `. I’m updating 
the FLIP arcading to the latest discussions.

Best,
Paul Lam

> 2022年6月8日 07:31,Jing Ge  写道:
> 
> Hi Paul,
> 
> Sorry that I am a little bit too late to join this thread. Thanks for driving 
> this and starting this informative discussion. The FLIP looks really 
> interesting. It will help us a lot to manage Flink SQL jobs. 
> 
> Have you considered the ETL scenario with Flink SQL, where multiple SQLs 
> build a DAG for many DAGs?
> 
> 1)
> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to 
> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are 
> responsible to *produce* data as the result(cube, materialized view, etc.) 
> for the future consumption by queries. The INSERT INTO SELECT FROM example in 
> FLIP and CTAS are typical SQL in this case. I would prefer to call them Jobs 
> instead of Queries.
> 
> 2)
> Speaking of ETL DAG, we might want to see the lineage. Is it possible to 
> support syntax like:
> 
> SHOW JOBTREE   // shows the downstream DAG from the given job_id
> SHOW JOBTREE  FULL // shows the whole DAG that contains the given 
> job_id
> SHOW JOBTREES // shows all DAGs
> SHOW ANCIENTS  // shows all parents of the given job_id
> 
> 3)
> Could we also support Savepoint housekeeping syntax? We ran into this issue 
> that a lot of savepoints have been created by customers (via their apps). It 
> will take extra (hacking) effort to clean it.
> 
> RELEASE SAVEPOINT ALL   
> 
> Best regards,
> Jing
> 
> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser  > wrote:
> Hi Paul,
> 
> I'm still doubting the keyword for the SQL applications. SHOW QUERIES could
> imply that this will actually show the query, but we're returning IDs of
> the running application. At first I was also not very much in favour of
> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS
> 
> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.
> 
> Best regards,
> 
> Martijn
> 
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary 
> 
> 
> Op za 4 jun. 2022 om 10:38 schreef Paul Lam  >:
> 
> > Hi Godfrey,
> >
> > Sorry for the late reply, I was on vacation.
> >
> > It looks like we have a variety of preferences on the syntax, how about we
> > choose the most acceptable one?
> >
> > WRT keyword for SQL jobs, we use JOBS, thus the statements related to jobs
> > would be:
> >
> > - SHOW JOBS
> > - STOP JOBS  (with options `table.job.stop-with-savepoint` and
> > `table.job.stop-with-drain`)
> >
> > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR
> > JOB`:
> >
> > - CREATE SAVEPOINT  FOR JOB 
> > - SHOW SAVEPOINTS FOR JOB  (show savepoints the current job
> > manager remembers)
> > - DROP SAVEPOINT 
> >
> > cc @Jark @ShengKai @Martijn @Timo .
> >
> > Best,
> > Paul Lam
> >
> >
> > godfrey he mailto:godfre...@gmail.com>> 于2022年5月23日周一 
> > 21:34写道:
> >
> >> Hi Paul,
> >>
> >> Thanks for the update.
> >>
> >> >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
> >> (DataStream or SQL) or
> >> clients (SQL client or CLI).
> >>
> >> Is DataStream job a QUERY? I think not.
> >> For a QUERY, the most important concept is the statement. But the
> >> result does not contain this info.
> >> If we need to contain all jobs in the cluster, I think the name should
> >> be JOB or PIPELINE.
> >> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id.
> >>
> >> > SHOW SAVEPOINTS
> >> To list the savepoint for a specific job, we need to specify a
> >> specific pipeline,
> >> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id
> >>
> >> Best,
> >> Godfrey
> >>
> >> Paul Lam mailto:paullin3...@gmail.com>> 
> >> 于2022年5月20日周五 11:25写道:
> >> >
> >> > Hi Jark,
> >> >
> >> > WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s
> >> > part of the reason why I proposed “STOP/CANCEL QUERY” at the
> >> > beginning. The downside of it is that it’s not ANSI-SQL compatible.
> >> >
> >> > Another question is, what should be the syntax for ungracefully
> >> > canceling a query? As ShengKai pointed out in a offline discussion,
> >> > “STOP QUERY” and “CANCEL QUERY” might confuse SQL users.
> >> > Flink CLI has both stop and cancel, mostly due to historical 

[jira] [Created] (FLINK-27952) Table UDF fails when using Double.POSITIVE_INFINITY as parameters

2022-06-08 Thread Zhipeng Zhang (Jira)
Zhipeng Zhang created FLINK-27952:
-

 Summary: Table UDF fails when using Double.POSITIVE_INFINITY as 
parameters
 Key: FLINK-27952
 URL: https://issues.apache.org/jira/browse/FLINK-27952
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: Zhipeng Zhang


The following code fails and throws NumberFormatException when casting 
Double.POSITIVE_INFINITY to BigDecimal.

 

@Test
public void testTableUdf() {
        StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream  data = env.fromElements(Row.of(1.), Row.of(2.));
        StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
        Table table = tEnv.fromDataStream(data).as("f0");
        Double[][] d = new Double[][]\{new Double[]{1.0, 
Double.POSITIVE_INFINITY}};
        Expression[] expressions = new Expression[2];
        expressions[0] = 
org.apache.flink.table.api.Expressions.call(MyUDF.class, $("f0"), d);
        expressions[1] = 
org.apache.flink.table.api.Expressions.call(MyUDF.class, $("f0"), d);
        table.addColumns(expressions)
            .as("f0", "output", "output2")
            .execute().print();
    }

public static class MyUDF extends ScalarFunction {
        public Integer eval(Integer num, Double[][] add) {
            return (int)(num + add[0][0]);
        }
    }



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-06-08 Thread Dian Fu
Hi Martijn,

There are many features available in the DataStream API of CEP not
supported in SQL, e.g. followedBy, notFollowedBy, followedByAny, etc. The
main reason is that the MATCH_RECOGNIZE clause which comes from SQL
standard doesn't define grammars for these semantics.

There are two ways to handle this:
1) Only support the features defined in MATCH_RECOGNIZE
2) Extend MATCH_RECOGNIZE to align with the other features in the
DataStream API of CEP

Regarding to 2), it means that we need to extend MATCH_RECOGNIZE in
calcite firstly and then support it in Flink. Personally I think it's worth
to do it. However, we need to think about it thoroughly, e.g. which
functionalities should be supported, what the grammar will be, etc. This
definitely deserves a separate FLIP and maybe should be discussed in
calcite community instead of Flink.

Regarding to the scope of DataStream API of CEP VS CEP SQL, I don't think
that SQL should align with all the functionalities provided in DataStream
API. DataStream API is more flexibility and so usually should be more
powerful.

Regards,
Dian


On Tue, Jun 7, 2022 at 8:37 PM Martijn Visser 
wrote:

> Hi Nicholas,
>
> It is disappointing that we can't support this in SQL. I am under the
> impression that currently all CEP capabilities are supported in both
> DataStream/Table API as well as SQL. If that's indeed the case, then I
> would rather have this also fixed for SQL to avoid introducing feature
> sparsity.
>
> Best regards,
>
> Martijn
>
> Op ma 6 jun. 2022 om 09:49 schreef Dian Fu :
>
> > Hi Nicholas,
> >
> > Thanks a lot for the update.
> >
> > Regarding the pattern API, should we also introduce APIs such as
> > Pattern.times(int from, int to, Time windowTime) to indicate the time
> > interval between events matched in the loop?
> >
> > Regarding the naming of the classes, does it make sense to rename
> > `WithinType` to `InternalType` or `WindowType`? For the enum values
> inside
> > it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> > intuitive for me. The candidates that come to my mind:
> > - `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS`
> > - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> >
> > Regards,
> > Dian
> >
> > On Tue, May 31, 2022 at 2:56 PM Nicholas Jiang  >
> > wrote:
> >
> > > Hi Martijn,
> > >
> > > Sorry for later reply. This feature is only supported in DataStream and
> > > doesn't be supported in MATCH_RECOGNIZE because the SQL syntax of
> > > MATCH_RECOGNIZE does not contain the semantics of this feature, which
> > > requires modification of the SQL syntax. The support above
> > MATCH_RECOGNIZE
> > > is suitable for new FLIP to discuss.
> > >
> > > Regards,
> > > Nicholas Jiang
> > >
> > > On 2022/05/25 11:36:33 Martijn Visser wrote:
> > > > Hi Nicholas,
> > > >
> > > > Thanks for creating the FLIP, I can imagine that there will be many
> use
> > > > cases who can be created using this new feature.
> > > >
> > > > The FLIP doesn't mention anything with regards to SQL, could this
> > feature
> > > > also be supported when using MATCH_RECOGNIZE?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > >
> > > > On Sat, 7 May 2022 at 11:17, Dian Fu  wrote:
> > > >
> > > > > Hi Nicholas,
> > > > >
> > > > > Thanks a lot for bringing up this discussion. If I recall it
> > correctly,
> > > > > this feature has been requested many times by the users and is
> among
> > > one of
> > > > > the most requested features in CEP. So big +1 to this feature
> > overall.
> > > > >
> > > > > Regarding the API, the name `partialWithin` sounds a little weird.
> Is
> > > it
> > > > > possible to find a name which is more intuitive? Other possible
> > > solutions:
> > > > > - Reuse the existing `Pattern.within` method and change its
> semantic
> > > to the
> > > > > maximum time interval between patterns. Currently `Pattern.within`
> is
> > > used
> > > > > to define the maximum time interval between the first event and the
> > > last
> > > > > event. However, the Pattern object represents only one node in a
> > > pattern
> > > > > sequence and so it doesn't make much sense to define the maximum
> time
> > > > > interval between the first event and the last event on the Pattern
> > > object,
> > > > > e.g. we could move it to  `PatternStreamBuilder`. However, if we
> > choose
> > > > > this option, we'd better consider how to keep backward
> compatibility.
> > > > > - Introduce a series of methods when appending a new pattern to the
> > > > > existing one, e.g. `Pattern.followedBy(Pattern group, Time
> > > > > timeInterval)`. As timeInterval is a property between patterns and
> so
> > > it
> > > > > makes sense to define this property when appending a new pattern.
> > > However,
> > > > > the drawback is that we need to introduce a series of methods
> instead
> > > of
> > > > > only one method.
> > > > >
> > > > > We need also to make the semantic clear i

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-06-08 Thread Nicholas Jiang
Hi Dian,

Thanks for your feedback about the Public Interface update for supporting the 
within between events feature. I have left the comments for above points:

- Regarding the pattern API, should we also introduce APIs such as 
Pattern.times(int from, int to, Time windowTime) to indicate the time interval 
between events matched in the loop?

IMO, we could not introduce the mentioned APIs for indication of the time 
interval between events. For example Pattern.times(int from, int to, Time 
windowTime), the user can use Pattern.times(int from, int 
to).within(BEFORE_AND_AFTER, windowTime) to indicate the time interval between 
the before and after event.

- Regarding the naming of the classes, does it make sense to rename 
`WithinType` to `InternalType` or `WindowType`? For the enum values inside it, 
the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not intuitive 
for me. The candidates that come to my mind: - `RELATIVE_TO_FIRST` and 
`RELATIVE_TO_PREVIOUS` - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`

IMO, the `WithinType` naming could directly the situation for the time 
interval. In addtion. the enum values of the `WithinType` could update to 
`PREVIOUS_AND_NEXT` and `FIRST_AND_LAST` which directly indicate the time 
interval within the PREVIOUS and NEXT event and within the FIRST and LAST 
event. `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` are not clear to 
understand which event is relative to FIRST or PREVIOUS event.

Best,
Nicholas Jiang

On 2022/06/06 07:48:22 Dian Fu wrote:
> Hi Nicholas,
> 
> Thanks a lot for the update.
> 
> Regarding the pattern API, should we also introduce APIs such as
> Pattern.times(int from, int to, Time windowTime) to indicate the time
> interval between events matched in the loop?
> 
> Regarding the naming of the classes, does it make sense to rename
> `WithinType` to `InternalType` or `WindowType`? For the enum values inside
> it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> intuitive for me. The candidates that come to my mind:
> - `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS`
> - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> 
> Regards,
> Dian
> 
> On Tue, May 31, 2022 at 2:56 PM Nicholas Jiang 
> wrote:
> 
> > Hi Martijn,
> >
> > Sorry for later reply. This feature is only supported in DataStream and
> > doesn't be supported in MATCH_RECOGNIZE because the SQL syntax of
> > MATCH_RECOGNIZE does not contain the semantics of this feature, which
> > requires modification of the SQL syntax. The support above MATCH_RECOGNIZE
> > is suitable for new FLIP to discuss.
> >
> > Regards,
> > Nicholas Jiang
> >
> > On 2022/05/25 11:36:33 Martijn Visser wrote:
> > > Hi Nicholas,
> > >
> > > Thanks for creating the FLIP, I can imagine that there will be many use
> > > cases who can be created using this new feature.
> > >
> > > The FLIP doesn't mention anything with regards to SQL, could this feature
> > > also be supported when using MATCH_RECOGNIZE?
> > >
> > > Best regards,
> > >
> > > Martijn
> > > https://twitter.com/MartijnVisser82
> > > https://github.com/MartijnVisser
> > >
> > >
> > > On Sat, 7 May 2022 at 11:17, Dian Fu  wrote:
> > >
> > > > Hi Nicholas,
> > > >
> > > > Thanks a lot for bringing up this discussion. If I recall it correctly,
> > > > this feature has been requested many times by the users and is among
> > one of
> > > > the most requested features in CEP. So big +1 to this feature overall.
> > > >
> > > > Regarding the API, the name `partialWithin` sounds a little weird. Is
> > it
> > > > possible to find a name which is more intuitive? Other possible
> > solutions:
> > > > - Reuse the existing `Pattern.within` method and change its semantic
> > to the
> > > > maximum time interval between patterns. Currently `Pattern.within` is
> > used
> > > > to define the maximum time interval between the first event and the
> > last
> > > > event. However, the Pattern object represents only one node in a
> > pattern
> > > > sequence and so it doesn't make much sense to define the maximum time
> > > > interval between the first event and the last event on the Pattern
> > object,
> > > > e.g. we could move it to  `PatternStreamBuilder`. However, if we choose
> > > > this option, we'd better consider how to keep backward compatibility.
> > > > - Introduce a series of methods when appending a new pattern to the
> > > > existing one, e.g. `Pattern.followedBy(Pattern group, Time
> > > > timeInterval)`. As timeInterval is a property between patterns and so
> > it
> > > > makes sense to define this property when appending a new pattern.
> > However,
> > > > the drawback is that we need to introduce a series of methods instead
> > of
> > > > only one method.
> > > >
> > > > We need also to make the semantic clear in a few corner cases, e.g.
> > > > - What's the semantic of `A.followedBy(B).times(3).partialWithin(1
> > min)`?
> > > > Doesn't it mean that all three B events should occur in 1 minute or
> > only
> > > > the first B event sh

MATCH_RECOGNIZE And Semantics

2022-06-08 Thread Atri Sharma
Hello,

At my day job, we have run into a requirement to support MATCH_RECOGNIZE.

I wanted to check if this is already a part of the roadmap, and if
anybody is working on it.

If not, I am willing to work on this, with the community's guidance.

-- 
Regards,

Atri
Apache Concerted


Re: MATCH_RECOGNIZE And Semantics

2022-06-08 Thread Martijn Visser
Hi Atri,

Everything around MATCH_RECOGNIZE is documented [1]. Support in Batch mode
for MATCH_RECOGNIZE is planned for 1.16.

Best regards,

Martijn

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/sql/queries/match_recognize/

Op wo 8 jun. 2022 om 10:24 schreef Atri Sharma :

> Hello,
>
> At my day job, we have run into a requirement to support MATCH_RECOGNIZE.
>
> I wanted to check if this is already a part of the roadmap, and if
> anybody is working on it.
>
> If not, I am willing to work on this, with the community's guidance.
>
> --
> Regards,
>
> Atri
> Apache Concerted
>


Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-06-08 Thread Martijn Visser
Hi Nicholas,

Thanks for clarifying the current feature sparsity between DataStream/Table
and SQL on this topic. I think it's an interesting topic for a future
discussion but let's definitely keep it out of scope for this FLIP. It
would be nice to have a follow-up discussion on this in the future :)

Best regards,

Martijn

Op wo 8 jun. 2022 om 10:12 schreef Nicholas Jiang :

> Hi Dian,
>
> Thanks for your feedback about the Public Interface update for supporting
> the within between events feature. I have left the comments for above
> points:
>
> - Regarding the pattern API, should we also introduce APIs such as
> Pattern.times(int from, int to, Time windowTime) to indicate the time
> interval between events matched in the loop?
>
> IMO, we could not introduce the mentioned APIs for indication of the time
> interval between events. For example Pattern.times(int from, int to, Time
> windowTime), the user can use Pattern.times(int from, int
> to).within(BEFORE_AND_AFTER, windowTime) to indicate the time interval
> between the before and after event.
>
> - Regarding the naming of the classes, does it make sense to rename
> `WithinType` to `InternalType` or `WindowType`? For the enum values inside
> it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> intuitive for me. The candidates that come to my mind: -
> `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` - `WHOLE_MATCH` and
> `RELATIVE_TO_PREVIOUS`
>
> IMO, the `WithinType` naming could directly the situation for the time
> interval. In addtion. the enum values of the `WithinType` could update to
> `PREVIOUS_AND_NEXT` and `FIRST_AND_LAST` which directly indicate the time
> interval within the PREVIOUS and NEXT event and within the FIRST and LAST
> event. `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` are not clear to
> understand which event is relative to FIRST or PREVIOUS event.
>
> Best,
> Nicholas Jiang
>
> On 2022/06/06 07:48:22 Dian Fu wrote:
> > Hi Nicholas,
> >
> > Thanks a lot for the update.
> >
> > Regarding the pattern API, should we also introduce APIs such as
> > Pattern.times(int from, int to, Time windowTime) to indicate the time
> > interval between events matched in the loop?
> >
> > Regarding the naming of the classes, does it make sense to rename
> > `WithinType` to `InternalType` or `WindowType`? For the enum values
> inside
> > it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> > intuitive for me. The candidates that come to my mind:
> > - `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS`
> > - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> >
> > Regards,
> > Dian
> >
> > On Tue, May 31, 2022 at 2:56 PM Nicholas Jiang  >
> > wrote:
> >
> > > Hi Martijn,
> > >
> > > Sorry for later reply. This feature is only supported in DataStream and
> > > doesn't be supported in MATCH_RECOGNIZE because the SQL syntax of
> > > MATCH_RECOGNIZE does not contain the semantics of this feature, which
> > > requires modification of the SQL syntax. The support above
> MATCH_RECOGNIZE
> > > is suitable for new FLIP to discuss.
> > >
> > > Regards,
> > > Nicholas Jiang
> > >
> > > On 2022/05/25 11:36:33 Martijn Visser wrote:
> > > > Hi Nicholas,
> > > >
> > > > Thanks for creating the FLIP, I can imagine that there will be many
> use
> > > > cases who can be created using this new feature.
> > > >
> > > > The FLIP doesn't mention anything with regards to SQL, could this
> feature
> > > > also be supported when using MATCH_RECOGNIZE?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > >
> > > > On Sat, 7 May 2022 at 11:17, Dian Fu  wrote:
> > > >
> > > > > Hi Nicholas,
> > > > >
> > > > > Thanks a lot for bringing up this discussion. If I recall it
> correctly,
> > > > > this feature has been requested many times by the users and is
> among
> > > one of
> > > > > the most requested features in CEP. So big +1 to this feature
> overall.
> > > > >
> > > > > Regarding the API, the name `partialWithin` sounds a little weird.
> Is
> > > it
> > > > > possible to find a name which is more intuitive? Other possible
> > > solutions:
> > > > > - Reuse the existing `Pattern.within` method and change its
> semantic
> > > to the
> > > > > maximum time interval between patterns. Currently `Pattern.within`
> is
> > > used
> > > > > to define the maximum time interval between the first event and the
> > > last
> > > > > event. However, the Pattern object represents only one node in a
> > > pattern
> > > > > sequence and so it doesn't make much sense to define the maximum
> time
> > > > > interval between the first event and the last event on the Pattern
> > > object,
> > > > > e.g. we could move it to  `PatternStreamBuilder`. However, if we
> choose
> > > > > this option, we'd better consider how to keep backward
> compatibility.
> > > > > - Introduce a series of methods when appending a new pattern to the
> > > > > existing one, e.g. `Patter

Re: MATCH_RECOGNIZE And Semantics

2022-06-08 Thread Atri Sharma
Ah, thanks. I should have been clearer -- I meant MATCH_RECOGNIZE for
batch mode.

Thanks for clarifying!

On Wed, Jun 8, 2022 at 1:58 PM Martijn Visser  wrote:
>
> Hi Atri,
>
> Everything around MATCH_RECOGNIZE is documented [1]. Support in Batch mode
> for MATCH_RECOGNIZE is planned for 1.16.
>
> Best regards,
>
> Martijn
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/sql/queries/match_recognize/
>
> Op wo 8 jun. 2022 om 10:24 schreef Atri Sharma :
>
> > Hello,
> >
> > At my day job, we have run into a requirement to support MATCH_RECOGNIZE.
> >
> > I wanted to check if this is already a part of the roadmap, and if
> > anybody is working on it.
> >
> > If not, I am willing to work on this, with the community's guidance.
> >
> > --
> > Regards,
> >
> > Atri
> > Apache Concerted
> >

-- 
Regards,

Atri
Apache Concerted


[jira] [Created] (FLINK-27953) using the original order to add the primary key in PushProjectIntoTableSourceScanRule

2022-06-08 Thread zoucao (Jira)
zoucao created FLINK-27953:
--

 Summary: using the original order to add the primary key in 
PushProjectIntoTableSourceScanRule
 Key: FLINK-27953
 URL: https://issues.apache.org/jira/browse/FLINK-27953
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.14.4
Reporter: zoucao


In PushProjectIntoTableSourceScanRule, if the source produces a changelog 
stream, the primary key will be added to the end of projected fields, see the 
following SQL:
{code:java}
StreamTableTestUtil util = streamTestUtil(TableConfig.getDefault());
TableEnvironment tEnv = util.getTableEnv();
String srcTableDdl =
"CREATE TABLE fs (\n"
+ "  a bigint,\n"
+ "  b int,\n"
+ "  c varchar,\n"
+ "  d int,\n"
+ "  e int,\n "
+ "  primary key (a,b) not enforced \n"
+ ") with (\n"
+ " 'connector' = 'values',\n"
+ " 'disable-lookup'='true',\n"
+ " 'changelog-mode' = 'I,UB,UA,D')";
tEnv.executeSql(srcTableDdl);
tEnv.getConfig().set("table.exec.source.cdc-events-duplicate", "true");
{code}
{code:java}
 System.out.println(tEnv.explainSql("select a, c from fs where c > 0 and b = 
0"));

projected list:
[[0],[1],[2]]

== Optimized Execution Plan ==
Calc(select=[a, c], where=[(CAST(c AS BIGINT) > 0)])
+- ChangelogNormalize(key=[a, b])
   +- Exchange(distribution=[hash[a, b]])
  +- Calc(select=[a, b, c], where=[(b = 0)])
 +- DropUpdateBefore
+- TableSourceScan(table=[[default_catalog, default_database, fs, 
filter=[], project=[a, b, c], metadata=[]]], fields=[a, b, c])
{code}
{code:java}
 System.out.println(tEnv.explainSql("select a, c from fs where c > 0")); 

projected list:
[[0],[2],[1]]

 == Optimized Execution Plan ==
Calc(select=[a, c], where=[(CAST(c AS BIGINT) > 0)])
+- ChangelogNormalize(key=[a, b])
   +- Exchange(distribution=[hash[a, b]])
  +- DropUpdateBefore
 +- TableSourceScan(table=[[default_catalog, default_database, fs, 
filter=[], project=[a, c, b], metadata=[]]], fields=[a, c, b])
{code}
Field b is not involved in
{code:sql}
select a, c from fs where c > 0{code}
, but it is a primary key, so we add it to the end of projected list, If 
'table.exec.source.cdc-events-duplicate' is enabled. The condition about field 
b will change output type, that says the duplicate node will get the different 
input type, and the state serializer will also be changed, leading to state 
incompatibility.

I think we can use the original order from the source table to add the primary 
key to projected list.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-08 Thread Xianxun Ye
Hey Alexander, 


Making datagen source connector easier to use is really helpful during doing 
some PoC/Demo.
And I thought about is it possible to produce a changelog stream by datagen 
source, so a new flink developer can practice flink sql with cdc data using 
Flink SQL Client CLI.
In the flink-examples-table module, a ChangelogSocketExample class[1] describes 
how to ingest delete or insert data by 'nc' command. Can we support producing a 
changelog stream by the new datagen source?


[1] 
https://github.com/apache/flink/blob/master/flink-examples/flink-examples-table/src/main/java/org/apache/flink/table/examples/java/connectors/ChangelogSocketExample.java#L79


Best regards,


Xianxun


On 06/8/2022 08:10,Alexander Fedulov wrote:
I looked a bit further and it seems it should actually be easier than I
initially thought:  SourceReader extends CheckpointListener interface and
with its custom implementation it should be possible to achieve similar
results. A prototype that I have for the generator uses an IteratorSourceReader
under the hood by default but we could consider adding the ability to
supply something like a DataGeneratorSourceReaderFactory that would allow
provisioning the DataGeneratorSource with customized implementations for
cases like this.

Best,
Alexander Fedulov

On Wed, Jun 8, 2022 at 12:58 AM Alexander Fedulov 
wrote:

Hi Steven,

This is going to be tricky since in the new Source API the checkpointing
aspects that you based your logic on are pushed further away from the
low-level interfaces responsible for handling data and splits [1]. At the
same time, the SourceCoordinatorProvider is hardwired into the internals
of the framework, so I don't think it will be possible to provide a
customized implementation for testing purposes.

The only chance to tie data generation to checkpointing in the new Source
API that I see at the moment is via the SplitEnumerator serializer (
getEnumeratorCheckpointSerializer() method) [2]. In theory, it should be
possible to share a variable visible both to the generator function and to
the serializer and manipulate it whenever the serialize() method gets
called upon a checkpoint request. That said, you still won't get
notifications of successful checkpoints that you currently use (this info
is only available to the SourceCoordinator).

In general, regardless of the generator implementation itself, the new Source
API does not seem to support the use case of verifying checkpoints
contents in lockstep with produced data, at least I do not see an immediate
solution for this. Can you think of a different way of checking the
correctness of the Iceberg Sink implementation that does not rely on this
approach?

Best,
Alexander Fedulov

[1]
https://github.com/apache/flink/blob/0f19c2472c54aac97e4067f5398731ab90036d1a/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L337

[2]
https://github.com/apache/flink/blob/e4b000818c15b5b781c4e5262ba83bfc9d65121a/flink-core/src/main/java/org/apache/flink/api/connector/source/Source.java#L97

On Tue, Jun 7, 2022 at 6:03 PM Steven Wu  wrote:

In Iceberg source, we have a data generator source that can control the
records per checkpoint cycle. Can we support sth like this in the
DataGeneratorSource?


https://github.com/apache/iceberg/blob/master/flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java
public BoundedTestSource(List> elementsPerCheckpoint, boolean
checkpointEnabled)

Thanks,
Steven

On Tue, Jun 7, 2022 at 8:48 AM Alexander Fedulov https://cwiki.apache.org/confluence/x/9Av1D
[2] https://lists.apache.org/thread/d6cwqw9b3105wcpdkwq7rr4s7x4ywqr9

Best,
Alexander Fedulov





Re: MATCH_RECOGNIZE And Semantics

2022-06-08 Thread Nicholas Jiang
Hi Atri, Martijn,

MATCH_RECOGNIZE will support Batch mode in Flink 1.16. The pull request[1] 
which support MATCH_RECOGNIZE in Batch mode will be merge to master branch 
today. You could use in master branch.

Best,
Nicholas Jiang

[1]https://github.com/apache/flink/pull/18408

On 2022/06/08 08:30:17 Atri Sharma wrote:
> Ah, thanks. I should have been clearer -- I meant MATCH_RECOGNIZE for
> batch mode.
> 
> Thanks for clarifying!
> 
> On Wed, Jun 8, 2022 at 1:58 PM Martijn Visser  
> wrote:
> >
> > Hi Atri,
> >
> > Everything around MATCH_RECOGNIZE is documented [1]. Support in Batch mode
> > for MATCH_RECOGNIZE is planned for 1.16.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1]
> > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/sql/queries/match_recognize/
> >
> > Op wo 8 jun. 2022 om 10:24 schreef Atri Sharma :
> >
> > > Hello,
> > >
> > > At my day job, we have run into a requirement to support MATCH_RECOGNIZE.
> > >
> > > I wanted to check if this is already a part of the roadmap, and if
> > > anybody is working on it.
> > >
> > > If not, I am willing to work on this, with the community's guidance.
> > >
> > > --
> > > Regards,
> > >
> > > Atri
> > > Apache Concerted
> > >
> 
> -- 
> Regards,
> 
> Atri
> Apache Concerted
> 


Re: [VOTE] FLIP-224: Blocklist Mechanism

2022-06-08 Thread Chesnay Schepler

I don't really see a strong need for the blocklist in the FLIP-168.
It states that we need the block mechanism so that the speculative 
executions aren't deployed to the slow node.


I'm wondering what exactly prevents the scheduler from ensuring that 
right now, given that we already have other mechanisms that do 
location-aware decisions (e.g., local recovery or locality).
It could refuse to use slots from slow nodes (and downscale the job 
accordingly), upscale the job to limit the impact of slow nodes, and we 
could also think about extending the requirement declaration to have a 
notion of "undesirable nodes" so you're getting some more slots from 
good nodes (which I /think/ is the main thing you're targeting with the 
blocklist).
In any case I think there are options to explore here. Maybe you already 
did, but there's nothing in the FLIP about rejected alternatives.


It makes me thing that the whole "block the node but keep current stuff 
running" is more of a band-aid for a problem we're about to introduce 
ourselves. In particular because cluster-wide blocks seem rather strange 
in general; this should be scoped to the job because the performance is 
measured relative to other vertices in that same job. Another job might 
have drawn the short straw and only got slow slots; as far as that job 
is concerned the performance is perfectly fine.


On 07/06/2022 10:27, Zhu Zhu wrote:

Hi Chesnay,

For your information, one major goal of blocklist mechanism is to
support FLIP-168(speculative execution of batch jobs). When
speculative execution happens, it needs to keep the existing tasks
running and launch speculative tasks on other nodes. We have heard
request of speculative execution from many users, who find the feature
a blocker for them to run their production batch jobs on Flink.
Multi-tenant environment is common for batch jobs and temporary
hotspot becomes a common problem. It cannot be well resolved by fine
grained resources(machine load is not controlled by Flink) nor by
killing all tasks on a temporary hotspot(the job may roll back to
hours ago). Therefore, even just considering this goal, I think it
adds enough value to users.

Regarding wether we should reject a proposal because it adds
complexity to the core components. My point is that it depends on
whether the feature adds enough value to users. And it's also welcome
if someone has another good idea which adds less complexity.

If you are still concerned about the value of this feature, I'm fine a
open a survey in the user mailing lists to see how users think about
it.
What do you think?

Thanks,
Zhu

Chesnay Schepler  于2022年6月7日周二 15:13写道:

I've had some time to think about it and concluded to stick to my -1.

While BLOCK_WITH_QUARANTINE is easy to implement (un-register TMs and
ignore all RPCs (the latter mostly happens automatically)) it doesn't
add a whole lot of value as it's pretty much equivalent with shutting
down the TM.

Meanwhile, BLOCK needs an entirely different implementation that
interacts with the slot management on both JM/RM, and tbh I'm not so
sure about it's purpose. If the node/process is overloaded because of
the running job, well then resource profiles & fine-grained resource
management is supposed to address that. If the overloading is externally
induced then BLOCK only makes sense if the node is overloaded to a
degree where the existing workload is fine (otherwise
BLOCK_WITH_QUARANTINE would be a better choice I guess), which seems
rather unlikely.

I'm against this change because I don't believe it will be useful for
the general user-base, nor since this can't be implemented without
pushing some complexity into core components.

On 28/05/2022 06:48, Zhu Zhu wrote:

Hi Chesnay,
Would you share your thoughts in the discussion thread if there are
still concerns?

Thanks,
Zhu

Chesnay Schepler  于2022年5月27日周五 14:54写道:


-1 to put a lid on things for now, because I'm not quite done yet with
the discussion.

On 27/05/2022 05:25, Yangze Guo wrote:

+1 (binding)

Best,
Yangze Guo

On Thu, May 26, 2022 at 3:54 PM Yun Gao  wrote:

Thanks Lijie and Zhu for driving the FLIP!

The blocked list functionality helps reduce the complexity in maintenance
and the currently design looks good to me, thus +1 from my side (binding).


Best,
Yun




--
From:Xintong Song
Send Time:2022 May 26 (Thu.) 12:51
To:dev
Subject:Re: [VOTE] FLIP-224: Blocklist Mechanism

Thanks for driving this effort, Lijie.

I think a nice addition would be to make this feature accessible directly
from webui. However, there's no reason to block this FLIP on it.

So +1 (binding) from my side.

Best,

Xintong



On Fri, May 20, 2022 at 12:57 PM Lijie Wang
wrote:


Hi everyone,

Thanks for the feedback for FLIP-224: Blocklist Mechanism [1] on the
discussion thread [2]

I'd like to start a vote for it. The vote will last for at least 72 hours
unless there is an objection or insufficient votes.

[1]

ht

[DISCUSS] Releasing 1.15.1

2022-06-08 Thread David Anderson
I would like to start a discussion on releasing 1.15.1. Flink 1.15 was
released on the 5th of May [1] and so far 43 issues have been resolved,
including several user-facing issues with blocker and critical priorities
[2]. (The recent problem with FileSink rolling policies not working
properly in 1.15.0 got me thinking it might be time for bug-fix release.)

There currently remain 16 unresolved tickets with a fixVersion of 1.15.1
[3], five of which are about CI infrastructure and tests. There is only one
such ticket marked Critical:

https://issues.apache.org/jira/browse/FLINK-27492 Flink table scala example
does not including the scala-api jars

I'm not convinced we should hold up a release for this issue, but on the
other hand, it seems that this issue can be resolved by making a decision
about how to handle the missing dependencies. @Timo Walther
 @yun_gao can you give an update?

Two other open issues seem to have made significant progress (listed
below). Would it make sense to wait for either of these? Are there any
other open tickets we should consider waiting for?

https://issues.apache.org/jira/browse/FLINK-27420 Suspended SlotManager
fail to reregister metrics when started again
https://issues.apache.org/jira/browse/FLINK-27606 CompileException when
using UDAF with merge() method

I would volunteer to manage the release. Is there a PMC member who would
join me to help, as needed?

Best,
David

[1] https://flink.apache.org/news/2022/05/05/1.15-announcement.html

[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC

[3]
https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC


Re: [DISCUSS] Releasing 1.15.1

2022-06-08 Thread Jark Wu
Hi David, thank you for driving the release.

+1 for the 1.15.1 release. The release-1.15 branch
already contains many bug fixes and some SQL
issues are quite critical.

Btw, FLINK-27606 has been merged just now.

Best,
Jark


On Wed, 8 Jun 2022 at 17:40, David Anderson  wrote:

> I would like to start a discussion on releasing 1.15.1. Flink 1.15 was
> released on the 5th of May [1] and so far 43 issues have been resolved,
> including several user-facing issues with blocker and critical priorities
> [2]. (The recent problem with FileSink rolling policies not working
> properly in 1.15.0 got me thinking it might be time for bug-fix release.)
>
> There currently remain 16 unresolved tickets with a fixVersion of 1.15.1
> [3], five of which are about CI infrastructure and tests. There is only one
> such ticket marked Critical:
>
> https://issues.apache.org/jira/browse/FLINK-27492 Flink table scala
> example
> does not including the scala-api jars
>
> I'm not convinced we should hold up a release for this issue, but on the
> other hand, it seems that this issue can be resolved by making a decision
> about how to handle the missing dependencies. @Timo Walther
>  @yun_gao can you give an update?
>
> Two other open issues seem to have made significant progress (listed
> below). Would it make sense to wait for either of these? Are there any
> other open tickets we should consider waiting for?
>
> https://issues.apache.org/jira/browse/FLINK-27420 Suspended SlotManager
> fail to reregister metrics when started again
> https://issues.apache.org/jira/browse/FLINK-27606 CompileException when
> using UDAF with merge() method
>
> I would volunteer to manage the release. Is there a PMC member who would
> join me to help, as needed?
>
> Best,
> David
>
> [1] https://flink.apache.org/news/2022/05/05/1.15-announcement.html
>
> [2]
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
>
> [3]
>
> https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
>


Re: [DISCUSS] Releasing 1.15.1

2022-06-08 Thread Jingsong Li
+1

Thanks David for volunteering to manage the release.

Best,
Jingsong

On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
>
> Hi David, thank you for driving the release.
>
> +1 for the 1.15.1 release. The release-1.15 branch
> already contains many bug fixes and some SQL
> issues are quite critical.
>
> Btw, FLINK-27606 has been merged just now.
>
> Best,
> Jark
>
>
> On Wed, 8 Jun 2022 at 17:40, David Anderson  wrote:
>
> > I would like to start a discussion on releasing 1.15.1. Flink 1.15 was
> > released on the 5th of May [1] and so far 43 issues have been resolved,
> > including several user-facing issues with blocker and critical priorities
> > [2]. (The recent problem with FileSink rolling policies not working
> > properly in 1.15.0 got me thinking it might be time for bug-fix release.)
> >
> > There currently remain 16 unresolved tickets with a fixVersion of 1.15.1
> > [3], five of which are about CI infrastructure and tests. There is only one
> > such ticket marked Critical:
> >
> > https://issues.apache.org/jira/browse/FLINK-27492 Flink table scala
> > example
> > does not including the scala-api jars
> >
> > I'm not convinced we should hold up a release for this issue, but on the
> > other hand, it seems that this issue can be resolved by making a decision
> > about how to handle the missing dependencies. @Timo Walther
> >  @yun_gao can you give an update?
> >
> > Two other open issues seem to have made significant progress (listed
> > below). Would it make sense to wait for either of these? Are there any
> > other open tickets we should consider waiting for?
> >
> > https://issues.apache.org/jira/browse/FLINK-27420 Suspended SlotManager
> > fail to reregister metrics when started again
> > https://issues.apache.org/jira/browse/FLINK-27606 CompileException when
> > using UDAF with merge() method
> >
> > I would volunteer to manage the release. Is there a PMC member who would
> > join me to help, as needed?
> >
> > Best,
> > David
> >
> > [1] https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> >
> > [2]
> >
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> >
> > [3]
> >
> > https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> >


Re: [DISCUSS] Releasing 1.15.1

2022-06-08 Thread Chesnay Schepler

+1

Thank you for proposing this. I can take care of the PMC-side of things.

On 08/06/2022 12:37, Jingsong Li wrote:

+1

Thanks David for volunteering to manage the release.

Best,
Jingsong

On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:

Hi David, thank you for driving the release.

+1 for the 1.15.1 release. The release-1.15 branch
already contains many bug fixes and some SQL
issues are quite critical.

Btw, FLINK-27606 has been merged just now.

Best,
Jark


On Wed, 8 Jun 2022 at 17:40, David Anderson  wrote:


I would like to start a discussion on releasing 1.15.1. Flink 1.15 was
released on the 5th of May [1] and so far 43 issues have been resolved,
including several user-facing issues with blocker and critical priorities
[2]. (The recent problem with FileSink rolling policies not working
properly in 1.15.0 got me thinking it might be time for bug-fix release.)

There currently remain 16 unresolved tickets with a fixVersion of 1.15.1
[3], five of which are about CI infrastructure and tests. There is only one
such ticket marked Critical:

https://issues.apache.org/jira/browse/FLINK-27492 Flink table scala
example
does not including the scala-api jars

I'm not convinced we should hold up a release for this issue, but on the
other hand, it seems that this issue can be resolved by making a decision
about how to handle the missing dependencies. @Timo Walther
 @yun_gao can you give an update?

Two other open issues seem to have made significant progress (listed
below). Would it make sense to wait for either of these? Are there any
other open tickets we should consider waiting for?

https://issues.apache.org/jira/browse/FLINK-27420 Suspended SlotManager
fail to reregister metrics when started again
https://issues.apache.org/jira/browse/FLINK-27606 CompileException when
using UDAF with merge() method

I would volunteer to manage the release. Is there a PMC member who would
join me to help, as needed?

Best,
David

[1] https://flink.apache.org/news/2022/05/05/1.15-announcement.html

[2]

https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC

[3]

https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC





Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.0.0 released

2022-06-08 Thread Lijie Wang
Congrats! Thanks Yang for driving the release, and thanks to all
contributors!

Best,
Lijie

John Gerassimou  于2022年6月6日周一 22:38写道:

> Thank you for all your efforts!
>
> Thanks
> John
>
> On Sun, Jun 5, 2022 at 10:33 PM Aitozi  wrote:
>
>> Thanks Yang and Nice to see it happen.
>>
>> Best,
>> Aitozi.
>>
>> Yang Wang  于2022年6月5日周日 16:14写道:
>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink Kubernetes Operator 1.0.0.
>>>
>>> The Flink Kubernetes Operator allows users to manage their Apache Flink
>>> applications and their lifecycle through native k8s tooling like kubectl.
>>> This is the first production ready release and brings numerous
>>> improvements and new features to almost every aspect of the operator.
>>>
>>> Please check out the release blog post for an overview of the release:
>>>
>>> https://flink.apache.org/news/2022/06/05/release-kubernetes-operator-1.0.0.html
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Maven artifacts for Flink Kubernetes Operator can be found at:
>>>
>>> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
>>>
>>> Official Docker image for Flink Kubernetes Operator applications can be
>>> found at:
>>> https://hub.docker.com/r/apache/flink-kubernetes-operator
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351500
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Regards,
>>> Gyula & Yang
>>>
>>


[jira] [Created] (FLINK-27954) JobVertexFlameGraphHandler does not work on standby Dispatcher

2022-06-08 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-27954:


 Summary: JobVertexFlameGraphHandler does not work on standby 
Dispatcher
 Key: FLINK-27954
 URL: https://issues.apache.org/jira/browse/FLINK-27954
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Runtime / REST
Affects Versions: 1.15.0
Reporter: Chesnay Schepler
 Fix For: 1.15.2, 1.16.0


The {{JobVertexFlameGraphHandler}} relies internally on the 
{{JobVertexThreadInfoTracker}} which calls 
{{ResourceManagerGateway#requestTaskExecutorThreadInfoGateway}} to get a 
gateway for requesting the thread info from the task executors. Since this 
gateway is not serializable it would categorically fail if called from a 
standby dispatcher.
Instead this should follow the logic of the {{MetricFetcherImpl}}, which 
requests addresses instead and manually connectors to the task executors.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-08 Thread Yuan Mei
I was thinking of the same thing about adding a dedicated channel for the
daily performance micro bench.

I was even thinking of asking volunteers to watch for the daily performance
micro bench.

Roman and I are drafting instructions on how to monitor performance
regression right now.

But I think this is a different topic and I will start a new thread to
discuss this once the details are finalized.

So, +1 for a dedicated channel for the daily performance micro bench notice.

Best,

Yuan

On Wed, Jun 8, 2022 at 11:38 AM Xingbo Huang  wrote:

> Thanks everyone for the great effort driving this.
>
> +1 for building a dedicated channel for CI builds.
>
> Best,
> Xingbo
>
> Martijn Visser  于2022年6月8日周三 01:32写道:
>
> > Hi Matthias,
> >
> > I've only thought about it, but I absolutely would +1 this. Making this
> > information available in dedicated channels will only improve things.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op di 7 jun. 2022 om 18:12 schreef Matthias Pohl :
> >
> > > Thanks for driving the Slack channel efforts and setting everything up.
> > :-)
> > >
> > > I'm wondering whether we should extend the Slack space to also have a
> > > channel dedicated for CI builds to enable the release managers to
> monitor
> > > CI builds on master and the release branches through the Apache Flink
> > Slack
> > > workspace. It might help us in being more transparent with the release
> > > process. The same goes for the Flink performance tests [1].
> > >
> > > Are there plans around that already? I couldn't find anything related
> in
> > > [2]. Or is it worth opening another thread around that topic.
> > >
> > > Best,
> > > Matthias
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115511847
> > > [2] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
> > > On Tue, Jun 7, 2022 at 10:39 AM Jing Ge  wrote:
> > >
> > > > got it, thanks Martijn!
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > > On Tue, Jun 7, 2022 at 10:38 AM Martijn Visser <
> > martijnvis...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Hi Jing,
> > > > >
> > > > > Yes, since this Flink Community workspace is treated as one
> company.
> > So
> > > > > everyone you want to invite, should considered a 'coworker'.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > Op za 4 jun. 2022 om 20:02 schreef Jing Ge :
> > > > >
> > > > >> Hi Xingtong,
> > > > >>
> > > > >> While inviting new members, there are two options: "From another
> > > > company"
> > > > >> vs "Your coworker". In this case, we should always choose "Your
> > > > coworker"
> > > > >> to add new members to the Apache Flink workspace, right?
> > > > >>
> > > > >> Best regards,
> > > > >> Jing
> > > > >>
> > > > >> On Fri, Jun 3, 2022 at 1:10 PM Yuan Mei 
> > > wrote:
> > > > >>
> > > > >>> Thanks, Xintong and Jark the great effort driving this, and
> > everyone
> > > > for
> > > > >>> making this possible.
> > > > >>>
> > > > >>> I've also Twittered this announcement on our Apache Flink Twitter
> > > > >>> account.
> > > > >>>
> > > > >>> Best
> > > > >>>
> > > > >>> Yuan
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Fri, Jun 3, 2022 at 12:54 AM Jing Ge 
> > wrote:
> > > > >>>
> > > >  Thanks everyone for your effort!
> > > > 
> > > >  Best regards,
> > > >  Jing
> > > > 
> > > >  On Thu, Jun 2, 2022 at 4:17 PM Martijn Visser <
> > > > martijnvis...@apache.org>
> > > >  wrote:
> > > > 
> > > > > Thanks everyone for joining! It's good to see so many have
> joined
> > > in
> > > > > such a short time already. I've just refreshed the link which
> you
> > > can
> > > > > always find on the project website [1]
> > > > >
> > > > > Best regards, Martijn
> > > > >
> > > > > [1] https://flink.apache.org/community.html#slack
> > > > >
> > > > > Op do 2 jun. 2022 om 11:42 schreef Jingsong Li <
> > > > jingsongl...@gmail.com
> > > > > >:
> > > > >
> > > > >> Thanks Xingtong, Jark, Martijn and Robert for making this
> > > possible!
> > > > >>
> > > > >> Best,
> > > > >> Jingsong
> > > > >>
> > > > >>
> > > > >> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu 
> > wrote:
> > > > >>
> > > > >>> Thank Xingtong for making this possible!
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Jark Wu
> > > > >>>
> > > > >>> On Thu, 2 Jun 2022 at 15:31, Xintong Song <
> > tonysong...@gmail.com
> > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hi everyone,
> > > > >>> >
> > > > >>> > I'm very happy to announce that the Apache Flink community
> > has
> > > > >>> created a
> > > > >>> > dedicated Slack workspace [1]. Welcome to join us on Slack.
> > > > >>> >
> > > > >>> > ## Join the Slack workspace
> > > > >>> >
> > > > >>> > You can join the Slack workspace by either of the following
> > two
> > > > >>> ways:
> > > > >>> > 1. 

Re: [DISCUSS] Releasing Flink 1.14.5

2022-06-08 Thread Xingbo Huang
Hi everyone,

I will start with the preparation for the release.

Best,
Xingbo

Jing Ge  于2022年5月31日周二 01:20写道:

> Hi Xingbo,
>
> +1
> Thanks for driving this.
>
> Best regards,
> Jing
>
> On Mon, May 30, 2022 at 4:51 PM Martijn Visser 
> wrote:
>
> > Hi Xingbo,
> >
> > +1 to release Flink 1.14.5.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op do 26 mei 2022 om 10:19 schreef Dian Fu :
> >
> > > Hi Xingbo,
> > >
> > > Thanks for driving this release. +1 for 1.14.5 as there are already
> > nearly
> > > 100 commits [1] since 1.14.4.
> > >
> > > Regards,
> > > Dian
> > >
> > > [1]
> > https://github.com/apache/flink/compare/release-1.14.4...release-1.14
> > >
> > > On Tue, May 24, 2022 at 2:23 PM Xingbo Huang 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start discussing releasing Flink 1.14.5.
> > > >
> > > > It has already been more than two months since we released 1.14.4.
> > There
> > > > are currently 62 tickets[1] already resolved for 1.14.5, some of them
> > > quite
> > > > severe.
> > > >
> > > > Currently, there are no issues marked as critical or blocker for
> > 1.14.5.
> > > > Please let me know if there are any issues you'd like to be included
> in
> > > > this release but still not merged.
> > > >
> > > > I would like to volunteer as a release manager for 1.14.5, and start
> > the
> > > > release process once all the issues are merged.
> > > >
> > > > Best,
> > > > Xingbo
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20fixVersion%20%3D%201.14.5&startIndex=50
> > > >
> > >
> >
>


[jira] [Created] (FLINK-27955) Fix the problem of windows installation failure in PyFlink

2022-06-08 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-27955:


 Summary: Fix the problem of windows installation failure in PyFlink
 Key: FLINK-27955
 URL: https://issues.apache.org/jira/browse/FLINK-27955
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.15.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: 1.15.1


Because pemja doesn't support windows os, it makes installation failed in 
windows os in release-1.15. We need to fix it asap.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27956) Kubernetes Operator Deployment strategy type should be Recreate

2022-06-08 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-27956:
--

 Summary: Kubernetes Operator Deployment strategy type should be 
Recreate
 Key: FLINK-27956
 URL: https://issues.apache.org/jira/browse/FLINK-27956
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora


We should change the Deployment strategy.type from the default (RollingUpdate) 
to Recreate to avoid potential problems when a new operator pod is deployed 
during upgrade.

[https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy]

 

Only one operator pod is supposed to run at any given time to avoid any 
errors/inconsistencies, and without HA/leader election, this setting is 
necessary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.0.0 released

2022-06-08 Thread Nicholas Jiang
Congrats! Thanks Yang for driving the 1.0.0 release, and thanks to all 
contribution.

Best,
Nicholas Jiang

On 2022/06/08 10:54:37 Lijie Wang wrote:
> Congrats! Thanks Yang for driving the release, and thanks to all
> contributors!
> 
> Best,
> Lijie
> 
> John Gerassimou  于2022年6月6日周一 22:38写道:
> 
> > Thank you for all your efforts!
> >
> > Thanks
> > John
> >
> > On Sun, Jun 5, 2022 at 10:33 PM Aitozi  wrote:
> >
> >> Thanks Yang and Nice to see it happen.
> >>
> >> Best,
> >> Aitozi.
> >>
> >> Yang Wang  于2022年6月5日周日 16:14写道:
> >>
> >>> The Apache Flink community is very happy to announce the release of
> >>> Apache Flink Kubernetes Operator 1.0.0.
> >>>
> >>> The Flink Kubernetes Operator allows users to manage their Apache Flink
> >>> applications and their lifecycle through native k8s tooling like kubectl.
> >>> This is the first production ready release and brings numerous
> >>> improvements and new features to almost every aspect of the operator.
> >>>
> >>> Please check out the release blog post for an overview of the release:
> >>>
> >>> https://flink.apache.org/news/2022/06/05/release-kubernetes-operator-1.0.0.html
> >>>
> >>> The release is available for download at:
> >>> https://flink.apache.org/downloads.html
> >>>
> >>> Maven artifacts for Flink Kubernetes Operator can be found at:
> >>>
> >>> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
> >>>
> >>> Official Docker image for Flink Kubernetes Operator applications can be
> >>> found at:
> >>> https://hub.docker.com/r/apache/flink-kubernetes-operator
> >>>
> >>> The full release notes are available in Jira:
> >>>
> >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351500
> >>>
> >>> We would like to thank all contributors of the Apache Flink community
> >>> who made this release possible!
> >>>
> >>> Regards,
> >>> Gyula & Yang
> >>>
> >>
> 


[jira] [Created] (FLINK-27957) Extract AppendOnlyFileStore out of KeyValueFileStore

2022-06-08 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-27957:
---

 Summary: Extract AppendOnlyFileStore out of KeyValueFileStore
 Key: FLINK-27957
 URL: https://issues.apache.org/jira/browse/FLINK-27957
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Affects Versions: table-store-0.2.0
Reporter: Caizhi Weng


Currently {{FileStore}} for append only records and key-values are mixed in one 
{{FileStoreImpl}} class. This makes the code base messy and also introduce bugs 
(for example, {{AppendOnlyFileStore}} should rely on a special reader 
implementation but it is not, causing failures when using avro format).

We need to extract {{AppendOnlyFileStore}} out of {{KeyValueFileStore}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-08 Thread Alexander Fedulov
Hi Xianxun,

Thanks for bringing it up. I do believe it would be useful to have such a
CDC data generator but I see the
efforts to provide one a bit orthogonal to the DataSourceGenerator proposed
in the FLIP. FLIP-238 focuses
on the DataStream API and I could see integration into the Table/SQL
ecosystem as the next step that I would
prefer to keep separate (see KafkaDynamicSource reusing KafkaSource
under the hood [1]).

[1]
https://github.com/apache/flink/blob/3adca15859b36c61116fc56fabc24e9647f0ec5f/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/table/KafkaDynamicSource.java#L223

Best,
Alexander Fedulov




On Wed, Jun 8, 2022 at 11:01 AM Xianxun Ye  wrote:

> Hey Alexander,
>
> Making datagen source connector easier to use is really helpful during
> doing some PoC/Demo.
> And I thought about is it possible to produce a changelog stream by
> datagen source, so a new flink developer can practice flink sql with cdc
> data using Flink SQL Client CLI.
> In the flink-examples-table module, a ChangelogSocketExample class[1]
> describes how to ingest delete or insert data by 'nc' command. Can we
> support producing a changelog stream by the new datagen source?
>
> [1]
> https://github.com/apache/flink/blob/master/flink-examples/flink-examples-table/src/main/java/org/apache/flink/table/examples/java/connectors/ChangelogSocketExample.java#L79
>
> Best regards,
>
> Xianxun
>
> On 06/8/2022 08:10,Alexander Fedulov
>  wrote:
>
> I looked a bit further and it seems it should actually be easier than I
> initially thought:  SourceReader extends CheckpointListener interface and
> with its custom implementation it should be possible to achieve similar
> results. A prototype that I have for the generator uses an
> IteratorSourceReader
> under the hood by default but we could consider adding the ability to
> supply something like a DataGeneratorSourceReaderFactory that would allow
> provisioning the DataGeneratorSource with customized implementations for
> cases like this.
>
> Best,
> Alexander Fedulov
>
> On Wed, Jun 8, 2022 at 12:58 AM Alexander Fedulov  >
> wrote:
>
> Hi Steven,
>
> This is going to be tricky since in the new Source API the checkpointing
> aspects that you based your logic on are pushed further away from the
> low-level interfaces responsible for handling data and splits [1]. At the
> same time, the SourceCoordinatorProvider is hardwired into the internals
> of the framework, so I don't think it will be possible to provide a
> customized implementation for testing purposes.
>
> The only chance to tie data generation to checkpointing in the new Source
> API that I see at the moment is via the SplitEnumerator serializer (
> getEnumeratorCheckpointSerializer() method) [2]. In theory, it should be
> possible to share a variable visible both to the generator function and to
> the serializer and manipulate it whenever the serialize() method gets
> called upon a checkpoint request. That said, you still won't get
> notifications of successful checkpoints that you currently use (this info
> is only available to the SourceCoordinator).
>
> In general, regardless of the generator implementation itself, the new
> Source
> API does not seem to support the use case of verifying checkpoints
> contents in lockstep with produced data, at least I do not see an immediate
> solution for this. Can you think of a different way of checking the
> correctness of the Iceberg Sink implementation that does not rely on this
> approach?
>
> Best,
> Alexander Fedulov
>
> [1]
>
> https://github.com/apache/flink/blob/0f19c2472c54aac97e4067f5398731ab90036d1a/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L337
>
> [2]
>
> https://github.com/apache/flink/blob/e4b000818c15b5b781c4e5262ba83bfc9d65121a/flink-core/src/main/java/org/apache/flink/api/connector/source/Source.java#L97
>
> On Tue, Jun 7, 2022 at 6:03 PM Steven Wu  wrote:
>
> In Iceberg source, we have a data generator source that can control the
> records per checkpoint cycle. Can we support sth like this in the
> DataGeneratorSource?
>
>
>
> https://github.com/apache/iceberg/blob/master/flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java
> public BoundedTestSource(List> elementsPerCheckpoint, boolean
> checkpointEnabled)
>
> Thanks,
> Steven
>
> On Tue, Jun 7, 2022 at 8:48 AM Alexander Fedulov 
>
> wrote:
>
> Hi everyone,
>
> I would like to open a discussion on FLIP-238: Introduce FLIP-27-based
>
> Data
>
> Generator Source [1]. During the discussion about deprecating the
> SourceFunction API [2] it became evident that an easy-to-use
> FLIP-27-compatible data generator source is needed so that the current
> SourceFunction-based data generator implementations could be phased out
>
> for
>
> both Flink demo/PoC applications and for the internal Flink tests. This
> FLIP proposes to introduce a generic DataGeneratorSource capable of
> producing events

Re: [DISCUSS] Releasing 1.15.1

2022-06-08 Thread Xingbo Huang
Hi David,

+1
Thank you for driving this.

Best,
Xingbo

Chesnay Schepler  于2022年6月8日周三 18:41写道:

> +1
>
> Thank you for proposing this. I can take care of the PMC-side of things.
>
> On 08/06/2022 12:37, Jingsong Li wrote:
> > +1
> >
> > Thanks David for volunteering to manage the release.
> >
> > Best,
> > Jingsong
> >
> > On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
> >> Hi David, thank you for driving the release.
> >>
> >> +1 for the 1.15.1 release. The release-1.15 branch
> >> already contains many bug fixes and some SQL
> >> issues are quite critical.
> >>
> >> Btw, FLINK-27606 has been merged just now.
> >>
> >> Best,
> >> Jark
> >>
> >>
> >> On Wed, 8 Jun 2022 at 17:40, David Anderson 
> wrote:
> >>
> >>> I would like to start a discussion on releasing 1.15.1. Flink 1.15 was
> >>> released on the 5th of May [1] and so far 43 issues have been resolved,
> >>> including several user-facing issues with blocker and critical
> priorities
> >>> [2]. (The recent problem with FileSink rolling policies not working
> >>> properly in 1.15.0 got me thinking it might be time for bug-fix
> release.)
> >>>
> >>> There currently remain 16 unresolved tickets with a fixVersion of
> 1.15.1
> >>> [3], five of which are about CI infrastructure and tests. There is
> only one
> >>> such ticket marked Critical:
> >>>
> >>> https://issues.apache.org/jira/browse/FLINK-27492 Flink table scala
> >>> example
> >>> does not including the scala-api jars
> >>>
> >>> I'm not convinced we should hold up a release for this issue, but on
> the
> >>> other hand, it seems that this issue can be resolved by making a
> decision
> >>> about how to handle the missing dependencies. @Timo Walther
> >>>  @yun_gao can you give an update?
> >>>
> >>> Two other open issues seem to have made significant progress (listed
> >>> below). Would it make sense to wait for either of these? Are there any
> >>> other open tickets we should consider waiting for?
> >>>
> >>> https://issues.apache.org/jira/browse/FLINK-27420 Suspended
> SlotManager
> >>> fail to reregister metrics when started again
> >>> https://issues.apache.org/jira/browse/FLINK-27606 CompileException
> when
> >>> using UDAF with merge() method
> >>>
> >>> I would volunteer to manage the release. Is there a PMC member who
> would
> >>> join me to help, as needed?
> >>>
> >>> Best,
> >>> David
> >>>
> >>> [1] https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> >>>
> >>> [2]
> >>>
> >>>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> >>>
> >>> [3]
> >>>
> >>>
> https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> >>>
>
>


Re: [DISCUSS] Releasing 1.15.1

2022-06-08 Thread Jing Ge
+1

Thanks David for driving it!

Best regards,
Jing


On Wed, Jun 8, 2022 at 2:32 PM Xingbo Huang  wrote:

> Hi David,
>
> +1
> Thank you for driving this.
>
> Best,
> Xingbo
>
> Chesnay Schepler  于2022年6月8日周三 18:41写道:
>
> > +1
> >
> > Thank you for proposing this. I can take care of the PMC-side of things.
> >
> > On 08/06/2022 12:37, Jingsong Li wrote:
> > > +1
> > >
> > > Thanks David for volunteering to manage the release.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
> > >> Hi David, thank you for driving the release.
> > >>
> > >> +1 for the 1.15.1 release. The release-1.15 branch
> > >> already contains many bug fixes and some SQL
> > >> issues are quite critical.
> > >>
> > >> Btw, FLINK-27606 has been merged just now.
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >>
> > >> On Wed, 8 Jun 2022 at 17:40, David Anderson 
> > wrote:
> > >>
> > >>> I would like to start a discussion on releasing 1.15.1. Flink 1.15
> was
> > >>> released on the 5th of May [1] and so far 43 issues have been
> resolved,
> > >>> including several user-facing issues with blocker and critical
> > priorities
> > >>> [2]. (The recent problem with FileSink rolling policies not working
> > >>> properly in 1.15.0 got me thinking it might be time for bug-fix
> > release.)
> > >>>
> > >>> There currently remain 16 unresolved tickets with a fixVersion of
> > 1.15.1
> > >>> [3], five of which are about CI infrastructure and tests. There is
> > only one
> > >>> such ticket marked Critical:
> > >>>
> > >>> https://issues.apache.org/jira/browse/FLINK-27492 Flink table scala
> > >>> example
> > >>> does not including the scala-api jars
> > >>>
> > >>> I'm not convinced we should hold up a release for this issue, but on
> > the
> > >>> other hand, it seems that this issue can be resolved by making a
> > decision
> > >>> about how to handle the missing dependencies. @Timo Walther
> > >>>  @yun_gao can you give an update?
> > >>>
> > >>> Two other open issues seem to have made significant progress (listed
> > >>> below). Would it make sense to wait for either of these? Are there
> any
> > >>> other open tickets we should consider waiting for?
> > >>>
> > >>> https://issues.apache.org/jira/browse/FLINK-27420 Suspended
> > SlotManager
> > >>> fail to reregister metrics when started again
> > >>> https://issues.apache.org/jira/browse/FLINK-27606 CompileException
> > when
> > >>> using UDAF with merge() method
> > >>>
> > >>> I would volunteer to manage the release. Is there a PMC member who
> > would
> > >>> join me to help, as needed?
> > >>>
> > >>> Best,
> > >>> David
> > >>>
> > >>> [1] https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > >>>
> > >>> [2]
> > >>>
> > >>>
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > >>>
> > >>> [3]
> > >>>
> > >>>
> >
> https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > >>>
> >
> >
>


Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-08 Thread Jark Wu
Hi Paul,

I'm fine with using JOBS. The only concern is that this may conflict with
displaying more detailed
information for query (e.g. query content, plan) in the future, e.g. SHOW
QUERIES EXTENDED in ksqldb[1].
This is not a big problem as we can introduce SHOW QUERIES in the future if
necessary.

> STOP JOBS  (with options `table.job.stop-with-savepoint` and
`table.job.stop-with-drain`)
What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
It might be trivial and error-prone to set configuration before executing a
statement,
and the configuration will affect all statements after that.

> CREATE SAVEPOINT  FOR JOB 
We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
and always use configuration "state.savepoints.dir" as the default
savepoint dir.
The concern with using "" is here should be savepoint dir,
and savepoint_path is the returned value.

I'm fine with other changes.

Thanks,
Jark

[1]:
https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/




On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:

> Hi Jing,
>
> Thank you for your inputs!
>
> TBH, I haven’t considered the ETL scenario that you mentioned. I think
> they’re managed just like other jobs interns of job lifecycles (please
> correct me if I’m wrong).
>
> WRT to the SQL statements about SQL lineages, I think it might be a little
> bit out of the scope of the FLIP, since it’s mainly about lifecycles. By
> the way, do we have these functionalities in Flink CLI or REST API already?
>
> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the
> community is more in favor of `DROP SAVEPOINT `. I’m
> updating the FLIP arcading to the latest discussions.
>
> Best,
> Paul Lam
>
> 2022年6月8日 07:31,Jing Ge  写道:
>
> Hi Paul,
>
> Sorry that I am a little bit too late to join this thread. Thanks for
> driving this and starting this informative discussion. The FLIP looks
> really interesting. It will help us a lot to manage Flink SQL jobs.
>
> Have you considered the ETL scenario with Flink SQL, where multiple SQLs
> build a DAG for many DAGs?
>
> 1)
> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to
> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are
> responsible to *produce* data as the result(cube, materialized view, etc.)
> for the future consumption by queries. The INSERT INTO SELECT FROM example
> in FLIP and CTAS are typical SQL in this case. I would prefer to call them
> Jobs instead of Queries.
>
> 2)
> Speaking of ETL DAG, we might want to see the lineage. Is it possible to
> support syntax like:
>
> SHOW JOBTREE   // shows the downstream DAG from the given job_id
> SHOW JOBTREE  FULL // shows the whole DAG that contains the given
> job_id
> SHOW JOBTREES // shows all DAGs
> SHOW ANCIENTS  // shows all parents of the given job_id
>
> 3)
> Could we also support Savepoint housekeeping syntax? We ran into this
> issue that a lot of savepoints have been created by customers (via their
> apps). It will take extra (hacking) effort to clean it.
>
> RELEASE SAVEPOINT ALL
>
> Best regards,
> Jing
>
> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser 
> wrote:
>
>> Hi Paul,
>>
>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES
>> could
>> imply that this will actually show the query, but we're returning IDs of
>> the running application. At first I was also not very much in favour of
>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS
>>
>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary
>>
>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam :
>>
>> > Hi Godfrey,
>> >
>> > Sorry for the late reply, I was on vacation.
>> >
>> > It looks like we have a variety of preferences on the syntax, how about
>> we
>> > choose the most acceptable one?
>> >
>> > WRT keyword for SQL jobs, we use JOBS, thus the statements related to
>> jobs
>> > would be:
>> >
>> > - SHOW JOBS
>> > - STOP JOBS  (with options `table.job.stop-with-savepoint` and
>> > `table.job.stop-with-drain`)
>> >
>> > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR
>> > JOB`:
>> >
>> > - CREATE SAVEPOINT  FOR JOB 
>> > - SHOW SAVEPOINTS FOR JOB  (show savepoints the current job
>> > manager remembers)
>> > - DROP SAVEPOINT 
>> >
>> > cc @Jark @ShengKai @Martijn @Timo .
>> >
>> > Best,
>> > Paul Lam
>> >
>> >
>> > godfrey he  于2022年5月23日周一 21:34写道:
>> >
>> >> Hi Paul,
>> >>
>> >> Thanks for the update.
>> >>
>> >> >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>> >> (DataStream or SQL) or
>> >> clients (SQL client or CLI).
>> >>
>> >> Is DataStream job a QUERY? I think not.
>> >> For a QUERY, the most important concept is the statement. But the
>> >> result does not contain this info.
>> >> If we need to contain all jobs in the clu

[jira] [Created] (FLINK-27958) Compare batch maxKey to reduce comparisons in SortMergeReader

2022-06-08 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-27958:


 Summary: Compare batch maxKey to reduce comparisons in 
SortMergeReader
 Key: FLINK-27958
 URL: https://issues.apache.org/jira/browse/FLINK-27958
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.2.0


In SortMergeReader, each sub reader is batched reader.

When adding a new batch to the priority queue, we can look at the maximum key 
of the batch, and if its maximum key is smaller than the minimum key of other 
batches, then we can just output the whole batch.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-08 Thread Jark Wu
Hi Jing,

Regarding JOBTREE (job lineage), I agree with Paul that this is out of the
scope
 of this FLIP and can be discussed in another FLIP.

Job lineage is a big topic that may involve many problems:
1) how to collect and report job entities, attributes, and lineages?
2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
3) how does Flink SQL CLI/Gateway know the lineage information and show
jobtree?
4) ...

Best,
Jark

On Wed, 8 Jun 2022 at 20:44, Jark Wu  wrote:

> Hi Paul,
>
> I'm fine with using JOBS. The only concern is that this may conflict with
> displaying more detailed
> information for query (e.g. query content, plan) in the future, e.g. SHOW
> QUERIES EXTENDED in ksqldb[1].
> This is not a big problem as we can introduce SHOW QUERIES in the future
> if necessary.
>
> > STOP JOBS  (with options `table.job.stop-with-savepoint` and
> `table.job.stop-with-drain`)
> What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
> It might be trivial and error-prone to set configuration before executing
> a statement,
> and the configuration will affect all statements after that.
>
> > CREATE SAVEPOINT  FOR JOB 
> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
> and always use configuration "state.savepoints.dir" as the default
> savepoint dir.
> The concern with using "" is here should be savepoint dir,
> and savepoint_path is the returned value.
>
> I'm fine with other changes.
>
> Thanks,
> Jark
>
> [1]:
> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>
>
>
>
> On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:
>
>> Hi Jing,
>>
>> Thank you for your inputs!
>>
>> TBH, I haven’t considered the ETL scenario that you mentioned. I think
>> they’re managed just like other jobs interns of job lifecycles (please
>> correct me if I’m wrong).
>>
>> WRT to the SQL statements about SQL lineages, I think it might be a
>> little bit out of the scope of the FLIP, since it’s mainly about
>> lifecycles. By the way, do we have these functionalities in Flink CLI or
>> REST API already?
>>
>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the
>> community is more in favor of `DROP SAVEPOINT `. I’m
>> updating the FLIP arcading to the latest discussions.
>>
>> Best,
>> Paul Lam
>>
>> 2022年6月8日 07:31,Jing Ge  写道:
>>
>> Hi Paul,
>>
>> Sorry that I am a little bit too late to join this thread. Thanks for
>> driving this and starting this informative discussion. The FLIP looks
>> really interesting. It will help us a lot to manage Flink SQL jobs.
>>
>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs
>> build a DAG for many DAGs?
>>
>> 1)
>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to
>> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are
>> responsible to *produce* data as the result(cube, materialized view, etc.)
>> for the future consumption by queries. The INSERT INTO SELECT FROM example
>> in FLIP and CTAS are typical SQL in this case. I would prefer to call them
>> Jobs instead of Queries.
>>
>> 2)
>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to
>> support syntax like:
>>
>> SHOW JOBTREE   // shows the downstream DAG from the given job_id
>> SHOW JOBTREE  FULL // shows the whole DAG that contains the given
>> job_id
>> SHOW JOBTREES // shows all DAGs
>> SHOW ANCIENTS  // shows all parents of the given job_id
>>
>> 3)
>> Could we also support Savepoint housekeeping syntax? We ran into this
>> issue that a lot of savepoints have been created by customers (via their
>> apps). It will take extra (hacking) effort to clean it.
>>
>> RELEASE SAVEPOINT ALL
>>
>> Best regards,
>> Jing
>>
>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser 
>> wrote:
>>
>>> Hi Paul,
>>>
>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES
>>> could
>>> imply that this will actually show the query, but we're returning IDs of
>>> the running application. At first I was also not very much in favour of
>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS
>>>
>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1]
>>>
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary
>>>
>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam :
>>>
>>> > Hi Godfrey,
>>> >
>>> > Sorry for the late reply, I was on vacation.
>>> >
>>> > It looks like we have a variety of preferences on the syntax, how
>>> about we
>>> > choose the most acceptable one?
>>> >
>>> > WRT keyword for SQL jobs, we use JOBS, thus the statements related to
>>> jobs
>>> > would be:
>>> >
>>> > - SHOW JOBS
>>> > - STOP JOBS  (with options `table.job.stop-with-savepoint` and
>>> > `table.job.stop-with-drain`)
>>> >
>>> > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR
>>> > JOB`:
>>> >
>>> > - CREATE SAVEPOINT  F

[jira] [Created] (FLINK-27959) Use ResolvedSchema in flink-avro instead of TableSchema

2022-06-08 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-27959:
---

 Summary: Use ResolvedSchema in flink-avro instead of TableSchema
 Key: FLINK-27959
 URL: https://issues.apache.org/jira/browse/FLINK-27959
 Project: Flink
  Issue Type: Technical Debt
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / API
Reporter: Sergey Nuyanzin


{{TableSchema}} is deprecated 
It is recommended to use {{ResolvedSchema}} and {{Schema}} in {{TableSchema}} 
javadoc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.0.0 released

2022-06-08 Thread Yun Tang
Good news!
Thanks Yang for driving the 1.0.0 release and all guys who ever contributed to 
make this release happen.

Best
Yun Tang

From: Nicholas Jiang 
Sent: Wednesday, June 8, 2022 19:40
To: dev@flink.apache.org 
Subject: Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.0.0 released

Congrats! Thanks Yang for driving the 1.0.0 release, and thanks to all 
contribution.

Best,
Nicholas Jiang

On 2022/06/08 10:54:37 Lijie Wang wrote:
> Congrats! Thanks Yang for driving the release, and thanks to all
> contributors!
>
> Best,
> Lijie
>
> John Gerassimou  于2022年6月6日周一 22:38写道:
>
> > Thank you for all your efforts!
> >
> > Thanks
> > John
> >
> > On Sun, Jun 5, 2022 at 10:33 PM Aitozi  wrote:
> >
> >> Thanks Yang and Nice to see it happen.
> >>
> >> Best,
> >> Aitozi.
> >>
> >> Yang Wang  于2022年6月5日周日 16:14写道:
> >>
> >>> The Apache Flink community is very happy to announce the release of
> >>> Apache Flink Kubernetes Operator 1.0.0.
> >>>
> >>> The Flink Kubernetes Operator allows users to manage their Apache Flink
> >>> applications and their lifecycle through native k8s tooling like kubectl.
> >>> This is the first production ready release and brings numerous
> >>> improvements and new features to almost every aspect of the operator.
> >>>
> >>> Please check out the release blog post for an overview of the release:
> >>>
> >>> https://flink.apache.org/news/2022/06/05/release-kubernetes-operator-1.0.0.html
> >>>
> >>> The release is available for download at:
> >>> https://flink.apache.org/downloads.html
> >>>
> >>> Maven artifacts for Flink Kubernetes Operator can be found at:
> >>>
> >>> https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator
> >>>
> >>> Official Docker image for Flink Kubernetes Operator applications can be
> >>> found at:
> >>> https://hub.docker.com/r/apache/flink-kubernetes-operator
> >>>
> >>> The full release notes are available in Jira:
> >>>
> >>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351500
> >>>
> >>> We would like to thank all contributors of the Apache Flink community
> >>> who made this release possible!
> >>>
> >>> Regards,
> >>> Gyula & Yang
> >>>
> >>
>


[jira] [Created] (FLINK-27960) Make the apt-get updating optional

2022-06-08 Thread Aitozi (Jira)
Aitozi created FLINK-27960:
--

 Summary: Make the apt-get updating optional 
 Key: FLINK-27960
 URL: https://issues.apache.org/jira/browse/FLINK-27960
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi
 Attachments: image-2022-06-08-22-06-44-586.png

I notice that it cost much time to do the apt-get updating, it's not necessary 
during development, So I think it will be convenient to add an option to 
control the image building to skip some unnecessary stage 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27961) The EventUtils generate event name should take the resource's uid into account

2022-06-08 Thread Aitozi (Jira)
Aitozi created FLINK-27961:
--

 Summary: The EventUtils generate event name should take the 
resource's uid into account
 Key: FLINK-27961
 URL: https://issues.apache.org/jira/browse/FLINK-27961
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Aitozi


Currently the event name do not include the uid of the target resource. If a 
resource is recreated, it will be associated with the former object's events. 
It's not expected and will be confusing with the empty events when describe the 
resource.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-224: Blacklist Mechanism

2022-06-08 Thread Zhu Zhu
Thanks Chesnay for the feedback in the vote thread.
(https://lists.apache.org/thread/opc7jg3rpxnwotkb0fcn4wnm02m4397o)

I'd like to continue the discussion in this thread so that the
discussions can be better tracked.

Regarding your questions, here are my thoughts:
1. The current locality mechanism does not work well to avoid
deploying tasks to slow nodes because it cannot proactively reject or
release slots from the slow nodes. And it cannot help at the resource
manager side to avoid allocating free slots or launching new
TaskManagers on slow nodes.
2. Dynamically downscaling or upscaling a batch job is usually
unacceptable because it means to re-run the whole stage.
3. Extending the requirement declaration to have a notion of
"undesirable nodes" is an option. And it is actually how we started.
Considering implementation details, we found we need that
- a tracker to collect all the undesirable(detected slow nodes)
- to filter out slots from undesirable nodes when allocating slots
from the SlotPool
- ask the ResourceManager for slots that are not on the undesirable
nodes. The ResourceManager may further need to ask for new
TaskManagers that are not on the undesirable nodes.
Then, with all these functionality, we found that we almost have a
blocklist mechanism. As blocklist mechanism is a common concept and is
possible to benefit users, we took this chance to propose the
blocklist mechanism.
4. A cluster-wide shared blocklist is not a must for speculative
execution at the moment. It is mainly part of a standalone blocklist
feature to host user specified block items. To avoid sharing job
specific blocked items between jobs, one way is to add a nullable
JobID tag for the blocked item.

Thanks,
Zhu

Zhu Zhu  于2022年6月7日周二 10:58写道:
>
> Hi Chesnay,
>
> Would you please take a look at the FLIP and discussion to see if all
> your concerns have been addressed?
>
> Thanks,
> Zhu
>
> Zhu Zhu  于2022年5月28日周六 13:26写道:
> >
> > Regarding the concern of the SlotManager, my two cents:
> > 1. it is necessary for the SlotManager to host blocked slots, in 2 cases:
> >   a. In standalone mode, a taskmanager may be temporarily added to
> > the blocklist. We do not want the TM to get disconnected and shut down.
> > So we need to keep its connection to RM and keep hosting its slots.
> >   b. When we want to avoid allocating slots to a slow nodes but do not
> > want to kill current running tasks on the nodes (MARK_BLOCKED mode).
> >
> > There is possible a way to keep the connection of a blocked task manager
> > while hide its slots from SlotManager, but I feel it may be even much more
> > complicated.
> >
> > 2. It will not complicate the SlotManager too much. The SlotManager will
> > be offered a BlockedTaskManagerChecker when created, and just need
> > to use it to filter out blocked slots on slot request. Therefore I think the
> > complication is acceptable.
> >
> > Thanks,
> > Zhu
> >
> > Lijie Wang  于2022年5月25日周三 15:26写道:
> > >
> > > Hi everyone,
> > >
> > > I've updated the FLIP according to Chesnay's feedback, changes as follows:
> > > 1. Change the GET result to a map.
> > > 2. Only left *endTimestamp* in ADD operation, and change the rest (from
> > > POST) to PUT
> > > 3. Introduce a new slot pool implementation(BlocklistSlotPool) to
> > > encapsulate blocklist related functions.
> > > 4. Remove *mainThread* from BlocklistTracker, instead provide a
> > > *removeTimeoutItems* method to be called by outside components。
> > >
> > > Best,
> > > Lijie
> > >
> > > Lijie Wang  于2022年5月23日周一 22:51写道:
> > >
> > > > Hi Chesnay,
> > > >
> > > > Thanks for feedback.
> > > >
> > > > 1. Regarding the TM/Node id. Do you mean special characters may appear 
> > > > in
> > > > the rest URL?  Actually, I don't think so. The task manager id in REST 
> > > > API
> > > > should be the *ResourceID* of taskmanager in Flink, there should be no
> > > > special characters, and some existing REST APIs are already using it, 
> > > > e.g.
> > > > GET: http://{jm_rest_address:port}/taskmanagers/. The 
> > > > node
> > > > id should be an IP of a machine or node name in Yarn/Kubernetes, I 
> > > > think it
> > > > should also have no special characters.
> > > > 2. Regarding the GET query responses. I agree with you, it makes sense 
> > > > to
> > > > change the GET result to a map.
> > > >
> > > > 3. Regarding the endTimestamp.  I also agree with you, endTimestamp can
> > > > cover everything, and the endTimestamp is a unix timestamp, there 
> > > > should be
> > > > no timezone issues. But I think PUT and DELETE are enough, no PATCH.  
> > > > The
> > > > add rest api is add or update, PUT can cover this semantics.
> > > >
> > > > 4. Regarding the slot pool/manager. I don't think the current slotpool
> > > > and slotmanager are able to support the MARK_BLOCKED(slots that are
> > > > already allocated will not be affected) action. The reasons are as
> > > > follows:
> > > >
> > > > a) for slot pool, with the MARK_BLOCKED action, when a slot state 
> > >

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-08 Thread Jing Ge
Hi Paul, Hi Jark,

Re JOBTREE, agree that it is out of the scope of this FLIP

Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP
SAVEPOINT ALL' housekeeping. WDYT?

Best regards,
Jing


On Wed, Jun 8, 2022 at 2:54 PM Jark Wu  wrote:

> Hi Jing,
>
> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the
> scope
>  of this FLIP and can be discussed in another FLIP.
>
> Job lineage is a big topic that may involve many problems:
> 1) how to collect and report job entities, attributes, and lineages?
> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
> 3) how does Flink SQL CLI/Gateway know the lineage information and show
> jobtree?
> 4) ...
>
> Best,
> Jark
>
> On Wed, 8 Jun 2022 at 20:44, Jark Wu  wrote:
>
>> Hi Paul,
>>
>> I'm fine with using JOBS. The only concern is that this may conflict with
>> displaying more detailed
>> information for query (e.g. query content, plan) in the future, e.g. SHOW
>> QUERIES EXTENDED in ksqldb[1].
>> This is not a big problem as we can introduce SHOW QUERIES in the future
>> if necessary.
>>
>> > STOP JOBS  (with options `table.job.stop-with-savepoint` and
>> `table.job.stop-with-drain`)
>> What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
>> It might be trivial and error-prone to set configuration before executing
>> a statement,
>> and the configuration will affect all statements after that.
>>
>> > CREATE SAVEPOINT  FOR JOB 
>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
>> and always use configuration "state.savepoints.dir" as the default
>> savepoint dir.
>> The concern with using "" is here should be savepoint
>> dir,
>> and savepoint_path is the returned value.
>>
>> I'm fine with other changes.
>>
>> Thanks,
>> Jark
>>
>> [1]:
>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>>
>>
>>
>>
>> On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:
>>
>>> Hi Jing,
>>>
>>> Thank you for your inputs!
>>>
>>> TBH, I haven’t considered the ETL scenario that you mentioned. I think
>>> they’re managed just like other jobs interns of job lifecycles (please
>>> correct me if I’m wrong).
>>>
>>> WRT to the SQL statements about SQL lineages, I think it might be a
>>> little bit out of the scope of the FLIP, since it’s mainly about
>>> lifecycles. By the way, do we have these functionalities in Flink CLI or
>>> REST API already?
>>>
>>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the
>>> community is more in favor of `DROP SAVEPOINT `. I’m
>>> updating the FLIP arcading to the latest discussions.
>>>
>>> Best,
>>> Paul Lam
>>>
>>> 2022年6月8日 07:31,Jing Ge  写道:
>>>
>>> Hi Paul,
>>>
>>> Sorry that I am a little bit too late to join this thread. Thanks for
>>> driving this and starting this informative discussion. The FLIP looks
>>> really interesting. It will help us a lot to manage Flink SQL jobs.
>>>
>>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs
>>> build a DAG for many DAGs?
>>>
>>> 1)
>>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how
>>> to support ETL jobs. Briefly speaking, SQLs that used to build the DAG are
>>> responsible to *produce* data as the result(cube, materialized view, etc.)
>>> for the future consumption by queries. The INSERT INTO SELECT FROM example
>>> in FLIP and CTAS are typical SQL in this case. I would prefer to call them
>>> Jobs instead of Queries.
>>>
>>> 2)
>>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to
>>> support syntax like:
>>>
>>> SHOW JOBTREE   // shows the downstream DAG from the given job_id
>>> SHOW JOBTREE  FULL // shows the whole DAG that contains the
>>> given job_id
>>> SHOW JOBTREES // shows all DAGs
>>> SHOW ANCIENTS  // shows all parents of the given job_id
>>>
>>> 3)
>>> Could we also support Savepoint housekeeping syntax? We ran into this
>>> issue that a lot of savepoints have been created by customers (via their
>>> apps). It will take extra (hacking) effort to clean it.
>>>
>>> RELEASE SAVEPOINT ALL
>>>
>>> Best regards,
>>> Jing
>>>
>>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser 
>>> wrote:
>>>
 Hi Paul,

 I'm still doubting the keyword for the SQL applications. SHOW QUERIES
 could
 imply that this will actually show the query, but we're returning IDs of
 the running application. At first I was also not very much in favour of
 SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
 jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP
 JOBS

 Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.

 Best regards,

 Martijn

 [1]

 https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary

 Op za 4 jun. 2022 om 10:38 schreef Paul Lam :

 > Hi Godfrey,
 >
 > Sorry for the late reply, I was on vacation.
 >
 > It looks like we have a variety of preferences on the syntax, 

[jira] [Created] (FLINK-27962) KafkaSourceReader fails to commit consumer offsets for checkpoints

2022-06-08 Thread Dmytro (Jira)
Dmytro created FLINK-27962:
--

 Summary: KafkaSourceReader fails to commit consumer offsets for 
checkpoints
 Key: FLINK-27962
 URL: https://issues.apache.org/jira/browse/FLINK-27962
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.14.4, 1.15.0
Reporter: Dmytro


The KafkaSourceReader works well for many hours, then fails and re-connects 
successfully, then continues to work some time. After the first three failures 
it hangs on "Offset commit failed" and never connected again. Restarting the 
Flink job does help and it works until the next "3 times fail".

*Failed to commit consumer offsets for checkpoint:*
{code:java}
Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
coordinator is not available.
2022-06-06 14:19:52,297 WARN  
org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to 
commit consumer offsets for checkpoint 464521
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit 
failed with a retriable exception. You should retry committing the latest 
consumed offsets.

Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
coordinator is not available.
2022-06-06 14:20:02,297 WARN  
org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to 
commit consumer offsets for checkpoint 464522
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit 
failed with a retriable exception. You should retry committing the latest 
consumed offsets.

Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
coordinator is not available.
2022-06-06 14:20:02,297 WARN  
org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed to 
commit consumer offsets for checkpoint 464523
org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit 
failed with a retriable exception. You should retry committing the latest 
consumed offsets

. fails permanently until the job restart
 {code}

*Consumer Config:*
{code:java}
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = none
bootstrap.servers = [test.host.net:9093]
check.crcs = true
client.dns.lookup = use_all_dns_ips
client.id = test-client-id
client.rack =
connections.max.idle.ms = 18
default.api.timeout.ms = 6
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = test-group-id
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
internal.throw.on.fetch.stable.offset.unsupported = false
isolation.level = read_uncommitted
key.deserializer = class 
org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 30
max.poll.records = 500
metadata.max.age.ms = 18
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 3
partition.assignment.strategy = [class 
org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 6
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = [hidden]
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 6
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = class 
com.test.kafka.security.AzureAuthenticateCallbackHandler
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = OAUTHBEARER
security.protocol = SASL_SSL
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 3
socket.connection.setup.timeout.max.ms = 3
socket.connection.setup.timeout.ms = 1
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2]
ssl.endpoint.identification.algorithm = https
ssl.engine.factory.class = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.certificate.chain = null
ssl.keystore.key = null
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLSv1.2
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.certificates = null
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class 
org.apache.kafka.common.serialization.ByteArrayDeserializer {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27963) FlinkRuntimeException in KafkaSink causes a Flink job to hang

2022-06-08 Thread Dmytro (Jira)
Dmytro created FLINK-27963:
--

 Summary: FlinkRuntimeException in KafkaSink causes a Flink job to 
hang
 Key: FLINK-27963
 URL: https://issues.apache.org/jira/browse/FLINK-27963
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.14.4, 1.15.0
Reporter: Dmytro


If FlinkRuntimeException occurs in the 
[KafkaSink|https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#kafka-sink]
 then the Flink job tries to re-send failed data  again and gets into endless 
loop "exception->send again"

*Code sample which throws the FlinkRuntimeException:*

{code:java}
int numberOfRows = 1;
int rowsPerSecond = 1;

DataStream stream = environment.addSource(
new DataGeneratorSource<>(
RandomGenerator.stringGenerator(105), // 
max.message.bytes=1048588
rowsPerSecond,
(long) numberOfRows),
TypeInformation.of(String.class))
.setParallelism(1)
.name("string-generator");


KafkaSinkBuilder builder = KafkaSink.builder()
.setBootstrapServers("localhost:9092")
.setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)
.setRecordSerializer(

KafkaRecordSerializationSchema.builder().setTopic("test.output")
.setValueSerializationSchema(new 
SimpleStringSchema())
.build());


KafkaSink sink = builder.build();

stream.sinkTo(sink).setParallelism(1).name("output-producer"); {code}
*Exception Stack Trace:*
{code:java}
2022-06-02/14:01:45.066/PDT [flink-akka.actor.default-dispatcher-4] INFO 
output-producer: Writer -> output-producer: Committer (1/1) 
(a66beca5a05c1c27691f7b94ca6ac025) switched from RUNNING to FAILED on 
271b1b90-7d6b-4a34-8116-3de6faa8a9bf @ 127.0.0.1 (dataPort=-1). 
org.apache.flink.util.FlinkRuntimeException: Failed to send data to Kafka null 
with FlinkKafkaInternalProducer{transactionalId='null', inTransaction=false, 
closed=false} at 
org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.throwException(KafkaWriter.java:440)
 ~[flink-connector-kafka-1.15.0.jar:1.15.0] at 
org.apache.flink.connector.kafka.sink.KafkaWriter$WriterCallback.lambda$onCompletion$0(KafkaWriter.java:421)
 ~[flink-connector-kafka-1.15.0.jar:1.15.0] at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
 ~[flink-streaming-java-1.15.0.jar:1.15.0] at 
org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) 
~[flink-streaming-java-1.15.0.jar:1.15.0] at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsNonBlocking(MailboxProcessor.java:353)
 ~[flink-streaming-java-1.15.0.jar:1.15.0] at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
 ~[flink-streaming-java-1.15.0.jar:1.15.0] at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:201)
 ~[flink-streaming-java-1.15.0.jar:1.15.0] at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
 ~[flink-streaming-java-1.15.0.jar:1.15.0] at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753) 
~[flink-streaming-java-1.15.0.jar:1.15.0] at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
 ~[flink-runtime-1.15.0.jar:1.15.0] at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) 
~[flink-runtime-1.15.0.jar:1.15.0] at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741) 
~[flink-runtime-1.15.0.jar:1.15.0] at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:563) 
~[flink-runtime-1.15.0.jar:1.15.0] at java.lang.Thread.run(Thread.java:748) 
~[?:1.8.0_292] Caused by: 
org.apache.kafka.common.errors.RecordTooLargeException: The message is 1050088 
bytes when serialized which is larger than 1048576, which is the value of the 
max.request.size configuration. {code}
**



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-08 Thread Xintong Song
Having ci & benchmark reports sent to dedicated slack channels totally
makes sense. +1 to this.


Best,

Xintong



On Wed, Jun 8, 2022 at 6:57 PM Yuan Mei  wrote:

> I was thinking of the same thing about adding a dedicated channel for the
> daily performance micro bench.
>
> I was even thinking of asking volunteers to watch for the daily performance
> micro bench.
>
> Roman and I are drafting instructions on how to monitor performance
> regression right now.
>
> But I think this is a different topic and I will start a new thread to
> discuss this once the details are finalized.
>
> So, +1 for a dedicated channel for the daily performance micro bench
> notice.
>
> Best,
>
> Yuan
>
> On Wed, Jun 8, 2022 at 11:38 AM Xingbo Huang  wrote:
>
> > Thanks everyone for the great effort driving this.
> >
> > +1 for building a dedicated channel for CI builds.
> >
> > Best,
> > Xingbo
> >
> > Martijn Visser  于2022年6月8日周三 01:32写道:
> >
> > > Hi Matthias,
> > >
> > > I've only thought about it, but I absolutely would +1 this. Making this
> > > information available in dedicated channels will only improve things.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > Op di 7 jun. 2022 om 18:12 schreef Matthias Pohl :
> > >
> > > > Thanks for driving the Slack channel efforts and setting everything
> up.
> > > :-)
> > > >
> > > > I'm wondering whether we should extend the Slack space to also have a
> > > > channel dedicated for CI builds to enable the release managers to
> > monitor
> > > > CI builds on master and the release branches through the Apache Flink
> > > Slack
> > > > workspace. It might help us in being more transparent with the
> release
> > > > process. The same goes for the Flink performance tests [1].
> > > >
> > > > Are there plans around that already? I couldn't find anything related
> > in
> > > > [2]. Or is it worth opening another thread around that topic.
> > > >
> > > > Best,
> > > > Matthias
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115511847
> > > > [2]
> https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
> > > > On Tue, Jun 7, 2022 at 10:39 AM Jing Ge  wrote:
> > > >
> > > > > got it, thanks Martijn!
> > > > >
> > > > > Best regards,
> > > > > Jing
> > > > >
> > > > >
> > > > > On Tue, Jun 7, 2022 at 10:38 AM Martijn Visser <
> > > martijnvis...@apache.org
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Jing,
> > > > > >
> > > > > > Yes, since this Flink Community workspace is treated as one
> > company.
> > > So
> > > > > > everyone you want to invite, should considered a 'coworker'.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Martijn
> > > > > >
> > > > > > Op za 4 jun. 2022 om 20:02 schreef Jing Ge :
> > > > > >
> > > > > >> Hi Xingtong,
> > > > > >>
> > > > > >> While inviting new members, there are two options: "From another
> > > > > company"
> > > > > >> vs "Your coworker". In this case, we should always choose "Your
> > > > > coworker"
> > > > > >> to add new members to the Apache Flink workspace, right?
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Jing
> > > > > >>
> > > > > >> On Fri, Jun 3, 2022 at 1:10 PM Yuan Mei  >
> > > > wrote:
> > > > > >>
> > > > > >>> Thanks, Xintong and Jark the great effort driving this, and
> > > everyone
> > > > > for
> > > > > >>> making this possible.
> > > > > >>>
> > > > > >>> I've also Twittered this announcement on our Apache Flink
> Twitter
> > > > > >>> account.
> > > > > >>>
> > > > > >>> Best
> > > > > >>>
> > > > > >>> Yuan
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Fri, Jun 3, 2022 at 12:54 AM Jing Ge 
> > > wrote:
> > > > > >>>
> > > > >  Thanks everyone for your effort!
> > > > > 
> > > > >  Best regards,
> > > > >  Jing
> > > > > 
> > > > >  On Thu, Jun 2, 2022 at 4:17 PM Martijn Visser <
> > > > > martijnvis...@apache.org>
> > > > >  wrote:
> > > > > 
> > > > > > Thanks everyone for joining! It's good to see so many have
> > joined
> > > > in
> > > > > > such a short time already. I've just refreshed the link which
> > you
> > > > can
> > > > > > always find on the project website [1]
> > > > > >
> > > > > > Best regards, Martijn
> > > > > >
> > > > > > [1] https://flink.apache.org/community.html#slack
> > > > > >
> > > > > > Op do 2 jun. 2022 om 11:42 schreef Jingsong Li <
> > > > > jingsongl...@gmail.com
> > > > > > >:
> > > > > >
> > > > > >> Thanks Xingtong, Jark, Martijn and Robert for making this
> > > > possible!
> > > > > >>
> > > > > >> Best,
> > > > > >> Jingsong
> > > > > >>
> > > > > >>
> > > > > >> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu 
> > > wrote:
> > > > > >>
> > > > > >>> Thank Xingtong for making this possible!
> > > > > >>>
> > > > > >>> Cheers,
> > > > > >>> Jark Wu
> > > > > >>>
> > > > > >>> On Thu, 2 Jun 2022 at 15:31, Xintong Song <
> > > tonysong...@gmail.com
>

Re: [DISCUSS] FLIP-228: Support Within between events in CEP Pattern

2022-06-08 Thread Nicholas Jiang
Hi Dian,

About the indication of the time interval between events matched in the loop. I 
have updated the FLIP and introduced a series of times interface to specify 
that this pattern can occur the specified times and interval corresponds to the 
maximum time gap between previous and next event for each times. 

The within(withinType, windowTime) is used to configure the same time of the 
matching window for each times, but the times(int times, windowTimes) can 
configure the different time interval corresponds to the maximum time gap 
between previous and next event for each times, which is fully considered for 
time interval between events matched in the loop or times case.

Best,
Nicholas Jiang

On 2022/06/08 08:11:58 Nicholas Jiang wrote:
> Hi Dian,
> 
> Thanks for your feedback about the Public Interface update for supporting the 
> within between events feature. I have left the comments for above points:
> 
> - Regarding the pattern API, should we also introduce APIs such as 
> Pattern.times(int from, int to, Time windowTime) to indicate the time 
> interval between events matched in the loop?
> 
> IMO, we could not introduce the mentioned APIs for indication of the time 
> interval between events. For example Pattern.times(int from, int to, Time 
> windowTime), the user can use Pattern.times(int from, int 
> to).within(BEFORE_AND_AFTER, windowTime) to indicate the time interval 
> between the before and after event.
> 
> - Regarding the naming of the classes, does it make sense to rename 
> `WithinType` to `InternalType` or `WindowType`? For the enum values inside 
> it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not 
> intuitive for me. The candidates that come to my mind: - `RELATIVE_TO_FIRST` 
> and `RELATIVE_TO_PREVIOUS` - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> 
> IMO, the `WithinType` naming could directly the situation for the time 
> interval. In addtion. the enum values of the `WithinType` could update to 
> `PREVIOUS_AND_NEXT` and `FIRST_AND_LAST` which directly indicate the time 
> interval within the PREVIOUS and NEXT event and within the FIRST and LAST 
> event. `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS` are not clear to 
> understand which event is relative to FIRST or PREVIOUS event.
> 
> Best,
> Nicholas Jiang
> 
> On 2022/06/06 07:48:22 Dian Fu wrote:
> > Hi Nicholas,
> > 
> > Thanks a lot for the update.
> > 
> > Regarding the pattern API, should we also introduce APIs such as
> > Pattern.times(int from, int to, Time windowTime) to indicate the time
> > interval between events matched in the loop?
> > 
> > Regarding the naming of the classes, does it make sense to rename
> > `WithinType` to `InternalType` or `WindowType`? For the enum values inside
> > it, the current values(`BEFORE_AND_AFTER` and `FIRST_AND_LAST`) are not
> > intuitive for me. The candidates that come to my mind:
> > - `RELATIVE_TO_FIRST` and `RELATIVE_TO_PREVIOUS`
> > - `WHOLE_MATCH` and `RELATIVE_TO_PREVIOUS`
> > 
> > Regards,
> > Dian
> > 
> > On Tue, May 31, 2022 at 2:56 PM Nicholas Jiang 
> > wrote:
> > 
> > > Hi Martijn,
> > >
> > > Sorry for later reply. This feature is only supported in DataStream and
> > > doesn't be supported in MATCH_RECOGNIZE because the SQL syntax of
> > > MATCH_RECOGNIZE does not contain the semantics of this feature, which
> > > requires modification of the SQL syntax. The support above MATCH_RECOGNIZE
> > > is suitable for new FLIP to discuss.
> > >
> > > Regards,
> > > Nicholas Jiang
> > >
> > > On 2022/05/25 11:36:33 Martijn Visser wrote:
> > > > Hi Nicholas,
> > > >
> > > > Thanks for creating the FLIP, I can imagine that there will be many use
> > > > cases who can be created using this new feature.
> > > >
> > > > The FLIP doesn't mention anything with regards to SQL, could this 
> > > > feature
> > > > also be supported when using MATCH_RECOGNIZE?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > >
> > > > On Sat, 7 May 2022 at 11:17, Dian Fu  wrote:
> > > >
> > > > > Hi Nicholas,
> > > > >
> > > > > Thanks a lot for bringing up this discussion. If I recall it 
> > > > > correctly,
> > > > > this feature has been requested many times by the users and is among
> > > one of
> > > > > the most requested features in CEP. So big +1 to this feature overall.
> > > > >
> > > > > Regarding the API, the name `partialWithin` sounds a little weird. Is
> > > it
> > > > > possible to find a name which is more intuitive? Other possible
> > > solutions:
> > > > > - Reuse the existing `Pattern.within` method and change its semantic
> > > to the
> > > > > maximum time interval between patterns. Currently `Pattern.within` is
> > > used
> > > > > to define the maximum time interval between the first event and the
> > > last
> > > > > event. However, the Pattern object represents only one node in a
> > > pattern
> > > > > sequence and so it doesn't make much sense t

Re: [DISCUSS] Releasing 1.15.1

2022-06-08 Thread LuNing Wang
Hi David,

+1
Thank you for driving this.

Best regards,
LuNing Wang

Jing Ge  于2022年6月8日周三 20:45写道:

> +1
>
> Thanks David for driving it!
>
> Best regards,
> Jing
>
>
> On Wed, Jun 8, 2022 at 2:32 PM Xingbo Huang  wrote:
>
> > Hi David,
> >
> > +1
> > Thank you for driving this.
> >
> > Best,
> > Xingbo
> >
> > Chesnay Schepler  于2022年6月8日周三 18:41写道:
> >
> > > +1
> > >
> > > Thank you for proposing this. I can take care of the PMC-side of
> things.
> > >
> > > On 08/06/2022 12:37, Jingsong Li wrote:
> > > > +1
> > > >
> > > > Thanks David for volunteering to manage the release.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
> > > >> Hi David, thank you for driving the release.
> > > >>
> > > >> +1 for the 1.15.1 release. The release-1.15 branch
> > > >> already contains many bug fixes and some SQL
> > > >> issues are quite critical.
> > > >>
> > > >> Btw, FLINK-27606 has been merged just now.
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >>
> > > >> On Wed, 8 Jun 2022 at 17:40, David Anderson 
> > > wrote:
> > > >>
> > > >>> I would like to start a discussion on releasing 1.15.1. Flink 1.15
> > was
> > > >>> released on the 5th of May [1] and so far 43 issues have been
> > resolved,
> > > >>> including several user-facing issues with blocker and critical
> > > priorities
> > > >>> [2]. (The recent problem with FileSink rolling policies not working
> > > >>> properly in 1.15.0 got me thinking it might be time for bug-fix
> > > release.)
> > > >>>
> > > >>> There currently remain 16 unresolved tickets with a fixVersion of
> > > 1.15.1
> > > >>> [3], five of which are about CI infrastructure and tests. There is
> > > only one
> > > >>> such ticket marked Critical:
> > > >>>
> > > >>> https://issues.apache.org/jira/browse/FLINK-27492 Flink table
> scala
> > > >>> example
> > > >>> does not including the scala-api jars
> > > >>>
> > > >>> I'm not convinced we should hold up a release for this issue, but
> on
> > > the
> > > >>> other hand, it seems that this issue can be resolved by making a
> > > decision
> > > >>> about how to handle the missing dependencies. @Timo Walther
> > > >>>  @yun_gao can you give an update?
> > > >>>
> > > >>> Two other open issues seem to have made significant progress
> (listed
> > > >>> below). Would it make sense to wait for either of these? Are there
> > any
> > > >>> other open tickets we should consider waiting for?
> > > >>>
> > > >>> https://issues.apache.org/jira/browse/FLINK-27420 Suspended
> > > SlotManager
> > > >>> fail to reregister metrics when started again
> > > >>> https://issues.apache.org/jira/browse/FLINK-27606 CompileException
> > > when
> > > >>> using UDAF with merge() method
> > > >>>
> > > >>> I would volunteer to manage the release. Is there a PMC member who
> > > would
> > > >>> join me to help, as needed?
> > > >>>
> > > >>> Best,
> > > >>> David
> > > >>>
> > > >>> [1]
> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > > >>>
> > > >>> [2]
> > > >>>
> > > >>>
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > > >>>
> > > >>> [3]
> > > >>>
> > > >>>
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > > >>>
> > >
> > >
> >
>


Re: [VOTE] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

2022-06-08 Thread Jingsong Li
+1 (binding)

Best,
Jingsong

On Tue, Jun 7, 2022 at 5:21 PM Jark Wu  wrote:
>
> +1 (binding)
>
> Best,
> Jark
>
> On Tue, 7 Jun 2022 at 12:17, Lincoln Lee  wrote:
>
> > Dear Flink developers,
> >
> > Thanks for all your feedback for FLIP-234: Support Retryable Lookup Join To
> > Solve Delayed Updates Issue In External Systems[1] on the discussion
> > thread[2].
> >
> > I'd like to start a vote for it. The vote will be open for at least 72
> > hours unless there is an objection or not enough votes.
> >
> > [1]
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> >
> > Best,
> > Lincoln Lee
> >


Re: [VOTE] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-06-08 Thread Jingsong Li
+1 (binding)

Best,
Jingsong

On Tue, Jun 7, 2022 at 5:20 PM Jark Wu  wrote:
>
> +1 (binding)
>
> Best,
> Jark
>
> On Tue, 7 Jun 2022 at 13:44, Jing Ge  wrote:
>
> > Hi Godfrey,
> >
> > +1 (non-binding)
> >
> > Best regards,
> > Jing
> >
> >
> > On Tue, Jun 7, 2022 at 4:42 AM godfrey he  wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for all the feedback so far. Based on the discussion[1] we seem
> > > to have consensus, so I would like to start a vote on FLIP-231 for
> > > which the FLIP has now also been updated[2].
> > >
> > > The vote will last for at least 72 hours (Jun 10th 12:00 GMT) unless
> > > there is an objection or insufficient votes.
> > >
> > > [1] https://lists.apache.org/thread/88kxk7lh8bq2s2c2qrf06f3pnf9fkxj2
> > > [2]
> > >
> > https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
> > >
> > > Best,
> > > Godfrey
> > >
> >


Re: [VOTE] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

2022-06-08 Thread godfrey he
+1

Best,
Godfrey

Jingsong Li  于2022年6月9日周四 10:26写道:
>
> +1 (binding)
>
> Best,
> Jingsong
>
> On Tue, Jun 7, 2022 at 5:21 PM Jark Wu  wrote:
> >
> > +1 (binding)
> >
> > Best,
> > Jark
> >
> > On Tue, 7 Jun 2022 at 12:17, Lincoln Lee  wrote:
> >
> > > Dear Flink developers,
> > >
> > > Thanks for all your feedback for FLIP-234: Support Retryable Lookup Join 
> > > To
> > > Solve Delayed Updates Issue In External Systems[1] on the discussion
> > > thread[2].
> > >
> > > I'd like to start a vote for it. The vote will be open for at least 72
> > > hours unless there is an objection or not enough votes.
> > >
> > > [1]
> > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> > >
> > > Best,
> > > Lincoln Lee
> > >


[jira] [Created] (FLINK-27964) Support Cassandra connector in Python DataStream API

2022-06-08 Thread pengmd (Jira)
pengmd created FLINK-27964:
--

 Summary: Support Cassandra connector in Python DataStream API
 Key: FLINK-27964
 URL: https://issues.apache.org/jira/browse/FLINK-27964
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Connectors / Cassandra
Reporter: pengmd
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27965) Query schema is not correct for insert overwrite clause when source and target table are same

2022-06-08 Thread Jane Chan (Jira)
Jane Chan created FLINK-27965:
-

 Summary: Query schema is not correct for insert overwrite clause 
when source and target table are same
 Key: FLINK-27965
 URL: https://issues.apache.org/jira/browse/FLINK-27965
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.0
Reporter: Jane Chan


h3. How to reproduce

Add the following test under TableEnvironmentITCase
{code:java}
@Test
def testExplainInsertOverwrite(): Unit = {
  val sinkPath = tempFolder.newFolder().toString
  tEnv.executeSql(
s"""
   |create table MySink (
   |  first string,
   |  part string
   |) partitioned by (part)
   |with (
   |  'connector' = 'filesystem',
   |  'path' = '$sinkPath',
   |  'format' = 'testcsv'
   |)
 """.stripMargin)

tEnv.executeSql(
  "explain plan for " +
"insert overwrite MySink partition (part = '123') " +
"select first from MySink where part = '123'")
}
{code}
h3. Stacktrace
{code:java}
org.apache.flink.table.api.ValidationException: Column types of query result 
and sink for 'default_catalog.default_database.MySink' do not match.
Cause: Different number of columns.Query schema: [first: STRING, EXPR$1: STRING 
NOT NULL, EXPR$2: STRING NOT NULL]
Sink schema:  [first: STRING, part: STRING]    at 
org.apache.flink.table.planner.connectors.DynamicSinkUtils.createSchemaMismatchException(DynamicSinkUtils.java:453)
    at 
org.apache.flink.table.planner.connectors.DynamicSinkUtils.validateSchemaAndApplyImplicitCast(DynamicSinkUtils.java:256)
    at 
org.apache.flink.table.planner.connectors.DynamicSinkUtils.convertSinkToRel(DynamicSinkUtils.java:208)
    at 
org.apache.flink.table.planner.connectors.DynamicSinkUtils.convertSinkToRel(DynamicSinkUtils.java:170)
    at 
org.apache.flink.table.planner.delegation.PlannerBase.$anonfun$translateToRel$1(PlannerBase.scala:258)
    at scala.Option.map(Option.scala:146)
    at 
org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:228)
    at 
org.apache.flink.table.planner.delegation.PlannerBase.$anonfun$getExplainGraphs$2(PlannerBase.scala:508)
    at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
    at scala.collection.Iterator.foreach(Iterator.scala:937)
    at scala.collection.Iterator.foreach$(Iterator.scala:937)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
    at scala.collection.IterableLike.foreach(IterableLike.scala:70)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike.map(TraversableLike.scala:233)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at 
org.apache.flink.table.planner.delegation.PlannerBase.getExplainGraphs(PlannerBase.scala:487)
    at 
org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:93)
    at 
org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:50)
    at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:664)
    at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1298)
    at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:701)
    at 
org.apache.flink.table.api.TableEnvironmentITCase.testExplainInsertOverwrite(TableEnvironmentITCase.scala:179)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
    at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) 
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [VOTE] FLIP-223: Support HiveServer2 Endpoint

2022-06-08 Thread godfrey he
+1

Best,
Godfrey

Jark Wu  于2022年6月7日周二 17:21写道:
>
> +1 (binding)
>
> Best,
> Jark
>
> On Tue, 7 Jun 2022 at 13:32, Shengkai Fang  wrote:
>
> > Hi, everyone.
> >
> > Thanks for all feedback for FLIP-223: Support HiveServer2 Endpoint[1] on
> > the discussion thread[2]. I'd like to start a vote for it. The vote will be
> > open for at least 72 hours unless there is an objection or not enough
> > votes.
> >
> > Best,
> > Shengkai
> >
> >
> > [1]
> >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> > [2] https://lists.apache.org/thread/9r1j7ho2m8zbqy3tl7vvj9gnocggwr6x
> >


[jira] [Created] (FLINK-27966) All the classes move to the connector specific files

2022-06-08 Thread LuNng Wang (Jira)
LuNng Wang created FLINK-27966:
--

 Summary: All the classes move to the connector specific files
 Key: FLINK-27966
 URL: https://issues.apache.org/jira/browse/FLINK-27966
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.15.0
Reporter: LuNng Wang
 Fix For: 1.16.0


 If all the classes are placed `connectors/__init__.py`, conflicts may happen 
that two classes belonging to two different connectors having the same name.

https://github.com/apache/flink/pull/19732#discussion_r883361009



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-06-08 Thread godfrey he
Hi all,

Regarding `PIPELINE`, it comes from flink-core module, see
`PipelineOptions` class for more details.
`JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with `JOBS`.

+1 to discuss JOBTREE in other FLIP.

+1 to `STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] `

+1 to `CREATE SAVEPOINT FOR JOB ` and `DROP SAVEPOINT `

Best,
Godfrey

Jing Ge  于2022年6月9日周四 01:48写道:
>
> Hi Paul, Hi Jark,
>
> Re JOBTREE, agree that it is out of the scope of this FLIP
>
> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP 
> SAVEPOINT ALL' housekeeping. WDYT?
>
> Best regards,
> Jing
>
>
> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu  wrote:
>>
>> Hi Jing,
>>
>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the 
>> scope
>>  of this FLIP and can be discussed in another FLIP.
>>
>> Job lineage is a big topic that may involve many problems:
>> 1) how to collect and report job entities, attributes, and lineages?
>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
>> 3) how does Flink SQL CLI/Gateway know the lineage information and show 
>> jobtree?
>> 4) ...
>>
>> Best,
>> Jark
>>
>> On Wed, 8 Jun 2022 at 20:44, Jark Wu  wrote:
>>>
>>> Hi Paul,
>>>
>>> I'm fine with using JOBS. The only concern is that this may conflict with 
>>> displaying more detailed
>>> information for query (e.g. query content, plan) in the future, e.g. SHOW 
>>> QUERIES EXTENDED in ksqldb[1].
>>> This is not a big problem as we can introduce SHOW QUERIES in the future if 
>>> necessary.
>>>
>>> > STOP JOBS  (with options `table.job.stop-with-savepoint` and 
>>> > `table.job.stop-with-drain`)
>>> What about STOP JOB  [WITH SAVEPOINT] [WITH DRAIN] ?
>>> It might be trivial and error-prone to set configuration before executing a 
>>> statement,
>>> and the configuration will affect all statements after that.
>>>
>>> > CREATE SAVEPOINT  FOR JOB 
>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB ",
>>> and always use configuration "state.savepoints.dir" as the default 
>>> savepoint dir.
>>> The concern with using "" is here should be savepoint dir,
>>> and savepoint_path is the returned value.
>>>
>>> I'm fine with other changes.
>>>
>>> Thanks,
>>> Jark
>>>
>>> [1]: 
>>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>>>
>>>
>>>
>>> On Wed, 8 Jun 2022 at 15:07, Paul Lam  wrote:

 Hi Jing,

 Thank you for your inputs!

 TBH, I haven’t considered the ETL scenario that you mentioned. I think 
 they’re managed just like other jobs interns of job lifecycles (please 
 correct me if I’m wrong).

 WRT to the SQL statements about SQL lineages, I think it might be a little 
 bit out of the scope of the FLIP, since it’s mainly about lifecycles. By 
 the way, do we have these functionalities in Flink CLI or REST API already?

 WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the 
 community is more in favor of `DROP SAVEPOINT `. I’m 
 updating the FLIP arcading to the latest discussions.

 Best,
 Paul Lam

 2022年6月8日 07:31,Jing Ge  写道:

 Hi Paul,

 Sorry that I am a little bit too late to join this thread. Thanks for 
 driving this and starting this informative discussion. The FLIP looks 
 really interesting. It will help us a lot to manage Flink SQL jobs.

 Have you considered the ETL scenario with Flink SQL, where multiple SQLs 
 build a DAG for many DAGs?

 1)
 +1 for SHOW JOBS. I think sooner or later we will start to discuss how to 
 support ETL jobs. Briefly speaking, SQLs that used to build the DAG are 
 responsible to *produce* data as the result(cube, materialized view, etc.) 
 for the future consumption by queries. The INSERT INTO SELECT FROM example 
 in FLIP and CTAS are typical SQL in this case. I would prefer to call them 
 Jobs instead of Queries.

 2)
 Speaking of ETL DAG, we might want to see the lineage. Is it possible to 
 support syntax like:

 SHOW JOBTREE   // shows the downstream DAG from the given job_id
 SHOW JOBTREE  FULL // shows the whole DAG that contains the given 
 job_id
 SHOW JOBTREES // shows all DAGs
 SHOW ANCIENTS  // shows all parents of the given job_id

 3)
 Could we also support Savepoint housekeeping syntax? We ran into this 
 issue that a lot of savepoints have been created by customers (via their 
 apps). It will take extra (hacking) effort to clean it.

 RELEASE SAVEPOINT ALL

 Best regards,
 Jing

 On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser  
 wrote:
>
> Hi Paul,
>
> I'm still doubting the keyword for the SQL applications. SHOW QUERIES 
> could
> imply that this will actually show the query, but we're returning IDs of
> the running application. At first I was also not very much in favour of
> SHOW JOBS since I pref

Re: [VOTE] FLIP-223: Support HiveServer2 Endpoint

2022-06-08 Thread Nicholas Jiang
+1 (not-binding)

Best,
Nicholas Jiang

On 2022/06/07 05:31:21 Shengkai Fang wrote:
> Hi, everyone.
> 
> Thanks for all feedback for FLIP-223: Support HiveServer2 Endpoint[1] on
> the discussion thread[2]. I'd like to start a vote for it. The vote will be
> open for at least 72 hours unless there is an objection or not enough votes.
> 
> Best,
> Shengkai
> 
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
> [2] https://lists.apache.org/thread/9r1j7ho2m8zbqy3tl7vvj9gnocggwr6x
> 


Re: [VOTE] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

2022-06-08 Thread Martijn Visser
+1 (binding)

Op do 9 jun. 2022 om 04:53 schreef godfrey he :

> +1
>
> Best,
> Godfrey
>
> Jingsong Li  于2022年6月9日周四 10:26写道:
> >
> > +1 (binding)
> >
> > Best,
> > Jingsong
> >
> > On Tue, Jun 7, 2022 at 5:21 PM Jark Wu  wrote:
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 7 Jun 2022 at 12:17, Lincoln Lee 
> wrote:
> > >
> > > > Dear Flink developers,
> > > >
> > > > Thanks for all your feedback for FLIP-234: Support Retryable Lookup
> Join To
> > > > Solve Delayed Updates Issue In External Systems[1] on the discussion
> > > > thread[2].
> > > >
> > > > I'd like to start a vote for it. The vote will be open for at least
> 72
> > > > hours unless there is an objection or not enough votes.
> > > >
> > > > [1]
> > > >
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
>


Re: [VOTE] FLIP-231: Introduce SupportStatisticReport to support reporting statistics from source connectors

2022-06-08 Thread Martijn Visser
+1 (binding)

Op do 9 jun. 2022 om 04:31 schreef Jingsong Li :

> +1 (binding)
>
> Best,
> Jingsong
>
> On Tue, Jun 7, 2022 at 5:20 PM Jark Wu  wrote:
> >
> > +1 (binding)
> >
> > Best,
> > Jark
> >
> > On Tue, 7 Jun 2022 at 13:44, Jing Ge  wrote:
> >
> > > Hi Godfrey,
> > >
> > > +1 (non-binding)
> > >
> > > Best regards,
> > > Jing
> > >
> > >
> > > On Tue, Jun 7, 2022 at 4:42 AM godfrey he  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for all the feedback so far. Based on the discussion[1] we
> seem
> > > > to have consensus, so I would like to start a vote on FLIP-231 for
> > > > which the FLIP has now also been updated[2].
> > > >
> > > > The vote will last for at least 72 hours (Jun 10th 12:00 GMT) unless
> > > > there is an objection or insufficient votes.
> > > >
> > > > [1] https://lists.apache.org/thread/88kxk7lh8bq2s2c2qrf06f3pnf9fkxj2
> > > > [2]
> > > >
> > >
> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=211883860&draftShareId=eda17eaa-43f9-4dc1-9a7d-3a9b5a4bae00&;
> > > >
> > > > Best,
> > > > Godfrey
> > > >
> > >
>