Re: [DISCUSS] FLIP 333 - Redesign Apache Flink website

2023-07-16 Thread Yun Tang
+1 for the new look of website.

And for the content of FLIP, please do not paste the "Detailed designs" under 
the scope of "Rejected Alternatives", you can just post the pictures in the 
"Proposed Changes" part.


Best
Yun Tang

From: Mohan, Deepthi 
Sent: Sunday, July 16, 2023 14:10
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] FLIP 333 - Redesign Apache Flink website

@Chesnay

Thank you for your feedback.

An important takeaway from the previous discussion [1] and your feedback was to 
keep the design and text/diagram changes separate as each change for text and 
diagrams likely require deeper discussion. Therefore, as a first step I am 
proposing only UX changes with minimal text changes for the pages mentioned in 
the FLIP.

The feedback we received from customers cover both aesthetics and functional 
aspects of the website. Note that most feedback is focused only on the main 
Flink website [2].

1) New customers who are considering Flink have said about the website “there 
is a lot going on”, “looks too complicated”, “I am not sure *why* I should use 
this" and similar feedback. The proposed redesign in this FLIP helps partially 
address this category of feedback, but we may need to make the use cases and 
value proposition “pop” more than we have currently proposed in the redesign. 
I’d like to get the community’s thoughts on this.

2) On the look and feel of the website, I’ve already shared feedback prior that 
I am repeating here: “like a wiki page thrown together by developers.” 
Customers also point out other related Apache project websites: [3] and [4] as 
having “modern” user design. The proposed redesign in this FLIP will help 
address this feedback. Modernizing the look and feel of the website will appeal 
to customers who are used to what they encounter on other contemporary websites.

3) New and existing Flink developers have said “I am not sure what the diagram 
is supposed to depict” - referencing the main diagram on [2] and have said that 
the website lacks useful graphics and colors. Apart from removing the diagram 
on the main page [2], the current FLIP does propose major changes to diagrams 
in the rest of website and we can discuss them separately as they become 
available. I’d like to keep the FLIP focused only on the website redesign.

Ultimately, to Chesnay’s point in the earlier discussion in [1], I do not want 
to boil the ocean with all the changes at once. In this FLIP, my proposal is to 
first work on the UX design as that gives us a good starting point. We can use 
it as a framework to make iterative changes and enhancements to diagrams and 
the actual website content incrementally.

I’ve added a few more screenshots of additional pages to the FLIP that will 
give you a clearer picture of the proposed changes for the main page, What is 
Flink [Architecture, Applications, and Operations] pages.

And finally, I am not proposing any tooling changes.

[1] https://lists.apache.org/thread/c3pt00cf77lrtgt242p26lgp9l2z5yc8
[2]https://flink.apache.org/
[3] https://spark.apache.org/
[4] https://kafka.apache.org/

On 7/13/23, 6:25 AM, "Chesnay Schepler" mailto:ches...@apache.org>> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






On 13/07/2023 08:07, Mohan, Deepthi wrote:
> However, even these developers when explicitly asked in our conversations 
> often comment that the website could do with a redesign


Can you go into more detail as to their specific concerns? Are there
functional problems with the page, or is this just a matter of "I don't
like the way it looks"?


What had they trouble with? Which information was
missing/unnecessary/too hard to find?


The FLIP states that "/we want to modernize the website so that new and
existing users can easily find information to understand what Flink is,
the primary use cases where Flink is useful, and clearly understand its
value proposition/."


From the mock-ups I don't /really/ see how these stated goals are
achieved. It mostly looks like a fresh coat of paint, with a compressed
nav bar (which does reduce how much information and links we throw at
people at once (which isn't necessarily bad)).


Can you go into more detail w.r.t. to the proposed
text/presentation/diagram changes?


I assume you are not proposing any tooling changes?







Re: [VOTE] Release 2.0 must-have work items

2023-07-16 Thread Yun Tang
I agree that we could downgrade "Eager state declaration" to a nice-to-have 
feature.

For the depreciation of "queryable state", can we just rename to deprecate 
"current implementation of queryable state"? The feature to query the internal 
state is actually very useful for debugging and could provide more possibility 
to extend FlinkSQL more like a database.

Just as Yuan replied in the previous email [1], current implementation of 
queryable state has many problems in design. However, I don't want to make 
users feel that this feature cannot be done well, and maybe we can redesign 
this feature. As far as I know, risingwave already support  queryable state 
with better user experience [2].


[1] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m
[2] https://syntaxbug.com/06a3e7c554/

Best
Yun Tang

From: Xintong Song 
Sent: Friday, July 14, 2023 13:51
To: dev@flink.apache.org 
Subject: Re: [VOTE] Release 2.0 must-have work items

Thanks for the support, Yu.

We will have the guideline before removing DataSet. We are currently
prioritizing works that need to be done before the 1.18 feature freeze, and
will soon get back to working on the guidelines. We expect to get the
guideline ready before or soon after the 1.18 release, which will
definitely be before removing DataSet in 2.0.

Best,

Xintong



On Fri, Jul 14, 2023 at 1:06 PM Yu Li  wrote:

> It's great to see the discussion about what we need to improve on
> (completely) switching from DataSet API to DataStream API from the user
> perspective. I feel that these improvements would happen faster (only) when
> we seriously prepare to remove the DataSet APIs with a target release, just
> like what we are doing now. And the same applies to the SinkV1 related
> discussions (smile).
>
> I support Xintong's opinion on keeping "Remove the DataSet APIs" a
> must-have item, meantime I support Yuxia's opinion that we should
> explicitly let our users know how to migrate their existing DataSet API
> based applications afterwards, meaning that the guideline Xintong mentioned
> is a must-have (rather than best efforts) before removing the DataSet APIs.
>
> Best Regards,
> Yu
>
>
> On Wed, 12 Jul 2023 at 14:00, yuxia  wrote:
>
> > Thanks Xintong for clarification. A guideline to help users migrating
> from
> > DataSet to DataStream will definitely be helpful.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Xintong Song" 
> > 收件人: "dev" 
> > 发送时间: 星期三, 2023年 7 月 12日 上午 11:40:12
> > 主题: Re: [VOTE] Release 2.0 must-have work items
> >
> > @Yuxia,
> >
> > We are aware of the issue that you mentioned. Actually, I don't think the
> > DataStream API can cover everything in the DataSet API in exactly the
> same
> > way, because the fundamental model, concepts and primitives of the two
> sets
> > of APIs are completely different. Many of the DataSet APIs, especially
> > those accessing the full data set at once, do not fit in the DataStream
> > concepts at all. I think what's important is that users can achieve the
> > same function, even if they may need to code in a different way.
> >
> > We have gone through all the existing DataSet APIs, and categorized them
> > into 3 kinds:
> > - APIs that are well supported by DataStream API as is. E.g., map, reduce
> > on grouped dataset, etc.
> > - APIs that can be achieved by DataStream API as is, but with a price
> > (programming complexity, or computation efficiency). E.g., reduce on full
> > dataset, sort partition, etc. Admittedly, there is room for improvement
> on
> > these. We may keep improving these for the DataStream API, or we can
> > concentrate on supporting them better in the new ProcessFunction API.
> > Either way, I don't think we should block the retiring of DataSet API on
> > them.
> > - There are also a few APIs that cannot be supported by the DataStream
> API
> > as is, unless users write their custom operators from the ground up. Only
> > left/rightOuterJoin and combineGroup fall into this category. I think
> > combinedGroup is probably not a problem, because this is more like a
> > variant of reduceGroup that allows the framework to execute more
> > efficiently. As for the outer joins, depending on how badly this is
> needed,
> > it can be supported by emitting the non-joined entries upon triggering a
> > window join.
> >
> > We are also planning to draft a guideline to help users migrating from
> > DataSet to DataStream, which should demonstrate how users can achieve
> > things like sort-partition with DataStream API.
> >
> > Last but not least, I'd like to point out that the decision to deprecate
> > and eventually remove the DataSet API was approved in FLIP-131, and all
> the
> > prerequisites mentioned in the FLIP have been completed.
> >
> > Best,
> >
> > Xintong
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
> >
> >
> >
> > On Wed, Jul 12, 2023 at 10:20 AM Jingsong Li 
> > wrote:
> >
> > >

Re: [VOTE] Release 2.0 must-have work items

2023-07-16 Thread Xintong Song
I'd propose to downgrade "Refactor the API modules" to TBD. The original
proposal was based on the condition that we are allowed to introduce
in-place API breaking changes in release 2.0. As the migration period is
introduced, and we are no longer planning to do in-place changes /
removal for DataStream (and same for APIs in `flink-core`), we need to
re-evaluate whether it's feasible to do things like moving classes to
different module / packages, turning concrete classes into interfaces on
the API classes.

Best,

Xintong



On Mon, Jul 17, 2023 at 1:10 AM Yun Tang  wrote:

> I agree that we could downgrade "Eager state declaration" to a
> nice-to-have feature.
>
> For the depreciation of "queryable state", can we just rename to deprecate
> "current implementation of queryable state"? The feature to query the
> internal state is actually very useful for debugging and could provide more
> possibility to extend FlinkSQL more like a database.
>
> Just as Yuan replied in the previous email [1], current implementation of
> queryable state has many problems in design. However, I don't want to make
> users feel that this feature cannot be done well, and maybe we can redesign
> this feature. As far as I know, risingwave already support  queryable state
> with better user experience [2].
>
>
> [1] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m
> [2] https://syntaxbug.com/06a3e7c554/
>
> Best
> Yun Tang
> 
> From: Xintong Song 
> Sent: Friday, July 14, 2023 13:51
> To: dev@flink.apache.org 
> Subject: Re: [VOTE] Release 2.0 must-have work items
>
> Thanks for the support, Yu.
>
> We will have the guideline before removing DataSet. We are currently
> prioritizing works that need to be done before the 1.18 feature freeze, and
> will soon get back to working on the guidelines. We expect to get the
> guideline ready before or soon after the 1.18 release, which will
> definitely be before removing DataSet in 2.0.
>
> Best,
>
> Xintong
>
>
>
> On Fri, Jul 14, 2023 at 1:06 PM Yu Li  wrote:
>
> > It's great to see the discussion about what we need to improve on
> > (completely) switching from DataSet API to DataStream API from the user
> > perspective. I feel that these improvements would happen faster (only)
> when
> > we seriously prepare to remove the DataSet APIs with a target release,
> just
> > like what we are doing now. And the same applies to the SinkV1 related
> > discussions (smile).
> >
> > I support Xintong's opinion on keeping "Remove the DataSet APIs" a
> > must-have item, meantime I support Yuxia's opinion that we should
> > explicitly let our users know how to migrate their existing DataSet API
> > based applications afterwards, meaning that the guideline Xintong
> mentioned
> > is a must-have (rather than best efforts) before removing the DataSet
> APIs.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Wed, 12 Jul 2023 at 14:00, yuxia  wrote:
> >
> > > Thanks Xintong for clarification. A guideline to help users migrating
> > from
> > > DataSet to DataStream will definitely be helpful.
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Xintong Song" 
> > > 收件人: "dev" 
> > > 发送时间: 星期三, 2023年 7 月 12日 上午 11:40:12
> > > 主题: Re: [VOTE] Release 2.0 must-have work items
> > >
> > > @Yuxia,
> > >
> > > We are aware of the issue that you mentioned. Actually, I don't think
> the
> > > DataStream API can cover everything in the DataSet API in exactly the
> > same
> > > way, because the fundamental model, concepts and primitives of the two
> > sets
> > > of APIs are completely different. Many of the DataSet APIs, especially
> > > those accessing the full data set at once, do not fit in the DataStream
> > > concepts at all. I think what's important is that users can achieve the
> > > same function, even if they may need to code in a different way.
> > >
> > > We have gone through all the existing DataSet APIs, and categorized
> them
> > > into 3 kinds:
> > > - APIs that are well supported by DataStream API as is. E.g., map,
> reduce
> > > on grouped dataset, etc.
> > > - APIs that can be achieved by DataStream API as is, but with a price
> > > (programming complexity, or computation efficiency). E.g., reduce on
> full
> > > dataset, sort partition, etc. Admittedly, there is room for improvement
> > on
> > > these. We may keep improving these for the DataStream API, or we can
> > > concentrate on supporting them better in the new ProcessFunction API.
> > > Either way, I don't think we should block the retiring of DataSet API
> on
> > > them.
> > > - There are also a few APIs that cannot be supported by the DataStream
> > API
> > > as is, unless users write their custom operators from the ground up.
> Only
> > > left/rightOuterJoin and combineGroup fall into this category. I think
> > > combinedGroup is probably not a problem, because this is more like a
> > > variant of reduceGroup that allows the framework to execute more
> > > efficiently. As for

Re: [VOTE] Release 2.0 must-have work items

2023-07-16 Thread Xintong Song
@Yun,
I see your point that the ability queryable states trying to provide is
meaningful but the current implementation of the feature is problematic. So
what's your opinion on deprecating the current queryable state? Do you
think we need to wait until there is a new implementation of queryable
state to remove the current one? Or maybe the current implementation is not
well functional anyway and we can treat the removal of it as
independent from introducing a new one?

However, I don't want to make users feel that this feature cannot be done
> well, and maybe we can redesign this feature.
>
TBH, the impression that I got from the roadmap[1] is that the queryable
state is retiring and will be replaced by the state processor api. If this
is not the impression we want users to have, you probably also need to
raise it in the roadmap discussion [2].

Best,

Xintong


[1] https://flink.apache.org/roadmap

[2] https://lists.apache.org/thread/szdr4ngrfcmo7zko4917393zbqhgw0v5



On Mon, Jul 17, 2023 at 9:53 AM Xintong Song  wrote:

> I'd propose to downgrade "Refactor the API modules" to TBD. The original
> proposal was based on the condition that we are allowed to introduce
> in-place API breaking changes in release 2.0. As the migration period is
> introduced, and we are no longer planning to do in-place changes /
> removal for DataStream (and same for APIs in `flink-core`), we need to
> re-evaluate whether it's feasible to do things like moving classes to
> different module / packages, turning concrete classes into interfaces on
> the API classes.
>
> Best,
>
> Xintong
>
>
>
> On Mon, Jul 17, 2023 at 1:10 AM Yun Tang  wrote:
>
>> I agree that we could downgrade "Eager state declaration" to a
>> nice-to-have feature.
>>
>> For the depreciation of "queryable state", can we just rename to
>> deprecate "current implementation of queryable state"? The feature to query
>> the internal state is actually very useful for debugging and could provide
>> more possibility to extend FlinkSQL more like a database.
>>
>> Just as Yuan replied in the previous email [1], current implementation of
>> queryable state has many problems in design. However, I don't want to make
>> users feel that this feature cannot be done well, and maybe we can redesign
>> this feature. As far as I know, risingwave already support  queryable state
>> with better user experience [2].
>>
>>
>> [1] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m
>> [2] https://syntaxbug.com/06a3e7c554/
>>
>> Best
>> Yun Tang
>> 
>> From: Xintong Song 
>> Sent: Friday, July 14, 2023 13:51
>> To: dev@flink.apache.org 
>> Subject: Re: [VOTE] Release 2.0 must-have work items
>>
>> Thanks for the support, Yu.
>>
>> We will have the guideline before removing DataSet. We are currently
>> prioritizing works that need to be done before the 1.18 feature freeze,
>> and
>> will soon get back to working on the guidelines. We expect to get the
>> guideline ready before or soon after the 1.18 release, which will
>> definitely be before removing DataSet in 2.0.
>>
>> Best,
>>
>> Xintong
>>
>>
>>
>> On Fri, Jul 14, 2023 at 1:06 PM Yu Li  wrote:
>>
>> > It's great to see the discussion about what we need to improve on
>> > (completely) switching from DataSet API to DataStream API from the user
>> > perspective. I feel that these improvements would happen faster (only)
>> when
>> > we seriously prepare to remove the DataSet APIs with a target release,
>> just
>> > like what we are doing now. And the same applies to the SinkV1 related
>> > discussions (smile).
>> >
>> > I support Xintong's opinion on keeping "Remove the DataSet APIs" a
>> > must-have item, meantime I support Yuxia's opinion that we should
>> > explicitly let our users know how to migrate their existing DataSet API
>> > based applications afterwards, meaning that the guideline Xintong
>> mentioned
>> > is a must-have (rather than best efforts) before removing the DataSet
>> APIs.
>> >
>> > Best Regards,
>> > Yu
>> >
>> >
>> > On Wed, 12 Jul 2023 at 14:00, yuxia 
>> wrote:
>> >
>> > > Thanks Xintong for clarification. A guideline to help users migrating
>> > from
>> > > DataSet to DataStream will definitely be helpful.
>> > >
>> > > Best regards,
>> > > Yuxia
>> > >
>> > > - 原始邮件 -
>> > > 发件人: "Xintong Song" 
>> > > 收件人: "dev" 
>> > > 发送时间: 星期三, 2023年 7 月 12日 上午 11:40:12
>> > > 主题: Re: [VOTE] Release 2.0 must-have work items
>> > >
>> > > @Yuxia,
>> > >
>> > > We are aware of the issue that you mentioned. Actually, I don't think
>> the
>> > > DataStream API can cover everything in the DataSet API in exactly the
>> > same
>> > > way, because the fundamental model, concepts and primitives of the two
>> > sets
>> > > of APIs are completely different. Many of the DataSet APIs, especially
>> > > those accessing the full data set at once, do not fit in the
>> DataStream
>> > > concepts at all. I think what's important is that users can achieve
>> the
>> > > same function, ev

[jira] [Created] (FLINK-32595) Kinesis connector doc show wrong deserialization schema version

2023-07-16 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-32595:
-

 Summary: Kinesis connector doc show wrong deserialization schema 
version
 Key: FLINK-32595
 URL: https://issues.apache.org/jira/browse/FLINK-32595
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: aws-connector-4.2.0
Reporter: Yuxin Tan
Assignee: Yuxin Tan


[https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kinesis/]

GlueSchemaRegistryJsonDeserializationSchema and 
GlueSchemaRegistryAvroDeserializationSchema show the wrong version(flink 
version), but they have been moved to the repo of aws-connector. 
So we should fix the version number.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Update Flink Roadmap

2023-07-16 Thread Jark Wu
Hi Jiabao,

Thank you for your suggestions. I have added them to the "Going Beyond a
SQL Stream/Batch Processing Engine" and "Large-Scale State Jobs" sections.

Best,
Jark

On Thu, 13 Jul 2023 at 16:06, Jiabao Sun 
wrote:

> Thanks Jark and Martijn for driving this.
>
> There are two suggestions about the Table API:
>
> - Add the JSON type to adapt to the no sql database type.
> - Remove changelog normalize operator for upsert stream.
>
>
> Best,
> Jiabao
>
>
> > 2023年7月13日 下午3:49,Jark Wu  写道:
> >
> > Hi all,
> >
> > Sorry for taking so long back here.
> >
> > Martijn and I have drafted the first version of the updated roadmap,
> > including the updated feature radar reflecting the current state of
> > different components.
> >
> https://docs.google.com/document/d/12BDiVKEsY-f7HI3suO_IxwzCmR04QcVqLarXgyJAb7c/edit
> >
> > Feel free to leave comments in the thread or the document.
> > We may miss mentioning something important, so your help in enriching
> > the content is greatly appreciated.
> >
> > Best,
> > Jark & Martijn
> >
> >
> > On Fri, 2 Jun 2023 at 00:50, Jing Ge  wrote:
> >
> >> Hi Jark,
> >>
> >> Fair enough. Let's do it like you suggested. Thanks!
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Thu, Jun 1, 2023 at 6:00 PM Jark Wu  wrote:
> >>
> >>> Hi Jing,
> >>>
> >>> This thread is for discussing the roadmap for versions 1.18, 2.0, and
> >> even
> >>> more.
> >>> One of the outcomes of this discussion will be an updated version of
> the
> >>> current roadmap.
> >>> Let's work together on refining the roadmap in this thread.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Thu, 1 Jun 2023 at 23:25, Jing Ge 
> wrote:
> >>>
>  Hi Jark,
> 
>  Thanks for driving it! For point 2, since we are developing 1.18 now,
>  does it make sense to update the roadmap this time while we are
> >> releasing
>  1.18? This discussion thread will be focusing on the Flink 2.0
> roadmap,
> >>> as
>  you mentioned previously. WDYT?
> 
>  Best regards,
>  Jing
> 
>  On Thu, Jun 1, 2023 at 3:31 PM Jark Wu  wrote:
> 
> > Hi all,
> >
> > Martijn and I would like to initiate a discussion on the Flink
> >> roadmap,
> > which should cover the project's long-term roadmap and the regular
> >>> update
> > mechanism.
> >
> > Xintong has already started a discussion about Flink 2.0 planning.
> >> One
> >>> of
> > the points raised in that discussion is that we should have a
> >>> high-level
> > discussion of the roadmap to present where the project is heading
> >>> (which
> > doesn't necessarily need to block the Flink 2.0 planning). Moreover,
> >>> the
> > roadmap on the Flink website [1] hasn't been updated for half a year,
> >>> and
> > the last update was for the feature radar for the 1.15 release. It
> >> has
>  been
> > 2 years since the community discussed Flink's overall roadmap.
> >
> > I would like to raise two topics for discussion:
> >
> > 1. The new roadmap. This should be an updated version of the current
> > roadmap[1].
> > 2. A mechanism to regularly discuss and update the roadmap.
> >
> > To make the first topic discussion more efficient, Martijn and I
>  volunteer
> > to summarize the ongoing big things of different components and
> >>> present a
> > roadmap draft to the community in the next few weeks. This should be
> >> a
>  good
> > starting point for a more detailed discussion.
> >
> > Regarding the regular update mechanism, there was a proposal in a
> >>> thread
> > [2] three years ago to make the release manager responsible for
> >>> updating
> > the roadmap. However, it appears that this was not documented as a
>  release
> > management task [3], and the roadmap update wasn't performed for
> >>> releases
> > 1.16 and 1.17.
> >
> > In my opinion, making release managers responsible for keeping the
>  roadmap
> > up to date is a good idea. Specifically, release managers of release
> >> X
>  can
> > kick off the roadmap update at the beginning of release X, which can
> >>> be a
> > joint task with collecting a feature list [4]. Additionally, release
> > managers of release X-1 can help verify and remove the accomplished
> >>> items
> > from the roadmap and update the feature radar.
> >
> > What do you think? Do you have other ideas?
> >
> > Best,
> > Jark & Martijn
> >
> > [1]: https://flink.apache.org/roadmap.html
> > [2]:
> >> https://lists.apache.org/thread/o0l3cg6yphxwrww0k7215jgtw3yfoybv
> > [3]:
> >
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management
> > [4]: https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
> >
> 
> >>>
> >>
>
>


[jira] [Created] (FLINK-32596) The partition key will be wrong when use Flink dialect to create Hive table

2023-07-16 Thread luoyuxia (Jira)
luoyuxia created FLINK-32596:


 Summary: The partition key will be wrong when use Flink dialect to 
create Hive table
 Key: FLINK-32596
 URL: https://issues.apache.org/jira/browse/FLINK-32596
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.17.0, 1.16.0, 1.15.0
Reporter: luoyuxia


Can be reproduced by the following SQL:

 
{code:java}
tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT);
tableEnv.executeSql(
"create table t1(`date` string, `geo_altitude` FLOAT) partitioned by 
(`date`)"
+ " with ('connector' = 'hive', 
'sink.partition-commit.delay'='1 s',  
'sink.partition-commit.policy.kind'='metastore,success-file')");
CatalogTable catalogTable =
(CatalogTable) 
hiveCatalog.getTable(ObjectPath.fromString("default.t1"));

// the following assertion will fail
assertThat(catalogTable.getPartitionKeys().toString()).isEqualTo("[date]");{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Issue with flink 1.16 and hive dialect

2023-07-16 Thread yuxia
Hi, Ram. 
Thanks for reaching out. 
1: 
About Hive dialect issue, may be you're using JDK11? 
There's a known issue in FLINK-27450[1]. The main reason that Hive dosen't 
fully support JDK11. More specific to your case, it has been tracked in 
HIVE-21584[2]. 
Flink has upgrade the Hive 2.x version to 2.3.9 to include this patch. But 
unfortunately, IIRC, this patch is still not available in Hive 3.x. 

2: 
About the creating table issue, thanks for reporting it. I tried it and it 
turns out that it's a bug. I have created FLINK-32596 [3] to trace it. 
It only happen with Flink dialect & partitioned table & Hive Catalog. 
In most case, we recommend user to use Hive dialect to created hive tables, 
then we miss the test to cover use Flink dialect to create partitioed table in 
Hive Catalog. So this bug has been hiden for a while. 
For your case, as a work around, I think you can try to create the table in 
Hive itself with the following SQL: 
CREATE TABLE testsource( 
`geo_altitude` FLOAT 
) 
PARTITIONED by ( `date` STRING) tblproperties ( 
'sink.partition-commit.delay'='1 s', 
'sink.partition-commit.policy.kind'='metastore,success-file'); 


[1] https://issues.apache.org/jira/browse/FLINK-27450 
[2] https://issues.apache.org/jira/browse/HIVE-21584 
[3] https://issues.apache.org/jira/browse/FLINK-32596 


Best regards, 
Yuxia 


发件人: "ramkrishna vasudevan"  
收件人: "User" , "dev"  
发送时间: 星期五, 2023年 7 月 14日 下午 8:46:20 
主题: Issue with flink 1.16 and hive dialect 

Hi All, 
I am not sure if this was already discussed in this forum. 
In our set up with 1.16.0 flink we have ensured that the setup has all the 
necessary things for Hive catalog to work. 

The flink dialect works fine functionally (with some issues will come to that 
later). 

But when i follow the steps here in [ 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hive-dialect/queries/overview/#examples
 | 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hive-dialect/queries/overview/#examples
 ] 
I am getting an exception once i set to hive dialect 
at org.apache.flink.table.client.SqlClient.main(SqlClient.java:161) 
[flink-sql-client-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT] 
Caused by: java.lang.ClassCastException: class 
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
java.net.URLClassLoader are in module java.base of loader 'bootstrap') 
at org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:413) 
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
 
at org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:389) 
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
 
at 
org.apache.flink.table.planner.delegation.hive.HiveSessionState.(HiveSessionState.java:80)
 
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
 
at 
org.apache.flink.table.planner.delegation.hive.HiveSessionState.startSessionState(HiveSessionState.java:128)
 
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
 
at 
org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:210)
 
~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
 
at 
org.apache.flink.table.client.gateway.local.LocalExecutor.parseStatement(LocalExecutor.java:172)
 ~[flink-sql-client-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT] 

I have ensured the dialect related steps are completed followed including 
[ https://issues.apache.org/jira/browse/FLINK-25128 | 
https://issues.apache.org/jira/browse/FLINK-25128 ] 

In the flink catalog - if we create a table 
> CREATE TABLE testsource( 
> 
> `date` STRING, 
> `geo_altitude` FLOAT 
> ) 
> PARTITIONED by ( `date`) 
> 
> WITH ( 
> 
> 'connector' = 'hive', 
> 'sink.partition-commit.delay'='1 s', 
> 'sink.partition-commit.policy.kind'='metastore,success-file' 
> ); 

The parition always gets created on the last set of columns and not on the 
columns that we specify. Is this a known bug? 

Regards 
Ram 



Re: [DISCUSS][2.0] FLIP-336: Remove "now" timestamp field from REST responses

2023-07-16 Thread Xintong Song
+1

Best,

Xintong



On Thu, Jul 13, 2023 at 9:05 PM Chesnay Schepler  wrote:

> Hello,
>
> Several REST responses contain a timestamp field of the current time
>
>   There is no known use-case for said timestamp, it makes caching of
> responses technically sketchy (since the response differs on each call)
> and  it complicates testing since the timestamp field can't be easily
> predicted.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263424789
>
>
> Regards,
>
> Chesnay
>
>


Re: [DISCUSS][2.0] FLIP-338: Remove terminationMode query parameter from job cancellation REST endpoint

2023-07-16 Thread Xintong Song
+1

Best,

Xintong



On Thu, Jul 13, 2023 at 9:41 PM Chesnay Schepler  wrote:

> Hello,
>
> The job cancellation REST endpoint has a terminationMode query
> parameter, which in the past could be set to either CANCEL or STOP, but
> nowadays the job stop endpoint has subsumed the STOP functionality.
>
> Since then the cancel endpoint rejected requests that specified STOP.
>
> I propose to finally remove this parameter, as it currently serves no
> function.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-338%3A+Remove+terminationMode+query+parameter+from+job+cancellation+REST+endpoint
>
>
>
> Regards,
>
> Chesnay
>


Re: [DISCUSS][2.0] FLIP-337: Remove JarRequestBody#programArgs

2023-07-16 Thread Xintong Song
+1

Best,

Xintong



On Thu, Jul 13, 2023 at 9:34 PM Chesnay Schepler  wrote:

> Hello,
>
> The request body for the jar run/plan REST endpoints accepts program
> arguments as a string (programArgs) or a list of strings
> (programArgsList). The latter was introduced as kept running into issues
> with splitting the string into individual arguments./
> /
>
> We ideally force users to use the list argument, and we can simplify the
> codebase if there'd only be 1 way to pass arguments.
>
> As such I propose to remove the programArgs field from the request body.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263424796
>
>
> Regards,
>
> Chesnay
>


Re: [DISCUSS] FLIP-335: Removing Flink's Time classes as part of Flink 2.0

2023-07-16 Thread Xintong Song
Thanks for creating this FLIP, Matthias. +1 from my side.

BTW, it's impressive how this FLIP clearly states the proposed API changes
and impacts on other classes / APIs, which makes it much easier for others
to evaluate the proposal. Nice work~!

Best,

Xintong



On Thu, Jul 13, 2023 at 9:11 PM Jing Ge  wrote:

> yeah, saw that thread. Thanks for starting the discussion!
>
> Best regards,
> Jing
>
> On Thu, Jul 13, 2023 at 2:15 PM Matthias Pohl
>  wrote:
>
> > Thanks Jing. About your question regarding FLIPs: The FLIP is necessary
> > when these classes touch user-facing API (which should be the case if
> we're
> > preparing it for 2.0). I created a dedicated discussion thread [1] to
> > discuss this topic because it's more general and independent of FLIP-335.
> >
> > Best,
> > Matthias
> >
> > [1] https://lists.apache.org/thread/1007v4f4ms7ftp1qtkjsq25s5lwmk9wo
> >
> > On Thu, Jul 13, 2023 at 12:21 PM Jing Ge 
> > wrote:
> >
> > > Hi Matthias,
> > >
> > > Thanks for raising this up! I saw your conversation with Chesnay in the
> > PR
> > > and after reading the historical discussions, there should be no doubt
> to
> > > deprecate Time classes. +1
> > >
> > > I just have a related question: Do we need to create a FLIP each time
> > when
> > > we want to deprecate any classes? (BTW, FLIP-335 contains useful
> > > information. It makes sense to write it)
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Thu, Jul 13, 2023 at 12:09 PM Matthias Pohl
> > >  wrote:
> > >
> > > > The 2.0 feature list includes the removal of Flink's Time classes in
> > > favor
> > > > of the JDKs java.time.Duration class. There was already a discussion
> > > about
> > > > it in [1] and FLINK-14068 [2] was created as a consequence of this
> > > > discussion.
> > > >
> > > > I started working on marking the APIs as deprecated in FLINK-32570
> [3]
> > > > where Chesnay raised a fair point that there isn't a FLIP, yet, to
> > > > formalize this public API change. Therefore, I went ahead and created
> > > > FLIP-335 [4] to have this change properly documented.
> > > >
> > > > I'm not 100% sure whether there are better ways of checking whether
> > we're
> > > > covering everything Public API-related. There are even classes which
> I
> > > > think might be user-facing but are not labeled accordingly (e.g.
> > > > flink-cep). But I don't have the proper knowledge in these parts of
> the
> > > > code. Therefore, I would propose marking these methods as deprecated,
> > > > anyway, to be on the safe side.
> > > >
> > > > I'm open to any suggestions on improving the Test Plan of this
> change.
> > > >
> > > > I'm looking forward to feedback on this FLIP.
> > > >
> > > > Best,
> > > > Matthias
> > > >
> > > > [1] https://lists.apache.org/thread/76yywnwf3lk8qn4dby0vz7yoqx7f7pkj
> > > > [2] https://issues.apache.org/jira/browse/FLINK-14068
> > > > [3] https://issues.apache.org/jira/browse/FLINK-32570
> > > > [4]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-335%3A+Removing+Flink%27s+Time+classes
> > > >
> > >
> >
>


Re: [DISCUSS] Release 2.0 Work Items

2023-07-16 Thread Xintong Song
Hi Matthias,

How's it going with the summary of existing 2.0.0 jira tickets?

I have gone through everything listed under FLINK-3957[1], and will
continue with other Jira tickets whose fix-version is 2.0.0.

Here are my 2-cents on the FLINK-3975 subtasks. Hope this helps on your
summary.

I'd suggest going ahead with the following tickets.

   - Need action in 1.18
  - FLINK-4675: Double-check whether the argument is indeed not used.
  Introduce a new non-argument API, and mark the original one as
  `@Deprecated`. FLIP needed.
  - FLINK-6912: Double-check whether the argument is indeed not used.
  Introduce a new non-argument API, and mark the original one as
  `@Deprecated`. FLIP needed.
  - FLINK-5336: Double-check whether `IOReadableWritable` is indeed not
  needed for `Path`. Mark methods from `IOReadableWritable` as
`@Deprecated`
  in `Path`. FLIP needed.
   - Need no action in 1.18
  - FLINK-4602/14068: Already listed in the release 2.0 wiki [2]
  - FLINK-3986/3991/3992/4367/5130/7691: Subsumed by "Deprecated
  methods/fields/classes in DataStream" in the release 2.0 wiki [2]
  - FLINK-6375: Change the hashCode behavior of `LongValue` (and other
  numeric types).

I'd suggest not doing the following tickets.

   - FLINK-4147/4330/9529/14658: These changes are non-trivial for both
   developers and users. Also, we are taking them into consideration designing
   the new ProcessFunction API. I'd be in favor of letting users migrate to
   the ProcessFunction API directly once it's ready, rather than forcing users
   to adapt to the breaking changes twice.
   - FLINK-3610: Only affects Scala API, which will soon be removed.

I don't have strong opinions on whether to work on the following tickets or
not. Some of them are not very clear to me based on the description and
conversation on the ticket, others may require further investigation and
evaluation to decide. Unless someone volunteers to look into them, I'd be
slightly in favor of not doing them, as I'm not aware of them causing any
serious problems.

   - FLINK-3959 Remove implicit Sinks
   - FLINK-4757 Unify "GlobalJobParameters" and "Configuration"
   - FLINK-4758 Remove IOReadableWritable from classes where not needed
   - FLINK-4971 Unify Stream Sinks and OutputFormats
   - FLINK-5126 Remove Checked Exceptions from State Interfaces
   - FLINK-5337 Introduce backwards compatible state to task assignment
   - FLINK-5346 Remove all ad-hoc config loading via GlobalConfiguration
   - FLINK-5875 Use TypeComparator.hash() instead of Object.hashCode() for
   keying in DataStream API
   - FLINK-9798 Drop canEqual() from TypeInformation, TypeSerializer, etc.
   - FLINK-13926 `ProcessingTimeSessionWindows` and
   `EventTimeSessionWindows` should be generic


WDYT?

Best,

Xintong


[1] https://issues.apache.org/jira/browse/FLINK-3957

[2] https://cwiki.apache.org/confluence/display/FLINK/2.0+Release



On Thu, Jul 13, 2023 at 10:31 AM li zhiqiang 
wrote:

> @Xingtong
> I already know the modification of some api, but because there are many
> changes involved,
> I am afraid that the consideration is not comprehensive.
> I'm willing to do the work, but I haven't found a committer yet.
>
> Best,
> Zhiqiang
>
> 发件人: Xintong Song 
> 日期: 星期四, 2023年7月13日 10:03
> 收件人: dev@flink.apache.org 
> 主题: Re: [DISCUSS] Release 2.0 Work Items
> Thanks for the inputs, Zhiqiang and Jiabao.
>
> @Zhiqiang,
> The proposal sounds interesting. Do you already have an idea what API
> changes are needed in order to make the connectors pluggable? I think
> whether this should go into Flink 2.0 would significantly depend on what
> API changes are needed. Moreover, would you like to work on this effort or
> simply raise a need? And if you'd like to work on this, do you already find
> a committer who can help on this?
>
> @Jiabao,
> Thanks for the suggestions. I agree that it would be nice to improve the
> experiences in deploying Flink instances and submitting tasks. It would be
> helpful if you can point out the specific behaviors that make integrating
> Flink in your production difficult. Also, I'd like to understand how this
> topic is related to the Release 2.0 topic. Or asked differently, is this
> something that requires breaking changes that can only happen in major
> version bumps, or is it just improvement that can go into any minor
> version?
>
>
> Best,
>
> Xintong
>
>
>
> On Thu, Jul 13, 2023 at 12:49 AM Jiabao Sun  .invalid>
> wrote:
>
> > Thanks Xintong for driving the effort.
> >
> >
> > I’d add a +1 to improving out-of-box user experience, as suggested by
> > @Jark and @Chesnay.
> > For beginners, understanding complex configurations is a hard work.
> >
> > In addition, the deployment of a set of Flink runtime environment is also
> > a complex matter.
> > At present, there are still big differences in the submission tasks for
> > different computing resource. If users need time for their own data
> > development plat

[jira] [Created] (FLINK-32597) Drop Yarn specific get rest endpoints

2023-07-16 Thread Zhenqiu Huang (Jira)
Zhenqiu Huang created FLINK-32597:
-

 Summary: Drop Yarn specific get rest endpoints 
 Key: FLINK-32597
 URL: https://issues.apache.org/jira/browse/FLINK-32597
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Affects Versions: 1.18.0
Reporter: Zhenqiu Huang


As listed in in the 2.0 release, we need to Drop YARN-specific mutating GET 
REST endpoints (yarn-cancel, yarn-stop)
We shouldn't continue having such hacks in our APIs to work around YARN 
deficiencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Issue with flink 1.16 and hive dialect

2023-07-16 Thread ramkrishna vasudevan
Thanks a lot @yuxia   for your time. We will
try to back port https://issues.apache.org/jira/browse/FLINK-27450 and see
if that works. Also will keep watching for your fix for the Default Dialect
fix.

Regards
Ram

On Mon, Jul 17, 2023 at 8:08 AM yuxia  wrote:

> Hi, Ram.
> Thanks for reaching out.
> 1:
> About  Hive dialect issue, may be you're using JDK11?
> There's a known issue in FLINK-27450[1]. The main reason that Hive
> dosen't fully support JDK11. More specific to your case, it has been
> tracked in HIVE-21584[2].
> Flink has upgrade the Hive 2.x version to 2.3.9 to include this patch. But
> unfortunately, IIRC, this patch is still not available in Hive 3.x.
>
> 2:
> About the creating table issue, thanks for reporting it. I tried it and it
> turns out that it's a bug. I have created FLINK-32596[3] to trace it.
> It only happen with Flink dialect & partitioned table & Hive Catalog.
> In most case, we recommend user to use Hive dialect to created hive
> tables, then we miss the test to cover use Flink dialect to create
> partitioed table in Hive Catalog.  So this bug has been hiden for a while.
> For your case, as a work around, I think you can try to create the table
> in Hive itself with the following SQL:
>  CREATE TABLE testsource(
>  `geo_altitude` FLOAT
> )
> PARTITIONED by ( `date` STRING) tblproperties (
>  'sink.partition-commit.delay'='1 s',
>  'sink.partition-commit.policy.kind'='metastore,success-file');
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-27450
> [2] https://issues.apache.org/jira/browse/HIVE-21584
> [3] https://issues.apache.org/jira/browse/FLINK-32596
>
>
> Best regards,
> Yuxia
>
> --
> *发件人: *"ramkrishna vasudevan" 
> *收件人: *"User" , "dev" 
> *发送时间: *星期五, 2023年 7 月 14日 下午 8:46:20
> *主题: *Issue with flink 1.16 and hive dialect
>
> Hi All,
> I am not sure if this was already discussed in this forum.
> In our set up with 1.16.0 flink we have ensured that the setup has all the
> necessary things for Hive catalog to work.
>
> The flink dialect works fine functionally (with some issues will come to
> that later).
>
> But when i follow the steps here in
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hive-dialect/queries/overview/#examples
> I am getting an exception once i set to hive dialect
> at
> org.apache.flink.table.client.SqlClient.main(SqlClient.java:161)
> [flink-sql-client-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
> Caused by: java.lang.ClassCastException: class
> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class
> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader
> and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
> at
> org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:413)
> ~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:389)
> ~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
> at
> org.apache.flink.table.planner.delegation.hive.HiveSessionState.(HiveSessionState.java:80)
> ~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
> at
> org.apache.flink.table.planner.delegation.hive.HiveSessionState.startSessionState(HiveSessionState.java:128)
> ~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
> at
> org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:210)
> ~[flink-sql-connector-hive-3.1.2_2.12-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
> at
> org.apache.flink.table.client.gateway.local.LocalExecutor.parseStatement(LocalExecutor.java:172)
> ~[flink-sql-client-1.16.0-0.0-SNAPSHOT.jar:1.16.0-0.0-SNAPSHOT]
>
> I have ensured the dialect related steps are completed followed including
> https://issues.apache.org/jira/browse/FLINK-25128
>
> In the flink catalog - if we create a table
> > CREATE TABLE testsource(
> >
> >  `date` STRING,
> >  `geo_altitude` FLOAT
> > )
> > PARTITIONED by ( `date`)
> >
> > WITH (
> >
> > 'connector' = 'hive',
> > 'sink.partition-commit.delay'='1 s',
> > 'sink.partition-commit.policy.kind'='metastore,success-file'
> > );
>
> The parition always gets created on the last set of columns and not on the
> columns that we specify. Is this a known bug?
>
> Regards
> Ram
>
>


Re: [DISCUSS] FLIP-330: Support specifying record timestamp requirement

2023-07-16 Thread Matt Wang
Hi Yunfeng,


Thank you for testing 1 again, and look forward to the performance results of 
TPC-DS later.

We use a latency marker to monitor the end-to-end latency of flink jobs. If the 
latencyTrackingInterval is set too small(like 5ms), it will have a large impact 
on performance. But if the latencyTrackingInterval is configured to be 
relatively large, such as 10s, this impact can be ignored.



--

Best,
Matt Wang


 Replied Message 
| From | Yunfeng Zhou |
| Date | 07/14/2023 20:30 |
| To |  |
| Subject | Re: [DISCUSS] FLIP-330: Support specifying record timestamp 
requirement |
Hi Matt,

1. I tried to add back the tag serialization process back to my POC
code and run the benchmark again, this time the performance
improvement is roughly reduced by half. It seems that both the
serialization and the judgement process have a major contribution to
the overhead reduction in the specific scenario, but in a production
environment where distributed cluster is deployed, I believe the
reduction in serialization would be a more major reason for
performance improvement.

2. According to the latency-tracking section in Flink document[1], it
seems that users would only enable latency markers for debugging
purposes, instead of using it in production code. Could you please
illustrate a bit more about the scenarios that would be limited when
latency markers are disabled?

3. I plan to benchmark the performance of this POC against TPC-DS and
hope that it could cover the common use cases that you are concerned
about. I believe there would still be performance improvement when the
size of each StreamRecord increases, though the improvement will not
be as obvious as that currently in FLIP.

Best regards,
Yunfeng

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#end-to-end-latency-tracking

On Thu, Jul 13, 2023 at 5:51 PM Matt Wang  wrote:

Hi Yunfeng,

Thanks for the proposal. The POC showed a performance improvement of 20%, which 
is very exciting. But I have some questions:
1. Is the performance improvement here mainly due to the reduction of 
serialization, or is it due to the judgment consumption caused by tags?
2. Watermark is not needed in some scenarios, but the latency maker is a useful 
function. If the latency maker cannot be used, it will greatly limit the usage 
scenarios. Whether the solution design can retain the capability of the latency 
marker;
3. The data of the POC test is of long type. Here I want to see how much profit 
it will have if it is a string with a length of 100B or 1KB.


--

Best,
Matt Wang


 Replied Message 
| From | Yunfeng Zhou |
| Date | 07/13/2023 14:52 |
| To |  |
| Subject | Re: [DISCUSS] FLIP-330: Support specifying record timestamp 
requirement |
Hi Jing,

Thanks for reviewing this FLIP.

1. I did change the names of some APIs in the FLIP compared with the
original version according to which I implemented the POC. As the core
optimization logic remains the same and the POC's performance can
still reflect the current FLIP's expected improvement, I have not
updated the POC code after that. I'll add a note on the benchmark
section of the FLIP saying that the namings in the POC code might be
outdated, and FLIP is still the source of truth for our proposed
design.

2. This FLIP could bring a fixed reduction on the workload of the
per-record serialization path in Flink, so if the absolute time cost
by non-optimized components could be lower, the performance
improvement of this FLIP would be more obvious. That's why I chose to
enable object-reuse and to transmit Boolean values in serialization.
If it would be more widely regarded as acceptable for a benchmark to
adopt more commonly-applied behavior(for object reuse, I believe
disable is more common), I would be glad to update the benchmark
result to disable object reuse.

Best regards,
Yunfeng


On Thu, Jul 13, 2023 at 6:37 AM Jing Ge  wrote:

Hi Yunfeng,

Thanks for the proposal. It makes sense to offer the optimization. I got
some NIT questions.

1. I guess you changed your thoughts while coding the POC, I found
pipeline.enable-operator-timestamp in the code but  is
pipeline.force-timestamp-support defined in the FLIP
2. about the benchmark example, why did you enable object reuse? Since It
is an optimization of serde, will the benchmark be better if it is
disabled?

Best regards,
Jing

On Mon, Jul 10, 2023 at 11:54 AM Yunfeng Zhou 
wrote:

Hi all,

Dong(cc'ed) and I are opening this thread to discuss our proposal to
support optimizing StreamRecord's serialization performance.

Currently, a StreamRecord would be converted into a 1-byte tag (+
8-byte timestamp) + N-byte serialized value during the serialization
process. In scenarios where timestamps and watermarks are not needed,
and latency tracking is enabled, this process would include
unnecessary information in the serialized byte array. This FLIP aims
to avoid such overhead and increases Flink job's performance during
serialization.

Ple