date:20230905

[jira] [Created] (FLINK-33035) Add Transformer and Estimator for Als

2023-09-05 Thread weibo zhao (Jira)

weibo zhao created FLINK-33035:
--

 Summary: Add Transformer and Estimator for Als
 Key: FLINK-33035
 URL: https://issues.apache.org/jira/browse/FLINK-33035
 Project: Flink
  Issue Type: New Feature
Reporter: weibo zhao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33036) Add Transformer and Estimator for Als

2023-09-05 Thread weibo zhao (Jira)

weibo zhao created FLINK-33036:
--

 Summary: Add Transformer and Estimator for Als
 Key: FLINK-33036
 URL: https://issues.apache.org/jira/browse/FLINK-33036
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: weibo zhao


Add Transformer and Estimator for Als



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-358: flink-avro enhancement and cleanup

2023-09-05 Thread Becket Qin

Hi Jing,

Thanks for the comments.

1. "For the batch cases, currently the BulkFormat for DataStream is
> missing" - true, and there is another option to leverage
> StreamFormatAdapter[1]
>
StreamFormatAdapter is internal and it requires a StreamFormat
implementation for Avro files which does not exist either.

2. "The following two interfaces should probably be marked as Public for
> now and Deprecated once we deprecate the InputFormat / OutputFormat" -
> would you like to share some background info of the deprecation of the
> InputFormat / OutputFormat? It is for me a little bit weird to mark APIs as
> public that are now known to be deprecated.

InputFormat and OutputFormat are legacy APIs for SourceFunction and
SinkFunction. So when the SourceFunction and SinkFunction are deprecated,
the InputFormat and OutputFormat should also be deprecated accordingly. As
of now, technically speaking, we have not deprecated these two APIs. So,
making them public for now is just to fix the stability annotation because
they are already used publicly by the users.

3. "Remove the PublicEvolving annotation for the following deprecated
> classes. It does not make sense for an API to be PublicEvolving and
> Deprecated at the same time" - this is very common in the Flink code base
> to have PublicEvolving and Deprecated at the same time. APIs that do not
> survive the PublicEvolving phase will be marked as deprecated in addition.
> Removing PublicEvolving in this case will break Flink API graduation rule.

Both PublicEvolving and Deprecated are status in the API lifecycle, they
are by definition mutually exclusive. When an API is marked as deprecated,
either the functionality is completely going away, or another API is
replacing the deprecated one. In either case, it does not make sense to
evolve that API any more. Even though Flink has some APIs marked with both
PublicEvolving and Deprecated at the same time, that does not make sense
and needs to be fixed. If a PublicEvolving API is deprecated, it should
only be marked as Deprecated, just like a Public API. I am not sure how
this would violate the API graduation rule, can you explain?

By the way, there is another orthogonal abuse of the Deprecated annotation
in the Flink code base. For private methods, we should not mark them as
deprecated and leave the existing code base using it, while introducing a
new method. This is a bad practice adding to technical debts. Instead, a
proper refactor should be done immediately in the same patch to just remove
that private method and migrate all the usage to the new method.

Thanks,

Jiangjie (Becket) Qin

On Fri, Sep 1, 2023 at 12:00 AM Jing Ge  wrote:

> Hi Becket,
>
> It is a very useful proposal, thanks for driving it. +1. I'd like to ask
> some questions to make sure I understand your thoughts correctly:
>
> 1. "For the batch cases, currently the BulkFormat for DataStream is
> missing" - true, and there is another option to leverage
> StreamFormatAdapter[1]
> 2. "The following two interfaces should probably be marked as Public for
> now and Deprecated once we deprecate the InputFormat / OutputFormat" -
> would you like to share some background info of the deprecation of the
> InputFormat / OutputFormat? It is for me a little bit weird to mark APIs as
> public that are now known to be deprecated.
> 3. "Remove the PublicEvolving annotation for the following deprecated
> classes. It does not make sense for an API to be PublicEvolving and
> Deprecated at the same time" - this is very common in the Flink code base
> to have PublicEvolving and Deprecated at the same time. APIs that do not
> survive the PublicEvolving phase will be marked as deprecated in addition.
> Removing PublicEvolving in this case will break Flink API graduation rule.
>
> Best regards,
> Jing
>
>
>
> [1]
>
> https://github.com/apache/flink/blob/1d1247d4ae6d4313f7d952c4b2d66351314c9432/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/impl/StreamFormatAdapter.java#L61
>
> On Thu, Aug 31, 2023 at 4:16 PM Becket Qin  wrote:
>
> > Hi Ryan, thanks for the reply.
> >
> > Verifying the component with the schemas you have would be super helpful.
> >
> > I think enum is actually a type that is generally useful. Although it is
> > not a part of ANSI SQL, MySQL and some other databases have this type.
> > BTW, ENUM_STRING proposed in this FLIP is actually not a type by itself.
> > The ENUM_STRING is just a syntax sugar which actually creates a "new
> > AtomicDataType(new VarCharType(Integer.MAX_VALUE), ENUM_CLASS)".  So, we
> > are not really introducing a new type here. However, in order to make the
> > VARCHAR to ENUM conversion work, the ENUM class has to be considered as a
> > ConversionClass of the VARCHAR type, and a StringToEnum converter is
> > required.
> >
> > And yes, AvroSchemaUtils should be annotated as @PublicEvolving.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Thu, Aug 31, 2023 at 5:22 PM Ryan Skr

[jira] [Created] (FLINK-33037) Bump Guava to 32.1.2-jre

2023-09-05 Thread Jing Ge (Jira)

Jing Ge created FLINK-33037:
---

 Summary: Bump Guava to 32.1.2-jre
 Key: FLINK-33037
 URL: https://issues.apache.org/jira/browse/FLINK-33037
 Project: Flink
  Issue Type: Improvement
Reporter: Jing Ge
Assignee: Jing Ge






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-09-05 Thread Rui Fan

After discussing this FLIP-334[1] offline with Gyula and Max,
I updated the FLIP based on the latest conclusion.

Big thanks to Gyula and Max for their professional advice!

> Does the interface function of handlerRecommendedParallelism
> in AutoScalerEventHandler conflict with
> handlerScalingFailure/handlerScalingReport (one of the
> handles the event of scale failure, and the other handles
> the event of scale success).
Hi Matt,

You can take a look at the FLIP, I think the issue has been fixed.
Currently, we introduced the ScalingRealizer and
AutoScalerEventHandler interface.

The ScalingRealizer handles scaling action.
- The AutoScalerEventHandler  interface handles loggable events.

Looking forward to your feedback, thanks!

[1] https://cwiki.apache.org/confluence/x/x4qzDw

Best,
Rui

On Thu, Aug 24, 2023 at 10:55 AM Matt Wang  wrote:

> Sorry for the late reply, I still have a small question here:
> Does the interface function of handlerRecommendedParallelism
> in AutoScalerEventHandler conflict with
> handlerScalingFailure/handlerScalingReport (one of the
> handles the event of scale failure, and the other handles
> the event of scale success).
>
>
>
> --
>
> Best,
> Matt Wang
>
>
>  Replied Message 
> | From | Rui Fan<1996fan...@gmail.com> |
> | Date | 08/21/2023 17:41 |
> | To |  |
> | Cc | Maximilian Michels ,
> Gyula Fóra ,
> Matt Wang |
> | Subject | Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes |
> Hi Max, Gyula and Matt,
>
> Do you have any other comments?
>
> The flink-kubernetes-operator 1.6 has been released recently,
> it's a good time to kick off this FLIP.
>
> Please let me know if you have any questions or concerns,
> looking forward to your feedback, thanks!
>
> Best,
> Rui
>
> On Wed, Aug 9, 2023 at 11:55 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> Hi Matt Wang,
>
> Thanks for your discussion here.
>
> it is recommended to unify the descriptions of AutoScalerHandler
> and AutoScalerEventHandler in the FLIP
>
> Good catch, I have updated all AutoScalerHandler to
> AutoScalerEventHandler.
>
> Can it support the use of zookeeper (zookeeper is a relatively
> common use of flink HA)?
>
> In my opinion, it's a good suggestion. However, I prefer we
> implement other state stores in the other FLINK JIRA, and
> this FLIP focus on the decoupling and implementing the
> necessary state store. Does that make sense?
>
> Regarding each scaling information, can it be persisted in
> the shared file system through the filesystem? I think it will
> be a more valuable requirement to support viewing
> Autoscaling info on the UI in the future, which can provide
> some foundations in advance;
>
> This is a good suggestion as well. It's useful for users to check
> the scaling information. I propose to add a CompositeEventHandler,
> it can include multiple EventHandlers.
>
> However, as the last question, I prefer we implement other
> event handler in the other FLINK JIRA. What do you think?
>
> A solution mentioned in FLIP is to initialize the
> AutoScalerEventHandler object every time an event is
> processed.
>
> No, the FLIP mentioned `The AutoScalerEventHandler  object is shared for
> all flink jobs`,
> So the AutoScalerEventHandler is only initialized once.
>
> And we call the AutoScalerEventHandler#handlerXXX
> every time an event is processed.
>
> Best,
> Rui
>
> On Tue, Aug 8, 2023 at 9:40 PM Matt Wang  wrote:
>
> Hi Rui
>
> Thanks for driving the FLIP.
>
> I agree with the point fo this FLIP. This FLIP first provides a
> general function of Autoscaler in Flink repo, and there is no
> need to move kubernetes-autoscaler from kubernetes-operator
> to Flink repo in this FLIP(it is recommended to unify the
> descriptions of AutoScalerHandler and AutoScalerEventHandler
> in the FLIP). Here I still have a few questions:
>
> 1. AutoScalerStateStore mainly records the state information
> during Scaling. In addition to supporting the use of configmap,
> can it support the use of zookeeper (zookeeper is a relatively
> common use of flink HA)?
> 2. Regarding each scaling information, can it be persisted in
> the shared file system through the filesystem? I think it will
> be a more valuable requirement to support viewing
> Autoscaling info on the UI in the future, which can provide
> some foundations in advance;
> 3. A solution mentioned in FLIP is to initialize the
> AutoScalerEventHandler object every time an event is
> processed. What is the main purpose of this solution?
>
>
>
> --
>
> Best,
> Matt Wang
>
>
>  Replied Message 
> | From | Rui Fan<1996fan...@gmail.com> |
> | Date | 08/7/2023 11:34 |
> | To |  |
> | Cc | m...@apache.org ,
> Gyula Fóra |
> | Subject | Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes
> |
> Hi Ron:
>
> Thanks for the feedback! The goal is indeed to turn the autoscaler into
> a general tool that can support other resource management.
>
>
> Hi Max, Gyula:
>
> My proposed `AutoScalerStateStore` is similar to Map, it can really be
> impr

Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-09-05 Thread David Morávek

Hi Tawfik,

It's exciting to see any ongoing research that tries to push Flink forward!

The get the discussion started, can you please your paper with the
community? Assessing the proposal without further context is tough.

Best,
D.

On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek 
wrote:

> Dear Apache Flink Development Team,
>
> I hope this email finds you well. I am writing to propose an exciting new
> feature for Apache Flink that has the potential to significantly enhance
> its capabilities in handling unbounded streams of events, particularly in
> the context of event-time windowing.
>
> As you may be aware, Apache Flink has been at the forefront of Big Data
> Stream processing engines, leveraging windowing techniques to manage
> unbounded event streams effectively. The accuracy of the results obtained
> from these streams relies heavily on the ability to gather all relevant
> input within a window. At the core of this process are watermarks, which
> serve as unique timestamps marking the progression of events in time.
>
> However, our analysis has revealed a critical issue with the current
> watermark generation method in Apache Flink. This method, which operates at
> the input stream level, exhibits a bias towards faster sub-streams,
> resulting in the unfortunate consequence of dropped events from slower
> sub-streams. Our investigations showed that Apache Flink's conventional
> watermark generation approach led to an alarming data loss of approximately
> 33% when 50% of the keys around the median experienced delays. This loss
> further escalated to over 37% when 50% of random keys were delayed.
>
> In response to this issue, we have authored a research paper outlining a
> novel strategy named "keyed watermarks" to address data loss and
> substantially enhance data processing accuracy, achieving at least 99%
> accuracy in most scenarios.
>
> Moreover, we have conducted comprehensive comparative studies to evaluate
> the effectiveness of our strategy against the conventional watermark
> generation method, specifically in terms of event-time tracking accuracy.
>
> We believe that implementing keyed watermarks in Apache Flink can greatly
> enhance its performance and reliability, making it an even more valuable
> tool for organizations dealing with complex, high-throughput data
> processing tasks.
>
> We kindly request your consideration of this proposal. We would be eager
> to discuss further details, provide the full research paper, or collaborate
> closely to facilitate the integration of this feature into Apache Flink.
>
> Thank you for your time and attention to this proposal. We look forward to
> the opportunity to contribute to the continued success and evolution of
> Apache Flink.
>
> Best Regards,
>
> Tawfik Yasser
> Senior Teaching Assistant @ Nile University, Egypt
> Email: tyas...@nu.edu.eg
> LinkedIn: https://www.linkedin.com/in/tawfikyasser/
>

Re: Re: [DISCUSS] FLIP-357: Deprecate Iteration API of DataStream

2023-09-05 Thread David Morávek

+1 since there is an alternative, more complete implementation available

Best,
D.

On Sat, Sep 2, 2023 at 12:07 AM David Anderson  wrote:

> +1
>
> Keeping the legacy implementation in place is confusing and encourages
> adoption of something that really shouldn't be used.
>
> Thanks for driving this,
> David
>
> On Fri, Sep 1, 2023 at 8:45 AM Jing Ge  wrote:
> >
> > Hi Wencong,
> >
> > Thanks for your clarification! +1
> >
> > Best regards,
> > Jing
> >
> > On Fri, Sep 1, 2023 at 12:36 PM Wencong Liu 
> wrote:
> >
> > > Hi Jing,
> > >
> > >
> > > Thanks for your reply！
> > >
> > >
> > > > Or the "independent module extraction" mentioned in the FLIP does
> mean an
> > > independent module in Flink?
> > >
> > >
> > > Yes. If there are submodules in Flink repository needs the iteration
> > > (currently not),
> > > we could consider extracting them to a new submodule of Flink.
> > >
> > >
> > > > users will have to add one more dependency of Flink ML. If iteration
> is
> > > the
> > > only feature they need, it will look a little bit weird.
> > >
> > >
> > > If users only need to execute iteration jobs, they can simply remove
> the
> > > Flink
> > > dependency and add the necessary dependencies related to Flink ML.
> > > However,
> > > they can still utilize the DataStream API as it is also a dependency of
> > > Flink ML.
> > >
> > >
> > > Keeping an iteration submodule in Flink repository and make Flink ML
> > > depends it
> > > is also another solution. But the current implementation of Iteration
> in
> > > DataStream
> > > should be removed definitely due to its Incompleteness.
> > >
> > >
> > > The placement of the Iteration API in the repository is a topic that
> has
> > > multiple
> > > potential solutions. WDYT?
> > >
> > >
> > > Best,
> > > Wencong
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2023-09-01 17:59:34, "Jing Ge"  wrote:
> > > >Hi Wencong,
> > > >
> > > >Thanks for the proposal!
> > > >
> > > >"The Iteration API in DataStream is planned be deprecated in Flink
> 1.19
> > > and
> > > >then finally removed in Flink 2.0. For the users that rely on the
> > > Iteration
> > > >API in DataStream, they will have to migrate to Flink ML."
> > > >- Does it make sense to migrate the iteration module into Flink
> directly?
> > > >Or the "independent module extraction" mentioned in the FLIP does
> mean an
> > > >independent module in Flink? Since the iteration will be removed in
> Flink,
> > > >users will have to add one more dependency of Flink ML. If iteration
> is
> > > the
> > > >only feature they need, it will look a little bit weird.
> > > >
> > > >
> > > >Best regards,
> > > >Jing
> > > >
> > > >On Fri, Sep 1, 2023 at 11:05 AM weijie guo  >
> > > >wrote:
> > > >
> > > >> Thanks, +1 for this.
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Weijie
> > > >>
> > > >>
> > > >> Yangze Guo  于2023年9月1日周五 14:29写道：
> > > >>
> > > >> > +1
> > > >> >
> > > >> > Thanks for driving this.
> > > >> >
> > > >> > Best,
> > > >> > Yangze Guo
> > > >> >
> > > >> > On Fri, Sep 1, 2023 at 2:00 PM Xintong Song <
> tonysong...@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > +1
> > > >> > >
> > > >> > > Best,
> > > >> > >
> > > >> > > Xintong
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Fri, Sep 1, 2023 at 1:11 PM Dong Lin 
> > > wrote:
> > > >> > >
> > > >> > > > Thanks Wencong for initiating the discussion.
> > > >> > > >
> > > >> > > > +1 for the proposal.
> > > >> > > >
> > > >> > > > On Fri, Sep 1, 2023 at 12:00 PM Wencong Liu <
> liuwencle...@163.com
> > > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Hi devs,
> > > >> > > > >
> > > >> > > > > I would like to start a discussion on FLIP-357: Deprecate
> > > Iteration
> > > >> > API
> > > >> > > > of
> > > >> > > > > DataStream [1].
> > > >> > > > >
> > > >> > > > > Currently, the Iteration API of DataStream is incomplete.
> For
> > > >> > instance,
> > > >> > > > it
> > > >> > > > > lacks support
> > > >> > > > > for iteration in sync mode and exactly once semantics.
> > > >> Additionally,
> > > >> > it
> > > >> > > > > does not offer the
> > > >> > > > > ability to set iteration termination conditions. As a
> result,
> > > it's
> > > >> > hard
> > > >> > > > > for developers to
> > > >> > > > > build an iteration pipeline by DataStream in the practical
> > > >> > applications
> > > >> > > > > such as machine learning.
> > > >> > > > >
> > > >> > > > > FLIP-176: Unified Iteration to Support Algorithms [2] has
> > > >> introduced
> > > >> > a
> > > >> > > > > unified iteration library
> > > >> > > > > in the Flink ML repository. This library addresses all the
> > > issues
> > > >> > present
> > > >> > > > > in the Iteration API of
> > > >> > > > > DataStream and could provide solution for all the iteration
> > > >> > use-cases.
> > > >> > > > > However, maintaining two
> > > >> > > > > separate implementations of iteration in both the Flink
> > > repository
> > > >> > and
> > > >> > > > the
> > > >> > > > > Flink ML

[jira] [Created] (FLINK-33038) remove getMinRetentionTime in StreamExecDeduplicate

2023-09-05 Thread xiaogang zhou (Jira)

xiaogang zhou created FLINK-33038:
-

 Summary: remove getMinRetentionTime in StreamExecDeduplicate
 Key: FLINK-33038
 URL: https://issues.apache.org/jira/browse/FLINK-33038
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.18.0
Reporter: xiaogang zhou
 Fix For: 1.19.0


I suggest to remove the 

StreamExecDeduplicate method in StreamExecDeduplicate as the ttl is controlled 
by the state meta data.

 

Please let me take the issue if possible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] Release flink-connector-hbase v3.0.0, release candidate 2

2023-09-05 Thread Ferenc Csaky

Hi,

Thanks Martijn for initiating the release!

+1 (non-binding)

- checked signatures and checksums
- checked source has no binaries
- checked LICENSE and NOTICE files
- approved web PR

Cheers,
Ferenc




--- Original Message ---
On Monday, September 4th, 2023 at 12:54, Samrat Deb  
wrote:


> 
> 
> Hi,
> 
> +1 (non-binding)
> 
> Verified NOTICE files
> Verified CheckSum and signatures
> Glanced through PR[1] , Looks good to me
> 
> Bests,
> Samrat
> 
> [1]https://github.com/apache/flink-web/pull/591
> 
> 
> > On 04-Sep-2023, at 2:22 PM, Ahmed Hamdy hamdy10...@gmail.com wrote:
> > 
> > Hi Martijn,
> > +1 (non-binding)
> > 
> > - verified Checksums and signatures
> > - no binaries in source
> > - Checked NOTICE files contains migrated artifacts
> > - tag is correct
> > - Approved Web PR
> > 
> > Best Regards
> > Ahmed Hamdy
> > 
> > On Fri, 1 Sept 2023 at 15:35, Martijn Visser martijnvis...@apache.org
> > wrote:
> > 
> > > Hi everyone,
> > > 
> > > Please review and vote on the release candidate #2 for the version 3.0.0,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > > 
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > > [2],
> > > which are signed with the key with fingerprint
> > > A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag v3.0.0-rc2 [5],
> > > * website pull request listing the new release [6].
> > > 
> > > This replaces the old, cancelled vote of RC1 [7]. This version is the
> > > externalized version which is compatible with Flink 1.16 and 1.17.
> > > 
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > > 
> > > Thanks,
> > > Release Manager
> > > 
> > > [1]
> > > 
> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352578
> > > [2]
> > > 
> > > https://dist.apache.org/repos/dist/dev/flink/flink-connector-hbase-3.0.0-rc2
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4] https://repository.apache.org/content/repositories/orgapacheflink-1650
> > > [5]
> > > https://github.com/apache/flink-connector-hbase/releases/tag/v3.0.0-rc2
> > > [6] https://github.com/apache/flink-web/pull/591
> > > [7] https://lists.apache.org/thread/wbl6sc86q9s5mmz5slx4z09svh91cpr0

[RESULT][VOTE] FLIP-348: Make expanding behavior of virtual metadata columns configurable

2023-09-05 Thread Timo Walther


Hi everyone,

The voting time for [VOTE] FLIP-348: Make expanding behavior of virtual 
metadata columns configurable[1] has passed. I'm closing the vote now.


There were 6 +1 votes, all were binding:

- Martijn Visser (binding)
- Benchao Li (binding)
- Godfrey He (binding)
- Sergey Nuyanzin (binding)
- Jing Ge (binding)
- Jark Wu (binding)

There were no -1 votes.

Thus, FLIP-348 has been accepted.

Thanks everyone for joining the discussion and giving feedback!

[1] https://lists.apache.org/thread/lc2h6xpk25bmvm4c5pgtfggm920j2ckr

Cheers,
Timo



On 31.08.23 14:29, Jark Wu wrote:

+1 (binding)

Best,
Jark


2023年8月31日 18:54，Jing Ge  写道：

+1(binding)

On Thu, Aug 31, 2023 at 11:22 AM Sergey Nuyanzin 
wrote:


+1 (binding)

On Thu, Aug 31, 2023 at 9:28 AM Benchao Li  wrote:


+1 (binding)

Martijn Visser  于2023年8月31日周四 15:24写道：


+1 (binding)

On Thu, Aug 31, 2023 at 9:09 AM Timo Walther 

wrote:



Hi everyone,

I'd like to start a vote on FLIP-348: Make expanding behavior of

virtual

metadata columns configurable [1] which has been discussed in this
thread [2].

The vote will be open for at least 72 hours unless there is an

objection

or not enough votes.

[1] https://cwiki.apache.org/confluence/x/_o6zDw
[2] https://lists.apache.org/thread/zon967w7synby8z6m1s7dj71dhkh9ccy

Cheers,
Timo





--

Best,
Benchao Li




--
Best regards,
Sergey

[REQUEST] Edit Permissions for FLIP

2023-09-05 Thread Chen Zhanghao

Hi folks,

I'm writing to request the edit permission for FLIP. My Confluence Wiki ID is: 
zhanghao.chen. I've recently reported two JIRA issues and was reminded of the 
need to create a FLIP for each of them as they would change the public API:

  1.  [FLINK-25371] Include data port as part of the host info for subtask 
detail panel on Web UI - ASF JIRA 
(apache.org). During code 
review with Fan Rui and Weihua, we think it better to align the inconsistent 
usage of the host field in various REST APIs (some only contains host name, 
some contains host + port), and would therefore need to add a new field that 
consistently holds both hostname and port info while keeping the old host field 
untouched.

  2.  [FLINK-32872] Add option to control the default partitioner when the 
parallelism of upstream and downstream operator does not match - ASF JIRA 
(apache.org), which intends 
to add a new configuration.

Thanks for your attention, much appreciated in advance.

Best,
Zhanghao Chen

[DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Gyula Fóra

Hi All!

@Maximilian Michels  has raised the question of Flink
version support in the operator before the last release. I would like to
open this discussion publicly so we can finalize this before the next
release.

Background:
Currently the Flink Operator supports all Flink versions since Flink 1.13.
While this is great for the users, it introduces a lot of backward
compatibility related code in the operator logic and also adds considerable
time to the CI. We should strike a reasonable balance here that allows us
to move forward and eliminate some of this tech debt.

In the current model it is also impossible to support all features for all
Flink versions which leads to some confusion over time.

Proposal:
Since it's a key feature of the kubernetes operator to support several
versions at the same time, I propose to support the last 4 stable Flink
minor versions. Currently this would mean to support Flink 1.14-1.17 (and
drop 1.13 support). When Flink 1.18 is released we would drop 1.14 support
and so on. Given the Flink release cadence this means about 2 year support
for each Flink version.

What do you think?

Cheers,
Gyula

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Galen Warren

Sounds good to me, thanks.

On Tue, Sep 5, 2023, 8:12 AM Gyula Fóra  wrote:

> Hi All!
>
> @Maximilian Michels  has raised the question of Flink
> version support in the operator before the last release. I would like to
> open this discussion publicly so we can finalize this before the next
> release.
>
> Background:
> Currently the Flink Operator supports all Flink versions since Flink 1.13.
> While this is great for the users, it introduces a lot of backward
> compatibility related code in the operator logic and also adds considerable
> time to the CI. We should strike a reasonable balance here that allows us
> to move forward and eliminate some of this tech debt.
>
> In the current model it is also impossible to support all features for all
> Flink versions which leads to some confusion over time.
>
> Proposal:
> Since it's a key feature of the kubernetes operator to support several
> versions at the same time, I propose to support the last 4 stable Flink
> minor versions. Currently this would mean to support Flink 1.14-1.17 (and
> drop 1.13 support). When Flink 1.18 is released we would drop 1.14 support
> and so on. Given the Flink release cadence this means about 2 year support
> for each Flink version.
>
> What do you think?
>
> Cheers,
> Gyula
>

[DISCUSS] FLIP-361: Improve GC Metrics

2023-09-05 Thread Gyula Fóra

Hi Devs,

I would like to start a discussion on FLIP-361: Improve GC Metrics [1].

The current Flink GC metrics [2] are not very useful for monitoring
purposes as they require post processing logic that is also dependent on
the current runtime environment.

Problems:
 - Total time is not very relevant for long running applications, only the
rate of change (msPerSec)
 - In most cases it's best to simply aggregate the time/count across the
different GabrageCollectors, however the specific collectors are dependent
on the current Java runtime

We propose to improve the current situation by:
 - Exposing rate metrics per GarbageCollector
 - Exposing aggregated Total time/count/rate metrics

These new metrics are all derived from the existing ones with minimal
overhead.

Looking forward to your feedback.

Cheers,
Gyula

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection

回复: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Chen Zhanghao

+1 for the proposal. A side question: how will we handle a major Flink version 
given that Flink 2.0 is around the corner.

Best,
Zhanghao Chen

发件人: Gyula Fóra 
发送时间: 2023年9月5日 20:12
收件人: dev 
抄送: Maximilian Michels ; Thomas Weise ; 
Márton Balassi ; morh...@apache.org 
主题: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

Hi All!

@Maximilian Michels  has raised the question of Flink
version support in the operator before the last release. I would like to
open this discussion publicly so we can finalize this before the next
release.

Background:
Currently the Flink Operator supports all Flink versions since Flink 1.13.
While this is great for the users, it introduces a lot of backward
compatibility related code in the operator logic and also adds considerable
time to the CI. We should strike a reasonable balance here that allows us
to move forward and eliminate some of this tech debt.

In the current model it is also impossible to support all features for all
Flink versions which leads to some confusion over time.

Proposal:
Since it's a key feature of the kubernetes operator to support several
versions at the same time, I propose to support the last 4 stable Flink
minor versions. Currently this would mean to support Flink 1.14-1.17 (and
drop 1.13 support). When Flink 1.18 is released we would drop 1.14 support
and so on. Given the Flink release cadence this means about 2 year support
for each Flink version.

What do you think?

Cheers,
Gyula

[jira] [Created] (FLINK-33039) Avro Specific Record Logical timestamp is not serialized in Parquet

2023-09-05 Thread Ahmed Elhassany (Jira)

Ahmed Elhassany created FLINK-33039:
---

 Summary: Avro Specific Record Logical timestamp is not serialized 
in Parquet
 Key: FLINK-33039
 URL: https://issues.apache.org/jira/browse/FLINK-33039
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.17.1
Reporter: Ahmed Elhassany


I'm trying to save a SpecificRecord to S3 Parquet, which contains a field with 
a logical timestmap. It's defined as
{code:java}
{
  "name": "ts",
  "type": {
"type": "long",
"logicalType": "timestamp-millis"
  }
}
  {code}
And I'm using the following method to save it
{code:java}
final FileSink sinkFlowAggregationAvro =
FileSink.forBulkFormat(path, 
AvroParquetWriters.forSpecificRecord(MyObj.class))
.withOutputFileConfig(OutputFileConfig
.builder()
.withPartSuffix(".parquet")
.build())
.build(); {code}
 

However, I'm getting the following casting errors:

 
{noformat}
flink-taskmanager-b467cbff9-n28zp taskmanager 
2023-09-05T16:10:02.124425478+02:00 Caused by: java.lang.ClassCastException: 
class java.time.Instant cannot be cast to class java.lang.Number 
(java.time.Instant and java.lang.Number are in module java.base of loader 
'bootstrap')
flink-taskmanager-b467cbff9-n28zp taskmanager 
2023-09-05T16:10:02.124425478+02:00     at 
org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:340)
 
~[blob_p-22acff48719adf70603f57842bd158d7f5538a47-e40c3e350efab078d53261fe2bc38640:?]
flink-taskmanager-b467cbff9-wt8p9 taskmanager 
2023-09-05T16:10:01.868385407+02:00     at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
 ~[flink-dist-1.17.1.jar:1.17.1]
flink-taskmanager-b467cbff9-n28zp taskmanager 
2023-09-05T16:10:02.124425478+02:00     at 
org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:288) 
~[blob_p-22acff48719adf70603f57842bd158d7f5538a47-e40c3e350efab078d53261fe2bc38640:?]
flink-taskmanager-b467cbff9-wt8p9 taskmanager 
2023-09-05T16:10:01.868385407+02:00     at 
org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:39)
 ~[flink-dist-1.17.1.jar:1.17.1]
flink-taskmanager-b467cbff9-wt8p9 taskmanager 
2023-09-05T16:10:01.868385407+02:00     at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:75)
 ~[flink-dist-1.17.1.jar:1.17.1]
flink-taskmanager-b467cbff9-wt8p9 taskmanager 
2023-09-05T16:10:01.868385407+02:00     ... 21 more
flink-taskmanager-b467cbff9-m5gdt taskmanager 
2023-09-05T16:10:01.979428558+02:00     at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
 ~[flink-dist-1.17.1.jar:1.17.1]
flink-taskmanager-b467cbff9-wt8p9 taskmanager 
2023-09-05T16:10:01.868385407+02:00 Caused by: 
org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: 
Could not forward element to next operator
flink-taskmanager-b467cbff9-2xn5w taskmanager 
2023-09-05T16:10:01.871644827+02:00     at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:144)
 ~[flink-connector-base-1.17.1.jar:1.17.1]
flink-taskmanager-b467cbff9-m5gdt taskmanager 
2023-09-05T16:10:01.979428558+02:00     at 
org.apache.flink.streaming.api.operators.StreamFilter.processElement(StreamFilter.java:39)
 ~[flink-dist-1.17.1.jar:1.17.1]
flink-taskmanager-b467cbff9-pqjqr taskmanager 
2023-09-05T16:10:02.276107852+02:00     ... 21 more
flink-taskmanager-b467cbff9-m5gdt taskmanager 
2023-09-05T16:10:01.979428558+02:00     at 
org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput^Cflink-taskmanager-b467cbff9-m5gdt
 taskmanager 2023-09-05T16:10:01.979428558+02:00     ... 21 more{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread ConradJam

+1 (non-binding)

Yuepeng Pan  于2023年9月1日周五 15:43写道：

> +1 (non-binding)
>
> Best,
> Yuepeng
>
>
>
> At 2023-09-01 14:32:19, "Jark Wu"  wrote:
> >+1 (binding)
> >
> >Best,
> >Jark
> >
> >> 2023年8月30日 02:40，Venkatakrishnan Sowrirajan  写道：
> >>
> >> Hi everyone,
> >>
> >> Thank you all for your feedback on FLIP-356. I'd like to start a vote.
> >>
> >> Discussion thread:
> >> https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7
> >> FLIP:
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown
> >>
> >> Regards
> >> Venkata krishnan
>

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Martijn Visser

+1 (binding)

On Tue, Sep 5, 2023 at 4:16 PM ConradJam  wrote:

> +1 (non-binding)
>
> Yuepeng Pan  于2023年9月1日周五 15:43写道：
>
> > +1 (non-binding)
> >
> > Best,
> > Yuepeng
> >
> >
> >
> > At 2023-09-01 14:32:19, "Jark Wu"  wrote:
> > >+1 (binding)
> > >
> > >Best,
> > >Jark
> > >
> > >> 2023年8月30日 02:40，Venkatakrishnan Sowrirajan  写道：
> > >>
> > >> Hi everyone,
> > >>
> > >> Thank you all for your feedback on FLIP-356. I'd like to start a vote.
> > >>
> > >> Discussion thread:
> > >> https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7
> > >> FLIP:
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown
> > >>
> > >> Regards
> > >> Venkata krishnan
> >
>

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Jiabao Sun

+1 (non-binding)

Best,
Jiabao


> 2023年9月5日 下午10:33，Martijn Visser  写道：
> 
> +1 (binding)
> 
> On Tue, Sep 5, 2023 at 4:16 PM ConradJam  wrote:
> 
>> +1 (non-binding)
>> 
>> Yuepeng Pan  于2023年9月1日周五 15:43写道：
>> 
>>> +1 (non-binding)
>>> 
>>> Best,
>>> Yuepeng
>>> 
>>> 
>>> 
>>> At 2023-09-01 14:32:19, "Jark Wu"  wrote:
 +1 (binding)
 
 Best,
 Jark
 
> 2023年8月30日 02:40，Venkatakrishnan Sowrirajan  写道：
> 
> Hi everyone,
> 
> Thank you all for your feedback on FLIP-356. I'd like to start a vote.
> 
> Discussion thread:
> https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7
> FLIP:
> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown
> 
> Regards
> Venkata krishnan
>>> 
>>

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Sergey Nuyanzin

+1 (binding)

On Tue, Sep 5, 2023 at 4:55 PM Jiabao Sun 
wrote:

> +1 (non-binding)
>
> Best,
> Jiabao
>
>
> > 2023年9月5日 下午10:33，Martijn Visser  写道：
> >
> > +1 (binding)
> >
> > On Tue, Sep 5, 2023 at 4:16 PM ConradJam  wrote:
> >
> >> +1 (non-binding)
> >>
> >> Yuepeng Pan  于2023年9月1日周五 15:43写道：
> >>
> >>> +1 (non-binding)
> >>>
> >>> Best,
> >>> Yuepeng
> >>>
> >>>
> >>>
> >>> At 2023-09-01 14:32:19, "Jark Wu"  wrote:
>  +1 (binding)
> 
>  Best,
>  Jark
> 
> > 2023年8月30日 02:40，Venkatakrishnan Sowrirajan  写道：
> >
> > Hi everyone,
> >
> > Thank you all for your feedback on FLIP-356. I'd like to start a
> vote.
> >
> > Discussion thread:
> > https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7
> > FLIP:
> >
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown
> >
> > Regards
> > Venkata krishnan
> >>>
> >>
>
>

-- 
Best regards,
Sergey

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Thomas Weise

+1, thanks for the proposal

On Tue, Sep 5, 2023 at 8:13 AM Gyula Fóra  wrote:

> Hi All!
>
> @Maximilian Michels  has raised the question of Flink
> version support in the operator before the last release. I would like to
> open this discussion publicly so we can finalize this before the next
> release.
>
> Background:
> Currently the Flink Operator supports all Flink versions since Flink 1.13.
> While this is great for the users, it introduces a lot of backward
> compatibility related code in the operator logic and also adds considerable
> time to the CI. We should strike a reasonable balance here that allows us
> to move forward and eliminate some of this tech debt.
>
> In the current model it is also impossible to support all features for all
> Flink versions which leads to some confusion over time.
>
> Proposal:
> Since it's a key feature of the kubernetes operator to support several
> versions at the same time, I propose to support the last 4 stable Flink
> minor versions. Currently this would mean to support Flink 1.14-1.17 (and
> drop 1.13 support). When Flink 1.18 is released we would drop 1.14 support
> and so on. Given the Flink release cadence this means about 2 year support
> for each Flink version.
>
> What do you think?
>
> Cheers,
> Gyula
>

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-05 Thread Maximilian Michels

Hi Gyula,

+1 The proposed changes make sense and are in line with what is
available for other metrics, e.g. number of records processed.

-Max

On Tue, Sep 5, 2023 at 2:43 PM Gyula Fóra  wrote:
>
> Hi Devs,
>
> I would like to start a discussion on FLIP-361: Improve GC Metrics [1].
>
> The current Flink GC metrics [2] are not very useful for monitoring
> purposes as they require post processing logic that is also dependent on
> the current runtime environment.
>
> Problems:
>  - Total time is not very relevant for long running applications, only the
> rate of change (msPerSec)
>  - In most cases it's best to simply aggregate the time/count across the
> different GabrageCollectors, however the specific collectors are dependent
> on the current Java runtime
>
> We propose to improve the current situation by:
>  - Exposing rate metrics per GarbageCollector
>  - Exposing aggregated Total time/count/rate metrics
>
> These new metrics are all derived from the existing ones with minimal
> overhead.
>
> Looking forward to your feedback.
>
> Cheers,
> Gyula
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection

Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-09-05 Thread Maximilian Michels

Thanks Rui for the update!

Alongside with the refactoring to decouple autoscaler logic from the
deployment logic, are we planning to add an alternative implementation
against the new interfaces? I think the best way to get the interfaces
right, is to have an alternative implementation in addition to
Kubernetes. YARN or a standalone mode implementation were already
mentioned. Ultimately, this is the reason we are doing the
refactoring. Without a new implementation, it becomes harder to
justify the refactoring work.

Cheers,
Max

On Tue, Sep 5, 2023 at 9:48 AM Rui Fan  wrote:
>
> After discussing this FLIP-334[1] offline with Gyula and Max,
> I updated the FLIP based on the latest conclusion.
>
> Big thanks to Gyula and Max for their professional advice!
>
> > Does the interface function of handlerRecommendedParallelism
> > in AutoScalerEventHandler conflict with
> > handlerScalingFailure/handlerScalingReport (one of the
> > handles the event of scale failure, and the other handles
> > the event of scale success).
> Hi Matt,
>
> You can take a look at the FLIP, I think the issue has been fixed.
> Currently, we introduced the ScalingRealizer and
> AutoScalerEventHandler interface.
>
> The ScalingRealizer handles scaling action.
>
> The AutoScalerEventHandler  interface handles loggable events.
>
>
> Looking forward to your feedback, thanks!
>
> [1] https://cwiki.apache.org/confluence/x/x4qzDw
>
> Best,
> Rui
>
> On Thu, Aug 24, 2023 at 10:55 AM Matt Wang  wrote:
>>
>> Sorry for the late reply, I still have a small question here:
>> Does the interface function of handlerRecommendedParallelism
>> in AutoScalerEventHandler conflict with
>> handlerScalingFailure/handlerScalingReport (one of the
>> handles the event of scale failure, and the other handles
>> the event of scale success).
>>
>>
>>
>> --
>>
>> Best,
>> Matt Wang
>>
>>
>>  Replied Message 
>> | From | Rui Fan<1996fan...@gmail.com> |
>> | Date | 08/21/2023 17:41 |
>> | To |  |
>> | Cc | Maximilian Michels ,
>> Gyula Fóra ,
>> Matt Wang |
>> | Subject | Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes |
>> Hi Max, Gyula and Matt,
>>
>> Do you have any other comments?
>>
>> The flink-kubernetes-operator 1.6 has been released recently,
>> it's a good time to kick off this FLIP.
>>
>> Please let me know if you have any questions or concerns,
>> looking forward to your feedback, thanks!
>>
>> Best,
>> Rui
>>
>> On Wed, Aug 9, 2023 at 11:55 AM Rui Fan <1996fan...@gmail.com> wrote:
>>
>> Hi Matt Wang,
>>
>> Thanks for your discussion here.
>>
>> it is recommended to unify the descriptions of AutoScalerHandler
>> and AutoScalerEventHandler in the FLIP
>>
>> Good catch, I have updated all AutoScalerHandler to
>> AutoScalerEventHandler.
>>
>> Can it support the use of zookeeper (zookeeper is a relatively
>> common use of flink HA)?
>>
>> In my opinion, it's a good suggestion. However, I prefer we
>> implement other state stores in the other FLINK JIRA, and
>> this FLIP focus on the decoupling and implementing the
>> necessary state store. Does that make sense?
>>
>> Regarding each scaling information, can it be persisted in
>> the shared file system through the filesystem? I think it will
>> be a more valuable requirement to support viewing
>> Autoscaling info on the UI in the future, which can provide
>> some foundations in advance;
>>
>> This is a good suggestion as well. It's useful for users to check
>> the scaling information. I propose to add a CompositeEventHandler,
>> it can include multiple EventHandlers.
>>
>> However, as the last question, I prefer we implement other
>> event handler in the other FLINK JIRA. What do you think?
>>
>> A solution mentioned in FLIP is to initialize the
>> AutoScalerEventHandler object every time an event is
>> processed.
>>
>> No, the FLIP mentioned `The AutoScalerEventHandler  object is shared for
>> all flink jobs`,
>> So the AutoScalerEventHandler is only initialized once.
>>
>> And we call the AutoScalerEventHandler#handlerXXX
>> every time an event is processed.
>>
>> Best,
>> Rui
>>
>> On Tue, Aug 8, 2023 at 9:40 PM Matt Wang  wrote:
>>
>> Hi Rui
>>
>> Thanks for driving the FLIP.
>>
>> I agree with the point fo this FLIP. This FLIP first provides a
>> general function of Autoscaler in Flink repo, and there is no
>> need to move kubernetes-autoscaler from kubernetes-operator
>> to Flink repo in this FLIP(it is recommended to unify the
>> descriptions of AutoScalerHandler and AutoScalerEventHandler
>> in the FLIP). Here I still have a few questions:
>>
>> 1. AutoScalerStateStore mainly records the state information
>> during Scaling. In addition to supporting the use of configmap,
>> can it support the use of zookeeper (zookeeper is a relatively
>> common use of flink HA)?
>> 2. Regarding each scaling information, can it be persisted in
>> the shared file system through the filesystem? I think it will
>> be a more valuable requirement to support viewing
>> Autoscaling info on the UI in the futur

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Őrhidi Mátyás

+1

On Tue, Sep 5, 2023 at 8:03 AM Thomas Weise  wrote:

> +1, thanks for the proposal
>
> On Tue, Sep 5, 2023 at 8:13 AM Gyula Fóra  wrote:
>
> > Hi All!
> >
> > @Maximilian Michels  has raised the question of Flink
> > version support in the operator before the last release. I would like to
> > open this discussion publicly so we can finalize this before the next
> > release.
> >
> > Background:
> > Currently the Flink Operator supports all Flink versions since Flink
> 1.13.
> > While this is great for the users, it introduces a lot of backward
> > compatibility related code in the operator logic and also adds
> considerable
> > time to the CI. We should strike a reasonable balance here that allows us
> > to move forward and eliminate some of this tech debt.
> >
> > In the current model it is also impossible to support all features for
> all
> > Flink versions which leads to some confusion over time.
> >
> > Proposal:
> > Since it's a key feature of the kubernetes operator to support several
> > versions at the same time, I propose to support the last 4 stable Flink
> > minor versions. Currently this would mean to support Flink 1.14-1.17 (and
> > drop 1.13 support). When Flink 1.18 is released we would drop 1.14
> support
> > and so on. Given the Flink release cadence this means about 2 year
> support
> > for each Flink version.
> >
> > What do you think?
> >
> > Cheers,
> > Gyula
> >
>

[jira] [Created] (FLINK-33040) flink-connector-hive builds might be blocked (but not fail) because Maven tries to access conjars.org repository (which times out)

2023-09-05 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-33040:
-

 Summary: flink-connector-hive builds might be blocked (but not 
fail) because Maven tries to access conjars.org repository (which times out)
 Key: FLINK-33040
 URL: https://issues.apache.org/jira/browse/FLINK-33040
 Project: Flink
  Issue Type: Bug
  Components: Build System / CI
Affects Versions: 1.17.1, 1.16.2, 1.18.0, 1.19.0
Reporter: Matthias Pohl


We experienced timeouts when building {{flink-connectors/flink-connector-hive}} 
because Maven tries to access {{http://conjars.org}} to retrieve meta 
information for  {{net.minidev:json-smart}} which fails because the repository 
is gone.

[~gunnar.morling] already described this in his blog post 
https://www.morling.dev/blog/maven-what-are-you-waiting-for/. 

We investigated where this {{conjar}} repository is coming from. It turns out 
that the 
[org.apache.hive:hive-exec:2.3.9|https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.9/hive-exec-2.3.9.pom]
 dependency derives from its parent 
[org.apache.hive:hive:2.3.9|https://repo1.maven.org/maven2/org/apache/hive/hive/2.3.9/hive-2.3.9.pom]
 which pulls in the conjar.org repository:
{code}

  conjars
  Conjars
  http://conjars.org/repo
  default
  
true
always
warn
  

{code}

The subsequent hive dependency 
[org.apache.hive:hive:3.0.0|https://repo1.maven.org/maven2/org/apache/hive/hive/3.0.0/hive-3.0.0.pom]
 doesn't have this reference anymore.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISSCUSS] Kubernetes Operator Flink Version Support Policy

2023-09-05 Thread Maximilian Michels

+1 Sounds good! Four releases give a decent amount of time to migrate
to the next Flink version.

On Tue, Sep 5, 2023 at 5:33 PM Őrhidi Mátyás  wrote:
>
> +1
>
> On Tue, Sep 5, 2023 at 8:03 AM Thomas Weise  wrote:
>
> > +1, thanks for the proposal
> >
> > On Tue, Sep 5, 2023 at 8:13 AM Gyula Fóra  wrote:
> >
> > > Hi All!
> > >
> > > @Maximilian Michels  has raised the question of Flink
> > > version support in the operator before the last release. I would like to
> > > open this discussion publicly so we can finalize this before the next
> > > release.
> > >
> > > Background:
> > > Currently the Flink Operator supports all Flink versions since Flink
> > 1.13.
> > > While this is great for the users, it introduces a lot of backward
> > > compatibility related code in the operator logic and also adds
> > considerable
> > > time to the CI. We should strike a reasonable balance here that allows us
> > > to move forward and eliminate some of this tech debt.
> > >
> > > In the current model it is also impossible to support all features for
> > all
> > > Flink versions which leads to some confusion over time.
> > >
> > > Proposal:
> > > Since it's a key feature of the kubernetes operator to support several
> > > versions at the same time, I propose to support the last 4 stable Flink
> > > minor versions. Currently this would mean to support Flink 1.14-1.17 (and
> > > drop 1.13 support). When Flink 1.18 is released we would drop 1.14
> > support
> > > and so on. Given the Flink release cadence this means about 2 year
> > support
> > > for each Flink version.
> > >
> > > What do you think?
> > >
> > > Cheers,
> > > Gyula
> > >
> >

Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and kubernetes

2023-09-05 Thread Samrat Deb

Hi Max,

> are we planning to add an alternative implementation
against the new interfaces?

Yes, we are simultaneously working on the YARN implementation using the
interface. During the initial interface design, we encountered some
anomalies while implementing it in YARN.

Once the interfaces are finalized, we will proceed to raise a pull request
(PR) for YARN as well.

Our initial approach was to create a decoupled interface as part of
FLIP-334 and then implement it for YARN in the subsequent phase.
However, if you recommend combining both phases, we can certainly consider
that option.

We look forward to hearing your thoughts on whether to have YARN
implementation as part of FLIP-334 or seperate one ?

Bests
Samrat



On Tue, Sep 5, 2023 at 8:41 PM Maximilian Michels  wrote:

> Thanks Rui for the update!
>
> Alongside with the refactoring to decouple autoscaler logic from the
> deployment logic, are we planning to add an alternative implementation
> against the new interfaces? I think the best way to get the interfaces
> right, is to have an alternative implementation in addition to
> Kubernetes. YARN or a standalone mode implementation were already
> mentioned. Ultimately, this is the reason we are doing the
> refactoring. Without a new implementation, it becomes harder to
> justify the refactoring work.
>
> Cheers,
> Max
>
> On Tue, Sep 5, 2023 at 9:48 AM Rui Fan  wrote:
> >
> > After discussing this FLIP-334[1] offline with Gyula and Max,
> > I updated the FLIP based on the latest conclusion.
> >
> > Big thanks to Gyula and Max for their professional advice!
> >
> > > Does the interface function of handlerRecommendedParallelism
> > > in AutoScalerEventHandler conflict with
> > > handlerScalingFailure/handlerScalingReport (one of the
> > > handles the event of scale failure, and the other handles
> > > the event of scale success).
> > Hi Matt,
> >
> > You can take a look at the FLIP, I think the issue has been fixed.
> > Currently, we introduced the ScalingRealizer and
> > AutoScalerEventHandler interface.
> >
> > The ScalingRealizer handles scaling action.
> >
> > The AutoScalerEventHandler  interface handles loggable events.
> >
> >
> > Looking forward to your feedback, thanks!
> >
> > [1] https://cwiki.apache.org/confluence/x/x4qzDw
> >
> > Best,
> > Rui
> >
> > On Thu, Aug 24, 2023 at 10:55 AM Matt Wang  wrote:
> >>
> >> Sorry for the late reply, I still have a small question here:
> >> Does the interface function of handlerRecommendedParallelism
> >> in AutoScalerEventHandler conflict with
> >> handlerScalingFailure/handlerScalingReport (one of the
> >> handles the event of scale failure, and the other handles
> >> the event of scale success).
> >>
> >>
> >>
> >> --
> >>
> >> Best,
> >> Matt Wang
> >>
> >>
> >>  Replied Message 
> >> | From | Rui Fan<1996fan...@gmail.com> |
> >> | Date | 08/21/2023 17:41 |
> >> | To |  |
> >> | Cc | Maximilian Michels ,
> >> Gyula Fóra ,
> >> Matt Wang |
> >> | Subject | Re: [DISCUSS] FLIP-334 : Decoupling autoscaler and
> kubernetes |
> >> Hi Max, Gyula and Matt,
> >>
> >> Do you have any other comments?
> >>
> >> The flink-kubernetes-operator 1.6 has been released recently,
> >> it's a good time to kick off this FLIP.
> >>
> >> Please let me know if you have any questions or concerns,
> >> looking forward to your feedback, thanks!
> >>
> >> Best,
> >> Rui
> >>
> >> On Wed, Aug 9, 2023 at 11:55 AM Rui Fan <1996fan...@gmail.com> wrote:
> >>
> >> Hi Matt Wang,
> >>
> >> Thanks for your discussion here.
> >>
> >> it is recommended to unify the descriptions of AutoScalerHandler
> >> and AutoScalerEventHandler in the FLIP
> >>
> >> Good catch, I have updated all AutoScalerHandler to
> >> AutoScalerEventHandler.
> >>
> >> Can it support the use of zookeeper (zookeeper is a relatively
> >> common use of flink HA)?
> >>
> >> In my opinion, it's a good suggestion. However, I prefer we
> >> implement other state stores in the other FLINK JIRA, and
> >> this FLIP focus on the decoupling and implementing the
> >> necessary state store. Does that make sense?
> >>
> >> Regarding each scaling information, can it be persisted in
> >> the shared file system through the filesystem? I think it will
> >> be a more valuable requirement to support viewing
> >> Autoscaling info on the UI in the future, which can provide
> >> some foundations in advance;
> >>
> >> This is a good suggestion as well. It's useful for users to check
> >> the scaling information. I propose to add a CompositeEventHandler,
> >> it can include multiple EventHandlers.
> >>
> >> However, as the last question, I prefer we implement other
> >> event handler in the other FLINK JIRA. What do you think?
> >>
> >> A solution mentioned in FLIP is to initialize the
> >> AutoScalerEventHandler object every time an event is
> >> processed.
> >>
> >> No, the FLIP mentioned `The AutoScalerEventHandler  object is shared for
> >> all flink jobs`,
> >> So the AutoScalerEventHandler is only initialized once.
> >>
> >> A

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Becket Qin

Hi Venkata,


> Also I made minor changes to the *NestedFieldReferenceExpression, *instead
> of *fieldIndexArray* we can just do away with *fieldNames *array that
> includes fieldName at every level for the nested field.


I don't think keeping only the field names array would work. At the end of
the day, the contract between Flink SQL and the connectors is based on the
indexes, not the names. Technically speaking, the connectors only emit a
bunch of RowData which is based on positions. The field names are added by
the SQL framework via the DDL for those RowData. In this sense, the
connectors may not be aware of the field names in Flink DDL at all. The
common language between Flink SQL and source is just positions. This is
also why ProjectionPushDown would work by only relying on the indexes, not
the field names. So I think the field index array is a must have here in
the NestedFieldReferenceExpression.

Thanks,

Jiangjie (Becket) Qin

On Fri, Sep 1, 2023 at 8:12 AM Venkatakrishnan Sowrirajan 
wrote:

> Gentle ping on the vote for FLIP-356: Support Nested fields filter pushdown
> .
>
> Regards
> Venkata krishnan
>
>
> On Tue, Aug 29, 2023 at 9:18 PM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu>
> wrote:
>
> > Sure, will reference this discussion to resume where we started as part
> of
> > the flip to refactor SupportsProjectionPushDown.
> >
> > On Tue, Aug 29, 2023, 7:22 PM Jark Wu  wrote:
> >
> >> I'm fine with this. `ReferenceExpression` and
> `SupportsProjectionPushDown`
> >> can be another FLIP. However, could you summarize the design of this
> part
> >> in the future part of the FLIP? This can be easier to get started with
> in
> >> the future.
> >>
> >>
> >> Best,
> >> Jark
> >>
> >>
> >> On Wed, 30 Aug 2023 at 02:45, Venkatakrishnan Sowrirajan <
> >> vsowr...@asu.edu>
> >> wrote:
> >>
> >> > Thanks Jark. Sounds good.
> >> >
> >> > One more thing, earlier in my summary I mentioned,
> >> >
> >> > Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*)
> >> > > abstract class which will be extended by both
> >> *FieldReferenceExpression*
> >> > >  and *NestedFieldReferenceExpression* (to be introduced as part of
> >> this
> >> > > FLIP)
> >> >
> >> > This can be punted for now and can be handled as part of refactoring
> >> > SupportsProjectionPushDown.
> >> >
> >> > Also I made minor changes to the *NestedFieldReferenceExpression,
> >> *instead
> >> > of *fieldIndexArray* we can just do away with *fieldNames *array that
> >> > includes fieldName at every level for the nested field.
> >> >
> >> > Updated the FLIP-357
> >> > <
> >> >
> >>
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!YAk6kV4CYvUSPfpoUDQRs6VlbmJXVX8KOKqFxKbNDkUWKzShvwpkLRGkAV1tgV3EqClNrjGS-Ij86Q$
> >> > >
> >> > wiki as well.
> >> >
> >> > Regards
> >> > Venkata krishnan
> >> >
> >> >
> >> > On Tue, Aug 29, 2023 at 5:21 AM Jark Wu  wrote:
> >> >
> >> > > Hi Venkata,
> >> > >
> >> > > Your summary looks good to me. +1 to start a vote.
> >> > >
> >> > > I think we don't need "inputIndex" in
> NestedFieldReferenceExpression.
> >> > > Actually, I think it is also not needed in FieldReferenceExpression,
> >> > > and we should try to remove it (another topic). The RexInputRef in
> >> > Calcite
> >> > > also doesn't require an inputIndex because the field index should
> >> > represent
> >> > > index of the field in the underlying row type. Field references
> >> shouldn't
> >> > > be
> >> > >  aware of the number of inputs.
> >> > >
> >> > > Best,
> >> > > Jark
> >> > >
> >> > >
> >> > > On Tue, 29 Aug 2023 at 02:24, Venkatakrishnan Sowrirajan <
> >> > vsowr...@asu.edu
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Hi Jinsong,
> >> > > >
> >> > > > Thanks for your comments.
> >> > > >
> >> > > > What is inputIndex in NestedFieldReferenceExpression?
> >> > > >
> >> > > >
> >> > > > I haven't looked at it before. Do you mean, given that it is now
> >> only
> >> > > used
> >> > > > to push filters it won't be subsequently used in further
> >> > > > planning/optimization and therefore it is not required at this
> time?
> >> > > >
> >> > > > So if NestedFieldReferenceExpression doesn't need inputIndex, is
> >> there
> >> > > > > a need to introduce a base class `ReferenceExpression`?
> >> > > >
> >> > > > For SupportsFilterPushDown itself, *ReferenceExpression* base
> class
> >> is
> >> > > not
> >> > > > needed. But there were discussions around cleaning up and
> >> standardizing
> >> > > the
> >> > > > API for Supports*PushDown. SupportsProjectionPushDown currently
> >> pushes
> >> > > the
> >> > > > projects as a 2-d array, instead it would be better to use the
> >> standard
> >> > > API
> >> > > > which seems to be the *ResolvedExpression*. For
> >> > > SupportsProjectionPushDown
> >> > > > either FieldReferenceExpression (top level fields) or
> >> > > > Nest

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-05 Thread Xingbo Huang

Hi Gabor,

Thanks for bringing this up. In my opinion, it is a bit aggressive to
directly drop Python 3.7 in 1.19. Python 3.7 is still used a lot[1], and as
far as I know, many Pyflink users are still using python 3.7 as their
default interpreter. I prefer to deprecate Python 3.7 in 1.19 just like we
deprecated Python 3.6 in 1.16[2] and dropped Python 3.6 in 1.17[3].

For the support of Python 3.11, I am very supportive of the implementation
in 1.19 (many users have this appeal, and I originally wanted to support it
in 1.18).

Regarding the miniconda upgrade, I tend to upgrade miniconda to the latest
version that can support python 3.7 to 3.11 at the same time.

[1] https://w3techs.com/technologies/history_details/pl-python/3
[2] https://issues.apache.org/jira/browse/FLINK-28195
[3] https://issues.apache.org/jira/browse/FLINK-27929

Best,
Xingbo

Jing Ge  于2023年9月5日周二 04:10写道：

> +1
>
> @Dian should we add support of python 3.11
>
> Best regards,
> Jing
>
> On Mon, Sep 4, 2023 at 3:39 PM Gabor Somogyi 
> wrote:
>
> > Thanks for all the responses!
> >
> > Based on the suggestions I've created the following jiras and started to
> > work on them:
> > * https://issues.apache.org/jira/browse/FLINK-33029
> > * https://issues.apache.org/jira/browse/FLINK-33030
> >
> > The reason why I've split them is to separate the concerns and reduce the
> > amount of code in a PR to help reviewers.
> >
> > BR,
> > G
> >
> >
> > On Mon, Sep 4, 2023 at 12:57 PM Sergey Nuyanzin 
> > wrote:
> >
> > > +1,
> > > Thanks for looking into this.
> > >
> > > On Mon, Sep 4, 2023 at 8:38 AM Gyula Fóra 
> wrote:
> > >
> > > > +1
> > > > Thanks for looking into this.
> > > >
> > > > Gyula
> > > >
> > > > On Mon, Sep 4, 2023 at 8:26 AM Matthias Pohl  > > > .invalid>
> > > > wrote:
> > > >
> > > > > Thanks Gabor for looking into it. It sounds reasonable to me as
> well.
> > > > >
> > > > > +1
> > > > >
> > > > > On Sun, Sep 3, 2023 at 5:44 PM Márton Balassi <
> > > balassi.mar...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Gabor,
> > > > > >
> > > > > > Thanks for bringing this up. Similarly to when we dropped Python
> > 3.6
> > > > due
> > > > > to
> > > > > > its end of life (and added 3.10) in Flink 1.17 [1,2], it makes
> > sense
> > > to
> > > > > > proceed to remove 3.7 and add 3.11 instead.
> > > > > >
> > > > > > +1.
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-27929
> > > > > > [2] https://github.com/apache/flink/pull/21699
> > > > > >
> > > > > > Best,
> > > > > > Marton
> > > > > >
> > > > > > On Fri, Sep 1, 2023 at 10:39 AM Gabor Somogyi <
> > > > gabor.g.somo...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I've analyzed through part of the pyflink code and found some
> > > > > improvement
> > > > > > > possibilities.
> > > > > > > I would like to hear voices on the idea.
> > > > > > >
> > > > > > > Intention:
> > > > > > > * upgrade several python related versions to eliminate
> > end-of-life
> > > > > issues
> > > > > > > and keep up with bugfixes
> > > > > > > * start to add python arm64 support
> > > > > > >
> > > > > > > Actual situation:
> > > > > > > * Flink supports the following python versions: 3.7, 3.8, 3.9,
> > 3.10
> > > > > > > * We use miniconda 4.7.10 (python package management system and
> > > > > > environment
> > > > > > > management system) which supports the following python
> versions:
> > > 3.7,
> > > > > > 3.8,
> > > > > > > 3.9, 3.10
> > > > > > > * Our python framework is not supporting anything but x86_64
> > > > > > >
> > > > > > > Issues:
> > > > > > > * Python 3.7.17 is the latest security patch of the 3.7 line.
> > This
> > > > > > version
> > > > > > > is end-of-life and is no longer supported:
> > > > > > > https://www.python.org/downloads/release/python-3717/
> > > > > > > * Miniconda 4.7.10 is released on 2019-07-29 which is 4 years
> old
> > > > > already
> > > > > > > and not supporting too many architectures (x86_64 and ppc64le)
> > > > > > > * The latest miniconda which has real multi-arch feature set
> > > supports
> > > > > the
> > > > > > > following python versions: 3.8, 3.9, 3.10, 3.11 and no 3.7
> > support
> > > > > > >
> > > > > > > Suggestion to solve the issues:
> > > > > > > * In 1.19 drop python 3.7 support and upgrade miniconda to the
> > > latest
> > > > > > > version which opens the door to other platform + python 3.11
> > > support
> > > > > > >
> > > > > > > Please note python 3.11 support is not initiated/discussed
> here.
> > > > > > >
> > > > > > > BR,
> > > > > > > G
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> > >
> >
>

[jira] [Created] (FLINK-33041) Add an introduction about how to migrate DataSet API to DataStream

2023-09-05 Thread Wencong Liu (Jira)

Wencong Liu created FLINK-33041:
---

 Summary: Add an introduction about how to migrate DataSet API to 
DataStream
 Key: FLINK-33041
 URL: https://issues.apache.org/jira/browse/FLINK-33041
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.18.0
Reporter: Wencong Liu
 Fix For: 1.18.0


The DataSet API has been formally deprecated and will no longer receive active 
maintenance and support. It will be removed in the Flink 2.0 version. Flink 
users are recommended to migrate from the DataSet API to the DataStream API, 
Table API and SQL for their data processing requirements.

Most of the DataSet operators can be implemented using the DataStream API. 
However, we believe it would be beneficial to have an introductory article on 
the Flink website that guides users in migrating their DataSet jobs to 
DataStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-05 Thread Rui Fan

Hi Gyula,

+1 for this proposal. The current GC metric is really unfriendly.

I have a concern with your proposed rate metric: the rate is perSecond
instead of per minute. I'm unsure whether it's suitable for GC metric.

There are two reasons why I suspect perSecond may not be well
compatible with GC metric:

1. GCs are usually infrequent and may only occur for a small number
of time periods within a minute.

Metrics are collected periodically, for example, reported every minute.
If the result reported by the GC metric is 1s/perSecond, it does not
mean that the GC of the TM is serious, because there may be no GC
in the remaining 59s.

On the contrary, the GC metric reports 0s/perSecond, which does not
mean that the GC of the TM is not serious, and the GC may be very
serious in the remaining 59s.

2. Stop-the-world may cause the metric to fail(delay) to report

The TM will stop the world during GC, especially full GC. It means
the metric cannot be collected or reported during full GC.

So the collected GC metric may never be 1s/perSecond. This metric
may always be good because the metric will only be reported when
the GC is not severe.

If these concerns make sense, how about updating the GC rate
at minute level?

We can define the type to Gauge for TimeMsPerMiunte, and updating
this Gauge every second, it is:
GC Total.Time of current time - GC total time of one miunte ago.

Best,
Rui

On Tue, Sep 5, 2023 at 11:05 PM Maximilian Michels  wrote:

> Hi Gyula,
>
> +1 The proposed changes make sense and are in line with what is
> available for other metrics, e.g. number of records processed.
>
> -Max
>
> On Tue, Sep 5, 2023 at 2:43 PM Gyula Fóra  wrote:
> >
> > Hi Devs,
> >
> > I would like to start a discussion on FLIP-361: Improve GC Metrics [1].
> >
> > The current Flink GC metrics [2] are not very useful for monitoring
> > purposes as they require post processing logic that is also dependent on
> > the current runtime environment.
> >
> > Problems:
> >  - Total time is not very relevant for long running applications, only
> the
> > rate of change (msPerSec)
> >  - In most cases it's best to simply aggregate the time/count across the
> > different GabrageCollectors, however the specific collectors are
> dependent
> > on the current Java runtime
> >
> > We propose to improve the current situation by:
> >  - Exposing rate metrics per GarbageCollector
> >  - Exposing aggregated Total time/count/rate metrics
> >
> > These new metrics are all derived from the existing ones with minimal
> > overhead.
> >
> > Looking forward to your feedback.
> >
> > Cheers,
> > Gyula
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
> > [2]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection
>

Re: [DISCUSS] Add config to enable job stop with savepoint on exceeding tolerable checkpoint Failures

2023-09-05 Thread Yanfei Lei

Hi Dongwoo,

If the checkpoint has failed
`execution.checkpointing.tolerable-failed-checkpoints` times, then
stopWithSavepoint is likely to fail as well.
If stopWithSavepoint succeeds or fails, will the job just stop?  I am
more curious about how this option works with the restart strategy?

Best,
Yanfei


Dongwoo Kim  于2023年9月4日周一 22:17写道：
>
> Hi all,
> I have a proposal that aims to enhance the flink application's resilience in 
> cases of unexpected failures in checkpoint storages like S3 or HDFS,
>
> [Background]
> When using self managed S3-compatible object storage, we faced checkpoint 
> async failures lasting for an extended period more than 30 minutes,
> leading to multiple job restarts and causing lags in our streaming 
> application.
>
> [Current Behavior]
> Currently, when the number of checkpoint failures exceeds a predefined 
> tolerable limit, flink will either restart or fail the job based on how it's 
> configured.
> In my opinion this does not handle scenarios where the checkpoint storage 
> itself may be unreliable or experiencing downtime.
>
> [Proposed Feature]
> I propose a config that allows for a graceful job stop with a savepoint when 
> the tolerable checkpoint failure limit is reached.
> Instead of restarting/failing the job when tolerable checkpoint failure 
> exceeds, when this new config is set to true just trigger stopWithSavepoint.
>
> This could offer the following benefits.
> - Indication of Checkpoint Storage State: Exceeding tolerable checkpoint 
> failures could indicate unstable checkpoint storage.
> - Automated Fallback Strategy: When combined with a monitoring cron job, this 
> feature could act as an automated fallback strategy for handling unstable 
> checkpoint storage.
>   The job would stop safely, take a savepoint, and then you could 
> automatically restart with different checkpoint storage configured like 
> switching from S3 to HDFS.
>
> For example let's say checkpoint path is configured to s3 and savepoint path 
> is configured to hdfs.
> When the new config is set to true the job stops with savepoint like below 
> when tolerable checkpoint failure exceeds.
> And we can restart the job from that savepoint while the checkpoint 
> configured as hdfs.
>
>
>
> Looking forward to hearing the community's thoughts on this proposal.
> And also want to ask how the community is handling long lasting unstable 
> checkpoint storage issues.
>
> Thanks in advance.
>
> Best dongwoo,

Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-09-05 Thread Bonnie Arogyam Varghese

It looks like it will be nice to have a config to disable hints. Any other
thoughts/concerns before we can close this discussion?

On Fri, Aug 18, 2023 at 7:43 AM Timo Walther  wrote:

>  > lots of the streaming SQL syntax are extensions of SQL standard
>
> That is true. But hints are kind of a special case because they are not
> even "part of Flink SQL" that's why they are written in a comment syntax.
>
> Anyway, I feel hints could be sometimes confusing for users because most
> of them have no effect for streaming and long-term we could also set
> some hints via the CompiledPlan. And if you have multiple teams,
> non-skilled users should not play around with hints and leave the
> decision to the system that might become smarter over time.
>
> Regards,
> Timo
>
>
> On 17.08.23 18:47, liu ron wrote:
> > Hi, Bonnie
> >
> >> Options hints could be a security concern since users can override
> > settings.
> >
> > I think this still doesn't answer my question
> >
> > Best,
> > Ron
> >
> > Jark Wu  于2023年8月17日周四 19:51写道：
> >
> >> Sorry, I still don't understand why we need to disable the query hint.
> >> It doesn't have the security problems as options hint. Bonnie said it
> >> could affect performance, but that depends on users using it explicitly.
> >> If there is any performance problem, users can remove the hint.
> >>
> >> If we want to disable query hint just because it's an extension to SQL
> >> standard.
> >> I'm afraid we have to introduce a bunch of configuration, because lots
> of
> >> the streaming SQL syntax are extensions of SQL standard.
> >>
> >> Best,
> >> Jark
> >>
> >> On Thu, 17 Aug 2023 at 15:43, Timo Walther  wrote:
> >>
> >>> +1 for this proposal.
> >>>
> >>> Not every data team would like to enable hints. Also because they are
> an
> >>> extension to the SQL standard. It might also be the case that custom
> >>> rules would be overwritten otherwise. Setting hints could also be the
> >>> exclusive task of a DevOp team.
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 17.08.23 09:30, Konstantin Knauf wrote:
>  Hi Bonnie,
> 
>  this makes sense to me, in particular, given that we already have this
>  toggle for a different type of hints.
> 
>  Best,
> 
>  Konstantin
> 
>  Am Mi., 16. Aug. 2023 um 19:38 Uhr schrieb Bonnie Arogyam Varghese
>  :
> 
> > Hi Liu,
> >Options hints could be a security concern since users can override
> > settings. However, query hints specifically could affect performance.
> > Since we have a config to disable Options hint, I'm suggesting we
> also
> >>> have
> > a config to disable Query hints.
> >
> > On Wed, Aug 16, 2023 at 9:41 AM liu ron  wrote:
> >
> >> Hi,
> >>
> >> Thanks for driving this proposal.
> >>
> >> Can you explain why you would need to disable query hints because of
> >> security issues? I don't really understand why query hints affects
> >> security.
> >>
> >> Best,
> >> Ron
> >>
> >> Bonnie Arogyam Varghese 
> >> 于2023年8月16日周三
> >> 23:59写道：
> >>
> >>> Platform providers may want to disable hints completely for
> security
> >>> reasons.
> >>>
> >>> Currently, there is a configuration to disable OPTIONS hint -
> >>>
> >>>
> >>
> >
> >>>
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-dynamic-table-options-enabled
> >>>
> >>> However, there is no configuration available to disable QUERY hints
> >> -
> >>>
> >>>
> >>
> >
> >>>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/hints/#query-hints
> >>>
> >>> The proposal is to add a new configuration:
> >>>
> >>> Name: table.query-options.enabled
> >>> Description: Enable or disable the QUERY hint, if disabled, an
> >>> exception would be thrown if any QUERY hints are specified
> >>> Note: The default value will be set to true.
> >>>
> >>
> >
> 
> 
> >>>
> >>>
> >>
> >
>
>

Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-09-05 Thread yuxia

Hi, Tawfik Yasser.
Thanks for the proposal. 
It sounds exciting. I can't wait the research paper for more details.

Best regards,
Yuxia

- 原始邮件 -
发件人: "David Morávek" 
收件人: "dev" 
发送时间: 星期二, 2023年 9 月 05日 下午 4:36:51
主题: Re: Proposal for Implementing Keyed Watermarks in Apache Flink

Hi Tawfik,

It's exciting to see any ongoing research that tries to push Flink forward!

The get the discussion started, can you please your paper with the
community? Assessing the proposal without further context is tough.

Best,
D.

On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek 
wrote:

> Dear Apache Flink Development Team,
>
> I hope this email finds you well. I am writing to propose an exciting new
> feature for Apache Flink that has the potential to significantly enhance
> its capabilities in handling unbounded streams of events, particularly in
> the context of event-time windowing.
>
> As you may be aware, Apache Flink has been at the forefront of Big Data
> Stream processing engines, leveraging windowing techniques to manage
> unbounded event streams effectively. The accuracy of the results obtained
> from these streams relies heavily on the ability to gather all relevant
> input within a window. At the core of this process are watermarks, which
> serve as unique timestamps marking the progression of events in time.
>
> However, our analysis has revealed a critical issue with the current
> watermark generation method in Apache Flink. This method, which operates at
> the input stream level, exhibits a bias towards faster sub-streams,
> resulting in the unfortunate consequence of dropped events from slower
> sub-streams. Our investigations showed that Apache Flink's conventional
> watermark generation approach led to an alarming data loss of approximately
> 33% when 50% of the keys around the median experienced delays. This loss
> further escalated to over 37% when 50% of random keys were delayed.
>
> In response to this issue, we have authored a research paper outlining a
> novel strategy named "keyed watermarks" to address data loss and
> substantially enhance data processing accuracy, achieving at least 99%
> accuracy in most scenarios.
>
> Moreover, we have conducted comprehensive comparative studies to evaluate
> the effectiveness of our strategy against the conventional watermark
> generation method, specifically in terms of event-time tracking accuracy.
>
> We believe that implementing keyed watermarks in Apache Flink can greatly
> enhance its performance and reliability, making it an even more valuable
> tool for organizations dealing with complex, high-throughput data
> processing tasks.
>
> We kindly request your consideration of this proposal. We would be eager
> to discuss further details, provide the full research paper, or collaborate
> closely to facilitate the integration of this feature into Apache Flink.
>
> Thank you for your time and attention to this proposal. We look forward to
> the opportunity to contribute to the continued success and evolution of
> Apache Flink.
>
> Best Regards,
>
> Tawfik Yasser
> Senior Teaching Assistant @ Nile University, Egypt
> Email: tyas...@nu.edu.eg
> LinkedIn: https://www.linkedin.com/in/tawfikyasser/
>

Re: [DISCUSS] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Venkatakrishnan Sowrirajan

Based on an offline discussion with Becket Qin, I added *fieldIndices
*back which
is the field index of the nested field at every level to the
*NestedFieldReferenceExpression
*in FLIP-356

*. *2 reasons to do it:

1. Agree with using *fieldIndices *as the only contract to refer to the
column from the underlying datasource.
2. To keep it consistent with *FieldReferenceExpression*

Having said that, I see that with *projection pushdown, *index of the
fields are used whereas with *filter pushdown (*based on scanning few
tablesources) *FieldReferenceExpression*'s name is used for eg: even in the
Flink's *FileSystemTableSource, IcebergSource, JDBCDatsource*. This way, I
feel the contract is not quite clear and explicit. Wanted to understand
other's thoughts as well.

Regards
Venkata krishnan


On Tue, Sep 5, 2023 at 5:34 PM Becket Qin  wrote:

> Hi Venkata,
>
>
> > Also I made minor changes to the *NestedFieldReferenceExpression,
> *instead
> > of *fieldIndexArray* we can just do away with *fieldNames *array that
> > includes fieldName at every level for the nested field.
>
>
> I don't think keeping only the field names array would work. At the end of
> the day, the contract between Flink SQL and the connectors is based on the
> indexes, not the names. Technically speaking, the connectors only emit a
> bunch of RowData which is based on positions. The field names are added by
> the SQL framework via the DDL for those RowData. In this sense, the
> connectors may not be aware of the field names in Flink DDL at all. The
> common language between Flink SQL and source is just positions. This is
> also why ProjectionPushDown would work by only relying on the indexes, not
> the field names. So I think the field index array is a must have here in
> the NestedFieldReferenceExpression.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Sep 1, 2023 at 8:12 AM Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu>
> wrote:
>
> > Gentle ping on the vote for FLIP-356: Support Nested fields filter
> pushdown
> > <
> https://urldefense.com/v3/__https://www.mail-archive.com/dev@flink.apache.org/msg69289.html__;!!IKRxdwAv5BmarQ!bOW26WlafOQQcb32eWtUiXBAl0cTCK1C6iYhDI2f_z__eczudAWmTRvjDiZg6gzlXmPXrDV4KJS5cFxagFE$
> >.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Tue, Aug 29, 2023 at 9:18 PM Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu>
> > wrote:
> >
> > > Sure, will reference this discussion to resume where we started as part
> > of
> > > the flip to refactor SupportsProjectionPushDown.
> > >
> > > On Tue, Aug 29, 2023, 7:22 PM Jark Wu  wrote:
> > >
> > >> I'm fine with this. `ReferenceExpression` and
> > `SupportsProjectionPushDown`
> > >> can be another FLIP. However, could you summarize the design of this
> > part
> > >> in the future part of the FLIP? This can be easier to get started with
> > in
> > >> the future.
> > >>
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >>
> > >> On Wed, 30 Aug 2023 at 02:45, Venkatakrishnan Sowrirajan <
> > >> vsowr...@asu.edu>
> > >> wrote:
> > >>
> > >> > Thanks Jark. Sounds good.
> > >> >
> > >> > One more thing, earlier in my summary I mentioned,
> > >> >
> > >> > Introduce a new *ReferenceExpression* (or *BaseReferenceExpression*)
> > >> > > abstract class which will be extended by both
> > >> *FieldReferenceExpression*
> > >> > >  and *NestedFieldReferenceExpression* (to be introduced as part of
> > >> this
> > >> > > FLIP)
> > >> >
> > >> > This can be punted for now and can be handled as part of refactoring
> > >> > SupportsProjectionPushDown.
> > >> >
> > >> > Also I made minor changes to the *NestedFieldReferenceExpression,
> > >> *instead
> > >> > of *fieldIndexArray* we can just do away with *fieldNames *array
> that
> > >> > includes fieldName at every level for the nested field.
> > >> >
> > >> > Updated the FLIP-357
> > >> > <
> > >> >
> > >>
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!YAk6kV4CYvUSPfpoUDQRs6VlbmJXVX8KOKqFxKbNDkUWKzShvwpkLRGkAV1tgV3EqClNrjGS-Ij86Q$
> > >> > >
> > >> > wiki as well.
> > >> >
> > >> > Regards
> > >> > Venkata krishnan
> > >> >
> > >> >
> > >> > On Tue, Aug 29, 2023 at 5:21 AM Jark Wu  wrote:
> > >> >
> > >> > > Hi Venkata,
> > >> > >
> > >> > > Your summary looks good to me. +1 to start a vote.
> > >> > >
> > >> > > I think we don't need "inputIndex" in
> > NestedFieldReferenceExpression.
> > >> > > Actually, I think it is also not needed in
> FieldReferenceExpression,
> > >> > > and we should try to remove it (another topic). The RexInputRef in
> > >> > Calcite
> > >> > > also doesn't require an inputIndex because the field index should
> > >> > represent
> > >> > > index of the field in the underlying row type. Field references
> > >> shouldn't
> > >> > > be
> > >> > >  aware of the number of inputs.
> > >> > >
> > >> > > Best,
> >

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Venkatakrishnan Sowrirajan

Based on the recent discussions in the thread [DISCUSS] FLIP-356: Support
Nested Fields Filter Pushdown
, I made
some changes to the FLIP-356
.
Unless anyone else has any concerns, we can continue with this vote to
reach consensus.

Regards
Venkata krishnan


On Tue, Sep 5, 2023 at 8:04 AM Sergey Nuyanzin  wrote:

> +1 (binding)
>
> On Tue, Sep 5, 2023 at 4:55 PM Jiabao Sun  .invalid>
> wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Jiabao
> >
> >
> > > 2023年9月5日 下午10:33，Martijn Visser  写道：
> > >
> > > +1 (binding)
> > >
> > > On Tue, Sep 5, 2023 at 4:16 PM ConradJam  wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> Yuepeng Pan  于2023年9月1日周五 15:43写道：
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> Best,
> > >>> Yuepeng
> > >>>
> > >>>
> > >>>
> > >>> At 2023-09-01 14:32:19, "Jark Wu"  wrote:
> >  +1 (binding)
> > 
> >  Best,
> >  Jark
> > 
> > > 2023年8月30日 02:40，Venkatakrishnan Sowrirajan  写道：
> > >
> > > Hi everyone,
> > >
> > > Thank you all for your feedback on FLIP-356. I'd like to start a
> > vote.
> > >
> > > Discussion thread:
> > >
> https://urldefense.com/v3/__https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7__;!!IKRxdwAv5BmarQ!eNR1R48e8jbqDCSdXqWj6bjfmP1uMn-IUIgVX3uXlgzYp_9rcf-nZOaAZ7KzFo2JwMAJPGYv8wfRxuRMAA$
> > > FLIP:
> > >
> > >>>
> > >>
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!eNR1R48e8jbqDCSdXqWj6bjfmP1uMn-IUIgVX3uXlgzYp_9rcf-nZOaAZ7KzFo2JwMAJPGYv8wdkI0waFw$
> > >
> > > Regards
> > > Venkata krishnan
> > >>>
> > >>
> >
> >
>
> --
> Best regards,
> Sergey
>

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Becket Qin

Thanks for pushing the FLIP through.

+1 on the updated FLIP wiki.

Cheers,

Jiangjie (Becket) Qin

On Wed, Sep 6, 2023 at 1:12 PM Venkatakrishnan Sowrirajan 
wrote:

> Based on the recent discussions in the thread [DISCUSS] FLIP-356: Support
> Nested Fields Filter Pushdown
> , I made
> some changes to the FLIP-356
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown
> >.
> Unless anyone else has any concerns, we can continue with this vote to
> reach consensus.
>
> Regards
> Venkata krishnan
>
>
> On Tue, Sep 5, 2023 at 8:04 AM Sergey Nuyanzin 
> wrote:
>
> > +1 (binding)
> >
> > On Tue, Sep 5, 2023 at 4:55 PM Jiabao Sun  > .invalid>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Jiabao
> > >
> > >
> > > > 2023年9月5日 下午10:33，Martijn Visser  写道：
> > > >
> > > > +1 (binding)
> > > >
> > > > On Tue, Sep 5, 2023 at 4:16 PM ConradJam 
> wrote:
> > > >
> > > >> +1 (non-binding)
> > > >>
> > > >> Yuepeng Pan  于2023年9月1日周五 15:43写道：
> > > >>
> > > >>> +1 (non-binding)
> > > >>>
> > > >>> Best,
> > > >>> Yuepeng
> > > >>>
> > > >>>
> > > >>>
> > > >>> At 2023-09-01 14:32:19, "Jark Wu"  wrote:
> > >  +1 (binding)
> > > 
> > >  Best,
> > >  Jark
> > > 
> > > > 2023年8月30日 02:40，Venkatakrishnan Sowrirajan 
> 写道：
> > > >
> > > > Hi everyone,
> > > >
> > > > Thank you all for your feedback on FLIP-356. I'd like to start a
> > > vote.
> > > >
> > > > Discussion thread:
> > > >
> >
> https://urldefense.com/v3/__https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7__;!!IKRxdwAv5BmarQ!eNR1R48e8jbqDCSdXqWj6bjfmP1uMn-IUIgVX3uXlgzYp_9rcf-nZOaAZ7KzFo2JwMAJPGYv8wfRxuRMAA$
> > > > FLIP:
> > > >
> > > >>>
> > > >>
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!eNR1R48e8jbqDCSdXqWj6bjfmP1uMn-IUIgVX3uXlgzYp_9rcf-nZOaAZ7KzFo2JwMAJPGYv8wdkI0waFw$
> > > >
> > > > Regards
> > > > Venkata krishnan
> > > >>>
> > > >>
> > >
> > >
> >
> > --
> > Best regards,
> > Sergey
> >
>

Re: [VOTE] FLIP-356: Support Nested Fields Filter Pushdown

2023-09-05 Thread Jingsong Li

+1

On Wed, Sep 6, 2023 at 1:18 PM Becket Qin  wrote:
>
> Thanks for pushing the FLIP through.
>
> +1 on the updated FLIP wiki.
>
> Cheers,
>
> Jiangjie (Becket) Qin
>
> On Wed, Sep 6, 2023 at 1:12 PM Venkatakrishnan Sowrirajan 
> wrote:
>
> > Based on the recent discussions in the thread [DISCUSS] FLIP-356: Support
> > Nested Fields Filter Pushdown
> > , I made
> > some changes to the FLIP-356
> > <
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-356%3A+Support+Nested+Fields+Filter+Pushdown
> > >.
> > Unless anyone else has any concerns, we can continue with this vote to
> > reach consensus.
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Tue, Sep 5, 2023 at 8:04 AM Sergey Nuyanzin 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > On Tue, Sep 5, 2023 at 4:55 PM Jiabao Sun  > > .invalid>
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Jiabao
> > > >
> > > >
> > > > > 2023年9月5日 下午10:33，Martijn Visser  写道：
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Tue, Sep 5, 2023 at 4:16 PM ConradJam 
> > wrote:
> > > > >
> > > > >> +1 (non-binding)
> > > > >>
> > > > >> Yuepeng Pan  于2023年9月1日周五 15:43写道：
> > > > >>
> > > > >>> +1 (non-binding)
> > > > >>>
> > > > >>> Best,
> > > > >>> Yuepeng
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> At 2023-09-01 14:32:19, "Jark Wu"  wrote:
> > > >  +1 (binding)
> > > > 
> > > >  Best,
> > > >  Jark
> > > > 
> > > > > 2023年8月30日 02:40，Venkatakrishnan Sowrirajan 
> > 写道：
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > Thank you all for your feedback on FLIP-356. I'd like to start a
> > > > vote.
> > > > >
> > > > > Discussion thread:
> > > > >
> > >
> > https://urldefense.com/v3/__https://lists.apache.org/thread/686bhgwrrb4xmbfzlk60szwxos4z64t7__;!!IKRxdwAv5BmarQ!eNR1R48e8jbqDCSdXqWj6bjfmP1uMn-IUIgVX3uXlgzYp_9rcf-nZOaAZ7KzFo2JwMAJPGYv8wfRxuRMAA$
> > > > > FLIP:
> > > > >
> > > > >>>
> > > > >>
> > > >
> > >
> > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-356*3A*Support*Nested*Fields*Filter*Pushdown__;JSsrKysr!!IKRxdwAv5BmarQ!eNR1R48e8jbqDCSdXqWj6bjfmP1uMn-IUIgVX3uXlgzYp_9rcf-nZOaAZ7KzFo2JwMAJPGYv8wdkI0waFw$
> > > > >
> > > > > Regards
> > > > > Venkata krishnan
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> > >
> >

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-05 Thread Gyula Fóra

Thanks for the feedback Rui,

The rates would be computed using the MeterView class (like for any other
rate metric), just because we report the value per second it doesn't mean
that we measure in a second granularity.
By default the meterview measures for 1 minute and then we calculate the
per second rates, but we can increase the timespan if necessary.

So I don't think we run into this problem in practice and we can keep the
metric aligned with other time rate metrics like busyTimeMsPerSec etc.

Cheers,
Gyula

On Wed, Sep 6, 2023 at 4:55 AM Rui Fan <1996fan...@gmail.com> wrote:

> Hi Gyula,
>
> +1 for this proposal. The current GC metric is really unfriendly.
>
> I have a concern with your proposed rate metric: the rate is perSecond
> instead of per minute. I'm unsure whether it's suitable for GC metric.
>
> There are two reasons why I suspect perSecond may not be well
> compatible with GC metric:
>
> 1. GCs are usually infrequent and may only occur for a small number
> of time periods within a minute.
>
> Metrics are collected periodically, for example, reported every minute.
> If the result reported by the GC metric is 1s/perSecond, it does not
> mean that the GC of the TM is serious, because there may be no GC
> in the remaining 59s.
>
> On the contrary, the GC metric reports 0s/perSecond, which does not
> mean that the GC of the TM is not serious, and the GC may be very
> serious in the remaining 59s.
>
> 2. Stop-the-world may cause the metric to fail(delay) to report
>
> The TM will stop the world during GC, especially full GC. It means
> the metric cannot be collected or reported during full GC.
>
> So the collected GC metric may never be 1s/perSecond. This metric
> may always be good because the metric will only be reported when
> the GC is not severe.
>
>
> If these concerns make sense, how about updating the GC rate
> at minute level?
>
> We can define the type to Gauge for TimeMsPerMiunte, and updating
> this Gauge every second, it is:
> GC Total.Time of current time - GC total time of one miunte ago.
>
> Best,
> Rui
>
> On Tue, Sep 5, 2023 at 11:05 PM Maximilian Michels  wrote:
>
> > Hi Gyula,
> >
> > +1 The proposed changes make sense and are in line with what is
> > available for other metrics, e.g. number of records processed.
> >
> > -Max
> >
> > On Tue, Sep 5, 2023 at 2:43 PM Gyula Fóra  wrote:
> > >
> > > Hi Devs,
> > >
> > > I would like to start a discussion on FLIP-361: Improve GC Metrics [1].
> > >
> > > The current Flink GC metrics [2] are not very useful for monitoring
> > > purposes as they require post processing logic that is also dependent
> on
> > > the current runtime environment.
> > >
> > > Problems:
> > >  - Total time is not very relevant for long running applications, only
> > the
> > > rate of change (msPerSec)
> > >  - In most cases it's best to simply aggregate the time/count across
> the
> > > different GabrageCollectors, however the specific collectors are
> > dependent
> > > on the current Java runtime
> > >
> > > We propose to improve the current situation by:
> > >  - Exposing rate metrics per GarbageCollector
> > >  - Exposing aggregated Total time/count/rate metrics
> > >
> > > These new metrics are all derived from the existing ones with minimal
> > > overhead.
> > >
> > > Looking forward to your feedback.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
> > > [2]
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection
> >
>

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-05 Thread Gyula Fóra

Hi Xingbo!

I think we have to analyze what we gain by dropping 3.7 and upgrading to a
miniconda version with a multiarch support.

If this is what we need to get Apple silicon support then I think it's
worth doing it already in 1.19. Keep in mind that 1.18 is not even released
yet so if we delay this to 1.20 that is basically 1 year from now.
Making this change can increase the adoption instantly if we enable new
platforms.

Cheers,
Gyula

On Wed, Sep 6, 2023 at 4:46 AM Xingbo Huang  wrote:

> Hi Gabor,
>
> Thanks for bringing this up. In my opinion, it is a bit aggressive to
> directly drop Python 3.7 in 1.19. Python 3.7 is still used a lot[1], and as
> far as I know, many Pyflink users are still using python 3.7 as their
> default interpreter. I prefer to deprecate Python 3.7 in 1.19 just like we
> deprecated Python 3.6 in 1.16[2] and dropped Python 3.6 in 1.17[3].
>
> For the support of Python 3.11, I am very supportive of the implementation
> in 1.19 (many users have this appeal, and I originally wanted to support it
> in 1.18).
>
> Regarding the miniconda upgrade, I tend to upgrade miniconda to the latest
> version that can support python 3.7 to 3.11 at the same time.
>
> [1] https://w3techs.com/technologies/history_details/pl-python/3
> [2] https://issues.apache.org/jira/browse/FLINK-28195
> [3] https://issues.apache.org/jira/browse/FLINK-27929
>
> Best,
> Xingbo
>
> Jing Ge  于2023年9月5日周二 04:10写道：
>
> > +1
> >
> > @Dian should we add support of python 3.11
> >
> > Best regards,
> > Jing
> >
> > On Mon, Sep 4, 2023 at 3:39 PM Gabor Somogyi 
> > wrote:
> >
> > > Thanks for all the responses!
> > >
> > > Based on the suggestions I've created the following jiras and started
> to
> > > work on them:
> > > * https://issues.apache.org/jira/browse/FLINK-33029
> > > * https://issues.apache.org/jira/browse/FLINK-33030
> > >
> > > The reason why I've split them is to separate the concerns and reduce
> the
> > > amount of code in a PR to help reviewers.
> > >
> > > BR,
> > > G
> > >
> > >
> > > On Mon, Sep 4, 2023 at 12:57 PM Sergey Nuyanzin 
> > > wrote:
> > >
> > > > +1,
> > > > Thanks for looking into this.
> > > >
> > > > On Mon, Sep 4, 2023 at 8:38 AM Gyula Fóra 
> > wrote:
> > > >
> > > > > +1
> > > > > Thanks for looking into this.
> > > > >
> > > > > Gyula
> > > > >
> > > > > On Mon, Sep 4, 2023 at 8:26 AM Matthias Pohl <
> matthias.p...@aiven.io
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Thanks Gabor for looking into it. It sounds reasonable to me as
> > well.
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > On Sun, Sep 3, 2023 at 5:44 PM Márton Balassi <
> > > > balassi.mar...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Gabor,
> > > > > > >
> > > > > > > Thanks for bringing this up. Similarly to when we dropped
> Python
> > > 3.6
> > > > > due
> > > > > > to
> > > > > > > its end of life (and added 3.10) in Flink 1.17 [1,2], it makes
> > > sense
> > > > to
> > > > > > > proceed to remove 3.7 and add 3.11 instead.
> > > > > > >
> > > > > > > +1.
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-27929
> > > > > > > [2] https://github.com/apache/flink/pull/21699
> > > > > > >
> > > > > > > Best,
> > > > > > > Marton
> > > > > > >
> > > > > > > On Fri, Sep 1, 2023 at 10:39 AM Gabor Somogyi <
> > > > > gabor.g.somo...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I've analyzed through part of the pyflink code and found some
> > > > > > improvement
> > > > > > > > possibilities.
> > > > > > > > I would like to hear voices on the idea.
> > > > > > > >
> > > > > > > > Intention:
> > > > > > > > * upgrade several python related versions to eliminate
> > > end-of-life
> > > > > > issues
> > > > > > > > and keep up with bugfixes
> > > > > > > > * start to add python arm64 support
> > > > > > > >
> > > > > > > > Actual situation:
> > > > > > > > * Flink supports the following python versions: 3.7, 3.8,
> 3.9,
> > > 3.10
> > > > > > > > * We use miniconda 4.7.10 (python package management system
> and
> > > > > > > environment
> > > > > > > > management system) which supports the following python
> > versions:
> > > > 3.7,
> > > > > > > 3.8,
> > > > > > > > 3.9, 3.10
> > > > > > > > * Our python framework is not supporting anything but x86_64
> > > > > > > >
> > > > > > > > Issues:
> > > > > > > > * Python 3.7.17 is the latest security patch of the 3.7 line.
> > > This
> > > > > > > version
> > > > > > > > is end-of-life and is no longer supported:
> > > > > > > > https://www.python.org/downloads/release/python-3717/
> > > > > > > > * Miniconda 4.7.10 is released on 2019-07-29 which is 4 years
> > old
> > > > > > already
> > > > > > > > and not supporting too many architectures (x86_64 and
> ppc64le)
> > > > > > > > * The latest miniconda which has real multi-arch feature set
> > > > supports
> > > > > > the
> > > > > > > > following python versions: 3.8, 3.9, 3.10, 3.11 and no 3.7
> > >

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-05 Thread Rui Fan

Thanks for the clarification!

By default the meterview measures for 1 minute sounds good to me!

+1 for this proposal.

Best,
Rui

On Wed, Sep 6, 2023 at 1:27 PM Gyula Fóra  wrote:

> Thanks for the feedback Rui,
>
> The rates would be computed using the MeterView class (like for any other
> rate metric), just because we report the value per second it doesn't mean
> that we measure in a second granularity.
> By default the meterview measures for 1 minute and then we calculate the
> per second rates, but we can increase the timespan if necessary.
>
> So I don't think we run into this problem in practice and we can keep the
> metric aligned with other time rate metrics like busyTimeMsPerSec etc.
>
> Cheers,
> Gyula
>
> On Wed, Sep 6, 2023 at 4:55 AM Rui Fan <1996fan...@gmail.com> wrote:
>
> > Hi Gyula,
> >
> > +1 for this proposal. The current GC metric is really unfriendly.
> >
> > I have a concern with your proposed rate metric: the rate is perSecond
> > instead of per minute. I'm unsure whether it's suitable for GC metric.
> >
> > There are two reasons why I suspect perSecond may not be well
> > compatible with GC metric:
> >
> > 1. GCs are usually infrequent and may only occur for a small number
> > of time periods within a minute.
> >
> > Metrics are collected periodically, for example, reported every minute.
> > If the result reported by the GC metric is 1s/perSecond, it does not
> > mean that the GC of the TM is serious, because there may be no GC
> > in the remaining 59s.
> >
> > On the contrary, the GC metric reports 0s/perSecond, which does not
> > mean that the GC of the TM is not serious, and the GC may be very
> > serious in the remaining 59s.
> >
> > 2. Stop-the-world may cause the metric to fail(delay) to report
> >
> > The TM will stop the world during GC, especially full GC. It means
> > the metric cannot be collected or reported during full GC.
> >
> > So the collected GC metric may never be 1s/perSecond. This metric
> > may always be good because the metric will only be reported when
> > the GC is not severe.
> >
> >
> > If these concerns make sense, how about updating the GC rate
> > at minute level?
> >
> > We can define the type to Gauge for TimeMsPerMiunte, and updating
> > this Gauge every second, it is:
> > GC Total.Time of current time - GC total time of one miunte ago.
> >
> > Best,
> > Rui
> >
> > On Tue, Sep 5, 2023 at 11:05 PM Maximilian Michels 
> wrote:
> >
> > > Hi Gyula,
> > >
> > > +1 The proposed changes make sense and are in line with what is
> > > available for other metrics, e.g. number of records processed.
> > >
> > > -Max
> > >
> > > On Tue, Sep 5, 2023 at 2:43 PM Gyula Fóra 
> wrote:
> > > >
> > > > Hi Devs,
> > > >
> > > > I would like to start a discussion on FLIP-361: Improve GC Metrics
> [1].
> > > >
> > > > The current Flink GC metrics [2] are not very useful for monitoring
> > > > purposes as they require post processing logic that is also dependent
> > on
> > > > the current runtime environment.
> > > >
> > > > Problems:
> > > >  - Total time is not very relevant for long running applications,
> only
> > > the
> > > > rate of change (msPerSec)
> > > >  - In most cases it's best to simply aggregate the time/count across
> > the
> > > > different GabrageCollectors, however the specific collectors are
> > > dependent
> > > > on the current Java runtime
> > > >
> > > > We propose to improve the current situation by:
> > > >  - Exposing rate metrics per GarbageCollector
> > > >  - Exposing aggregated Total time/count/rate metrics
> > > >
> > > > These new metrics are all derived from the existing ones with minimal
> > > > overhead.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
> > > > [2]
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection
> > >
> >
>

Re: [DISCUSS] Drop python 3.7 support in 1.19

2023-09-05 Thread Gabor Somogyi

Hi Xingbo,

*Constraint:*
I personally not found any miniconda version which provides arm64 support
together with python 3.7.
[image: image.png]

At the moment I think new platform support means 3.7 drop.

I fully to agree with Gyula, if we start now maybe we can release it in
half a year however *3.7 active support already ended in 27 Jun 2020*.
At the moment any python development/test execution on MacOS M1 is just not
working as-is just like any kind of python test execution on any ARM CPU.

Gains:
* We can release a working version in half a year hopefully and not
shifting support to 1+ year
* MacOS M1 local development would work finally which is essential for user
engagement
* It would be possible to execute python tests on ARM64 machines
* We can shake up the python development story because it's not the most
loved area

BR,
G


On Wed, Sep 6, 2023 at 8:06 AM Gyula Fóra  wrote:

> Hi Xingbo!
>
> I think we have to analyze what we gain by dropping 3.7 and upgrading to a
> miniconda version with a multiarch support.
>
> If this is what we need to get Apple silicon support then I think it's
> worth doing it already in 1.19. Keep in mind that 1.18 is not even released
> yet so if we delay this to 1.20 that is basically 1 year from now.
> Making this change can increase the adoption instantly if we enable new
> platforms.
>
> Cheers,
> Gyula
>
> On Wed, Sep 6, 2023 at 4:46 AM Xingbo Huang  wrote:
>
> > Hi Gabor,
> >
> > Thanks for bringing this up. In my opinion, it is a bit aggressive to
> > directly drop Python 3.7 in 1.19. Python 3.7 is still used a lot[1], and
> as
> > far as I know, many Pyflink users are still using python 3.7 as their
> > default interpreter. I prefer to deprecate Python 3.7 in 1.19 just like
> we
> > deprecated Python 3.6 in 1.16[2] and dropped Python 3.6 in 1.17[3].
> >
> > For the support of Python 3.11, I am very supportive of the
> implementation
> > in 1.19 (many users have this appeal, and I originally wanted to support
> it
> > in 1.18).
> >
> > Regarding the miniconda upgrade, I tend to upgrade miniconda to the
> latest
> > version that can support python 3.7 to 3.11 at the same time.
> >
> > [1] https://w3techs.com/technologies/history_details/pl-python/3
> > [2] https://issues.apache.org/jira/browse/FLINK-28195
> > [3] https://issues.apache.org/jira/browse/FLINK-27929
> >
> > Best,
> > Xingbo
> >
> > Jing Ge  于2023年9月5日周二 04:10写道：
> >
> > > +1
> > >
> > > @Dian should we add support of python 3.11
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Mon, Sep 4, 2023 at 3:39 PM Gabor Somogyi <
> gabor.g.somo...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for all the responses!
> > > >
> > > > Based on the suggestions I've created the following jiras and started
> > to
> > > > work on them:
> > > > * https://issues.apache.org/jira/browse/FLINK-33029
> > > > * https://issues.apache.org/jira/browse/FLINK-33030
> > > >
> > > > The reason why I've split them is to separate the concerns and reduce
> > the
> > > > amount of code in a PR to help reviewers.
> > > >
> > > > BR,
> > > > G
> > > >
> > > >
> > > > On Mon, Sep 4, 2023 at 12:57 PM Sergey Nuyanzin  >
> > > > wrote:
> > > >
> > > > > +1,
> > > > > Thanks for looking into this.
> > > > >
> > > > > On Mon, Sep 4, 2023 at 8:38 AM Gyula Fóra 
> > > wrote:
> > > > >
> > > > > > +1
> > > > > > Thanks for looking into this.
> > > > > >
> > > > > > Gyula
> > > > > >
> > > > > > On Mon, Sep 4, 2023 at 8:26 AM Matthias Pohl <
> > matthias.p...@aiven.io
> > > > > > .invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks Gabor for looking into it. It sounds reasonable to me as
> > > well.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Sun, Sep 3, 2023 at 5:44 PM Márton Balassi <
> > > > > balassi.mar...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Gabor,
> > > > > > > >
> > > > > > > > Thanks for bringing this up. Similarly to when we dropped
> > Python
> > > > 3.6
> > > > > > due
> > > > > > > to
> > > > > > > > its end of life (and added 3.10) in Flink 1.17 [1,2], it
> makes
> > > > sense
> > > > > to
> > > > > > > > proceed to remove 3.7 and add 3.11 instead.
> > > > > > > >
> > > > > > > > +1.
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-27929
> > > > > > > > [2] https://github.com/apache/flink/pull/21699
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Marton
> > > > > > > >
> > > > > > > > On Fri, Sep 1, 2023 at 10:39 AM Gabor Somogyi <
> > > > > > gabor.g.somo...@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > I've analyzed through part of the pyflink code and found
> some
> > > > > > > improvement
> > > > > > > > > possibilities.
> > > > > > > > > I would like to hear voices on the idea.
> > > > > > > > >
> > > > > > > > > Intention:
> > > > > > > > > * upgrade several python related versions to eliminate
> > > > end-of-life
> > > > > > > issues
> > > > > > > > > and

Re: Proposal for Implementing Keyed Watermarks in Apache Flink

2023-09-05 Thread Yun Tang

Hi Tawfik,

Thanks for offering such a proposal, looking forward to your research paper!

You could also ask the edit permission for Flink improvement proposals to 
create a new proposal if you want to contribute this to the community by 
yourself.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Best
Yun Tang

From: yuxia 
Sent: Wednesday, September 6, 2023 12:31
To: dev 
Subject: Re: Proposal for Implementing Keyed Watermarks in Apache Flink

Hi, Tawfik Yasser.
Thanks for the proposal.
It sounds exciting. I can't wait the research paper for more details.

Best regards,
Yuxia

- 原始邮件 -
发件人: "David Morávek" 
收件人: "dev" 
发送时间: 星期二, 2023年 9 月 05日 下午 4:36:51
主题: Re: Proposal for Implementing Keyed Watermarks in Apache Flink

Hi Tawfik,

It's exciting to see any ongoing research that tries to push Flink forward!

The get the discussion started, can you please your paper with the
community? Assessing the proposal without further context is tough.

Best,
D.

On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek 
wrote:

> Dear Apache Flink Development Team,
>
> I hope this email finds you well. I am writing to propose an exciting new
> feature for Apache Flink that has the potential to significantly enhance
> its capabilities in handling unbounded streams of events, particularly in
> the context of event-time windowing.
>
> As you may be aware, Apache Flink has been at the forefront of Big Data
> Stream processing engines, leveraging windowing techniques to manage
> unbounded event streams effectively. The accuracy of the results obtained
> from these streams relies heavily on the ability to gather all relevant
> input within a window. At the core of this process are watermarks, which
> serve as unique timestamps marking the progression of events in time.
>
> However, our analysis has revealed a critical issue with the current
> watermark generation method in Apache Flink. This method, which operates at
> the input stream level, exhibits a bias towards faster sub-streams,
> resulting in the unfortunate consequence of dropped events from slower
> sub-streams. Our investigations showed that Apache Flink's conventional
> watermark generation approach led to an alarming data loss of approximately
> 33% when 50% of the keys around the median experienced delays. This loss
> further escalated to over 37% when 50% of random keys were delayed.
>
> In response to this issue, we have authored a research paper outlining a
> novel strategy named "keyed watermarks" to address data loss and
> substantially enhance data processing accuracy, achieving at least 99%
> accuracy in most scenarios.
>
> Moreover, we have conducted comprehensive comparative studies to evaluate
> the effectiveness of our strategy against the conventional watermark
> generation method, specifically in terms of event-time tracking accuracy.
>
> We believe that implementing keyed watermarks in Apache Flink can greatly
> enhance its performance and reliability, making it an even more valuable
> tool for organizations dealing with complex, high-throughput data
> processing tasks.
>
> We kindly request your consideration of this proposal. We would be eager
> to discuss further details, provide the full research paper, or collaborate
> closely to facilitate the integration of this feature into Apache Flink.
>
> Thank you for your time and attention to this proposal. We look forward to
> the opportunity to contribute to the continued success and evolution of
> Apache Flink.
>
> Best Regards,
>
> Tawfik Yasser
> Senior Teaching Assistant @ Nile University, Egypt
> Email: tyas...@nu.edu.eg
> LinkedIn: https://www.linkedin.com/in/tawfikyasser/
>

Re: [DISCUSS] FLIP-361: Improve GC Metrics

2023-09-05 Thread Xintong Song

Thanks for bringing this up, Gyula.

The proposed changes make sense to me. +1 for them.

In addition to the proposed changes, I wonder if we should also add
something like timePerGc? This would help understand whether there are long
pauses, due to GC STW, that may lead to rpc unresponsiveness and heartbeat
timeouts. Ideally, we'd like to understand the max pause time per STW in a
recent time window. However, I don't see an easy way to separate the pause
time of each STW. Deriving the overall time per GC from the existing
metrics (time-increment / count-increment) seems to be a good alternative.
WDYT?

Best,

Xintong



On Wed, Sep 6, 2023 at 2:16 PM Rui Fan <1996fan...@gmail.com> wrote:

> Thanks for the clarification!
>
> By default the meterview measures for 1 minute sounds good to me!
>
> +1 for this proposal.
>
> Best,
> Rui
>
> On Wed, Sep 6, 2023 at 1:27 PM Gyula Fóra  wrote:
>
> > Thanks for the feedback Rui,
> >
> > The rates would be computed using the MeterView class (like for any other
> > rate metric), just because we report the value per second it doesn't mean
> > that we measure in a second granularity.
> > By default the meterview measures for 1 minute and then we calculate the
> > per second rates, but we can increase the timespan if necessary.
> >
> > So I don't think we run into this problem in practice and we can keep the
> > metric aligned with other time rate metrics like busyTimeMsPerSec etc.
> >
> > Cheers,
> > Gyula
> >
> > On Wed, Sep 6, 2023 at 4:55 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Hi Gyula,
> > >
> > > +1 for this proposal. The current GC metric is really unfriendly.
> > >
> > > I have a concern with your proposed rate metric: the rate is perSecond
> > > instead of per minute. I'm unsure whether it's suitable for GC metric.
> > >
> > > There are two reasons why I suspect perSecond may not be well
> > > compatible with GC metric:
> > >
> > > 1. GCs are usually infrequent and may only occur for a small number
> > > of time periods within a minute.
> > >
> > > Metrics are collected periodically, for example, reported every minute.
> > > If the result reported by the GC metric is 1s/perSecond, it does not
> > > mean that the GC of the TM is serious, because there may be no GC
> > > in the remaining 59s.
> > >
> > > On the contrary, the GC metric reports 0s/perSecond, which does not
> > > mean that the GC of the TM is not serious, and the GC may be very
> > > serious in the remaining 59s.
> > >
> > > 2. Stop-the-world may cause the metric to fail(delay) to report
> > >
> > > The TM will stop the world during GC, especially full GC. It means
> > > the metric cannot be collected or reported during full GC.
> > >
> > > So the collected GC metric may never be 1s/perSecond. This metric
> > > may always be good because the metric will only be reported when
> > > the GC is not severe.
> > >
> > >
> > > If these concerns make sense, how about updating the GC rate
> > > at minute level?
> > >
> > > We can define the type to Gauge for TimeMsPerMiunte, and updating
> > > this Gauge every second, it is:
> > > GC Total.Time of current time - GC total time of one miunte ago.
> > >
> > > Best,
> > > Rui
> > >
> > > On Tue, Sep 5, 2023 at 11:05 PM Maximilian Michels 
> > wrote:
> > >
> > > > Hi Gyula,
> > > >
> > > > +1 The proposed changes make sense and are in line with what is
> > > > available for other metrics, e.g. number of records processed.
> > > >
> > > > -Max
> > > >
> > > > On Tue, Sep 5, 2023 at 2:43 PM Gyula Fóra 
> > wrote:
> > > > >
> > > > > Hi Devs,
> > > > >
> > > > > I would like to start a discussion on FLIP-361: Improve GC Metrics
> > [1].
> > > > >
> > > > > The current Flink GC metrics [2] are not very useful for monitoring
> > > > > purposes as they require post processing logic that is also
> dependent
> > > on
> > > > > the current runtime environment.
> > > > >
> > > > > Problems:
> > > > >  - Total time is not very relevant for long running applications,
> > only
> > > > the
> > > > > rate of change (msPerSec)
> > > > >  - In most cases it's best to simply aggregate the time/count
> across
> > > the
> > > > > different GabrageCollectors, however the specific collectors are
> > > > dependent
> > > > > on the current Java runtime
> > > > >
> > > > > We propose to improve the current situation by:
> > > > >  - Exposing rate metrics per GarbageCollector
> > > > >  - Exposing aggregated Total time/count/rate metrics
> > > > >
> > > > > These new metrics are all derived from the existing ones with
> minimal
> > > > > overhead.
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Cheers,
> > > > > Gyula
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-361%3A+Improve+GC+Metrics
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#garbagecollection
> > > >
> > >
> >
>

44 matches

Mail list logo