[jira] [Created] (FLINK-32497) An exception will be thrown when the condition of the IF FUNCTION is FALSE and the false_value parameter is a function.

2023-06-30 Thread Han Zhuo (Jira)
Han Zhuo created FLINK-32497:


 Summary: An exception will be thrown when the condition of the IF 
FUNCTION is FALSE and the false_value parameter is a function.
 Key: FLINK-32497
 URL: https://issues.apache.org/jira/browse/FLINK-32497
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API, Table SQL / Client
 Environment: {color:#172b4d}- Flink Version -{color}
{color:#172b4d}V{color}{color:#172b4d}ersion: 1.17.1, Commit ID: 2750d5c{color}
 
{color:#172b4d}- Java Version -{color}

{color:#172b4d}java version "1.8.0_202"{color}
{color:#172b4d}Java(TM) SE Runtime Environment (build 1.8.0_202-b08){color}
{color:#172b4d}Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed 
mode){color}
 
Reporter: Han Zhuo
 Attachments: flink-1.17.1_logs_jarieshan_20230630.tgz, 
image-2023-06-30-15-02-13-197.png, image-2023-06-30-15-02-26-099.png, 
image-2023-06-30-15-02-57-082.png, image-2023-06-30-15-07-08-588.png, 
image-2023-06-30-15-09-44-623.png, image-2023-06-30-15-10-08-619.png

It is successful to execute certain functions individually.
{code:java}
SELECT SPLIT_INDEX('TEST:ABC', ':', 0); {code}
!image-2023-06-30-15-02-57-082.png|width=189,height=36!

 

And it is also successful for these functions to be located in the true_value 
parameter of the {color:#172b4d}+IF function+{color}.
{code:java}
SELECT IF(2>1, SPLIT_INDEX('TEST:ABC', ':', 1), 'FALSE'); {code}
!image-2023-06-30-15-02-13-197.png|width=185,height=36!

 

Only when these functions are located in the false_value parameter of the {+}IF 
function{+}, an exception will be thrown.
{code:java}
SELECT IF(2>1, 'TRUE', SPLIT_INDEX('TEST:ABC', ':', 0)); {code}
{color:#172b4d}!image-2023-06-30-15-09-44-623.png|width=385,height=42!{color}
 
 
{color:#172b4d}And it is also successful for{color} +CASE function+
{code:java}
SELECT CASE WHEN 2=1 THEN 'TRUE' ELSE SPLIT_INDEX('TEST:ABC', ':', 0) END; 
{code}
{color:#172b4d}!image-2023-06-30-15-10-08-619.png|width=188,height=41!{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32498) array_max return type should always nullable

2023-06-30 Thread Jacky Lau (Jira)
Jacky Lau created FLINK-32498:
-

 Summary: array_max return type should always nullable
 Key: FLINK-32498
 URL: https://issues.apache.org/jira/browse/FLINK-32498
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.18.0
Reporter: Jacky Lau
 Fix For: 1.18.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32499) Removing dependency of flink-connector-aws on a specific flink-shaded version

2023-06-30 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-32499:
---

 Summary: Removing dependency of flink-connector-aws on a specific 
flink-shaded version
 Key: FLINK-32499
 URL: https://issues.apache.org/jira/browse/FLINK-32499
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / AWS
Reporter: Hong Liang Teoh


We want to improve build compatibility of the `flink-connector-aws` repo on 
upgrading Flink versions. 

If there are changes in the `flink-shaded` in Flink, we can see broken builds 
due to upgraded shaded versions (e.g. Guava).

We want to explicitly state the version of dependencies being used to prevent 
this.

 

 

See https://issues.apache.org/jira/browse/FLINK-32462



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] FLIP-321: introduce an API deprecation process

2023-06-30 Thread Becket Qin
Hi folks,

I'd like to start the VOTE for FLIP-321[1] which proposes to introduce an
API deprecation process to Flink. The discussion thread for the FLIP can be
found here[2].

The vote will be open until at least July 4, following the consensus voting
process.

Thanks,

Jiangjie (Becket) Qin

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-321%3A+Introduce+an+API+deprecation+process
[2] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9


Re: [DISCUSS] FLIP-278: Hybrid Source Connector

2023-06-30 Thread Ran Tao
Hi, Ilya. thanks for your reply.

If your first s3 and second kafka source has same schema. Currently hybrid
table source can work.
For you question

>> But because Flink’s optimizer removes unused fields from internal
records in the batch mode, the problem of inconsistent schema arises at
runtime.

1. actually hybrid source job is a streaming job in most cases (I'm not
sure whether you want to run only with batch).
2. the hybrid source sql api implementation currently also support filter
unused fields which means user no need to do extra works. (we support
ProjectPushDown)
e.g.  you define 3 fields, but only select 2 fields, the length hybrid
generated row is 2 (both batch and streaming are 2).

The processing of ProjectPushDown is consistent for the first batch source
and the subsequent streaming source.

Best Regards,
Ran Tao
https://github.com/chucheng92


Ilya Soin  于2023年5月12日周五 02:39写道:

> Hi, Ran Tao.
> Thanks for the reply!
>
> I agree that a way to manage inconsistent field names / numbers will need
> to be provided and that for POC it’s enough to support the case where the
> batch and streaming schemas are consistent.
>
> However, in the example provided by me, the schemas in batch and streaming
> are consistent: JSONs stored in S3 have exactly the same structure as JSONs
> in Kafka. As an end user, I’d expect this to work without providing
> schema.fields.mappings or schema.virtual-fields. But because Flink’s
> optimizer removes unused fields from internal records in the batch mode,
> the problem of inconsistent schema arises at runtime. Do you have an idea
> how to tackle this in a way that wouldn’t require users to add redundant
> configs in hybrid source definition?
>
> __
> Best regards,
> Ilya Soin
>
> > On 11 May 2023, at 04:44, Ran Tao  wrote:
> >
> > Hi, Илья.
> > Thanks for your opinions!
> >
> > Your are right, and in fact, in addition to the different fields numbers,
> > the names may also be different.
> > Currently, we can also support inconsistent schema, which was discussed
> in
> > the previous design,
> > for example, we can provide a `schema.fields.mappings` parameter.
> >
> > If we have different schema like below:
> > true batch fields is: a, f1, c, f3
> > true streaming fields is: f0, b, f2 (lack of 1 field)
> >
> >
> > 1.about inconsistent field names
> >
> > If user ddl is: f0, f1, f2
> > `schema.fields.mappings`='[{"f0":"a", f2":"C"},{"f1":"b"}]'
> >
> > then in hybrid table source, we generate child batch schema is: a, f1, c,
> > streaming schema is: f0, b, f2 and pass them to final child table source.
> > (note: we not use batch f3 field, just skip is ok)
> >
> > 2.about inconsistent field numbers
> >
> > If user ddl is: f0, f1, f2, f3
> > `schema.fields.mappings`='[{"f0":"a", f2":"C"},{"f1":"b"}]'
> >
> > then in hybrid table source, we generate child batch schema is: a, f1, c,
> > f3,
> > streaming schema has 2 options to set:
> >
> > 1. set f0, b, f2, f3 and pass them to final child table source. (if child
> > source format is k-v mode, f3 will be null)
> >
> > 2. add an option, e.g.`schema.virtual-fields`='[[],["f3"]]' means
> > streaming's field f3 is not existed.
> > then hybrid table source set null for streaming's field f3 actively and
> > just pass f0, b, f2 to child source to call real data.
> >
> > In a word, we can use `schema.fields.mappings` to deal with inconsistent
> > filed name
> > and pass more fields to child source to get null to deal with
> inconsistent
> > field numbers(or add a `schema.virtual-fields` option).
> >
> > But in order to maintain consistency with the current DataStream api,
> > we currently support the case where the batch and streaming schemas are
> > consistent.
> > I will update the POC pr then you can re-run your case. WDYT?
> >
> >
> > Best Regards,
> > Ran Tao
> >
> >
> >
> > Илья Соин mailto:ilya.soin...@gmail.com>>
> 于2023年5月11日周四 03:12写道:
> >
> >> Hi devs,
> >>
> >> I think for this approach to work, the internal record schema generated
> by
> >> Flink must be exactly the same for batch and stream records, because at
> >> runtime Flink will use the same serializer to send them downstream.
> >> However, it’s not always the case, because in batch mode Flink’s
> optimizer
> >> may realize that some fields are never actually used, so the records
> will
> >> not contain those fields. Such optimizations may not be done in the
> >> streaming mode, so records coming from the realtime source will have
> more
> >> fields. In that case, after switching to the realtime source, the job
> will
> >> fail, because record serializer expects records with the batch schema,
> but
> >> instead receives records with more fields and doesn’t know how to
> serialize
> >> them.
> >>
> >> Consider the following DDL:
> >> CREATE TABLE hybrid_table
> >> (
> >>trade ROW(
> >>`openTime` BIGINT,
> >>`closeTime` BIGINT),
> >>server  STRING,
> >>tradeTime as to_timestamp(from_unixtime(trade.openTime)),
> >>WATERMARK FOR trade

[jira] [Created] (FLINK-32500) Update dependency versions for AWS connectors package

2023-06-30 Thread Aleksandr Pilipenko (Jira)
Aleksandr Pilipenko created FLINK-32500:
---

 Summary: Update dependency versions for AWS connectors package
 Key: FLINK-32500
 URL: https://issues.apache.org/jira/browse/FLINK-32500
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / AWS
Affects Versions: aws-connector-4.1.0
Reporter: Aleksandr Pilipenko


Update dependencies:
 * snappy-java from 1.1.8.3 to 1.1.10.1
 * guava from 29.0-jre to 32.0.0-jre



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-06-30 Thread Martijn Visser
Hi Ryan,

I think option 2 and option 3 won't work, because there can be only one
version of the client. I don't think we should make a clean break on
metrics in a minor version, but only in major. All in all, I think option 1
would be the best. We could deprecate the existing one and remove it
with Flink 2.0 imho.

Best regards,

Martijn

On Thu, Jun 29, 2023 at 5:56 PM Ryan van Huuksloot
 wrote:

> Hi Martijn,
>
> Our team shared the same concern. We've considered a few options:
>
>
> *1. Add a new package such as `flink-metrics-prometheus-native` and
> eventually deprecate the original.*
> *Pros:*
> - Supports backward compatibility.
> *Cons:*
> - Two packages to maintain in the interim.
> - Not consistent with other metrics packages.
>
> *2. Maintain the same logic in flink-metrics-prometheus and write new
> natively typed metrics to a different metric name in Prometheus, in
> addition to the original metric.*
>
> *Pros:*
> - Supports backward compatibility.
> *Cons:*
> - Nearly doubles the metrics being captured by default.
> - The naming convention will permanently differ when the original names are
> deprecated.
> - The original names will likely be deprecated at some point.
>
> *3. Maintain the same logic in flink-metrics-prometheus. However, if you
> use a flink-conf option, natively typed metrics would be written to the
> same names instead of the original metric types.*
>
> *Pros:*
> - Supports backwards compatibility
> - No double metrics
> *Cons:*
> - Increases the maintenance burden.
> - Would require future migrations
>
> *4. Make a clean break and swap the types in flink-metrics-prometheus,
> releasing it in 1.18 or 1.19 with a note.*
>
> *Pros:*
> - Avoids duplicate metrics and packages.
> - No future maintenance burden.
> *Cons:*
> -Introduces a breaking change.
> - Metrics may silently fail in dashboards if the graphs do not support the
> new data type (I would need to conduct more testing to determine how often
> this occurs).
>
> I lean towards option 4, and we would communicate the change internally as
> part of a minor version upgrade. I'm open to other ideas and would welcome
> further discussion on what the OSS community prefers.
>
> Thanks,
>
> Ryan van Huuksloot
> Sr. Production Engineer | Streaming Platform
> [image: Shopify]
> 
>
>
> On Thu, Jun 29, 2023 at 4:23 AM Martijn Visser 
> wrote:
>
> > Hi Ryan,
> >
> > I think there definitely is an interest in the
> > flink-metrics-prometheus, but I do see some challenges as well. Given
> > that the Prometheus simpleclient doesn't yet have a major version,
> > there are breaking changes happening in that. If we would update this,
> > it can/probably breaks the metrics for users, which is an undesirable
> > situation. Any thoughts on how we could avoid that situation?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Tue, Jun 20, 2023 at 3:53 PM Ryan van Huuksloot
> >  wrote:
> > >
> > > Following up, any interest in flink-metrics-prometheus? It is quite a
> > stale
> > > package. I would be interested in contributing - time permitting.
> > >
> > > Ryan van Huuksloot
> > > Sr. Production Engineer | Streaming Platform
> > > [image: Shopify]
> > > <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >
> > >
> > >
> > > On Thu, Jun 15, 2023 at 2:16 PM Ryan van Huuksloot <
> > > ryan.vanhuuksl...@shopify.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > Internally we use the flink-metrics-prometheus jar and we noticed
> that
> > the
> > > > code is a little out of date. Primarily, there are new metric types
> in
> > > > Prometheus that would allow for the exporter to write Counters and
> > > > Histograms as Native metrics in prometheus (vs writing as Gauges).
> > > >
> > > > I noticed that there was a closed PR for the simpleclient:
> > > > https://github.com/apache/flink/pull/21047 - which has what is
> > required
> > > > for the native metrics but may cause other maintenance tickets.
> > > >
> > > > Is there any appetite from the community to update this exporter?
> > > >
> > > > Thanks,
> > > >
> > > > Ryan van Huuksloot
> > > > Sr. Production Engineer | Streaming Platform
> > > > [image: Shopify]
> > > > <
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> > > >
> >
>


[jira] [Created] (FLINK-32501) Wrong execution plan of a proctime window aggregation generated due to incorrect cost evaluation

2023-06-30 Thread lincoln lee (Jira)
lincoln lee created FLINK-32501:
---

 Summary: Wrong execution plan of a proctime window aggregation 
generated due to incorrect cost evaluation
 Key: FLINK-32501
 URL: https://issues.apache.org/jira/browse/FLINK-32501
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.17.1, 1.16.2
Reporter: lincoln lee
Assignee: lincoln lee
 Fix For: 1.18.0, 1.17.2


Currently when uses window aggregation referring a windowing tvf with a filter 
condition, may encounter wrong plan which may hang forever in runtime(the 
window aggregate operator never output)

for such a case:
{code}
insert into sink
select
window_start,
window_end,
b,
COALESCE(sum(case
when a = 11
then 1
end), 0) c
from
TABLE(
TUMBLE(TABLE source, DESCRIPTOR(proctime), INTERVAL '10' SECONDS)
)
where
a in (1, 5, 7, 9, 11)
GROUP BY
window_start, window_end, b
{code}

generate wrong plan which didn't combine the proctime WindowTableFunction into 
WindowAggregate (so when translate to execution plan the WindowAggregate will 
wrongly recognize the window as an event-time window, then the 
WindowAggregateOperator will not receive watermark nor setup timers to fire any 
windows in runtime)
{code}
Sink(table=[default_catalog.default_database.sink], fields=[ws, we, b, c])
+- Calc(select=[CAST(window_start AS TIMESTAMP(6)) AS ws, CAST(window_end AS 
TIMESTAMP(6)) AS we, b, CAST(COALESCE($f1, 0) AS BIGINT) AS c])
   +- WindowAggregate(groupBy=[b], window=[TUMBLE(win_start=[window_start], 
win_end=[window_end], size=[10 s])], select=[b, SUM($f3) AS $f1, start('w$) AS 
window_start, end('w$) AS window_end])
  +- Exchange(distribution=[hash[b]])
 +- Calc(select=[window_start, window_end, b, CASE((a = 11), 1, 
null:INTEGER) AS $f3], where=[SEARCH(a, Sarg[1, 5, 7, 9, 11])])
+- WindowTableFunction(window=[TUMBLE(time_col=[proctime], size=[10 
s])])
   +- Calc(select=[a, b, PROCTIME() AS proctime])
  +- TableSourceScan(table=[[default_catalog, default_database, 
source, project=[a, b], metadata=[]]], fields=[a, b])
{code}

expected plan:
{code}
Sink(table=[default_catalog.default_database.sink], fields=[ws, we, b, c])
+- Calc(select=[CAST(window_start AS TIMESTAMP(6)) AS ws, CAST(window_end AS 
TIMESTAMP(6)) AS we, b, CAST(COALESCE($f1, 0) AS BIGINT) AS c])
   +- WindowAggregate(groupBy=[b], window=[TUMBLE(time_col=[proctime], size=[10 
s])], select=[b, SUM($f3) AS $f1, start('w$) AS window_start, end('w$) AS 
window_end])
  +- Exchange(distribution=[hash[b]])
 +- Calc(select=[b, CASE((a = 11), 1, null:INTEGER) AS $f3, PROCTIME() 
AS proctime], where=[SEARCH(a, Sarg[1, 5, 7, 9, 11])])
+- TableSourceScan(table=[[default_catalog, default_database, 
source, project=[a, b], metadata=[]]], fields=[a, b])
{code}






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-06-30 Thread Ryan van Huuksloot
Hi Martijn,

Option 2 and 3 would use a single client. It would just register the
metrics differently.

Does that make sense? Does that change your perspective?

Thanks,

Ryan van Huuksloot
Sr. Production Engineer | Streaming Platform
[image: Shopify]



On Fri, Jun 30, 2023 at 7:49 AM Martijn Visser 
wrote:

> Hi Ryan,
>
> I think option 2 and option 3 won't work, because there can be only one
> version of the client. I don't think we should make a clean break on
> metrics in a minor version, but only in major. All in all, I think option 1
> would be the best. We could deprecate the existing one and remove it
> with Flink 2.0 imho.
>
> Best regards,
>
> Martijn
>
> On Thu, Jun 29, 2023 at 5:56 PM Ryan van Huuksloot
>  wrote:
>
> > Hi Martijn,
> >
> > Our team shared the same concern. We've considered a few options:
> >
> >
> > *1. Add a new package such as `flink-metrics-prometheus-native` and
> > eventually deprecate the original.*
> > *Pros:*
> > - Supports backward compatibility.
> > *Cons:*
> > - Two packages to maintain in the interim.
> > - Not consistent with other metrics packages.
> >
> > *2. Maintain the same logic in flink-metrics-prometheus and write new
> > natively typed metrics to a different metric name in Prometheus, in
> > addition to the original metric.*
> >
> > *Pros:*
> > - Supports backward compatibility.
> > *Cons:*
> > - Nearly doubles the metrics being captured by default.
> > - The naming convention will permanently differ when the original names
> are
> > deprecated.
> > - The original names will likely be deprecated at some point.
> >
> > *3. Maintain the same logic in flink-metrics-prometheus. However, if you
> > use a flink-conf option, natively typed metrics would be written to the
> > same names instead of the original metric types.*
> >
> > *Pros:*
> > - Supports backwards compatibility
> > - No double metrics
> > *Cons:*
> > - Increases the maintenance burden.
> > - Would require future migrations
> >
> > *4. Make a clean break and swap the types in flink-metrics-prometheus,
> > releasing it in 1.18 or 1.19 with a note.*
> >
> > *Pros:*
> > - Avoids duplicate metrics and packages.
> > - No future maintenance burden.
> > *Cons:*
> > -Introduces a breaking change.
> > - Metrics may silently fail in dashboards if the graphs do not support
> the
> > new data type (I would need to conduct more testing to determine how
> often
> > this occurs).
> >
> > I lean towards option 4, and we would communicate the change internally
> as
> > part of a minor version upgrade. I'm open to other ideas and would
> welcome
> > further discussion on what the OSS community prefers.
> >
> > Thanks,
> >
> > Ryan van Huuksloot
> > Sr. Production Engineer | Streaming Platform
> > [image: Shopify]
> >  >
> >
> >
> > On Thu, Jun 29, 2023 at 4:23 AM Martijn Visser  >
> > wrote:
> >
> > > Hi Ryan,
> > >
> > > I think there definitely is an interest in the
> > > flink-metrics-prometheus, but I do see some challenges as well. Given
> > > that the Prometheus simpleclient doesn't yet have a major version,
> > > there are breaking changes happening in that. If we would update this,
> > > it can/probably breaks the metrics for users, which is an undesirable
> > > situation. Any thoughts on how we could avoid that situation?
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Tue, Jun 20, 2023 at 3:53 PM Ryan van Huuksloot
> > >  wrote:
> > > >
> > > > Following up, any interest in flink-metrics-prometheus? It is quite a
> > > stale
> > > > package. I would be interested in contributing - time permitting.
> > > >
> > > > Ryan van Huuksloot
> > > > Sr. Production Engineer | Streaming Platform
> > > > [image: Shopify]
> > > > <
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > >
> > > >
> > > >
> > > > On Thu, Jun 15, 2023 at 2:16 PM Ryan van Huuksloot <
> > > > ryan.vanhuuksl...@shopify.com> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Internally we use the flink-metrics-prometheus jar and we noticed
> > that
> > > the
> > > > > code is a little out of date. Primarily, there are new metric types
> > in
> > > > > Prometheus that would allow for the exporter to write Counters and
> > > > > Histograms as Native metrics in prometheus (vs writing as Gauges).
> > > > >
> > > > > I noticed that there was a closed PR for the simpleclient:
> > > > > https://github.com/apache/flink/pull/21047 - which has what is
> > > required
> > > > > for the native metrics but may cause other maintenance tickets.
> > > > >
> > > > > Is there any appetite from the community to update this exporter?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Ryan van Huuksloot
> > > > > Sr. Production Engineer | Streaming Platform
> > > > > [image: Shopify]
> > > > > <
> > >
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
> > > >

Re: Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-06-30 Thread Martijn Visser
Hi Patrick,

Yeah, but you would need the latest version of the client, which would
break the implementation for the current, outdated one, wouldn't it?

Best regards,

Martijn

On Fri, Jun 30, 2023 at 3:35 PM Ryan van Huuksloot
 wrote:

> Hi Martijn,
>
> Option 2 and 3 would use a single client. It would just register the
> metrics differently.
>
> Does that make sense? Does that change your perspective?
>
> Thanks,
>
> Ryan van Huuksloot
> Sr. Production Engineer | Streaming Platform
> [image: Shopify]
> 
>
>
> On Fri, Jun 30, 2023 at 7:49 AM Martijn Visser 
> wrote:
>
> > Hi Ryan,
> >
> > I think option 2 and option 3 won't work, because there can be only one
> > version of the client. I don't think we should make a clean break on
> > metrics in a minor version, but only in major. All in all, I think
> option 1
> > would be the best. We could deprecate the existing one and remove it
> > with Flink 2.0 imho.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Thu, Jun 29, 2023 at 5:56 PM Ryan van Huuksloot
> >  wrote:
> >
> > > Hi Martijn,
> > >
> > > Our team shared the same concern. We've considered a few options:
> > >
> > >
> > > *1. Add a new package such as `flink-metrics-prometheus-native` and
> > > eventually deprecate the original.*
> > > *Pros:*
> > > - Supports backward compatibility.
> > > *Cons:*
> > > - Two packages to maintain in the interim.
> > > - Not consistent with other metrics packages.
> > >
> > > *2. Maintain the same logic in flink-metrics-prometheus and write new
> > > natively typed metrics to a different metric name in Prometheus, in
> > > addition to the original metric.*
> > >
> > > *Pros:*
> > > - Supports backward compatibility.
> > > *Cons:*
> > > - Nearly doubles the metrics being captured by default.
> > > - The naming convention will permanently differ when the original names
> > are
> > > deprecated.
> > > - The original names will likely be deprecated at some point.
> > >
> > > *3. Maintain the same logic in flink-metrics-prometheus. However, if
> you
> > > use a flink-conf option, natively typed metrics would be written to the
> > > same names instead of the original metric types.*
> > >
> > > *Pros:*
> > > - Supports backwards compatibility
> > > - No double metrics
> > > *Cons:*
> > > - Increases the maintenance burden.
> > > - Would require future migrations
> > >
> > > *4. Make a clean break and swap the types in flink-metrics-prometheus,
> > > releasing it in 1.18 or 1.19 with a note.*
> > >
> > > *Pros:*
> > > - Avoids duplicate metrics and packages.
> > > - No future maintenance burden.
> > > *Cons:*
> > > -Introduces a breaking change.
> > > - Metrics may silently fail in dashboards if the graphs do not support
> > the
> > > new data type (I would need to conduct more testing to determine how
> > often
> > > this occurs).
> > >
> > > I lean towards option 4, and we would communicate the change internally
> > as
> > > part of a minor version upgrade. I'm open to other ideas and would
> > welcome
> > > further discussion on what the OSS community prefers.
> > >
> > > Thanks,
> > >
> > > Ryan van Huuksloot
> > > Sr. Production Engineer | Streaming Platform
> > > [image: Shopify]
> > > <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >
> > >
> > >
> > > On Thu, Jun 29, 2023 at 4:23 AM Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi Ryan,
> > > >
> > > > I think there definitely is an interest in the
> > > > flink-metrics-prometheus, but I do see some challenges as well. Given
> > > > that the Prometheus simpleclient doesn't yet have a major version,
> > > > there are breaking changes happening in that. If we would update
> this,
> > > > it can/probably breaks the metrics for users, which is an undesirable
> > > > situation. Any thoughts on how we could avoid that situation?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Tue, Jun 20, 2023 at 3:53 PM Ryan van Huuksloot
> > > >  wrote:
> > > > >
> > > > > Following up, any interest in flink-metrics-prometheus? It is
> quite a
> > > > stale
> > > > > package. I would be interested in contributing - time permitting.
> > > > >
> > > > > Ryan van Huuksloot
> > > > > Sr. Production Engineer | Streaming Platform
> > > > > [image: Shopify]
> > > > > <
> > >
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jun 15, 2023 at 2:16 PM Ryan van Huuksloot <
> > > > > ryan.vanhuuksl...@shopify.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Internally we use the flink-metrics-prometheus jar and we noticed
> > > that
> > > > the
> > > > > > code is a little out of date. Primarily, there are new metric
> types
> > > in
> > > > > > Prometheus that would allow for the exporter to write Counters
> and
> > > > > > Histograms as Native metrics in prometheus (vs writing as
> Gauges).
> > > > > 

Re: Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-06-30 Thread Ryan van Huuksloot
I'd have to check but the original plan is to upgrade the client but keep
the flink-metrics-prometheus implementation the same. This should keep the
metrics collection consistent even with the client upgrade - this would
need to be verified.

But if that is the worry then we could create a new package to keep things
distinct.

Thanks,

Ryan van Huuksloot
Sr. Production Engineer | Streaming Platform
[image: Shopify]



On Fri, Jun 30, 2023 at 10:02 AM Martijn Visser 
wrote:

> Hi Patrick,
>
> Yeah, but you would need the latest version of the client, which would
> break the implementation for the current, outdated one, wouldn't it?
>
> Best regards,
>
> Martijn
>
> On Fri, Jun 30, 2023 at 3:35 PM Ryan van Huuksloot
>  wrote:
>
> > Hi Martijn,
> >
> > Option 2 and 3 would use a single client. It would just register the
> > metrics differently.
> >
> > Does that make sense? Does that change your perspective?
> >
> > Thanks,
> >
> > Ryan van Huuksloot
> > Sr. Production Engineer | Streaming Platform
> > [image: Shopify]
> >  >
> >
> >
> > On Fri, Jun 30, 2023 at 7:49 AM Martijn Visser  >
> > wrote:
> >
> > > Hi Ryan,
> > >
> > > I think option 2 and option 3 won't work, because there can be only one
> > > version of the client. I don't think we should make a clean break on
> > > metrics in a minor version, but only in major. All in all, I think
> > option 1
> > > would be the best. We could deprecate the existing one and remove it
> > > with Flink 2.0 imho.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Thu, Jun 29, 2023 at 5:56 PM Ryan van Huuksloot
> > >  wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > Our team shared the same concern. We've considered a few options:
> > > >
> > > >
> > > > *1. Add a new package such as `flink-metrics-prometheus-native` and
> > > > eventually deprecate the original.*
> > > > *Pros:*
> > > > - Supports backward compatibility.
> > > > *Cons:*
> > > > - Two packages to maintain in the interim.
> > > > - Not consistent with other metrics packages.
> > > >
> > > > *2. Maintain the same logic in flink-metrics-prometheus and write new
> > > > natively typed metrics to a different metric name in Prometheus, in
> > > > addition to the original metric.*
> > > >
> > > > *Pros:*
> > > > - Supports backward compatibility.
> > > > *Cons:*
> > > > - Nearly doubles the metrics being captured by default.
> > > > - The naming convention will permanently differ when the original
> names
> > > are
> > > > deprecated.
> > > > - The original names will likely be deprecated at some point.
> > > >
> > > > *3. Maintain the same logic in flink-metrics-prometheus. However, if
> > you
> > > > use a flink-conf option, natively typed metrics would be written to
> the
> > > > same names instead of the original metric types.*
> > > >
> > > > *Pros:*
> > > > - Supports backwards compatibility
> > > > - No double metrics
> > > > *Cons:*
> > > > - Increases the maintenance burden.
> > > > - Would require future migrations
> > > >
> > > > *4. Make a clean break and swap the types in
> flink-metrics-prometheus,
> > > > releasing it in 1.18 or 1.19 with a note.*
> > > >
> > > > *Pros:*
> > > > - Avoids duplicate metrics and packages.
> > > > - No future maintenance burden.
> > > > *Cons:*
> > > > -Introduces a breaking change.
> > > > - Metrics may silently fail in dashboards if the graphs do not
> support
> > > the
> > > > new data type (I would need to conduct more testing to determine how
> > > often
> > > > this occurs).
> > > >
> > > > I lean towards option 4, and we would communicate the change
> internally
> > > as
> > > > part of a minor version upgrade. I'm open to other ideas and would
> > > welcome
> > > > further discussion on what the OSS community prefers.
> > > >
> > > > Thanks,
> > > >
> > > > Ryan van Huuksloot
> > > > Sr. Production Engineer | Streaming Platform
> > > > [image: Shopify]
> > > > <
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > >
> > > >
> > > >
> > > > On Thu, Jun 29, 2023 at 4:23 AM Martijn Visser <
> > martijnvis...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Hi Ryan,
> > > > >
> > > > > I think there definitely is an interest in the
> > > > > flink-metrics-prometheus, but I do see some challenges as well.
> Given
> > > > > that the Prometheus simpleclient doesn't yet have a major version,
> > > > > there are breaking changes happening in that. If we would update
> > this,
> > > > > it can/probably breaks the metrics for users, which is an
> undesirable
> > > > > situation. Any thoughts on how we could avoid that situation?
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > On Tue, Jun 20, 2023 at 3:53 PM Ryan van Huuksloot
> > > > >  wrote:
> > > > > >
> > > > > > Following up, any interest in flink-metrics-prometheus? It is
> > quite a
> > > >

Re: Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-06-30 Thread Martijn Visser
Hi Ryan,

If I look at the changelog for the simpleclient 0.10 [1], they've switched
their data model. So if you upgrade to the later version, the data model
for existing Flink Prometheus users would be broken IIRC. That's why I
think option 1 is more clean: it provides the option to the user to choose
which package they want to use. Either the new one, with a new data model,
or the current one, with the existing data model.

Best regards,

Martijn

[1] https://github.com/prometheus/client_java/releases/tag/parent-0.10.0

On Fri, Jun 30, 2023 at 4:23 PM Ryan van Huuksloot
 wrote:

> I'd have to check but the original plan is to upgrade the client but keep
> the flink-metrics-prometheus implementation the same. This should keep the
> metrics collection consistent even with the client upgrade - this would
> need to be verified.
>
> But if that is the worry then we could create a new package to keep things
> distinct.
>
> Thanks,
>
> Ryan van Huuksloot
> Sr. Production Engineer | Streaming Platform
> [image: Shopify]
> 
>
>
> On Fri, Jun 30, 2023 at 10:02 AM Martijn Visser 
> wrote:
>
> > Hi Patrick,
> >
> > Yeah, but you would need the latest version of the client, which would
> > break the implementation for the current, outdated one, wouldn't it?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Fri, Jun 30, 2023 at 3:35 PM Ryan van Huuksloot
> >  wrote:
> >
> > > Hi Martijn,
> > >
> > > Option 2 and 3 would use a single client. It would just register the
> > > metrics differently.
> > >
> > > Does that make sense? Does that change your perspective?
> > >
> > > Thanks,
> > >
> > > Ryan van Huuksloot
> > > Sr. Production Engineer | Streaming Platform
> > > [image: Shopify]
> > > <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >
> > >
> > >
> > > On Fri, Jun 30, 2023 at 7:49 AM Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi Ryan,
> > > >
> > > > I think option 2 and option 3 won't work, because there can be only
> one
> > > > version of the client. I don't think we should make a clean break on
> > > > metrics in a minor version, but only in major. All in all, I think
> > > option 1
> > > > would be the best. We could deprecate the existing one and remove it
> > > > with Flink 2.0 imho.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Thu, Jun 29, 2023 at 5:56 PM Ryan van Huuksloot
> > > >  wrote:
> > > >
> > > > > Hi Martijn,
> > > > >
> > > > > Our team shared the same concern. We've considered a few options:
> > > > >
> > > > >
> > > > > *1. Add a new package such as `flink-metrics-prometheus-native` and
> > > > > eventually deprecate the original.*
> > > > > *Pros:*
> > > > > - Supports backward compatibility.
> > > > > *Cons:*
> > > > > - Two packages to maintain in the interim.
> > > > > - Not consistent with other metrics packages.
> > > > >
> > > > > *2. Maintain the same logic in flink-metrics-prometheus and write
> new
> > > > > natively typed metrics to a different metric name in Prometheus, in
> > > > > addition to the original metric.*
> > > > >
> > > > > *Pros:*
> > > > > - Supports backward compatibility.
> > > > > *Cons:*
> > > > > - Nearly doubles the metrics being captured by default.
> > > > > - The naming convention will permanently differ when the original
> > names
> > > > are
> > > > > deprecated.
> > > > > - The original names will likely be deprecated at some point.
> > > > >
> > > > > *3. Maintain the same logic in flink-metrics-prometheus. However,
> if
> > > you
> > > > > use a flink-conf option, natively typed metrics would be written to
> > the
> > > > > same names instead of the original metric types.*
> > > > >
> > > > > *Pros:*
> > > > > - Supports backwards compatibility
> > > > > - No double metrics
> > > > > *Cons:*
> > > > > - Increases the maintenance burden.
> > > > > - Would require future migrations
> > > > >
> > > > > *4. Make a clean break and swap the types in
> > flink-metrics-prometheus,
> > > > > releasing it in 1.18 or 1.19 with a note.*
> > > > >
> > > > > *Pros:*
> > > > > - Avoids duplicate metrics and packages.
> > > > > - No future maintenance burden.
> > > > > *Cons:*
> > > > > -Introduces a breaking change.
> > > > > - Metrics may silently fail in dashboards if the graphs do not
> > support
> > > > the
> > > > > new data type (I would need to conduct more testing to determine
> how
> > > > often
> > > > > this occurs).
> > > > >
> > > > > I lean towards option 4, and we would communicate the change
> > internally
> > > > as
> > > > > part of a minor version upgrade. I'm open to other ideas and would
> > > > welcome
> > > > > further discussion on what the OSS community prefers.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Ryan van Huuksloot
> > > > > Sr. Production Engineer | Streaming Platform
> > > > > [image: Shopify]
> > > > > <
> > >
> https://www.shopify.com/?utm_medium=salessignatures&ut

Re: [DISCUSS] FLIP-309: Enable operators to trigger checkpoints dynamically

2023-06-30 Thread Piotr Nowojski
Hey,

Sorry for a late reply, I was OoO for a week. I have three things to point
out.

1. ===

The updated proposal is indeed better, but to be honest I still don't like
it, for mostly the same reasons that I have mentioned earlier:
- only a partial solution, that doesn't address all use cases, so we would
need to throw it away sooner or later
- I don't see and it hasn't been discussed how to make it work out of the
box for all sources
- somehow complicating API for people implementing Sources
- it should work out of the box for most of the sources, or at least to
have that potential in the future

On top of that:
- the FLIP I think is missing how to hook up SplitEnumeratorContext and
CheckpointCoordinator to pass "isProcessingBacklog"
- the FLIP suggests to use the long checkpointing interval as long as any
subtask is processing the backlog. Are you sure that's the right call? What
if other
  sources are producing fresh records, and those fresh records are reaching
sinks? It could happen either with disjoint JobGraph, embarrassing parallel
  JobGraph (no keyBy/unions/joins), or even with keyBy. Fresh records can
slip using a not backpressured input channel through generally backpressured
  keyBy exchange. How should we handle that? This problem I think will
affect every solution, including my previously proposed generic one, but we
should
  discuss how to handle that as well.

2. ===

Regarding the current proposal, there might be a way to make it actually
somehow generic (but not pluggable). But it might require slightly different
interfaces. We could keep the idea that SourceCoordinator/SplitEnumerator
is responsible for switching between slow/fast processing modes. It could be
implemented to achieve something like in the FLIP-309 proposal, but apart
of that, the default behaviour would be a built in mechanism working like
this:
1. Every SourceReaderBase checks its metrics and its state, to decide if it
considers itself as "processingBacklog" or "veryBackpressured". The base
implementation could do it via a similar mechanism as I was proposing
previously, via looking at the busy/backPressuredTimeMsPerSecond,
pendingRecords and processing rate.
2. SourceReaderBase could send an event with
"processingBacklog"/"veryBackpressured" state.
3. SourceCoordinator would collect those events, and decide what should it
do, whether it should switch whole source to the
"processingBacklog"/"veryBackpressured" state or not.

That could provide eventually a generic solution that works for every
source that reports the required metrics. Each source implementation could
decide
whether to use that default behaviour, or if maybe it's better to override
the default, or combine default with something custom (like HybridSource).

And as a first step, we could implement that mechanism only on the
SourceCoordinator side, without events, without the default generic
solution and use
it in the HybridSource/MySQL CDC.

This approach has some advantages compared to my previous proposal:
  + no need to tinker with metrics and pushing metrics from TMs to JM
  + somehow communicating this information via Events seems a bit cleaner
to me and avoids problems with freshness of the metrics
And some issues:
  - I don't know if it can be made pluggable in the future. If a user could
implement a custom `CheckpointTrigger` that would automatically work with
all/most
of the pre-existing sources?
  - I don't know if it can be expanded if needed in the future, to make
decisions based on operators in the middle of a jobgraph.

3. ===

Independent of that, during some brainstorming between me, Chesnay and
Stefan Richter, an idea popped up, that I think could be a counter proposal
as
an intermediate solution that probably effectively works the same way as
current FLIP-309.

Inside a HybridSource, from it's SplitEnumerator#snapshotState method, can
not you throw an exception like
`new CheckpointException(TOO_MANY_CHECKPOINT_REQUESTS)` or `new
CheckpointException(TRIGGER_CHECKPOINT_FAILURE)`?
Actually we could also introduce a dedicated `CheckpointFailureReason` for
that purpose and handle it some special way in some places (like maybe hide
such rejected checkpoints from the REST API/WebUI). We could elaborate on
this a bit more, but after a brief thinking  I could see it actually
working well
enough without any public facing changes. But I might be wrong here.

If this feature actually grabs traction, we could expand it to something
more sophisticated available via a public API in the future.

===

Sorry for disturbing this FLIP discussion and voting.

Best,
Piotrek

czw., 29 cze 2023 o 05:08 feng xiangyu  napisał(a):

> Hi Dong,
>
> Thanks for your quick reply. I think this has truly solved our problem and
> will enable us upgrade our existing jobs more seamless.
>
> Best,
> Xiangyu
>
> Dong Lin  于2023年6月29日周四 10:50写道:
>
> > Hi Feng,
> >
> > Thanks for the feedback. Yes, you can configur

Re: [VOTE] FLIP-309: Support using larger checkpointing interval when source is processing backlog

2023-06-30 Thread Piotr Nowojski
Hey,

Sorry to disturb this voting, but after discussing this thoroughly with
Chesnay and Stefan Richter I have to vote:
 -1 (binding)
mainly to suspend the current voting thread. Please take a look at my mail
at dev mailing list.

Best,
Piotrek

czw., 29 cze 2023 o 14:59 feng xiangyu  napisał(a):

> +1 (non-binding)
>
> Best,
> Xiangyu
>
> yuxia  于2023年6月29日周四 20:44写道:
>
> > +1 (binding)
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Yuepeng Pan" 
> > 收件人: "dev" 
> > 发送时间: 星期四, 2023年 6 月 29日 下午 8:21:14
> > 主题: Re: [VOTE] FLIP-309: Support using larger checkpointing interval when
> > source is processing backlog
> >
> > +1  non-binding.
> >
> >
> > Best.
> > Yuepeng Pan
> >
> >
> >  Replied Message 
> > | From | Jingsong Li |
> > | Date | 06/29/2023 13:25 |
> > | To | dev |
> > | Cc | flink.zhouyunfeng |
> > | Subject | Re: [VOTE] FLIP-309: Support using larger checkpointing
> > interval when source is processing backlog |
> > +1 binding
> >
> > On Thu, Jun 29, 2023 at 11:03 AM Dong Lin  wrote:
> > >
> > > Hi all,
> > >
> > > We would like to start the vote for FLIP-309: Support using larger
> > > checkpointing interval when source is processing backlog [1]. This FLIP
> > was
> > > discussed in this thread [2].
> > >
> > > Flink 1.18 release will feature freeze on July 11. We hope to make this
> > > feature available in Flink 1.18.
> > >
> > > The vote will be open until at least July 4th (at least 72 hours),
> > following
> > > the consensus voting process.
> > >
> > > Cheers,
> > > Yunfeng and Dong
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog
> > > [2] https://lists.apache.org/thread/l1l7f30h7zldjp6ow97y70dcthx7tl37
> >
>


[jira] [Created] (FLINK-32502) Remove AbstractLeaderElectionService

2023-06-30 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-32502:
-

 Summary: Remove AbstractLeaderElectionService
 Key: FLINK-32502
 URL: https://issues.apache.org/jira/browse/FLINK-32502
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Matthias Pohl


{{AbstractLeaderElectionService}} doesn't bring much value anymore and can be 
removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32503) FLIP-285 technical debt

2023-06-30 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-32503:
-

 Summary: FLIP-285 technical debt
 Key: FLINK-32503
 URL: https://issues.apache.org/jira/browse/FLINK-32503
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Matthias Pohl


This Jira issue collects any technical debt related to the FLIP-285 efforts 
that were introduced with FLINK-26522



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32504) DefaultLeaderElectionService#running field can be removed

2023-06-30 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-32504:
-

 Summary: DefaultLeaderElectionService#running field can be removed
 Key: FLINK-32504
 URL: https://issues.apache.org/jira/browse/FLINK-32504
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.18.0
Reporter: Matthias Pohl


The running property of {{DefaultLeaderElectionService}} can be removed. The 
same functionality can be reflected through {{leaderElectionDriver != null && 
!leadershipOperationExecutor.isShutdown()}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-309: Support using larger checkpointing interval when source is processing backlog

2023-06-30 Thread Chesnay Schepler

-1 (binding)

I feel like this FLIP needs a bit more time in the oven.

It seems to be very light on actual details; you can summarize the 
entire changes section as "The enumerator calls this method and then 
another checkpoint interval is used."
I would love to know how this is wired into the triggering of 
checkpoints, what the behavior is with multiple sources, if a sink is 
allowed to set this at any point or just once, what the semantics of a 
"backlog" are for sources other than Hybrid/ MySQL CDC (because catching 
up after a failover is a common enough pattern), whether/how this 
information could also be interesting for the scheduler (because we may 
want to avoid rescalings during the backlog processing), whether the 
backlog processing should be exposed as a metric for users (or for that 
matter, how we inform users at all that we're using a different 
checkpoint interval at this time).


Following my discussion with Piotr and Stefan I'm also not sure how 
future-proof the proposed API really is. Already I feel like the name 
"setIsProcessingBacklog()" is rather specific for the state of the 
source (making it technically wrong to call it in other situations like 
being backpressured (again, depending on what "backlog processing" even 
means)), while not being clear on what this actually results in. The 
javadocs don't even mention the checkpointing interval at all, but 
instead reference downstream optimizations that, afaict, aren't 
mentioned in the FLIP.


I'd be very hesitant with marking it as public from the get-go. Ideally 
it would maybe even be added as a separate interface (somehow).


On 30/06/2023 16:37, Piotr Nowojski wrote:

Hey,

Sorry to disturb this voting, but after discussing this thoroughly with
Chesnay and Stefan Richter I have to vote:
  -1 (binding)
mainly to suspend the current voting thread. Please take a look at my mail
at dev mailing list.

Best,
Piotrek

czw., 29 cze 2023 o 14:59 feng xiangyu  napisał(a):


+1 (non-binding)

Best,
Xiangyu

yuxia  于2023年6月29日周四 20:44写道:


+1 (binding)

Best regards,
Yuxia

- 原始邮件 -
发件人: "Yuepeng Pan" 
收件人: "dev" 
发送时间: 星期四, 2023年 6 月 29日 下午 8:21:14
主题: Re: [VOTE] FLIP-309: Support using larger checkpointing interval when
source is processing backlog

+1  non-binding.


Best.
Yuepeng Pan


 Replied Message 
| From | Jingsong Li |
| Date | 06/29/2023 13:25 |
| To | dev |
| Cc | flink.zhouyunfeng |
| Subject | Re: [VOTE] FLIP-309: Support using larger checkpointing
interval when source is processing backlog |
+1 binding

On Thu, Jun 29, 2023 at 11:03 AM Dong Lin  wrote:

Hi all,

We would like to start the vote for FLIP-309: Support using larger
checkpointing interval when source is processing backlog [1]. This FLIP

was

discussed in this thread [2].

Flink 1.18 release will feature freeze on July 11. We hope to make this
feature available in Flink 1.18.

The vote will be open until at least July 4th (at least 72 hours),

following

the consensus voting process.

Cheers,
Yunfeng and Dong

[1]


https://cwiki.apache.org/confluence/display/FLINK/FLIP-309%3A+Support+using+larger+checkpointing+interval+when+source+is+processing+backlog

[2] https://lists.apache.org/thread/l1l7f30h7zldjp6ow97y70dcthx7tl37





[jira] [Created] (FLINK-32505) Compilation error in ProducerMergedPartitionFileWriter

2023-06-30 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-32505:
-

 Summary: Compilation error in ProducerMergedPartitionFileWriter
 Key: FLINK-32505
 URL: https://issues.apache.org/jira/browse/FLINK-32505
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Reporter: Matthias Pohl


Caused by FLINK-31644 and the flink-shaded upgrade



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32506) Add the watermark aggregation benchmark for source coordinator

2023-06-30 Thread Rui Fan (Jira)
Rui Fan created FLINK-32506:
---

 Summary: Add the watermark aggregation benchmark for source 
coordinator
 Key: FLINK-32506
 URL: https://issues.apache.org/jira/browse/FLINK-32506
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Affects Versions: 1.18.0
Reporter: Rui Fan
Assignee: Rui Fan
 Fix For: 1.18.0


FLINK-32420 is improving the watermark aggregation performance.

We want to add a benchmark for it first, and then we can see the official 
performance change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32507) Document KafkaSink SinkWriterMetricGroup metrics

2023-06-30 Thread Mason Chen (Jira)
Mason Chen created FLINK-32507:
--

 Summary: Document KafkaSink SinkWriterMetricGroup metrics
 Key: FLINK-32507
 URL: https://issues.apache.org/jira/browse/FLINK-32507
 Project: Flink
  Issue Type: Technical Debt
  Components: Connectors / Kafka
Reporter: Mason Chen


SinkWriterMetricGroup metrics that KafkaSink implements are not documented



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32508) Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-06-30 Thread Ryan van Huuksloot (Jira)
Ryan van Huuksloot created FLINK-32508:
--

 Summary: Flink-Metrics Prometheus - Native Histograms / Native 
Counters
 Key: FLINK-32508
 URL: https://issues.apache.org/jira/browse/FLINK-32508
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / Metrics
Reporter: Ryan van Huuksloot
 Fix For: 1.18.0, 1.19.0


There are new metric types in Prometheus that would allow for the exporter to 
write Counters and Histograms as Native metrics in prometheus (vs writing as 
Gauges). This requires an update to the Prometheus Client which has changed 
it's spec.

To accommodate the new metric types while retaining the old option for 
prometheus metrics, the recommendation is to *Add a new package such as 
`flink-metrics-prometheus-native` and eventually deprecate the original.*

Discussed more on the mailing list: 
https://lists.apache.org/thread/kbo3973whb8nj5xvkpvhxrmgtmnbkhlv



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Flink-Metrics Prometheus - Native Histograms / Native Counters

2023-06-30 Thread Ryan van Huuksloot
Sounds good - I haven't looked too much into the comparison of the old and
new client.

Opened a ticket: https://issues.apache.org/jira/browse/FLINK-32508

I may have some time to support this ticket in 2 weeks but anyone can feel
free to start working on it.

Thanks,
Ryan van Huuksloot
Sr. Production Engineer | Streaming Platform
[image: Shopify]



On Fri, Jun 30, 2023 at 10:34 AM Martijn Visser 
wrote:

> Hi Ryan,
>
> If I look at the changelog for the simpleclient 0.10 [1], they've switched
> their data model. So if you upgrade to the later version, the data model
> for existing Flink Prometheus users would be broken IIRC. That's why I
> think option 1 is more clean: it provides the option to the user to choose
> which package they want to use. Either the new one, with a new data model,
> or the current one, with the existing data model.
>
> Best regards,
>
> Martijn
>
> [1] https://github.com/prometheus/client_java/releases/tag/parent-0.10.0
>
> On Fri, Jun 30, 2023 at 4:23 PM Ryan van Huuksloot
>  wrote:
>
> > I'd have to check but the original plan is to upgrade the client but keep
> > the flink-metrics-prometheus implementation the same. This should keep
> the
> > metrics collection consistent even with the client upgrade - this would
> > need to be verified.
> >
> > But if that is the worry then we could create a new package to keep
> things
> > distinct.
> >
> > Thanks,
> >
> > Ryan van Huuksloot
> > Sr. Production Engineer | Streaming Platform
> > [image: Shopify]
> >  >
> >
> >
> > On Fri, Jun 30, 2023 at 10:02 AM Martijn Visser <
> martijnvis...@apache.org>
> > wrote:
> >
> > > Hi Patrick,
> > >
> > > Yeah, but you would need the latest version of the client, which would
> > > break the implementation for the current, outdated one, wouldn't it?
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Fri, Jun 30, 2023 at 3:35 PM Ryan van Huuksloot
> > >  wrote:
> > >
> > > > Hi Martijn,
> > > >
> > > > Option 2 and 3 would use a single client. It would just register the
> > > > metrics differently.
> > > >
> > > > Does that make sense? Does that change your perspective?
> > > >
> > > > Thanks,
> > > >
> > > > Ryan van Huuksloot
> > > > Sr. Production Engineer | Streaming Platform
> > > > [image: Shopify]
> > > > <
> > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > > >
> > > >
> > > >
> > > > On Fri, Jun 30, 2023 at 7:49 AM Martijn Visser <
> > martijnvis...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Hi Ryan,
> > > > >
> > > > > I think option 2 and option 3 won't work, because there can be only
> > one
> > > > > version of the client. I don't think we should make a clean break
> on
> > > > > metrics in a minor version, but only in major. All in all, I think
> > > > option 1
> > > > > would be the best. We could deprecate the existing one and remove
> it
> > > > > with Flink 2.0 imho.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > On Thu, Jun 29, 2023 at 5:56 PM Ryan van Huuksloot
> > > > >  wrote:
> > > > >
> > > > > > Hi Martijn,
> > > > > >
> > > > > > Our team shared the same concern. We've considered a few options:
> > > > > >
> > > > > >
> > > > > > *1. Add a new package such as `flink-metrics-prometheus-native`
> and
> > > > > > eventually deprecate the original.*
> > > > > > *Pros:*
> > > > > > - Supports backward compatibility.
> > > > > > *Cons:*
> > > > > > - Two packages to maintain in the interim.
> > > > > > - Not consistent with other metrics packages.
> > > > > >
> > > > > > *2. Maintain the same logic in flink-metrics-prometheus and write
> > new
> > > > > > natively typed metrics to a different metric name in Prometheus,
> in
> > > > > > addition to the original metric.*
> > > > > >
> > > > > > *Pros:*
> > > > > > - Supports backward compatibility.
> > > > > > *Cons:*
> > > > > > - Nearly doubles the metrics being captured by default.
> > > > > > - The naming convention will permanently differ when the original
> > > names
> > > > > are
> > > > > > deprecated.
> > > > > > - The original names will likely be deprecated at some point.
> > > > > >
> > > > > > *3. Maintain the same logic in flink-metrics-prometheus. However,
> > if
> > > > you
> > > > > > use a flink-conf option, natively typed metrics would be written
> to
> > > the
> > > > > > same names instead of the original metric types.*
> > > > > >
> > > > > > *Pros:*
> > > > > > - Supports backwards compatibility
> > > > > > - No double metrics
> > > > > > *Cons:*
> > > > > > - Increases the maintenance burden.
> > > > > > - Would require future migrations
> > > > > >
> > > > > > *4. Make a clean break and swap the types in
> > > flink-metrics-prometheus,
> > > > > > releasing it in 1.18 or 1.19 with a note.*
> > > > > >
> > > > > > *Pros:*
> > > > > > - Avoids duplicate metrics and packages.
> > > > >