[jira] [Created] (FLINK-28484) UniqueKey inference should respect the semantics of null as well as consider the possibility of precision loss from implicit conversions

2022-07-11 Thread lincoln lee (Jira)
lincoln lee created FLINK-28484:
---

 Summary: UniqueKey inference should respect the semantics of null 
as well as consider the possibility of precision loss from implicit conversions
 Key: FLINK-28484
 URL: https://issues.apache.org/jira/browse/FLINK-28484
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.1
Reporter: lincoln lee


FLINK-15313 introduces a new branch in UniqueKey inference of projection, but 
does not respect the semantics of null and does not consider the possible loss 
of precision due to implicit conversion, which may lead to a wrong 
determination of whether a column is unique or not, and consequently to a wrong 
execution plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28485) Structure examples better and add README/doc

2022-07-11 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-28485:
--

 Summary: Structure examples better and add README/doc
 Key: FLINK-28485
 URL: https://issues.apache.org/jira/browse/FLINK-28485
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Gyula Fora
 Fix For: kubernetes-operator-1.1.0


With the growing number of (more complex) examples, the examples directory is 
simply becoming very messy.

We should try to structure the examples better, separete simple yaml based 
operator examples from example projects and add a general README/doc page for 
this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Creating benchmark channel in Apache Flink slack

2022-07-11 Thread Yuan Mei
+1 (binding) & thanks for the efforts!

Best
Yuan



On Mon, Jul 11, 2022 at 2:08 PM Yun Gao 
wrote:

> +1 (binding)
>
> Thanks Anton for driving this!
>
>
> Best,
> Yun Gao
>
>
> --
> From:Anton Kalashnikov 
> Send Time:2022 Jul. 8 (Fri.) 22:59
> To:undefined 
> Subject:[VOTE] Creating benchmark channel in Apache Flink slack
>
> Hi everyone,
>
>
> I would like to start a vote for creating the new channel in Apache
> Flink slack for sending benchamrk's result to it. This should help the
> community to notice the performance degradation on time.
>
> The discussion of this idea can be found here[1]. The ticket for this is
> here[2].
>
>
> [1] https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
>
> [2] https://issues.apache.org/jira/browse/FLINK-28468
>
> --
>
> Best regards,
> Anton Kalashnikov


[jira] [Created] (FLINK-28486) [Flink][doc] Flink FileSystem SQL Connector Doc is not right

2022-07-11 Thread DingGeGe (Jira)
DingGeGe created FLINK-28486:


 Summary: [Flink][doc] Flink FileSystem SQL Connector Doc is not 
right
 Key: FLINK-28486
 URL: https://issues.apache.org/jira/browse/FLINK-28486
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.15.1
Reporter: DingGeGe
 Attachments: image-2022-07-11-16-48-12-692.png, 
image-2022-07-11-16-51-18-166.png

[https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/table/filesystem/]

English documents and Chinese documents are inconsistent.

For English doc:

!image-2022-07-11-16-48-12-692.png!

For Chinese doc:

!image-2022-07-11-16-51-18-166.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Creating benchmark channel in Apache Flink slack

2022-07-11 Thread godfrey he
+1, Thanks for driving this!

Best,
Godfrey

Yuan Mei  于2022年7月11日周一 16:13写道:
>
> +1 (binding) & thanks for the efforts!
>
> Best
> Yuan
>
>
>
> On Mon, Jul 11, 2022 at 2:08 PM Yun Gao 
> wrote:
>
> > +1 (binding)
> >
> > Thanks Anton for driving this!
> >
> >
> > Best,
> > Yun Gao
> >
> >
> > --
> > From:Anton Kalashnikov 
> > Send Time:2022 Jul. 8 (Fri.) 22:59
> > To:undefined 
> > Subject:[VOTE] Creating benchmark channel in Apache Flink slack
> >
> > Hi everyone,
> >
> >
> > I would like to start a vote for creating the new channel in Apache
> > Flink slack for sending benchamrk's result to it. This should help the
> > community to notice the performance degradation on time.
> >
> > The discussion of this idea can be found here[1]. The ticket for this is
> > here[2].
> >
> >
> > [1] https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> >
> > [2] https://issues.apache.org/jira/browse/FLINK-28468
> >
> > --
> >
> > Best regards,
> > Anton Kalashnikov


[jira] [Created] (FLINK-28487) Introduce configurable RateLimitingStrategy for Async Sink

2022-07-11 Thread Hong Liang Teoh (Jira)
Hong Liang Teoh created FLINK-28487:
---

 Summary: Introduce configurable RateLimitingStrategy for Async Sink
 Key: FLINK-28487
 URL: https://issues.apache.org/jira/browse/FLINK-28487
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Common
Reporter: Hong Liang Teoh
 Fix For: 1.16.0


Introduce a configurable RateLimitingStrategy to the AsyncSinkWriter.

This change will allow sink implementers using AsyncSinkWriter to configure 
their own RateLimitingStrategy instead of using the default 
AIMDRateLimitingStrategy.

See [FLIP-242: Introduce configurable RateLimitingStrategy for Async 
Sink|https://cwiki.apache.org/confluence/display/FLINK/FLIP-242%3A+Introduce+configurable+RateLimitingStrategy+for+Async+Sink].

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28488) KafkaMetricWrapper does incorrect cast

2022-07-11 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-28488:


 Summary: KafkaMetricWrapper does incorrect cast
 Key: FLINK-28488
 URL: https://issues.apache.org/jira/browse/FLINK-28488
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.15.1, 1.16.0
Reporter: Chesnay Schepler
 Fix For: 1.16.0, 1.15.2


Same as FLINK-27487, which unfortunately missed 1 of 2 wrapper classes 
({{{}KafkaMetricWrapper{}}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28489) FLIP-240: Introduce "ANALYZE TABLE" Syntax

2022-07-11 Thread godfrey he (Jira)
godfrey he created FLINK-28489:
--

 Summary: FLIP-240: Introduce "ANALYZE TABLE" Syntax
 Key: FLINK-28489
 URL: https://issues.apache.org/jira/browse/FLINK-28489
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0


see [FLIP 
doc|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=217386481] 
for more details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28490) Introduce "ANALYZE TABLE" Syntax in sql parser

2022-07-11 Thread godfrey he (Jira)
godfrey he created FLINK-28490:
--

 Summary: Introduce "ANALYZE TABLE" Syntax in sql parser
 Key: FLINK-28490
 URL: https://issues.apache.org/jira/browse/FLINK-28490
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28491) Introduce APPROX_COUNT_DISTINCT aggregate function for batch sql

2022-07-11 Thread godfrey he (Jira)
godfrey he created FLINK-28491:
--

 Summary: Introduce APPROX_COUNT_DISTINCT aggregate function for 
batch sql
 Key: FLINK-28491
 URL: https://issues.apache.org/jira/browse/FLINK-28491
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28492) Support "ANALYZE TABLE" execution

2022-07-11 Thread godfrey he (Jira)
godfrey he created FLINK-28492:
--

 Summary: Support "ANALYZE TABLE" execution
 Key: FLINK-28492
 URL: https://issues.apache.org/jira/browse/FLINK-28492
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28493) Add document to describe "ANALYZE TABLE" syntax

2022-07-11 Thread godfrey he (Jira)
godfrey he created FLINK-28493:
--

 Summary: Add document to describe "ANALYZE TABLE" syntax
 Key: FLINK-28493
 URL: https://issues.apache.org/jira/browse/FLINK-28493
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: godfrey he
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28494) Add new job status states to CRD

2022-07-11 Thread Daren Wong (Jira)
Daren Wong created FLINK-28494:
--

 Summary: Add new job status states to CRD
 Key: FLINK-28494
 URL: https://issues.apache.org/jira/browse/FLINK-28494
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Daren Wong
 Fix For: kubernetes-operator-1.1.0


The following job info are currently not available in the 
[jobStatus|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/status/JobStatus.java]
 under the Operator 
[CRD|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml]:
 # endTime – job’s end time, defaults to “-1” for streaming job.
 # duration – duration of the running job from start to now.
 # jobPlan – contains job graph info such as parallelism.



We propose to add the 3 states above into the CRD. Although it might be an 
internal requirement for KDA to integrate with Flink Kubernetes Operator, we 
foresee that these 3 states could be beneficial to other Flink Kubernetes 
Operator users/applications as well. These 3 states are readily available via 
Flink REST client endpoints (I.e “/jobs/:jobid”), which means that the states 
are exposed for Flink dashboard and other applications. Therefore, we see 
potential value to others from exposing these 3 job status states in the CRD.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

2022-07-11 Thread WONG, DAREN
Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team. I had a 
quick chat with Gyula as I propose to include a few additional fields in the 
jobStatus CRD for Flink Kubernetes Operator such as:

- endTime
- duration
- jobPlan

Further details of each states can be found 
here.
 Although addition of these 3 states stem from an internal requirement, I think 
they would be beneficial to others who uses these states in their application 
as well. The list of states above are not exhaustive, so do let me know if 
there are other states that you would like to include together in this 
iteration cycle.

JIRA: https://issues.apache.org/jira/browse/FLINK-28494


Re: [VOTE] Creating benchmark channel in Apache Flink slack

2022-07-11 Thread Márton Balassi
+1 (binding). Thanks!

I can help you with the Slack admin steps if needed.

On Mon, Jul 11, 2022 at 10:55 AM godfrey he  wrote:

> +1, Thanks for driving this!
>
> Best,
> Godfrey
>
> Yuan Mei  于2022年7月11日周一 16:13写道:
> >
> > +1 (binding) & thanks for the efforts!
> >
> > Best
> > Yuan
> >
> >
> >
> > On Mon, Jul 11, 2022 at 2:08 PM Yun Gao 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > Thanks Anton for driving this!
> > >
> > >
> > > Best,
> > > Yun Gao
> > >
> > >
> > > --
> > > From:Anton Kalashnikov 
> > > Send Time:2022 Jul. 8 (Fri.) 22:59
> > > To:undefined 
> > > Subject:[VOTE] Creating benchmark channel in Apache Flink slack
> > >
> > > Hi everyone,
> > >
> > >
> > > I would like to start a vote for creating the new channel in Apache
> > > Flink slack for sending benchamrk's result to it. This should help the
> > > community to notice the performance degradation on time.
> > >
> > > The discussion of this idea can be found here[1]. The ticket for this
> is
> > > here[2].
> > >
> > >
> > > [1] https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > >
> > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > >
> > > --
> > >
> > > Best regards,
> > > Anton Kalashnikov
>


Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

2022-07-11 Thread Őrhidi Mátyás
Hi Daren,

At the moment the Operator fetches the job state via
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
which contains the 'end-time' and 'duration' fields already. I feel calling
the
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
after the previous call for every job in every reconcile loop would be too
expensive.

Best,
Matyas

On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN 
wrote:

> Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team. I had
> a quick chat with Gyula as I propose to include a few additional fields in
> the jobStatus CRD for Flink Kubernetes Operator such as:
>
> - endTime
> - duration
> - jobPlan
>
> Further details of each states can be found here<
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java>.
> Although addition of these 3 states stem from an internal requirement, I
> think they would be beneficial to others who uses these states in their
> application as well. The list of states above are not exhaustive, so do let
> me know if there are other states that you would like to include together
> in this iteration cycle.
>
> JIRA: https://issues.apache.org/jira/browse/FLINK-28494
>


Re: [DISCUSS] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-11 Thread Gen Luo
Hi, everyone.

Thanks for your feedback.
If there are no more concerns or comments, I will start the vote tomorrow.

Gen Luo  于 2022年7月11日周一 11:12写道:

> Hi Lijie and Zhu,
>
> Thanks for the suggestion. I agree that the name "Blocked Free Slots" is
> more clear to users.
> I'll take the suggestion and update the FLIP.
>
> On Fri, Jul 8, 2022 at 9:12 PM Zhu Zhu  wrote:
>
>> I agree that it can be more useful to show the number of slots that are
>> free but blocked. Currently users infer the slots in use by subtracting
>> available slots from the total slots. With blocked slots introduced, this
>> can be achieved by subtracting available slots and blocked free slots
>> from the total slots.
>>
>> Therefore, +1 to show "Blocked Free Slots" on the resource card.
>>
>> Thanks,
>> Zhu
>>
>> Lijie Wang  于2022年7月8日周五 17:39写道:
>> >
>> > Hi Gen & Zhu,
>> >
>> > -> 1. Can we also show "Blocked Slots" in the resource card, so that
>> users
>> > can easily figure out how many slots are available/blocked/in-use?
>> >
>> > I think we should describe the "available" and "blocked" more clearly.
>> In
>> > my opinion, I think users should be interested in the number of slots in
>> > the following 3 state:
>> > 1. free and unblocked, I think it's OK to call this state "available".
>> > 2. free and blocked, I think it's not appropriate to call "blocked"
>> > directly, because "blocked" should include both the "free and blocked"
>> and
>> > "in-use and blocked".
>> > 3. in-use
>> >
>> > And the sum of the aboved 3 kind of slots should be the total number of
>> > slots in this cluster.
>> >
>> > WDYT?
>> >
>> > Best,
>> > Lijie
>> >
>> > Gen Luo  于2022年7月8日周五 16:14写道:
>> >
>> > > Hi Zhu,
>> > > Thanks for the feedback!
>> > >
>> > > 1.Good idea. Users should be more familiar with the slots as the
>> resource
>> > > units.
>> > >
>> > > 2.You remind me that the "speculative attempts" are execution attempts
>> > > started by the SpeculativeScheduler when slot tasks are detected,
>> while the
>> > > current execution attempts other than the "most current" one are not
>> really
>> > > the speculative attempts. I agree we should modify the field name.
>> > >
>> > > 3.ArchivedSpeculativeExecutionVertex seems to be introduced with the
>> > > speculative execution to handle the speculative attempts as a part of
>> the
>> > > execution history. Since this FLIP is handling the attempts with a
>> more
>> > > proper way, I agree that we can remove the
>> > > ArchivedSpeculativeExecutionVertex.
>> > >
>> > > Thanks again and I'll update the FLIP later according to these
>> suggestions.
>> > >
>> > > On Thu, Jul 7, 2022 at 4:35 PM Zhu Zhu  wrote:
>> > >
>> > > > Thanks for writing this FLIP and initiating the discussion, Gen,
>> Yun and
>> > > > Junhan!
>> > > > It will be very useful to have these improvements on the web UI for
>> > > > speculative execution users, allowing them to know what is
>> happening.
>> > > > I just have a few comment regarding the design details:
>> > > >
>> > > > 1. Can we also show "Blocked Slots" in the resource card, so that
>> users
>> > > > can easily figure out how many slots are available/blocked/in-use?
>> > > > 2. I think "speculative-attempts" is not accurate, because the
>> > > > root/fastest current can be a specualtive execution attempt, and in
>> > > > this case "speculative-attempts" will contain the intial execution
>> > > > attempt. How about name it as "other-concurrent-attempts"?
>> > > > 3. I think ArchivedSpeculativeExecutionVertex is not necessarily
>> > > > needed. We can rework the ArchivedExecutionVertex to contains a set
>> of
>> > > > current execution attempts. The set will have one only element in
>> > > > non-speculative cases though. In this way, we can have a unified
>> > > > processing for ArchivedExecutionVertex in
>> speculative/non-speculative
>> > > > cases.
>> > > >
>> > > > Thanks,
>> > > > Zhu
>> > > >
>> > > > Gen Luo  于2022年7月5日周二 15:10写道:
>> > > >
>> > > > >
>> > > > > Hi everyone,
>> > > > >
>> > > > > The speculative execution for batch jobs has been proposed and
>> accepted
>> > > > in
>> > > > > FLIP-168[1], as well as the related blocklist mechanism in
>> FLIP-224[2].
>> > > > As
>> > > > > a follow-up step, the Flink Web UI needs to be enhanced to
>> display the
>> > > > > related information if the speculative execution mechanism is
>> enabled.
>> > > > >
>> > > > > Junhan Yang, Yun Gao and I would like to start the discussion
>> about the
>> > > > Web
>> > > > > UI enhancement and the corresponding REST API changes in
>> FLIP-249[3],
>> > > > > including:
>> > > > > - show the speculative executions in the subtask list and the
>> > > > backpressure
>> > > > > page, where the fastest is shown directly while others are folded;
>> > > > > - show the number of the blocked task managers in the Task
>> Managers and
>> > > > > Slots card, when the number is not 0;
>> > > > > - show the BLOCKED label in the task manager list and the task
>> manager
>> > > > > detail page f

Re: [ACTION REQUESTED] Flink 1.16 Release Sync update

2022-07-11 Thread Martijn Visser
Hi Xingbo,

I'm a bit torn between that: we can definitely extend the feature freeze by
two weeks, but I'm worried that extending the deadline now means that
everyone will just start pushing their PRs even later. From my perspective,
we should only extend the feature freeze if we have CI problems or if
there's a feature we consider crucial that won't make it without the
extension. If the feature you're planning for 1.16 is not done before the
feature freeze, that just means that it moves to the next Flink release. As
a compromise, how about we first extend it by one week?

We can definitely follow that practice, but then we should make all
committers aware that they can't merge PRs with new features / with code
that should not end up in Flink 1.16 until the release branch is cut.

Best regards,

Martijn

Op do 7 jul. 2022 om 14:45 schreef Xingbo Huang :

> Hi Martjin,
>
> I have discussed offline with several contributors from China and found
> that most of them thought it difficult to finish the on-going features by
> 25th July. As there is consensus[1][2] that the feature freeze will happen
> by the end of July with potentially two weeks delay, do you think it makes
> sense to postpone the feature freeze date a bit(maybe two weeks or so) as
> there are still 44 planned tasks are on-going and only 8 of them are
> finished[3]?
>
> Regarding the release branch cut time, in previous releases, after feature
> freeze, it usually waits about two weeks or so before cutting the release
> branch. This could avoid pushing to two branches at the same time which may
> happen more frequently than usual. Should we also follow that practice in
> this release?
>
> Best,
> Xingbo
>
> [1] https://lists.apache.org/thread/ghfb5xdjy7tv0zqlrxvh3hcsc740w4ml
> [2] https://www.mail-archive.com/dev@flink.apache.org/msg56702.html
> [3] https://cwiki.apache.org/confluence/display/FLINK/1.16+Release
>
> Martijn Visser  于2022年7月5日周二 17:32写道:
>
> > Hi everyone,
> >
> > We've just finished our 1.16 release sync update and would like to update
> > you on the current status and request for your help:
> >
> > 1. The Flink 1.16 release branch is expected to be cut in 3 weeks, which
> > means that we will cut the release branch and start with the release
> > testing. New features that didn't make that time will be postponed until
> > the Flink release after this one.
> >
> > 2. Please help make it easier for the release managers by making sure
> that
> > the Flink 1.16 release page is updated [1]. This means that:
> >
> > * There should be a Jira link on the release page
> > * This is assigned to someone
> > * The release page has the latest state reflected
> > * When a Jira ticket is completed, proper release notes have been added
> to
> > the Jira ticket.
> >
> > 3. There are numerous test instabilities which we need help with to get
> > fixed. We currently have 1 release blocker and numerous critical test
> > stability tickets open. You can see an overview of these also on the
> > release page [2]. Please help the release managers with:
> >
> > * Volunteering to pick up a test instability ticket. You can ping me
> > (martijnvisser) to get it assigned to you.
> > * Please also mark the ticket as 'In Progress' when you're working on it
> > and 'Stop Progress' if you're not working on it anymore, so that we know
> > who is working on which ticket.
> >
> > Current open test instability tickets that needs a volunteer to review
> > and/or create fix are https://issues.apache.org/jira/browse/FLINK-26979
> > (reviewer), https://issues.apache.org/jira/browse/FLINK-28398,
> > https://issues.apache.org/jira/browse/FLINK-28392 and
> > https://issues.apache.org/jira/browse/FLINK-26402.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/1.16+Release
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/1.16+Release#id-1.16Release-2022-07-05
> >
>


[jira] [Created] (FLINK-28495) Fix typos or mistakes of Flink CEP Document in the official website

2022-07-11 Thread Biao Geng (Jira)
Biao Geng created FLINK-28495:
-

 Summary: Fix typos or mistakes of Flink CEP Document in the 
official website
 Key: FLINK-28495
 URL: https://issues.apache.org/jira/browse/FLINK-28495
 Project: Flink
  Issue Type: Improvement
Reporter: Biao Geng


"Will generate the following matches for an input sequence: C D A1 A2 A3 D A4 
B. with combinations enabled: {C A1 B}, {C A1 A2 B}, {C A1 A3 B}, {C A1 A4 B}, 
{C A1 A2 A3 B}, {C A1 A2 A4 B}, {C A1 A3 A4 B}, {C A1 A2 A3 A4 B}" -> "Will 
generate the following matches for an input sequence: C D A1 A2 A3 D A4 B. with 
combinations enabled: {C A1 B}, {C A1 A2 B}, {C A1 A3 B}, {C A1 A4 B}, {C A1 A2 
A3 B}, {C A1 A2 A4 B}, {C A1 A3 A4 B}, {C A1 A2 A3 A4 B}, {C A2 B}, {C A2 A3 
B}, {C A2 A4 B}, {C A2 A3 A4 B}, {C A3 B}, {C A3 A4 B}, {C A4 B}"
"For SKIP_TO_FIRST/LAST there are two options how to handle cases when there 
are no elements mapped to the specified variable." -> "For SKIP_TO_FIRST/LAST 
there are two options how to handle cases when there are no events mapped to 
the *PatternName*."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [RESULT][VOTE] FLIP-242: Introduce configurable RateLimitingStrategy for Async Sink

2022-07-11 Thread Piotr Nowojski
Thanks for driving this effort!

Best,
Piotrek

sob., 2 lip 2022 o 09:03 Teoh, Hong 
napisał(a):

> Hi Flink community!
>
> FLIP-242 [1] has been accepted.
>
> There are 3 binding votes [2]
>
>   *   Piotrek Nowojski (binding)
>   *   Martijn Visser (binding)
>   *   Danny Cranmer (binding)
>
> None against.
>
> Thanks everyone for the support!
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-242%3A+Introduce+configurable+RateLimitingStrategy+for+Async+Sink
> [2] https://lists.apache.org/thread/9z3nfgv57g4cmwcl90r4h5tct9h2qgvv
>
>
> Regards,
> Hong
>


[jira] [Created] (FLINK-28496) Expose label selector support in k8s operator

2022-07-11 Thread Jira
Márton Balassi created FLINK-28496:
--

 Summary: Expose label selector support in k8s operator
 Key: FLINK-28496
 URL: https://issues.apache.org/jira/browse/FLINK-28496
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Márton Balassi


The JOSDK has a 
[labelselector|https://github.com/java-operator-sdk/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ControllerConfigurationOverrider.java#L34]
 which can let users filter the custom resources watched. We should expose this 
via our config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Creating benchmark channel in Apache Flink slack

2022-07-11 Thread Alexander Fedulov
+1. Thanks, Anton.

Best,
Alexander Fedulov



On Mon, Jul 11, 2022 at 3:17 PM Márton Balassi 
wrote:

> +1 (binding). Thanks!
>
> I can help you with the Slack admin steps if needed.
>
> On Mon, Jul 11, 2022 at 10:55 AM godfrey he  wrote:
>
> > +1, Thanks for driving this!
> >
> > Best,
> > Godfrey
> >
> > Yuan Mei  于2022年7月11日周一 16:13写道:
> > >
> > > +1 (binding) & thanks for the efforts!
> > >
> > > Best
> > > Yuan
> > >
> > >
> > >
> > > On Mon, Jul 11, 2022 at 2:08 PM Yun Gao 
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Thanks Anton for driving this!
> > > >
> > > >
> > > > Best,
> > > > Yun Gao
> > > >
> > > >
> > > > --
> > > > From:Anton Kalashnikov 
> > > > Send Time:2022 Jul. 8 (Fri.) 22:59
> > > > To:undefined 
> > > > Subject:[VOTE] Creating benchmark channel in Apache Flink slack
> > > >
> > > > Hi everyone,
> > > >
> > > >
> > > > I would like to start a vote for creating the new channel in Apache
> > > > Flink slack for sending benchamrk's result to it. This should help
> the
> > > > community to notice the performance degradation on time.
> > > >
> > > > The discussion of this idea can be found here[1]. The ticket for this
> > is
> > > > here[2].
> > > >
> > > >
> > > > [1] https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > > >
> > > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Anton Kalashnikov
> >
>


Not able to run a simple table API file streaming sink

2022-07-11 Thread Jaya Ananthram
Hello There,

I am trying to write a simple table API S3 streaming sink using flink
1.15.1 and I am facing the following exception,

Caused by: org.apache.flink.util.SerializedThrowable:
S3RecoverableFsDataOutputStream cannot sync state to S3. Use persist() to
create a persistent recoverable intermediate point.
at
org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.sync(RefCountedBufferingFileStream.java:111)
~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
at
org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.sync(S3RecoverableFsDataOutputStream.java:129)
~[flink-s3-fs-hadoop-1.15.1.jar:1.15.1]
at
org.apache.flink.formats.csv.CsvBulkWriter.finish(CsvBulkWriter.java:110)
~[flink-csv-1.15.1.jar:1.15.1]
at
org.apache.flink.connector.file.table.FileSystemTableSink$ProjectionBulkFactory$1.finish(FileSystemTableSink.java:642)
~[flink-connector-files-1.15.1.jar:1.15.1]
at
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:64)
~[flink-file-sink-common-1.15.1.jar:1.15.1]
at
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:263)
~[flink-streaming-java-1.15.1.jar:1.15.1]
at
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:305)
~[flink-streaming-java-1.15.1.jar:1.15.1]
at
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:277)
~[flink-streaming-java-1.15.1.jar:1.15.1]
at
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:270)
~[flink-streaming-java-1.15.1.jar:1.15.1]
at
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:261)
~[flink-streaming-java-1.15.1.jar:1.15.1]
at
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.snapshotState(StreamingFileSinkHelper.java:87)
~[flink-streaming-java-1.15.1.jar:1.15.1]
at
org.apache.flink.connector.file.table.stream.AbstractStreamingWriter.snapshotState(AbstractStreamingWriter.java:129)
~[flink-connector-files-1.15.1.jar:1.15.1]

In my config, I am trying to read from Kafka and write to S3 (s3a) using
table API and checkpoint configuration using s3p (presto). Even I tried
with a simple datagen example instead of Kafka and I am getting the same
issue. I think I am following all the exact steps mentioned in the docs and
the above exceptions are not much helpful. Exactly it is failing when the
code triggers the checkpoint but I don't have any clue after this. Could
someone please help me to understand what I am missing here? I don't find
any open issue with such logs.

Hoping for your reply.

-- 


*This email was
sent by a company owned by Financial Times Group Limited 
("FT Group "), registered office 
at Bracken House, One Friday Street, London, EC4M 9BT. Registered in 
England and Wales with company
number 879531. This e-mail may contain 
confidential information. If you
are not the intended recipient, please 
notify the sender immediately, delete
all copies and do not distribute it 
further.  It could* *also
contain personal views which are not necessarily 
those of the FT Group. 
We may monitor outgoing or incoming emails as 
permitted by law.*


Re: [DISCUSS] FLIP-248: Introduce dynamic partition pruning

2022-07-11 Thread Jingsong Li
Thanks Godfrey for driving.

I like this FLIP.

We can restrict this capability to more than just partitions.
Here are some inputs from Lake Storage.

The format of the splits generated by Lake Storage is roughly as follows:
Split {
   Path filePath;
   Statistics[] fieldStats;
}

Stats contain the min and max of each column.

If the storage is sorted by a column, this means that the split
filtering on that column will be very good, so not only the partition
field, but also this column is worthy of being pushed down the
RuntimeFilter.
This information can only be known by source, so I suggest that source
return which fields are worthy of being pushed down.

My overall point is:
This FLIP can be extended to support Source Runtime Filter push-down
for all fields, not just dynamic partition pruning.

What do you think?

Best,
Jingsong

On Fri, Jul 8, 2022 at 10:12 PM godfrey he  wrote:
>
> Hi all,
>
> I would like to open a discussion on FLIP-248: Introduce dynamic
> partition pruning.
>
>  Currently, Flink supports static partition pruning: the conditions in
> the WHERE clause are analyzed
> to determine in advance which partitions can be safely skipped in the
> optimization phase.
> Another common scenario: the partitions information is not available
> in the optimization phase but in the execution phase.
> That's the problem this FLIP is trying to solve: dynamic partition
> pruning, which could reduce the partition table source IO.
>
> The query pattern looks like:
> select * from store_returns, date_dim where sr_returned_date_sk =
> d_date_sk and d_year = 2000
>
> We will introduce a mechanism for detecting dynamic partition pruning
> patterns in optimization phase
> and performing partition pruning at runtime by sending the dimension
> table results to the SplitEnumerator
> of fact table via existing coordinator mechanism.
>
> You can find more details in FLIP-248 document[1].
> Looking forward to your any feedback.
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-248%3A+Introduce+dynamic+partition+pruning
> [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-248
>
>
> Best,
> Godfrey


Re: [VOTE] Creating benchmark channel in Apache Flink slack

2022-07-11 Thread Yun Tang
+ 1 (binding, I think this voting acts as voting a FLIP).



Best,
Yun Tang

From: Alexander Fedulov 
Sent: Tuesday, July 12, 2022 5:27
To: dev 
Subject: Re: [VOTE] Creating benchmark channel in Apache Flink slack

+1. Thanks, Anton.

Best,
Alexander Fedulov



On Mon, Jul 11, 2022 at 3:17 PM Márton Balassi 
wrote:

> +1 (binding). Thanks!
>
> I can help you with the Slack admin steps if needed.
>
> On Mon, Jul 11, 2022 at 10:55 AM godfrey he  wrote:
>
> > +1, Thanks for driving this!
> >
> > Best,
> > Godfrey
> >
> > Yuan Mei  于2022年7月11日周一 16:13写道:
> > >
> > > +1 (binding) & thanks for the efforts!
> > >
> > > Best
> > > Yuan
> > >
> > >
> > >
> > > On Mon, Jul 11, 2022 at 2:08 PM Yun Gao 
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Thanks Anton for driving this!
> > > >
> > > >
> > > > Best,
> > > > Yun Gao
> > > >
> > > >
> > > > --
> > > > From:Anton Kalashnikov 
> > > > Send Time:2022 Jul. 8 (Fri.) 22:59
> > > > To:undefined 
> > > > Subject:[VOTE] Creating benchmark channel in Apache Flink slack
> > > >
> > > > Hi everyone,
> > > >
> > > >
> > > > I would like to start a vote for creating the new channel in Apache
> > > > Flink slack for sending benchamrk's result to it. This should help
> the
> > > > community to notice the performance degradation on time.
> > > >
> > > > The discussion of this idea can be found here[1]. The ticket for this
> > is
> > > > here[2].
> > > >
> > > >
> > > > [1] https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > > >
> > > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Anton Kalashnikov
> >
>


[jira] [Created] (FLINK-28497) resource leak when job failed with unknown status In Application Mode

2022-07-11 Thread lihe ma (Jira)
lihe ma created FLINK-28497:
---

 Summary: resource leak when job failed with unknown status In 
Application Mode
 Key: FLINK-28497
 URL: https://issues.apache.org/jira/browse/FLINK-28497
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.13.1
Reporter: lihe ma


I found a job restarted for thousands of times, and jobmanager tried to create 
a new taskmanager pod every time.  The jobmanager restarted because submitted 
with duplicate  job id[1] (we preset the jobId rather than generate), but I 
hadn't save the logs unfortunately. 

this job requires one taskmanager pod in normal circumstances, but thousands of 
pods were leaked finally.
!image-2022-07-12-11-02-43-009.png|width=666,height=366!



In application mode, cluster resources will be released  when job finished in 
succeeded, failed or canceled status[2][3] . When some exception happen, the 
job may be terminated in unknown status[4] .  

In this case, the job exited with unknown status , without releasing  
taskmanager pods. So is it reasonable to not release taskmanager when job 
exited in unknown status ? 

 

 

one line in original logs:
2022-07-01 09:45:40,712 [main] INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Terminating cluster 
entrypoint process KubernetesApplicationClusterEntrypoint with exit code 1445.

 

[1] 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L452]

[2] 
[https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java#L90-L91]


[3] 
[https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java#L175]

[4] 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L39]

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28498) resource leak when job failed with unknown status In Application Mode

2022-07-11 Thread lihe ma (Jira)
lihe ma created FLINK-28498:
---

 Summary: resource leak when job failed with unknown status In 
Application Mode
 Key: FLINK-28498
 URL: https://issues.apache.org/jira/browse/FLINK-28498
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.13.1
Reporter: lihe ma
 Attachments: cluster-pod-error.png

I found a job restarted for thousands of times, and jobmanager tried to create 
a new taskmanager pod every time.  The jobmanager restarted because submitted 
with duplicate  job id[1] (we preset the jobId rather than generate), but I 
hadn't save the logs unfortunately. 

this job requires one taskmanager pod in normal circumstances, but thousands of 
pods were leaked finally.
!image-2022-07-12-11-02-43-009.png|width=666,height=366!



In application mode, cluster resources will be released  when job finished in 
succeeded, failed or canceled status[2][3] . When some exception happen, the 
job may be terminated in unknown status[4] .  

In this case, the job exited with unknown status , without releasing  
taskmanager pods. So is it reasonable to not release taskmanager when job 
exited in unknown status ? 

 

 

one line in original logs:
2022-07-01 09:45:40,712 [main] INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Terminating cluster 
entrypoint process KubernetesApplicationClusterEntrypoint with exit code 1445.

 

[1] 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L452]

[2] 
[https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java#L90-L91]


[3] 
[https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java#L175]

[4] 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L39]

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28499) resource leak when job failed with unknown status In Application Mode

2022-07-11 Thread lihe ma (Jira)
lihe ma created FLINK-28499:
---

 Summary: resource leak when job failed with unknown status In 
Application Mode
 Key: FLINK-28499
 URL: https://issues.apache.org/jira/browse/FLINK-28499
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.13.1
Reporter: lihe ma
 Attachments: cluster-pod-error.png

I found a job restarted for thousands of times, and jobmanager tried to create 
a new taskmanager pod every time.  The jobmanager restarted because submitted 
with duplicate  job id[1] (we preset the jobId rather than generate), but I 
hadn't save the logs unfortunately. 

this job requires one taskmanager pod in normal circumstances, but thousands of 
pods were leaked finally.
!image-2022-07-12-11-02-43-009.png|width=666,height=366!



In application mode, cluster resources will be released  when job finished in 
succeeded, failed or canceled status[2][3] . When some exception happen, the 
job may be terminated in unknown status[4] . 

In this case, the job exited with unknown status , without releasing  
taskmanager pods. So is it reasonable to not release taskmanager when job 
exited in unknown status ? 

 

 

one line in original logs:
2022-07-01 09:45:40,712 [main] INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Terminating cluster 
entrypoint process KubernetesApplicationClusterEntrypoint with exit code 1445.

 

[1] 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L452]

[2] 
[https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java#L90-L91]


[3] 
[https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/deployment/application/ApplicationDispatcherBootstrap.java#L175]

[4] 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ApplicationStatus.java#L39]

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28500) Add Transformer for Tokenizer

2022-07-11 Thread Zhipeng Zhang (Jira)
Zhipeng Zhang created FLINK-28500:
-

 Summary: Add Transformer for Tokenizer
 Key: FLINK-28500
 URL: https://issues.apache.org/jira/browse/FLINK-28500
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: Zhipeng Zhang
 Fix For: ml-2.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28501) Add Transformer and Estimator for VectorIndexer

2022-07-11 Thread Zhipeng Zhang (Jira)
Zhipeng Zhang created FLINK-28501:
-

 Summary: Add Transformer and Estimator for VectorIndexer
 Key: FLINK-28501
 URL: https://issues.apache.org/jira/browse/FLINK-28501
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: Zhipeng Zhang
 Fix For: ml-2.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28502) Add Transformer for RegexTokenizer

2022-07-11 Thread Zhipeng Zhang (Jira)
Zhipeng Zhang created FLINK-28502:
-

 Summary: Add Transformer for RegexTokenizer
 Key: FLINK-28502
 URL: https://issues.apache.org/jira/browse/FLINK-28502
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: Zhipeng Zhang
 Fix For: ml-2.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

2022-07-11 Thread Yang Wang
I share mytyas's concern if we list the jobs first and then followed by
some get-job-detail requests.
It is a bit expensive and I do not see the benefit to store jobPlan in the
CR status.

Best,
Yang


Őrhidi Mátyás  于2022年7月11日周一 21:43写道:

> Hi Daren,
>
> At the moment the Operator fetches the job state via
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
> which contains the 'end-time' and 'duration' fields already. I feel calling
> the
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
> after the previous call for every job in every reconcile loop would be too
> expensive.
>
> Best,
> Matyas
>
> On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN  >
> wrote:
>
> > Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team. I had
> > a quick chat with Gyula as I propose to include a few additional fields
> in
> > the jobStatus CRD for Flink Kubernetes Operator such as:
> >
> > - endTime
> > - duration
> > - jobPlan
> >
> > Further details of each states can be found here<
> >
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
> >.
> > Although addition of these 3 states stem from an internal requirement, I
> > think they would be beneficial to others who uses these states in their
> > application as well. The list of states above are not exhaustive, so do
> let
> > me know if there are other states that you would like to include together
> > in this iteration cycle.
> >
> > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
> >
>


Re: [DISCUSS] FLIP-250: Support Customized Kubernetes Schedulers Proposal

2022-07-11 Thread Kelu Tao
+1 for this proposal.

Volcano has show its strength in some specified cases, such as ML. And its 
schedule ability will be enhancement for kubernetes default scheduler. 

Look forward to see the integration between Volcano and Flink.

Thanks.

On 2022/07/08 00:25:45 bo zhaobo wrote:
> Thanks Yang Wang.
> 
>  jar located in the $FLINK_HOME/opt. Users who want to try the volcano
> scheduler need
> to copy it to the plugins directory.r.>
> 
> --- >>
> A: Yeah, the said separated jar pkg won't be loaded by default. But when
> users copy the jar pkg from $FLINK_HOME/opt/flink-kubernetes-*.jar to
> plugins directory,
> then the customized scheduler will be load and the functionality will
> be enabled. That would be helpful that different users want different
> functions if more and more
> schedulers will be introduced in the future.
> 
> Thanks,
> 
> BR
> 
> Bo Zhao
> 
> 
> 
> Yang Wang  于2022年7月7日周四 12:08写道:
> 
> > Thanks zhaobo for starting the discussion and preparing the FLIP.
> >
> > The customized Kubernetes Schedulers support will be very helpful for the
> > users who still hesitates to migrate the Flink workloads from YARN to
> > Kubernetes.
> > Now leveraging the ability of customized K8s scheduler, many advanced
> > scheduling features(e.g. priority scheduling, dynamic resource sharing,
> > etc.) could be
> > introduced to make the streaming/batch jobs run more smoothly in a shared
> > K8s cluster.
> >
> > Just to remind that the flink-kubernetes-volcano-*.jar will be an optional
> > jar located in the $FLINK_HOME/opt. Users who want to try the volcano
> > scheduler need
> > to copy it to the plugins directory.
> >
> >
> > Best,
> > Yang
> >
> > bo zhaobo  于2022年7月7日周四 09:16写道:
> >
> > > Hi, all.
> > >
> > > I would like to raise a discussion in Flink dev ML about Support
> > Customized
> > > Kubernetes Schedulers.
> > > Currentlly, Kubernetes becomes more and more polular for Flink Cluster
> > > deployment, and its ability is better, especially, it supports
> > customized
> > > scheduling.
> > > Essentially, in high-performance workloads, we need to apply new
> > scheduling
> > > policies for meeting the new requirements. And now Flink native
> > Kubernetes
> > > solution is using Kubernetes default scheduler to work with all
> > scenarios,
> > > the default scheduling policy might be difficult to apply in some extreme
> > > cases, so
> > > we need to improve the Flink Kubernetes for coupling those Kubernetes
> > > customized schedulers with Flink native Kubernetes, provides a way for
> > > Flink
> > > administrators or users to use/apply their Flink Clusters on Kubernetes
> > > more flexibility.
> > >
> > > The proposal will introduce the customized K8S schdulers plugin mechanism
> > > and a reference implementation 'Volcano' in Flink. More details see [1].
> > >
> > > Looking forward to your feedback.
> > >
> > > [1]
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-250%3A+Support+Customized+Kubernetes+Schedulers+Proposal
> > >
> > > Thanks,
> > > BR
> > >
> >
> 


Re: [VOTE] Creating benchmark channel in Apache Flink slack

2022-07-11 Thread Jing Zhang
+ 1 (binding),
Thanks for driving this.

Best,
Jing Zhang

Yun Tang  于2022年7月12日周二 11:19写道:

> + 1 (binding, I think this voting acts as voting a FLIP).
>
>
>
> Best,
> Yun Tang
> 
> From: Alexander Fedulov 
> Sent: Tuesday, July 12, 2022 5:27
> To: dev 
> Subject: Re: [VOTE] Creating benchmark channel in Apache Flink slack
>
> +1. Thanks, Anton.
>
> Best,
> Alexander Fedulov
>
>
>
> On Mon, Jul 11, 2022 at 3:17 PM Márton Balassi 
> wrote:
>
> > +1 (binding). Thanks!
> >
> > I can help you with the Slack admin steps if needed.
> >
> > On Mon, Jul 11, 2022 at 10:55 AM godfrey he  wrote:
> >
> > > +1, Thanks for driving this!
> > >
> > > Best,
> > > Godfrey
> > >
> > > Yuan Mei  于2022年7月11日周一 16:13写道:
> > > >
> > > > +1 (binding) & thanks for the efforts!
> > > >
> > > > Best
> > > > Yuan
> > > >
> > > >
> > > >
> > > > On Mon, Jul 11, 2022 at 2:08 PM Yun Gao  >
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Thanks Anton for driving this!
> > > > >
> > > > >
> > > > > Best,
> > > > > Yun Gao
> > > > >
> > > > >
> > > > > --
> > > > > From:Anton Kalashnikov 
> > > > > Send Time:2022 Jul. 8 (Fri.) 22:59
> > > > > To:undefined 
> > > > > Subject:[VOTE] Creating benchmark channel in Apache Flink slack
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > >
> > > > > I would like to start a vote for creating the new channel in Apache
> > > > > Flink slack for sending benchamrk's result to it. This should help
> > the
> > > > > community to notice the performance degradation on time.
> > > > >
> > > > > The discussion of this idea can be found here[1]. The ticket for
> this
> > > is
> > > > > here[2].
> > > > >
> > > > >
> > > > > [1]
> https://www.mail-archive.com/dev@flink.apache.org/msg58666.html
> > > > >
> > > > > [2] https://issues.apache.org/jira/browse/FLINK-28468
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Anton Kalashnikov
> > >
> >
>


Re: [DISCUSS] FLIP-248: Introduce dynamic partition pruning

2022-07-11 Thread Jark Wu
I agree with Jingsong. DPP is a particular case of Dynamic Filter Pushdown
that the join key contains partition fields.  Extending this FLIP to
general filter
pushdown can benefit more optimizations, and they can share the same
interface.

For example, Trino Hive Connector leverages dynamic filtering to support:
- dynamic partition pruning for partitioned tables
- and dynamic bucket pruning for bucket tables
- and dynamic filter pushed into the ORC and Parquet readers to perform
stripe
  or row-group pruning and save on disk I/O.

Therefore, +1 to extend this FLIP to Dynamic Filter Pushdown (or Dynamic
Filtering),
just like Trino [1].  The interfaces should also be adapted for that.

Besides, maybe the FLIP should also demonstrate the EXPLAIN result, which
is also an API.

Best,
Jark

[1]: https://trino.io/docs/current/admin/dynamic-filtering.html










On Tue, 12 Jul 2022 at 09:59, Jingsong Li  wrote:

> Thanks Godfrey for driving.
>
> I like this FLIP.
>
> We can restrict this capability to more than just partitions.
> Here are some inputs from Lake Storage.
>
> The format of the splits generated by Lake Storage is roughly as follows:
> Split {
>Path filePath;
>Statistics[] fieldStats;
> }
>
> Stats contain the min and max of each column.
>
> If the storage is sorted by a column, this means that the split
> filtering on that column will be very good, so not only the partition
> field, but also this column is worthy of being pushed down the
> RuntimeFilter.
> This information can only be known by source, so I suggest that source
> return which fields are worthy of being pushed down.
>
> My overall point is:
> This FLIP can be extended to support Source Runtime Filter push-down
> for all fields, not just dynamic partition pruning.
>
> What do you think?
>
> Best,
> Jingsong
>
> On Fri, Jul 8, 2022 at 10:12 PM godfrey he  wrote:
> >
> > Hi all,
> >
> > I would like to open a discussion on FLIP-248: Introduce dynamic
> > partition pruning.
> >
> >  Currently, Flink supports static partition pruning: the conditions in
> > the WHERE clause are analyzed
> > to determine in advance which partitions can be safely skipped in the
> > optimization phase.
> > Another common scenario: the partitions information is not available
> > in the optimization phase but in the execution phase.
> > That's the problem this FLIP is trying to solve: dynamic partition
> > pruning, which could reduce the partition table source IO.
> >
> > The query pattern looks like:
> > select * from store_returns, date_dim where sr_returned_date_sk =
> > d_date_sk and d_year = 2000
> >
> > We will introduce a mechanism for detecting dynamic partition pruning
> > patterns in optimization phase
> > and performing partition pruning at runtime by sending the dimension
> > table results to the SplitEnumerator
> > of fact table via existing coordinator mechanism.
> >
> > You can find more details in FLIP-248 document[1].
> > Looking forward to your any feedback.
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-248%3A+Introduce+dynamic+partition+pruning
> > [2] POC: https://github.com/godfreyhe/flink/tree/FLIP-248
> >
> >
> > Best,
> > Godfrey
>


A common question on how to release resource in Flink job

2022-07-11 Thread Bruce Zu
Hi Fink team,
Excuse me, I have a common question about how to release the resource held
by some object in Flink Job.

Thank you in advance!. I am a new user of Flink. I search a lot but did not
find related documents. But I am sure there is a standard way to resolve it.

In our case, we need access Elasticsearch server. And we use a class
EsClient extends from `org.elasticsearch.client.RestHighLevelClient` to
does query work,
which requires calling its `close` method once it is not used anymore to
release resources.

So I need to find the right place to do that to make sure the resource
always can be released, even if some exceptions happen somewhere in the
invoked
method of the main class or other class.

What I can come up with now is using a `ThreadLocal` to keep the EsClient
object at the start of the main class’s main method and
release the resource at the end of the main method.

The pseudocode (Java)  like this:
```java
public class EsClientHolder {
  private static final ThreadLocal local = new
InheritableThreadLocal<>();

  public static final void createAndSetEsClient(EsClient esClient) {
local.set(esClient);
  }

  public static final void createAndSetEsClientBy(EsClientConfig
esClientConfig) {
EsClient instance = new EsClient();
local.set(instance);
  }

  public static final EsClient get() {
EsClient c = local.get();
if (c == null) {
  throw new RuntimeException("Make sure to create and set the EsClient
instance before use it");
}
return c;
  }

  public static final void close() throws IOException {
EsClient o = local.get();
if (o != null) {
  o.close();
}
  }

// usage in Fink application code
  public class MainClass {
public static void main(String[] args) throws IOException {
  try {
Properties prop = null;
EsClientConfig config = getEsClientConfig(prop);
EsClientHolder.createAndSetEsClientBy(config);
   // …
   SomeClass.method1();
   OtherClass.method2();
   // ...
  } finally {
EsClientHolder.close();
  }
}
  }

class SomeClass{
   public void.  method1(){
// 1. use the EsClient in any invoked method of any other class:
EsClient esClient = EsClientHolder.get();
   // …
   }
}
class OtherClass{
  public void method2(){
  // 2. use the EsClient in any invoked method of any forked child
thread
new Thread(
() -> {
  EsClient client = EsClientHolder.get();
  // …
})
.start();
 // …
  }
}

```

I understand that TaskManager is a Java JVM process and Task is executed by
a java Thread.

But I do not know how Flink creates a job graph and eventually how Flink
allocates the tasks to threads and the relationship within these threads.

For example, if Flink cut the method1 of  SomeClass and method2 of
OtherClass  into another thread, not the same as the thread running the
MainClass
Then the thread running method1 and mehod2 has no way to get the EsClient.
Here I assume the main method in the MainClass will be executed in one
thread. If it is not, the set() and close() are split to run in
different threads, then there is
no way to release the resource.

Thank you!


Re: [DISCUSS] FLIP-250: Support Customized Kubernetes Schedulers Proposal

2022-07-11 Thread Shqiprim Bunjaku
+1 (non binding)

On Tue, Jul 12, 2022 at 6:50 AM Kelu Tao  wrote:

> +1 for this proposal.
>
> Volcano has show its strength in some specified cases, such as ML. And its
> schedule ability will be enhancement for kubernetes default scheduler.
>
> Look forward to see the integration between Volcano and Flink.
>
> Thanks.
>
> On 2022/07/08 00:25:45 bo zhaobo wrote:
> > Thanks Yang Wang.
> >
> >  optional
> > jar located in the $FLINK_HOME/opt. Users who want to try the volcano
> > scheduler need
> > to copy it to the plugins directory.r.>
> >
> > --- >>
> > A: Yeah, the said separated jar pkg won't be loaded by default. But when
> > users copy the jar pkg from $FLINK_HOME/opt/flink-kubernetes-*.jar to
> > plugins directory,
> > then the customized scheduler will be load and the functionality will
> > be enabled. That would be helpful that different users want different
> > functions if more and more
> > schedulers will be introduced in the future.
> >
> > Thanks,
> >
> > BR
> >
> > Bo Zhao
> >
> >
> >
> > Yang Wang  于2022年7月7日周四 12:08写道:
> >
> > > Thanks zhaobo for starting the discussion and preparing the FLIP.
> > >
> > > The customized Kubernetes Schedulers support will be very helpful for
> the
> > > users who still hesitates to migrate the Flink workloads from YARN to
> > > Kubernetes.
> > > Now leveraging the ability of customized K8s scheduler, many advanced
> > > scheduling features(e.g. priority scheduling, dynamic resource sharing,
> > > etc.) could be
> > > introduced to make the streaming/batch jobs run more smoothly in a
> shared
> > > K8s cluster.
> > >
> > > Just to remind that the flink-kubernetes-volcano-*.jar will be an
> optional
> > > jar located in the $FLINK_HOME/opt. Users who want to try the volcano
> > > scheduler need
> > > to copy it to the plugins directory.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > bo zhaobo  于2022年7月7日周四 09:16写道:
> > >
> > > > Hi, all.
> > > >
> > > > I would like to raise a discussion in Flink dev ML about Support
> > > Customized
> > > > Kubernetes Schedulers.
> > > > Currentlly, Kubernetes becomes more and more polular for Flink
> Cluster
> > > > deployment, and its ability is better, especially, it supports
> > > customized
> > > > scheduling.
> > > > Essentially, in high-performance workloads, we need to apply new
> > > scheduling
> > > > policies for meeting the new requirements. And now Flink native
> > > Kubernetes
> > > > solution is using Kubernetes default scheduler to work with all
> > > scenarios,
> > > > the default scheduling policy might be difficult to apply in some
> extreme
> > > > cases, so
> > > > we need to improve the Flink Kubernetes for coupling those Kubernetes
> > > > customized schedulers with Flink native Kubernetes, provides a way
> for
> > > > Flink
> > > > administrators or users to use/apply their Flink Clusters on
> Kubernetes
> > > > more flexibility.
> > > >
> > > > The proposal will introduce the customized K8S schdulers plugin
> mechanism
> > > > and a reference implementation 'Volcano' in Flink. More details see
> [1].
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-250%3A+Support+Customized+Kubernetes+Schedulers+Proposal
> > > >
> > > > Thanks,
> > > > BR
> > > >
> > >
> >
>


[VOTE] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-11 Thread Gen Luo
Hi everyone,


Thanks for all the feedback so far. Based on the discussion [1], we seem to
have consensus. So, I would like to start a vote on FLIP-249 [2].


The vote will last for at least 72 hours unless there is an objection or
insufficient votes.


[1] https://lists.apache.org/thread/832tk3zvysg45vrqrv5tgbdqx974pm3m
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-249%3A+Flink+Web+UI+Enhancement+for+Speculative+Execution


[jira] [Created] (FLINK-28503) Fix invalid link in Python FAQ Document

2022-07-11 Thread Biao Geng (Jira)
Biao Geng created FLINK-28503:
-

 Summary: Fix invalid link in Python FAQ Document
 Key: FLINK-28503
 URL: https://issues.apache.org/jira/browse/FLINK-28503
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Project Website
Reporter: Biao Geng
 Attachments: image-2022-07-12-14-51-01-434.png

[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/faq/#preparing-python-virtual-environment]

The script for setting pyflink virtual environment is invalid now. The 
candidate is 
[https://nightlies.apache.org/flink/flink-docs-release-1.12/downloads/setup-pyflink-virtual-env.sh]
 or we can add this short script in the doc website directly.

!image-2022-07-12-14-51-01-434.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)