date:20240506

Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-06 Thread Muhammet Orazov

Morning João,

Recently we had a case where the JDBC drivers authentication was 
different than username&password authentication. For it to work, certain 
hacks required, there interface would have been helpful.

But I agree maybe the interface module separation is not required at the 
moment.

Thanks for your efforts!

Best,
Muhammet

On 2024-05-03 12:25, João Boto wrote:

Hi Muhammet,

While I generally agree, given our current usage, I'm struggling to 
discern any clear advantage. We already have abstract implementations 
that cover all necessary interfaces and offer essential functionality, 
complemented by a robust set of reusable tests to streamline 
implementation.

With this established infrastructure in place, coupled with the added 
import overhead of introducing another module, I find it difficult to 
identify any distinct benefits at this point.

Best

On 2024/04/26 02:18:52 Muhammet Orazov wrote:

Hey João,

Thanks for FLIP proposal!

Since proposal is to introduce modules, would it make sense
to have another module for APIs (flink-jdbc-connector-api)?

For this I would suggest to move all public interfaces (e.g,
JdbcRowConverter, JdbcConnectionProvider). And even convert
some classes into interface with their default implementations,
for example, JdbcSink, JdbcConnectionOptions.

This way users would have clear interfaces to build their own
JDBC based Flink connectors.

Here I am not suggesting to introduce new interfaces, only
suggest also to separate the API from the core implementation.

What do you think?

Best,
Muhammet

On 2024-04-25 08:54, Joao Boto wrote:
> Hi all,
>
> I'd like to start a discussion on FLIP-449: Reorganization of
> flink-connector-jdbc [1].
> As Flink continues to evolve, we've noticed an increasing level of
> complexity within the JDBC connector.
> The proposed solution is to address this complexity by separating the
> core
> functionality from individual database components, thereby streamlining
> the
> structure into distinct modules.
>
> Looking forward to your feedback and suggestions, thanks.
> Best regards,
> Joao Boto
>
> [1]
> 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread Stefan Richter

+1 (binding)

Thanks for updating to a new version!

Best,
Stefan


> On 6. May 2024, at 08:36, Hangxiang Yu  wrote:
> 
> +1（binding）
> 
> On Mon, May 6, 2024 at 12:25 PM Yuan Mei  > wrote:
> 
>> +1（binding）
>> 
>> Best
>> Yuan
>> 
>> On Mon, May 6, 2024 at 11:28 AM Rui Fan <1996fan...@gmail.com> wrote:
>> 
>>> +1 (binding)
>>> 
>>> Best,
>>> Rui
>>> 
>>> On Mon, May 6, 2024 at 11:01 AM Yanfei Lei  wrote:
>>> 
 +1 (binding)
 
 Best,
 Yanfei
 
 Zakelly Lan  于2024年5月6日周一 11:00写道：
> 
> +1 (binding)
> 
> Thanks for driving this!
> 
> 
> Best,
> Zakelly
> 
> On Mon, May 6, 2024 at 10:54 AM yue ma  wrote:
> 
>> Hi everyone,
>> 
>> Thanks for all the feedback, I'd like to start a vote on the
>>> FLIP-447:
>> Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread
>> is
 here
>> [2].
>> 
>> The vote will be open for at least 72 hours unless there is an
 objection or
>> insufficient votes.
>> 
>> [1]
>> 
>> 
 
>>> 
>> https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%253A%2BUpgrade%2BFRocksDB%2Bfrom%2B6.20.3%2B%2Bto%2B8.10.0&source=gmail-imap&ust=171558224200&usg=AOvVaw1DFHTHrb4bjg14MVzV3g6T
>> [2]
>> https://www.google.com/url?q=https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw&source=gmail-imap&ust=171558224200&usg=AOvVaw18pfJ4Jd8h8nQEso2ecfGD
>> 
>> --
>> Best,
>> Yue
>> 
 
>>> 
>> 
> 
> 
> -- 
> Best,
> Hangxiang.

Re: Question around Flink's AdaptiveBatchScheduler

2024-05-06 Thread Junrui Lee

Hi,
Thanks for the reminder. I will review it soon during my free time.

Venkatakrishnan Sowrirajan  于2024年5月4日周六 10:10写道：

> Jinrui and Xia
>
> Gentle ping for reviews.
>
> On Mon, Apr 29, 2024, 8:28 PM Venkatakrishnan Sowrirajan  >
> wrote:
>
> > Hi Xia and Jinrui,
> >
> > Filed https://github.com/apache/flink/pull/24736 to address the above
> > described issue. Please take a look whenever you can.
> >
> > Thanks
> > Venkat
> >
> >
> > On Thu, Apr 18, 2024 at 12:16 PM Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu> wrote:
> >
> >> Filed https://issues.apache.org/jira/browse/FLINK-35165 to address the
> >> above described issue. Will share the PR here once it is ready for
> review.
> >>
> >> Regards
> >> Venkata krishnan
> >>
> >>
> >> On Wed, Apr 17, 2024 at 5:32 AM Junrui Lee  wrote:
> >>
> >>> Thanks Venkata and Xia for providing further clarification. I think
> your
> >>> example illustrates the significance of this proposal very well. Please
> >>> feel free go ahead and address the concerns.
> >>>
> >>> Best,
> >>> Junrui
> >>>
> >>> Venkatakrishnan Sowrirajan  于2024年4月16日周二 07:01写道：
> >>>
> >>> > Thanks for adding your thoughts to this discussion.
> >>> >
> >>> > If we all agree that the source vertex parallelism shouldn't be bound
> >>> by
> >>> > the downstream max parallelism
> >>> > (jobmanager.adaptive-batch-scheduler.max-parallelism)
> >>> > based on the rationale and the issues described above, I can take a
> >>> stab at
> >>> > addressing the issue.
> >>> >
> >>> > Let me file a ticket to track this issue. Otherwise, I'm looking
> >>> forward to
> >>> > hearing more thoughts from others as well, especially Lijie and
> Junrui
> >>> who
> >>> > have more context on the AdaptiveBatchScheduler.
> >>> >
> >>> > Regards
> >>> > Venkata krishnan
> >>> >
> >>> >
> >>> > On Mon, Apr 15, 2024 at 12:54 AM Xia Sun 
> wrote:
> >>> >
> >>> > > Hi Venkat,
> >>> > > I agree that the parallelism of source vertex should not be upper
> >>> bounded
> >>> > > by the job's global max parallelism. The case you mentioned, >>
> High
> >>> > filter
> >>> > > selectivity with huge amounts of data to read  excellently supports
> >>> this
> >>> > > viewpoint. (In fact, in the current implementation, if the source
> >>> > > parallelism is pre-specified at job create stage, rather than
> >>> relying on
> >>> > > the dynamic parallelism inference of the AdaptiveBatchScheduler,
> the
> >>> > source
> >>> > > vertex's parallelism can indeed exceed the job's global max
> >>> parallelism.)
> >>> > >
> >>> > > As Lijie and Junrui pointed out, the key issue is "semantic
> >>> consistency."
> >>> > > Currently, if a vertex has not set maxParallelism, the
> >>> > > AdaptiveBatchScheduler will use
> >>> > > `execution.batch.adaptive.auto-parallelism.max-parallelism` as the
> >>> > vertex's
> >>> > > maxParallelism. Since the current implementation does not
> distinguish
> >>> > > between source vertices and downstream vertices, source vertices
> are
> >>> also
> >>> > > subject to this limitation.
> >>> > >
> >>> > > Therefore, I believe that if the issue of "semantic consistency"
> can
> >>> be
> >>> > > well explained in the code and configuration documentation, the
> >>> > > AdaptiveBatchScheduler should support that the parallelism of
> source
> >>> > > vertices can exceed the job's global max parallelism.
> >>> > >
> >>> > > Best,
> >>> > > Xia
> >>> > >
> >>> > > Venkatakrishnan Sowrirajan  于2024年4月14日周日
> 10:31写道：
> >>> > >
> >>> > > > Let me state why I think "*jobmanager.adaptive-batch-sche*
> >>> > > > *duler.default-source-parallelism*" should not be bound by the "
> >>> > > > *jobmanager.adaptive-batch-sche**duler.max-parallelism*".
> >>> > > >
> >>> > > >- Source vertex is unique and does not have any upstream
> >>> vertices
> >>> > > >- Downstream vertices read shuffled data partitioned by key,
> >>> which
> >>> > is
> >>> > > >not the case for the Source vertex
> >>> > > >- Limiting source parallelism by downstream vertices' max
> >>> > parallelism
> >>> > > is
> >>> > > >incorrect
> >>> > > >
> >>> > > > If we say for ""semantic consistency" the source vertex
> >>> parallelism has
> >>> > > to
> >>> > > > be bound by the overall job's max parallelism, it can lead to
> >>> following
> >>> > > > issues:
> >>> > > >
> >>> > > >- High filter selectivity with huge amounts of data to read -
> >>> > setting
> >>> > > >high "*jobmanager.adaptive-batch-scheduler.max-parallelism*"
> so
> >>> that
> >>> > > >source parallelism can be set higher can lead to small blocks
> >>> and
> >>> > > >sub-optimal performance.
> >>> > > >- Setting high
> >>> > "*jobmanager.adaptive-batch-scheduler.max-parallelism*"
> >>> > > >requires careful tuning of network buffer configurations which
> >>> is
> >>> > > >unnecessary in cases where it is not required just so that the
> >>> > source
> >>> > > >parallelism can be set high.
> >>> > > >
> >>> > > > Regards
> >>> > > > Venkata krishnan
>

Re: [DISCUSS] FLIP-451: Refactor Async sink API

2024-05-06 Thread Muhammet Orazov


Hey Ahmed,

Thanks for the FLIP! +1 (non-binding)


Additionally the current interface for passing fatal exceptions and
retrying records relies on java consumers which makes it harder to 
understand.


Could you please add more here why it is harder? Would the 
`completeExceptionally`

method be related to it? Maybe you can add usage example for it also.


we should proceed by adding support in all supporting connector repos.


Should we add list of possible connectors that this FLIP would improve?

Best,
Muhammet


On 2024-04-29 14:08, Ahmed Hamdy wrote:

Hi all,
I would like to start a discussion on FLIP-451[1]
The proposal comes on encountering a couple of issues while working 
with

implementers for Async Sink.
The FLIP mainly proposes a new API similar to AsyncFunction and
ResultFuture as well as introducing timeout handling for AsyncSink 
requests.

The FLIP targets 1.20 with backward compatible changes and we should
proceed by adding support in all supporting connector repos.

1-
https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Refactor+Async+Sink+API
Best Regards
Ahmed Hamdy

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread Muhammet Orazov


Thanks for your efforts! +1 (non-binding)

Best,
Muhammet

On 2024-05-06 02:53, yue ma wrote:

Hi everyone,

Thanks for all the feedback, I'd like to start a vote on the FLIP-447:
Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is 
here

[2].

The vote will be open for at least 72 hours unless there is an 
objection or

insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
[2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw

[jira] [Created] (FLINK-35292) Set dummy savepoint path during last-state upgrade

2024-05-06 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-35292:
--

 Summary: Set dummy savepoint path during last-state upgrade
 Key: FLINK-35292
 URL: https://issues.apache.org/jira/browse/FLINK-35292
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Gyula Fora
Assignee: Gyula Fora


Currently the operator always sets the savepoint path even if last-state (HA 
metadata) must be used. 

This can be misleading to users as the set savepoint path normally should never 
take effect and can actually lead to incorrect state restored if the HA 
metadata is deleted by the user at the wrong moment. 

To avoid this we can set an explicit dummy savepoint path which will prevent 
restoring from it accidentally. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35293) FLIP-445: Support dynamic parallelism inference for HiveSource

2024-05-06 Thread xingbe (Jira)

xingbe created FLINK-35293:
--

 Summary: FLIP-445: Support dynamic parallelism inference for 
HiveSource
 Key: FLINK-35293
 URL: https://issues.apache.org/jira/browse/FLINK-35293
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Affects Versions: 1.20.0
Reporter: xingbe


[FLIP-379|https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs]
 introduces dynamic source parallelism inference, which, compared to static 
inference, utilizes runtime information to more accurately determine the source 
parallelism. The FileSource already possesses the capability for dynamic 
parallelism inference. As a follow-up task to FLIP-379, this FLIP plans to 
implement the dynamic parallelism inference interface for HiveSource, and also 
switches the default static parallelism inference to dynamic parallelism 
inference.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread Yun Tang

+1 (binding)

Best
Yun Tang

From: Muhammet Orazov 
Sent: Monday, May 6, 2024 15:50
To: dev@flink.apache.org 
Cc: yue ma 
Subject: Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

Thanks for your efforts! +1 (non-binding)

Best,
Muhammet

On 2024-05-06 02:53, yue ma wrote:
> Hi everyone,
>
> Thanks for all the feedback, I'd like to start a vote on the FLIP-447:
> Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is
> here
> [2].
>
> The vote will be open for at least 72 hours unless there is an
> objection or
> insufficient votes.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> [2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread Gabor Somogyi

+1 (binding)

This will be a good addition.

G


On Mon, May 6, 2024 at 4:54 AM yue ma  wrote:

> Hi everyone,
>
> Thanks for all the feedback, I'd like to start a vote on the FLIP-447:
> Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is here
> [2].
>
> The vote will be open for at least 72 hours unless there is an objection or
> insufficient votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> [2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw
>
> --
> Best,
> Yue
>

[jira] [Created] (FLINK-35294) Use source config to check if the filter should be applied in timestamp starting mode

2024-05-06 Thread Xiao Huang (Jira)

Xiao Huang created FLINK-35294:
--

 Summary: Use source config to check if the filter should be 
applied in timestamp starting mode
 Key: FLINK-35294
 URL: https://issues.apache.org/jira/browse/FLINK-35294
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Xiao Huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-453: Promote Unified Sink API V2 to Public and Deprecate SinkFunction

2024-05-06 Thread Leonard Xu

+1 from my side, thanks Martijn for the effort.

Best,
Leonard

> 2024年5月4日 下午7:41，Ahmed Hamdy  写道：
> 
> Hi Martijn
> Thanks for the proposal +1 from me.
> Should this change take place in 1.20, what are the planned release steps
> for connectors that only offer a deprecated interface in this case (i.e.
> RabbitMQ, Cassandra, pusbub, Hbase)? Are we going to refrain from releases
> that support 1.20+ till the blockers are implemented?
> Best Regards
> Ahmed Hamdy
> 
> 
> On Fri, 3 May 2024 at 14:32, Péter Váry  wrote:
> 
>>> With regards to FLINK-35149, the fix version indicates a change at Flink
>> CDC; is that indeed correct, or does it require a change in the SinkV2
>> interface?
>> 
>> The fix doesn't need change in SinkV2, so we are good there.
>> The issue is that the new SinkV2 SupportsCommitter/SupportsPreWriteTopology
>> doesn't work with the CDC yet.
>> 
>> Martijn Visser  ezt írta (időpont: 2024. máj.
>> 3.,
>> P, 14:06):
>> 
>>> Hi Ferenc,
>>> 
>>> You're right, 1.20 it is :)
>>> 
>>> I've assigned the HBase one to you!
>>> 
>>> Thanks,
>>> 
>>> Martijn
>>> 
>>> On Fri, May 3, 2024 at 1:55 PM Ferenc Csaky 
>>> wrote:
>>> 
 Hi Martijn,
 
 +1 for the proposal.
 
> targeted for Flink 1.19
 
 I guess you meant Flink 1.20 here.
 
 Also, I volunteer to take updating the HBase sink, feel free to assign
 that task to me.
 
 Best,
 Ferenc
 
 
 
 
 On Friday, May 3rd, 2024 at 10:20, Martijn Visser <
 martijnvis...@apache.org> wrote:
 
> 
> 
> Hi Peter,
> 
> I'll add it for completeness, thanks!
> With regards to FLINK-35149, the fix version indicates a change at
>>> Flink
> CDC; is that indeed correct, or does it require a change in the
>> SinkV2
> interface?
> 
> Best regards,
> 
> Martijn
> 
> 
> On Fri, May 3, 2024 at 7:47 AM Péter Váry
>> peter.vary.apa...@gmail.com
> 
> wrote:
> 
>> Hi Martijn,
>> 
>> We might want to add FLIP-371 [1] to the list. (Or we aim only for
 higher
>> level FLIPs?)
>> 
>> We are in the process of using the new API in Iceberg connector
>> [2] -
 so
>> far, so good.
>> 
>> I know of one minor known issue about the sink [3], which should be
 ready
>> for the release.
>> 
>> All-in-all, I think we are in good shape, and we could move forward
 with
>> the promotion.
>> 
>> Thanks,
>> Peter
>> 
>> [1] -
>> 
>> 
 
>>> 
>> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=263430387
>> [2] - https://github.com/apache/iceberg/pull/10179
>> [3] - https://issues.apache.org/jira/browse/FLINK-35149
>> 
>> On Thu, May 2, 2024, 09:47 Muhammet Orazov
 mor+fl...@morazow.com.invalid
>> wrote:
>> 
>>> Got it, thanks!
>>> 
>>> On 2024-05-02 06:53, Martijn Visser wrote:
>>> 
 Hi Muhammet,
 
 Thanks for joining the discussion! The changes in this FLIP
>> would
 be
 targeted for Flink 1.19, since it's only a matter of changing
>> the
 annotation.
 
 Best regards,
 
 Martijn
 
 On Thu, May 2, 2024 at 7:26 AM Muhammet Orazov
 mor+fl...@morazow.com
 wrote:
 
> Hello Martijn,
> 
> Thanks for the FLIP and detailed history of changes, +1.
> 
> Would FLIP changes target for 2.0? I think it would be good
> to have clear APIs on 2.0 release.
> 
> Best,
> Muhammet
> 
> On 2024-05-01 15:30, Martijn Visser wrote:
> 
>> Hi everyone,
>> 
>> I would like to start a discussion on FLIP-453: Promote
 Unified Sink
>> API V2
>> to Public and Deprecate SinkFunction
>> https://cwiki.apache.org/confluence/x/rIobEg
>> 
>> This FLIP proposes to promote the Unified Sink API V2 from
>> PublicEvolving
>> to Public and to mark the SinkFunction as Deprecated.
>> 
>> I'm looking forward to your thoughts.
>> 
>> Best regards,
>> 
>> Martijn
 
>>> 
>>

[jira] [Created] (FLINK-35295) Improve jdbc connection pool initialization failure message

2024-05-06 Thread Xiao Huang (Jira)

Xiao Huang created FLINK-35295:
--

 Summary: Improve jdbc connection pool initialization failure 
message
 Key: FLINK-35295
 URL: https://issues.apache.org/jira/browse/FLINK-35295
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Xiao Huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35296) Flink mysql-cdc connector stops reading data

2024-05-06 Thread Gang Yang (Jira)

Gang Yang created FLINK-35296:
-

 Summary: Flink mysql-cdc connector stops reading data
 Key: FLINK-35296
 URL: https://issues.apache.org/jira/browse/FLINK-35296
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Affects Versions: 3.1.0
Reporter: Gang Yang
 Fix For: 1.18.1
 Attachments: image-2024-05-06-17-42-19-059.png

*Background:*
Consume sub-database and sub-table data through regular expressions, 
scan.startup.mode=initial

*Phenomenon:*
1. The task occurs during the snapshot data synchronization phase;
2. After the task runs normally for a period of time, no more data will be 
read. In fact, there is still a lot of data in the upstream Mysql table;
3. When the task is restarted from the state, it will read normally for a 
period of time and then stop reading.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread Roc Marshal

+1 (non-binding)

Best,
Regards.

Yuepeng Pan

 Replied Message 
| From | Rui Fan<1996fan...@gmail.com> |
| Date | 05/06/2024 11:28 |
| To | dev |
| Subject | Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0 |
+1 (binding)

Best,
Rui

On Mon, May 6, 2024 at 11:01 AM Yanfei Lei  wrote:

> +1 (binding)
>
> Best,
> Yanfei
>
> Zakelly Lan  于2024年5月6日周一 11:00写道：
> >
> > +1 (binding)
> >
> > Thanks for driving this!
> >
> >
> > Best,
> > Zakelly
> >
> > On Mon, May 6, 2024 at 10:54 AM yue ma  wrote:
> >
> > > Hi everyone,
> > >
> > > Thanks for all the feedback, I'd like to start a vote on the FLIP-447:
> > > Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is
> here
> > > [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection or
> > > insufficient votes.
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> > > [2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw
> > >
> > > --
> > > Best,
> > > Yue
> > >
>

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread gongzhongqiang

+1 (non-binding)

Best,
Zhongqiang Gong

yue ma  于2024年5月6日周一 10:54写道：

> Hi everyone,
>
> Thanks for all the feedback, I'd like to start a vote on the FLIP-447:
> Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is here
> [2].
>
> The vote will be open for at least 72 hours unless there is an objection or
> insufficient votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> [2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw
>
> --
> Best,
> Yue
>

Re: [DISCUSS] FLIP-443: Interruptible watermark processing

2024-05-06 Thread Piotr Nowojski

Hi Zakelly,

Can you elaborate a bit more on what you have in mind? How marking mails as
interruptible helps with something? If an incoming async state access
response comes, it could just request to interrupt any currently ongoing
computations, regardless the currently executed mail is or is not
interruptible.

Best,
Piotrek

pon., 6 maj 2024 o 06:33 Zakelly Lan  napisał(a):

> Hi Piotr,
>
> Thanks for the improvement, overall +1 for this. I'd leave a minor comment:
>
> 1. I'd suggest also providing `isInterruptable()` in `Mail`, and the
> continuation mail will return true. The FLIP-425 will leverage this queue
> to execute some state requests, and when the cp arrives, the operator may
> call `yield()` to drain. It may happen that the continuation mail is called
> again in `yield()`. By checking `isInterruptable()`, we can skip this mail
> and re-enqueue.
>
>
> Best,
> Zakelly
>
> On Wed, May 1, 2024 at 4:35 PM Yanfei Lei  wrote:
>
> > Thanks for your answers, Piotrek. I got it now.  +1 for this improvement.
> >
> > Best,
> > Yanfei
> >
> > Stefan Richter  于2024年4月30日周二 21:30写道：
> > >
> > >
> > > Thanks for the improvement proposal, I’m +1 for the change!
> > >
> > > Best,
> > > Stefan
> > >
> > >
> > >
> > > > On 30. Apr 2024, at 15:23, Roman Khachatryan 
> wrote:
> > > >
> > > > Thanks for the proposal, I definitely see the need for this
> > improvement, +1.
> > > >
> > > > Regards,
> > > > Roman
> > > >
> > > >
> > > > On Tue, Apr 30, 2024 at 3:11 PM Piotr Nowojski  > > wrote:
> > > >
> > > >> Hi Yanfei,
> > > >>
> > > >> Thanks for the feedback!
> > > >>
> > > >>> 1. Currently when AbstractStreamOperator or
> AbstractStreamOperatorV2
> > > >>> processes a watermark, the watermark will be sent to downstream, if
> > > >>> the `InternalTimerServiceImpl#advanceWatermark` is interrupted,
> when
> > > >>> is the watermark sent downstream?
> > > >>
> > > >> The watermark would be outputted by an operator only once all
> relevant
> > > >> timers are fired.
> > > >> In other words, if firing of timers is interrupted a continuation
> > mail to
> > > >> continue firing those
> > > >> interrupted timers is created. Watermark will be emitted downstream
> > at the
> > > >> end of that
> > > >> continuation mail.
> > > >>
> > > >>> 2. IIUC, processing-timer's firing is also encapsulated into mail
> and
> > > >>> executed in mailbox. Is processing-timer allowed to be interrupted?
> > > >>
> > > >> Yes, both firing processing and even time timers share the same code
> > and
> > > >> both will
> > > >> support interruptions in the same way. Actually I've renamed the
> FLIP
> > from
> > > >>
> > > >>> Interruptible watermarks processing
> > > >>
> > > >> to:
> > > >>
> > > >>> Interruptible timers firing
> > > >>
> > > >> to make this more clear.
> > > >>
> > > >> Best,
> > > >> Piotrek
> > > >>
> > > >> wt., 30 kwi 2024 o 06:08 Yanfei Lei 
> napisał(a):
> > > >>
> > > >>> Hi Piotrek,
> > > >>>
> > > >>> Thanks for this proposal. It looks like it will shorten the
> > checkpoint
> > > >>> duration, especially in the case of back pressure. +1 for it!  I'd
> > > >>> like to ask some questions to understand your thoughts more
> > precisely.
> > > >>>
> > > >>> 1. Currently when AbstractStreamOperator or
> AbstractStreamOperatorV2
> > > >>> processes a watermark, the watermark will be sent to downstream, if
> > > >>> the `InternalTimerServiceImpl#advanceWatermark` is interrupted,
> when
> > > >>> is the watermark sent downstream?
> > > >>> 2. IIUC, processing-timer's firing is also encapsulated into mail
> and
> > > >>> executed in mailbox. Is processing-timer allowed to be interrupted?
> > > >>>
> > > >>> Best regards,
> > > >>> Yanfei
> > > >>>
> > > >>> Piotr Nowojski  于2024年4月29日周一 21:57写道：
> > > >>>
> > > 
> > >  Hi all,
> > > 
> > >  I would like to start a discussion on FLIP-443: Interruptible
> > watermark
> > >  processing.
> > > 
> > > 
> >
> https://www.google.com/url?q=https://cwiki.apache.org/confluence/x/qgn9EQ&source=gmail-imap&ust=171508837000&usg=AOvVaw0eTZDvLwdZUDai5GqoSGrD
> > > 
> > >  This proposal tries to make Flink's subtask thread more responsive
> > when
> > >  processing watermarks/firing timers, and make those operations
> > >  interruptible/break them apart into smaller steps. At the same
> time,
> > > >> the
> > >  proposed solution could be potentially adopted in other places in
> > the
> > > >>> code
> > >  base as well, to solve similar problems with other flatMap-like
> > > >> operators
> > >  (non windowed joins, aggregations, CepOperator, ...).
> > > 
> > >  I'm looking forward to your thoughts.
> > > 
> > >  Best,
> > >  Piotrek
> > >
> >
>

Re: [DISCUSS] FLIP-443: Interruptible watermark processing

2024-05-06 Thread Zakelly Lan

Hi Piotr,

I'm saying the scenario where things happen in the following order:
1. advance watermark and process timers.
2. the cp arrives and interrupts the timer processing, after this the
continuation mail is in the mailbox queue.
3. `snapshotState` is called, where the async state access responses will
be drained by calling `tryYield()` [1]. —— What if the continuation mail is
triggered by `tryYield()`?

I'm suggesting skipping the continuation mail during draining of async
state access.


[1]
https://github.com/apache/flink/blob/1904b215e36e4fd48e48ece7ffdf2f1470653130/flink-runtime/src/main/java/org/apache/flink/runtime/asyncprocessing/AsyncExecutionController.java#L305

Best,
Zakelly


On Mon, May 6, 2024 at 6:00 PM Piotr Nowojski  wrote:

> Hi Zakelly,
>
> Can you elaborate a bit more on what you have in mind? How marking mails as
> interruptible helps with something? If an incoming async state access
> response comes, it could just request to interrupt any currently ongoing
> computations, regardless the currently executed mail is or is not
> interruptible.
>
> Best,
> Piotrek
>
> pon., 6 maj 2024 o 06:33 Zakelly Lan  napisał(a):
>
> > Hi Piotr,
> >
> > Thanks for the improvement, overall +1 for this. I'd leave a minor
> comment:
> >
> > 1. I'd suggest also providing `isInterruptable()` in `Mail`, and the
> > continuation mail will return true. The FLIP-425 will leverage this queue
> > to execute some state requests, and when the cp arrives, the operator may
> > call `yield()` to drain. It may happen that the continuation mail is
> called
> > again in `yield()`. By checking `isInterruptable()`, we can skip this
> mail
> > and re-enqueue.
> >
> >
> > Best,
> > Zakelly
> >
> > On Wed, May 1, 2024 at 4:35 PM Yanfei Lei  wrote:
> >
> > > Thanks for your answers, Piotrek. I got it now.  +1 for this
> improvement.
> > >
> > > Best,
> > > Yanfei
> > >
> > > Stefan Richter  于2024年4月30日周二 21:30写道：
> > > >
> > > >
> > > > Thanks for the improvement proposal, I’m +1 for the change!
> > > >
> > > > Best,
> > > > Stefan
> > > >
> > > >
> > > >
> > > > > On 30. Apr 2024, at 15:23, Roman Khachatryan 
> > wrote:
> > > > >
> > > > > Thanks for the proposal, I definitely see the need for this
> > > improvement, +1.
> > > > >
> > > > > Regards,
> > > > > Roman
> > > > >
> > > > >
> > > > > On Tue, Apr 30, 2024 at 3:11 PM Piotr Nowojski <
> pnowoj...@apache.org
> > > > wrote:
> > > > >
> > > > >> Hi Yanfei,
> > > > >>
> > > > >> Thanks for the feedback!
> > > > >>
> > > > >>> 1. Currently when AbstractStreamOperator or
> > AbstractStreamOperatorV2
> > > > >>> processes a watermark, the watermark will be sent to downstream,
> if
> > > > >>> the `InternalTimerServiceImpl#advanceWatermark` is interrupted,
> > when
> > > > >>> is the watermark sent downstream?
> > > > >>
> > > > >> The watermark would be outputted by an operator only once all
> > relevant
> > > > >> timers are fired.
> > > > >> In other words, if firing of timers is interrupted a continuation
> > > mail to
> > > > >> continue firing those
> > > > >> interrupted timers is created. Watermark will be emitted
> downstream
> > > at the
> > > > >> end of that
> > > > >> continuation mail.
> > > > >>
> > > > >>> 2. IIUC, processing-timer's firing is also encapsulated into mail
> > and
> > > > >>> executed in mailbox. Is processing-timer allowed to be
> interrupted?
> > > > >>
> > > > >> Yes, both firing processing and even time timers share the same
> code
> > > and
> > > > >> both will
> > > > >> support interruptions in the same way. Actually I've renamed the
> > FLIP
> > > from
> > > > >>
> > > > >>> Interruptible watermarks processing
> > > > >>
> > > > >> to:
> > > > >>
> > > > >>> Interruptible timers firing
> > > > >>
> > > > >> to make this more clear.
> > > > >>
> > > > >> Best,
> > > > >> Piotrek
> > > > >>
> > > > >> wt., 30 kwi 2024 o 06:08 Yanfei Lei 
> > napisał(a):
> > > > >>
> > > > >>> Hi Piotrek,
> > > > >>>
> > > > >>> Thanks for this proposal. It looks like it will shorten the
> > > checkpoint
> > > > >>> duration, especially in the case of back pressure. +1 for it!
> I'd
> > > > >>> like to ask some questions to understand your thoughts more
> > > precisely.
> > > > >>>
> > > > >>> 1. Currently when AbstractStreamOperator or
> > AbstractStreamOperatorV2
> > > > >>> processes a watermark, the watermark will be sent to downstream,
> if
> > > > >>> the `InternalTimerServiceImpl#advanceWatermark` is interrupted,
> > when
> > > > >>> is the watermark sent downstream?
> > > > >>> 2. IIUC, processing-timer's firing is also encapsulated into mail
> > and
> > > > >>> executed in mailbox. Is processing-timer allowed to be
> interrupted?
> > > > >>>
> > > > >>> Best regards,
> > > > >>> Yanfei
> > > > >>>
> > > > >>> Piotr Nowojski  于2024年4月29日周一 21:57写道：
> > > > >>>
> > > > 
> > > >  Hi all,
> > > > 
> > > >  I would like to start a discussion on FLIP-443: Interruptible
> > > watermark
> > > >  processi

[jira] [Created] (FLINK-35297) Add validation for option connect.timeout

2024-05-06 Thread Xiao Huang (Jira)

Xiao Huang created FLINK-35297:
--

 Summary: Add validation for option connect.timeout
 Key: FLINK-35297
 URL: https://issues.apache.org/jira/browse/FLINK-35297
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Xiao Huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread Roman Khachatryan

+1 (binding)

Regards,
Roman


On Mon, May 6, 2024 at 11:56 AM gongzhongqiang 
wrote:

> +1 (non-binding)
>
> Best,
> Zhongqiang Gong
>
> yue ma  于2024年5月6日周一 10:54写道：
>
> > Hi everyone,
> >
> > Thanks for all the feedback, I'd like to start a vote on the FLIP-447:
> > Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is here
> > [2].
> >
> > The vote will be open for at least 72 hours unless there is an objection
> or
> > insufficient votes.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> > [2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw
> >
> > --
> > Best,
> > Yue
> >
>

Re: [DISCUSS] FLIP-444: Native file copy support

2024-05-06 Thread Piotr Nowojski

Hi All!

Thanks for your comments.

Muhammet and Hong, about the config options.

> Could you please also add the configuration property for this? An example
showing how users would set this parameter would be helpful.

> 1/ Configure the implementation of PathsCopyingFileSystem used
> 2/ Configure the location of the s5cmd binary (version control etc.)

Ops, sorry I added the config options that I had in mind to the FLIP. I
don't know why I have omitted this. Basically I suggest that in order to
use native file copying:
1. `FileSystem` must support it via implementing `PathsCopyingFileSystem`
interface
2. That `FileSystem` would have to be configured to actually use it. For
example S3 file system would return `true` that it can copy paths
only if `s3.s5cmd.path` has been specified.

> Would this affect any filesystem connectors that use FileSystem[1][2]
dependencies?

Definitely not out of the box. Any place in Flink that is currently
uploading/downloading files from a FileSystem could use this feature, but
it
would have to be implemented. The same way this FLIP will implement native
files copying when downloading state during recovery,
but the old code path will be still used for uploading state files during a
checkpoint.

> How adding a s5cmd will affect memory footprint? Since this is a native
binary, memory consumption will not be controlled by JVM or Flink.

As you mentioned the memory usage of `s5cmd` will not be controlled, so the
memory footprint will grow. S5cmd integration with Flink
has been tested quite extensively on our production environment already,
and we haven't observed any issues so far despite the fact we
are using quite small pods. But of course if your setup is working on the
edge of OOM, this could tip you over that edge.

Zakelly:

> 1. What is the semantic of `canCopyPath`? Should it be associated with a
> specific destination path? e.g. It can be copied to local, but not to the
> remote FS.

For the S3 (both for SDKv2 and s5cmd implementations), the copying
direction (upload/download) doesn't matter. I don't know about other
file systems, I haven't investigated anything besides S3. Nevertheless I
wouldn't worry too much about it, since we can start with the simple
`canCopyPath` that handles both directions. If this will become important
in the future, adding directional `canDownloadPath` or `canUploadPath`
would be a backward compatible change, so we can safely extend it in the
future if needed.

> 2. Is the existing interface `DuplicatingFileSystem` feasible/enough for
this case?

Good question. The intention and use case behind `DuplicatingFileSystem` is
different. It marks if `FileSystem` can quickly copy/duplicate files
in the remote `FileSystem`. For example an equivalent of a hard link or
bumping a reference count in the remote system. That's a bit different
to copy paths between remote and local file systems.

However, it could arguably be unified under one interface where we would
re-use or re-name `canFastDuplicate(Path, Path)` to
`canFastCopy(Path, Path)` with the following use cases:
- `canFastCopy(remoteA, remoteB)` returns true - current equivalent of
`DuplicatingFileSystem` - quickly duplicate/hard link remote path
- `canFastCopy(local, remote)` returns true - FS can natively upload local
file to a remote location
- `canFastCopy(remote, local)` returns true - FS can natively download
local file from a remote location

Maybe indeed that's a better solution vs having two separate interfaces for
copying and duplicating?

> 3. Will the interface extracting introduce a break change?

No. The signature of the existing abstract `FileSystem` class would remain
the same. Only most/all of the abstract methods would be
pulled out to the interface and abstract `FileSystem` would implement that
new interface.

Best,
Piotrek

pon., 6 maj 2024 o 04:55 Zakelly Lan  napisał(a):

> Hi Piotr,
>
> Thanks for the proposal. It's meaningful to speed up the state download. I
> get into some questions:
>
> 1. What is the semantic of `canCopyPath`? Should it be associated with a
> specific destination path? e.g. It can be copied to local, but not to the
> remote FS.
> 2. Is the existing interface `DuplicatingFileSystem` feasible/enough for
> this case?
> 3. Will the interface extracting introduce a break change?
>
>
> Best,
> Zakelly
>
>
> On Thu, May 2, 2024 at 6:50 PM Aleksandr Pilipenko 
> wrote:
>
> > Hi Piotr,
> >
> > Thanks for the proposal.
> > How adding a s5cmd will affect memory footprint? Since this is a native
> > binary, memory consumption will not be controlled by JVM or Flink.
> >
> > Thanks,
> > Aleksandr
> >
> > On Thu, 2 May 2024 at 11:12, Hong Liang  wrote:
> >
> > > Hi Piotr,
> > >
> > > Thanks for the FLIP! Nice to see work to improve the filesystem
> > > performance. +1 to future work to improve the upload speed as well.
> This
> > > would be useful for jobs with large state and high Async checkpointing
> > > times.
> > >
> > > Some thoughts on the configuration, i

Re: [DISCUSS] FLIP-444: Native file copy support

2024-05-06 Thread Zakelly Lan

Hi Piotrek,

Thanks for your answers!

Good question. The intention and use case behind `DuplicatingFileSystem` is
> different. It marks if `FileSystem` can quickly copy/duplicate files
> in the remote `FileSystem`. For example an equivalent of a hard link or
> bumping a reference count in the remote system. That's a bit different
> to copy paths between remote and local file systems.
>
> However, it could arguably be unified under one interface where we would
> re-use or re-name `canFastDuplicate(Path, Path)` to
> `canFastCopy(Path, Path)` with the following use cases:
> - `canFastCopy(remoteA, remoteB)` returns true - current equivalent of
> `DuplicatingFileSystem` - quickly duplicate/hard link remote path
> - `canFastCopy(local, remote)` returns true - FS can natively upload local
> file to a remote location
> - `canFastCopy(remote, local)` returns true - FS can natively download
> local file from a remote location
>
> Maybe indeed that's a better solution vs having two separate interfaces for
> copying and duplicating?
>

I'd prefer a unified one interface, `canFastCopy(Path, Path)` looks good to
me. This also resolves my question 1 about the destination.


Best,
Zakelly

On Mon, May 6, 2024 at 6:36 PM Piotr Nowojski  wrote:

> Hi All!
>
> Thanks for your comments.
>
> Muhammet and Hong, about the config options.
>
> > Could you please also add the configuration property for this? An example
> showing how users would set this parameter would be helpful.
>
> > 1/ Configure the implementation of PathsCopyingFileSystem used
> > 2/ Configure the location of the s5cmd binary (version control etc.)
>
> Ops, sorry I added the config options that I had in mind to the FLIP. I
> don't know why I have omitted this. Basically I suggest that in order to
> use native file copying:
> 1. `FileSystem` must support it via implementing `PathsCopyingFileSystem`
> interface
> 2. That `FileSystem` would have to be configured to actually use it. For
> example S3 file system would return `true` that it can copy paths
> only if `s3.s5cmd.path` has been specified.
>
> > Would this affect any filesystem connectors that use FileSystem[1][2]
> dependencies?
>
> Definitely not out of the box. Any place in Flink that is currently
> uploading/downloading files from a FileSystem could use this feature, but
> it
> would have to be implemented. The same way this FLIP will implement native
> files copying when downloading state during recovery,
> but the old code path will be still used for uploading state files during a
> checkpoint.
>
> > How adding a s5cmd will affect memory footprint? Since this is a native
> binary, memory consumption will not be controlled by JVM or Flink.
>
> As you mentioned the memory usage of `s5cmd` will not be controlled, so the
> memory footprint will grow. S5cmd integration with Flink
> has been tested quite extensively on our production environment already,
> and we haven't observed any issues so far despite the fact we
> are using quite small pods. But of course if your setup is working on the
> edge of OOM, this could tip you over that edge.
>
> Zakelly:
>
> > 1. What is the semantic of `canCopyPath`? Should it be associated with a
> > specific destination path? e.g. It can be copied to local, but not to the
> > remote FS.
>
> For the S3 (both for SDKv2 and s5cmd implementations), the copying
> direction (upload/download) doesn't matter. I don't know about other
> file systems, I haven't investigated anything besides S3. Nevertheless I
> wouldn't worry too much about it, since we can start with the simple
> `canCopyPath` that handles both directions. If this will become important
> in the future, adding directional `canDownloadPath` or `canUploadPath`
> would be a backward compatible change, so we can safely extend it in the
> future if needed.
>
> > 2. Is the existing interface `DuplicatingFileSystem` feasible/enough for
> this case?
>
> Good question. The intention and use case behind `DuplicatingFileSystem` is
> different. It marks if `FileSystem` can quickly copy/duplicate files
> in the remote `FileSystem`. For example an equivalent of a hard link or
> bumping a reference count in the remote system. That's a bit different
> to copy paths between remote and local file systems.
>
> However, it could arguably be unified under one interface where we would
> re-use or re-name `canFastDuplicate(Path, Path)` to
> `canFastCopy(Path, Path)` with the following use cases:
> - `canFastCopy(remoteA, remoteB)` returns true - current equivalent of
> `DuplicatingFileSystem` - quickly duplicate/hard link remote path
> - `canFastCopy(local, remote)` returns true - FS can natively upload local
> file to a remote location
> - `canFastCopy(remote, local)` returns true - FS can natively download
> local file from a remote location
>
> Maybe indeed that's a better solution vs having two separate interfaces for
> copying and duplicating?
>
> > 3. Will the interface extracting introduce a break change?
>
> No. The signa

[jira] [Created] (FLINK-35298) Improve metric reporter logic

2024-05-06 Thread Xiao Huang (Jira)

Xiao Huang created FLINK-35298:
--

 Summary: Improve metric reporter logic
 Key: FLINK-35298
 URL: https://issues.apache.org/jira/browse/FLINK-35298
 Project: Flink
  Issue Type: Improvement
Reporter: Xiao Huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread Piotr Nowojski

+1 (binding)

Piotrek

pon., 6 maj 2024 o 12:35 Roman Khachatryan  napisał(a):

> +1 (binding)
>
> Regards,
> Roman
>
>
> On Mon, May 6, 2024 at 11:56 AM gongzhongqiang 
> wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Zhongqiang Gong
> >
> > yue ma  于2024年5月6日周一 10:54写道：
> >
> > > Hi everyone,
> > >
> > > Thanks for all the feedback, I'd like to start a vote on the FLIP-447:
> > > Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is
> here
> > > [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > or
> > > insufficient votes.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> > > [2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw
> > >
> > > --
> > > Best,
> > > Yue
> > >
> >
>

Re:[DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-05-06 Thread Xuyang

Hi, Ron.

Thanks for driving this. After reading the entire flip, I have the following 
questions:

1. In the sequence diagram, it appears that there is a missing step for 
obtaining the refresh handler from the catalog during the suspend operation.

2. The term "cascade refresh" does not seem to be mentioned in FLIP-435. The 
workflow it creates is marked as a "one-time workflow". This is different 

from a "periodic workflow," and it appears to be a one-off execution. Is this 
actually referring to the Refresh command in FLIP-435?

3. The workflow-scheduler.type has no default value; should it be set to CRON 
by default?

4. It appears that in the section on `public interfaces`, within 
`WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to 

`CreateWorkflowOperation`, right?

--

Best！
Xuyang

At 2024-04-22 14:41:39, "Ron Liu"  wrote:
>Hi, Dev
>
>I would like to start a discussion about FLIP-448: Introduce Pluggable
>Workflow Scheduler Interface for Materialized Table.
>
>In FLIP-435[1], we proposed Materialized Table, which has two types of data
>refresh modes: Full Refresh & Continuous Refresh Mode. In Full Refresh
>mode, the Materialized Table relies on a workflow scheduler to perform
>periodic refresh operation to achieve the desired data freshness.
>
>There are numerous open-source workflow schedulers available, with popular
>ones including Airflow and DolphinScheduler. To enable Materialized Table
>to work with different workflow schedulers, we propose a pluggable workflow
>scheduler interface for Materialized Table in this FLIP.
>
>For more details, see FLIP-448 [2]. Looking forward to your feedback.
>
>[1] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs
>[2]
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-448%3A+Introduce+Pluggable+Workflow+Scheduler+Interface+for+Materialized+Table
>
>Best,
>Ron

Re: [DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-05-06 Thread Ron Liu

Hi, Xuyang

Thanks for joining this discussion

> 1. In the sequence diagram, it appears that there is a missing step for
obtaining the refresh handler from the catalog during the suspend operation.

Good catch

> 2. The term "cascade refresh" does not seem to be mentioned in FLIP-435.
The workflow it creates is marked as a "one-time workflow". This is
different

from a "periodic workflow," and it appears to be a one-off execution. Is
this actually referring to the Refresh command in FLIP-435?

The cascade refresh is a future work, we don't propose the corresponding
syntax in FLIP-435. However, intuitively, it would be an extension of the
Refresh command in FLIP-435.

> 3. The workflow-scheduler.type has no default value; should it be set to
CRON by default?

Firstly, CRON is not a workflow scheduler. Secondly, I believe that
configuring the Scheduler should be an action that users are aware of, and
default values should not be set.

> 4. It appears that in the section on `public interfaces`, within
`WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to

`CreateWorkflowOperation`, right?

Sorry, I don't get your point. Can you give more description?

Best,
Ron

Xuyang  于2024年5月6日周一 20:26写道：

> Hi, Ron.
>
> Thanks for driving this. After reading the entire flip, I have the
> following questions:
>
>
>
>
> 1. In the sequence diagram, it appears that there is a missing step for
> obtaining the refresh handler from the catalog during the suspend operation.
>
>
>
>
> 2. The term "cascade refresh" does not seem to be mentioned in FLIP-435.
> The workflow it creates is marked as a "one-time workflow". This is
> different
>
> from a "periodic workflow," and it appears to be a one-off execution. Is
> this actually referring to the Refresh command in FLIP-435?
>
>
>
>
> 3. The workflow-scheduler.type has no default value; should it be set to
> CRON by default?
>
>
>
>
> 4. It appears that in the section on `public interfaces`, within
> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>
> `CreateWorkflowOperation`, right?
>
>
>
>
> --
>
> Best！
> Xuyang
>
>
>
>
>
> At 2024-04-22 14:41:39, "Ron Liu"  wrote:
> >Hi, Dev
> >
> >I would like to start a discussion about FLIP-448: Introduce Pluggable
> >Workflow Scheduler Interface for Materialized Table.
> >
> >In FLIP-435[1], we proposed Materialized Table, which has two types of
> data
> >refresh modes: Full Refresh & Continuous Refresh Mode. In Full Refresh
> >mode, the Materialized Table relies on a workflow scheduler to perform
> >periodic refresh operation to achieve the desired data freshness.
> >
> >There are numerous open-source workflow schedulers available, with popular
> >ones including Airflow and DolphinScheduler. To enable Materialized Table
> >to work with different workflow schedulers, we propose a pluggable
> workflow
> >scheduler interface for Materialized Table in this FLIP.
> >
> >For more details, see FLIP-448 [2]. Looking forward to your feedback.
> >
> >[1] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs
> >[2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-448%3A+Introduce+Pluggable+Workflow+Scheduler+Interface+for+Materialized+Table
> >
> >Best,
> >Ron
>

Re: [DISCUSS] FLIP-451: Refactor Async sink API

2024-05-06 Thread Ahmed Hamdy

Hi Muhammet,
Thanks for the feedback.

> Could you please add more here why it is harder? Would the
> `completeExceptionally`
> method be related to it? Maybe you can add usage example for it also.
>

this is mainly due to the current implementation of fatal exception
failures which depends on base `getFatalExceptionConsumer` method that is
decoupled from the actual called method `submitRequestEntries`, Since this
is now not the primary concern of the FLIP, I have removed it from the
motivation so that the scope is defined around introducing the timeout
configuration.

> Should we add a list of possible connectors that this FLIP would improve?

Good call, I have added under migration plan.

Best Regards
Ahmed Hamdy


On Mon, 6 May 2024 at 08:49, Muhammet Orazov  wrote:

> Hey Ahmed,
>
> Thanks for the FLIP! +1 (non-binding)
>
> > Additionally the current interface for passing fatal exceptions and
> > retrying records relies on java consumers which makes it harder to
> > understand.
>
> Could you please add more here why it is harder? Would the
> `completeExceptionally`
> method be related to it? Maybe you can add usage example for it also.
>
> > we should proceed by adding support in all supporting connector repos.
>
> Should we add list of possible connectors that this FLIP would improve?
>
> Best,
> Muhammet
>
>
> On 2024-04-29 14:08, Ahmed Hamdy wrote:
> > Hi all,
> > I would like to start a discussion on FLIP-451[1]
> > The proposal comes on encountering a couple of issues while working
> > with
> > implementers for Async Sink.
> > The FLIP mainly proposes a new API similar to AsyncFunction and
> > ResultFuture as well as introducing timeout handling for AsyncSink
> > requests.
> > The FLIP targets 1.20 with backward compatible changes and we should
> > proceed by adding support in all supporting connector repos.
> >
> > 1-
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-451%3A+Refactor+Async+Sink+API
> > Best Regards
> > Ahmed Hamdy
>

[jira] [Created] (FLINK-35299) FlinkKinesisConsumer does not respect StreamInitialPosition for new Kinesis Stream when restoring from snapshot

2024-05-06 Thread Hong Liang Teoh (Jira)

Hong Liang Teoh created FLINK-35299:
---

Summary: FlinkKinesisConsumer does not respect
StreamInitialPosition for new Kinesis Stream when restoring from snapshot
Key: FLINK-35299
URL: https://issues.apache.org/jira/browse/FLINK-35299
Project: Flink
Issue Type: Bug
Components: Connectors / Kinesis
Affects Versions: aws-connector-4.2.0
Reporter: Hong Liang Teoh
Fix For: aws-connector-4.4.0

h3. What

The FlinkKinesisConsumer allows users to read from [multiple Kinesis
Streams|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L224].

Users can also specify a STREAM_INITIAL_POSITION, which configures if the
consumer starts reading the stream from TRIM_HORIZON / LATEST / AT_TIMESTAMP.

When restoring the Kinesis Consumer from an existing snapshot, users can
configure the consumer to read from additional Kinesis Streams. The expected
behavior would be for the FlinkKinesisConsumer to start reading from the
additional Kinesis Streams respecting the STREAM_INITIAL_POSITION
configuration. However, we find that it currently reads from TRIM_HORIZON.

This is surprising behavior and should be corrected.
h3. Why

Principle of Least Astonishment
h3. How

We recommend that we reconstruct the previously seen streams by iterating
through the [sequenceNumsStateForCheckpoint in
FlinkKinesisConsumer#initializeState()|https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java#L454].

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-06 Thread João Boto

Hi Muhammet,

Have you had a chance to review the recently merged pull request [1]? 
We've introduced a new feature allowing users to include ad hoc configurations 
in the 'JdbcConnectionOptions' class.
```
 new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl(FakeDBUtils.TEST_DB_URL)
.withProperty("keyA", "valueA")
.build();
```

This provides flexibility by enabling users to specify additional configuration 
parameters dynamically. 

[1] https://github.com/apache/flink-connector-jdbc/pull/115/files

Best

On 2024/05/06 07:34:06 Muhammet Orazov wrote:
> Morning João,
> 
> Recently we had a case where the JDBC drivers authentication was 
> different than username&password authentication. For it to work, certain 
> hacks required, there interface would have been helpful.
> 
> But I agree maybe the interface module separation is not required at the 
> moment.
> 
> Thanks for your efforts!
> 
> Best,
> Muhammet
> 
> 
> On 2024-05-03 12:25, João Boto wrote:
> > Hi Muhammet,
> > 
> > While I generally agree, given our current usage, I'm struggling to 
> > discern any clear advantage. We already have abstract implementations 
> > that cover all necessary interfaces and offer essential functionality, 
> > complemented by a robust set of reusable tests to streamline 
> > implementation.
> > 
> > With this established infrastructure in place, coupled with the added 
> > import overhead of introducing another module, I find it difficult to 
> > identify any distinct benefits at this point.
> > 
> > Best
> > 
> > On 2024/04/26 02:18:52 Muhammet Orazov wrote:
> >> Hey João,
> >> 
> >> Thanks for FLIP proposal!
> >> 
> >> Since proposal is to introduce modules, would it make sense
> >> to have another module for APIs (flink-jdbc-connector-api)?
> >> 
> >> For this I would suggest to move all public interfaces (e.g,
> >> JdbcRowConverter, JdbcConnectionProvider). And even convert
> >> some classes into interface with their default implementations,
> >> for example, JdbcSink, JdbcConnectionOptions.
> >> 
> >> This way users would have clear interfaces to build their own
> >> JDBC based Flink connectors.
> >> 
> >> Here I am not suggesting to introduce new interfaces, only
> >> suggest also to separate the API from the core implementation.
> >> 
> >> What do you think?
> >> 
> >> Best,
> >> Muhammet
> >> 
> >> 
> >> On 2024-04-25 08:54, Joao Boto wrote:
> >> > Hi all,
> >> >
> >> > I'd like to start a discussion on FLIP-449: Reorganization of
> >> > flink-connector-jdbc [1].
> >> > As Flink continues to evolve, we've noticed an increasing level of
> >> > complexity within the JDBC connector.
> >> > The proposed solution is to address this complexity by separating the
> >> > core
> >> > functionality from individual database components, thereby streamlining
> >> > the
> >> > structure into distinct modules.
> >> >
> >> > Looking forward to your feedback and suggestions, thanks.
> >> > Best regards,
> >> > Joao Boto
> >> >
> >> > [1]
> >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
> >> 
>

Re: [DISCUSSION] FLIP-449: Reorganization of flink-connector-jdbc

2024-05-06 Thread João Boto

Hi Jeyhun,

> > I would also ask to include a sample usage and changes for end-users in the
> > FLIP.

We want to ensure a seamless transition for end-users, minimizing any 
disruptions in their current usage of the connector. To achieve this, we will 
uphold consistency by maintaining the same interfaces (though deprecated) 
within the existing package.

> > Also, in order to ensure the backwards compatibility, do you think at some
> > point we might need to decouple interface and implementations and put only
> > interfaces in flink-connector-jdbc module?

I think flink-connector-jdbc  should be only a shade jar, we could use it to 
deprecate some current interfaces, extending the new on the new package.. but 
that is all..
We propose that the flink-connector-jdbc be solely packaged as a shade jar. 
This would enable us to deprecate existing interfaces while introducing the 
same interface on new package.

Best

On 2024/05/03 14:08:28 Jeyhun Karimov wrote:
> Hi Boto,
> 
> Thanks for driving this FLIP. +1 for it.
> 
> I would also ask to include a sample usage and changes for end-users in the
> FLIP.
> 
> flink-connector-jdbc: The current module, which will be transformed to
> > shade all other modules and maintain backward compatibility.
> 
> 
> Also, in order to ensure the backwards compatibility, do you think at some
> point we might need to decouple interface and implementations and put only
> interfaces in flink-connector-jdbc module?
> 
> Regards,
> Jeyhun
> 
> On Fri, May 3, 2024 at 2:56 PM João Boto  wrote:
> 
> > Hi,
> >
> > > > You can now update the derby implementation and the core independently
> > and decide at your own will when to include the new derby in the core?
> > Not really, we are talking about creating modules in the same repository,
> > not about externalizing the database modules. That is, whenever there is a
> > release, both the core and the DBs will be released at the same time.
> >
> > > > For clarity of motivation, could you please add some concrete examples
> > (just a couple) to the FLIP to clarify when this really comes in handy?
> > Added.
> >
> > Best
> >
> > On 2024/04/26 07:59:30 lorenzo.affe...@ververica.com.INVALID wrote:
> > > Hello Joao,
> > > thank your for your proposal, modularity is always welcome :)
> > >
> > > > To maintain clarity and minimize conflicts, we're currently leaning
> > towards maintaining the existing structure, where
> > flink-connector-jdbc-${version}.jar remains shaded for simplicity,
> > encompassing the core functionality and all database-related features
> > within the same JAR.
> > >
> > > I do agree with this approach as the usecase of reading/writing to
> > different DBs could be quite common.
> > >
> > > However, I am missing what would be the concrete advantage in this
> > change for connector maintainability.
> > > I make an example:
> > > You can now update the derby implementation and the core independently
> > and decide at your own will when to include the new derby in the core?
> > >
> > > For clarity of motivation, could you please add some concrete examples
> > (just a couple) to the FLIP to clarify when this really comes in handy?
> > >
> > > Thank you!
> > > On Apr 26, 2024 at 04:19 +0200, Muhammet Orazov
> > , wrote:
> > > > Hey João,
> > > >
> > > > Thanks for FLIP proposal!
> > > >
> > > > Since proposal is to introduce modules, would it make sense
> > > > to have another module for APIs (flink-jdbc-connector-api)?
> > > >
> > > > For this I would suggest to move all public interfaces (e.g,
> > > > JdbcRowConverter, JdbcConnectionProvider). And even convert
> > > > some classes into interface with their default implementations,
> > > > for example, JdbcSink, JdbcConnectionOptions.
> > > >
> > > > This way users would have clear interfaces to build their own
> > > > JDBC based Flink connectors.
> > > >
> > > > Here I am not suggesting to introduce new interfaces, only
> > > > suggest also to separate the API from the core implementation.
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Muhammet
> > > >
> > > >
> > > > On 2024-04-25 08:54, Joao Boto wrote:
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a discussion on FLIP-449: Reorganization of
> > > > > flink-connector-jdbc [1].
> > > > > As Flink continues to evolve, we've noticed an increasing level of
> > > > > complexity within the JDBC connector.
> > > > > The proposed solution is to address this complexity by separating the
> > > > > core
> > > > > functionality from individual database components, thereby
> > streamlining
> > > > > the
> > > > > structure into distinct modules.
> > > > >
> > > > > Looking forward to your feedback and suggestions, thanks.
> > > > > Best regards,
> > > > > Joao Boto
> > > > >
> > > > > [1]
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-449%3A+Reorganization+of+flink-connector-jdbc
> > >
> >
>

Re: [Discuss] FLIP-452: Allow Skipping Invocation of Function Calls While Constant-folding

2024-05-06 Thread Timo Walther


Hi Alan,

thanks for the update. From my side, the FLIP seems good for voting.

Since it only touches a small API surface, I guess the proposal is not 
very controversial.


Feel free to start a vote thread by tomorrow. If there are no objections?

Thanks,
Timo


On 02.05.24 09:45, Muhammet Orazov wrote:

Hey Alan,

Thanks for the proposal, +1!

The `isDeterministic()`[1] function is mentioned in the documentation,
I would suggest to add maybe a section for `supportsConstantFolding()`,
with short description and examples use cases (similar to the
motivation of the FLIP) where this could be useful in UDFs.

Best,
Muhammet

[1]: 
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/#evaluation-methods



On 2024-04-29 22:57, Alan Sheinberg wrote:

I'd like to start a discussion of FLIP-452: Allow Skipping Invocation of
Function Calls While Constant-folding [1]

This feature proposes adding a new
method FunctionDefinition.allowConstantFolding() as part of the Flink
Table/SQL API.  This would be used to determine whether an expression
containing this function should have constant-folding logic run on it,
invoking the function at planning time.

The current behavior of always doing constant-folding on function 
calls is

problematic for UDFs which invoke RPCs or have other side effects in
external systems.  In these cases, you either don’t want these actions to
occur during planning time, or it may be important to happen on a per
result row basis.

Note that this is a bit different than
FunctionDefinition.isDeterministic(), and can exist along-side it.

Looking forward to your feedback and suggestions.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-452%3A+Allow+Skipping+Invocation+of+Function+Calls+While+Constant-folding


Thanks,
Alan

Re: [Discuss] FLIP-452: Allow Skipping Invocation of Function Calls While Constant-folding

2024-05-06 Thread Alan Sheinberg

Hi Muhammet, Timo

The `isDeterministic()`[1] function is mentioned in the documentation,
> I would suggest to add maybe a section for `supportsConstantFolding()`,
> with short description and examples use cases (similar to the
> motivation of the FLIP) where this could be useful in UDFs.
>
Thanks for the suggestion.  I'll do that.

 Feel free to start a vote thread by tomorrow. If there are no objections?

Sounds good. I'll do that tomorrow if I hear no objections by then.

Thanks,
Alan

On Mon, May 6, 2024 at 8:12 AM Timo Walther  wrote:

> Hi Alan,
>
> thanks for the update. From my side, the FLIP seems good for voting.
>
> Since it only touches a small API surface, I guess the proposal is not
> very controversial.
>
> Feel free to start a vote thread by tomorrow. If there are no objections?
>
> Thanks,
> Timo
>
>
> On 02.05.24 09:45, Muhammet Orazov wrote:
> > Hey Alan,
> >
> > Thanks for the proposal, +1!
> >
> > The `isDeterministic()`[1] function is mentioned in the documentation,
> > I would suggest to add maybe a section for `supportsConstantFolding()`,
> > with short description and examples use cases (similar to the
> > motivation of the FLIP) where this could be useful in UDFs.
> >
> > Best,
> > Muhammet
> >
> > [1]:
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/#evaluation-methods
> >
> >
> > On 2024-04-29 22:57, Alan Sheinberg wrote:
> >> I'd like to start a discussion of FLIP-452: Allow Skipping Invocation of
> >> Function Calls While Constant-folding [1]
> >>
> >> This feature proposes adding a new
> >> method FunctionDefinition.allowConstantFolding() as part of the Flink
> >> Table/SQL API.  This would be used to determine whether an expression
> >> containing this function should have constant-folding logic run on it,
> >> invoking the function at planning time.
> >>
> >> The current behavior of always doing constant-folding on function
> >> calls is
> >> problematic for UDFs which invoke RPCs or have other side effects in
> >> external systems.  In these cases, you either don’t want these actions
> to
> >> occur during planning time, or it may be important to happen on a per
> >> result row basis.
> >>
> >> Note that this is a bit different than
> >> FunctionDefinition.isDeterministic(), and can exist along-side it.
> >>
> >> Looking forward to your feedback and suggestions.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-452%3A+Allow+Skipping+Invocation+of+Function+Calls+While+Constant-folding
> >>
> >>
> >> Thanks,
> >> Alan
> >
>
>

Re: [Discuss] FLIP-452: Allow Skipping Invocation of Function Calls While Constant-folding

2024-05-06 Thread Lincoln Lee

Thanks Alan for starting this flip!

+1 for the new `supportsConstantFolding`. We do need to reserve the
ability to not perform constant folding for certain cases.

Best,
Lincoln Lee


Alan Sheinberg  于2024年5月7日周二 08:32写道：

> Hi Muhammet, Timo
>
> The `isDeterministic()`[1] function is mentioned in the documentation,
> > I would suggest to add maybe a section for `supportsConstantFolding()`,
> > with short description and examples use cases (similar to the
> > motivation of the FLIP) where this could be useful in UDFs.
> >
> Thanks for the suggestion.  I'll do that.
>
>  Feel free to start a vote thread by tomorrow. If there are no objections?
>
> Sounds good. I'll do that tomorrow if I hear no objections by then.
>
> Thanks,
> Alan
>
> On Mon, May 6, 2024 at 8:12 AM Timo Walther  wrote:
>
> > Hi Alan,
> >
> > thanks for the update. From my side, the FLIP seems good for voting.
> >
> > Since it only touches a small API surface, I guess the proposal is not
> > very controversial.
> >
> > Feel free to start a vote thread by tomorrow. If there are no objections?
> >
> > Thanks,
> > Timo
> >
> >
> > On 02.05.24 09:45, Muhammet Orazov wrote:
> > > Hey Alan,
> > >
> > > Thanks for the proposal, +1!
> > >
> > > The `isDeterministic()`[1] function is mentioned in the documentation,
> > > I would suggest to add maybe a section for `supportsConstantFolding()`,
> > > with short description and examples use cases (similar to the
> > > motivation of the FLIP) where this could be useful in UDFs.
> > >
> > > Best,
> > > Muhammet
> > >
> > > [1]:
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/table/functions/udfs/#evaluation-methods
> > >
> > >
> > > On 2024-04-29 22:57, Alan Sheinberg wrote:
> > >> I'd like to start a discussion of FLIP-452: Allow Skipping Invocation
> of
> > >> Function Calls While Constant-folding [1]
> > >>
> > >> This feature proposes adding a new
> > >> method FunctionDefinition.allowConstantFolding() as part of the Flink
> > >> Table/SQL API.  This would be used to determine whether an expression
> > >> containing this function should have constant-folding logic run on it,
> > >> invoking the function at planning time.
> > >>
> > >> The current behavior of always doing constant-folding on function
> > >> calls is
> > >> problematic for UDFs which invoke RPCs or have other side effects in
> > >> external systems.  In these cases, you either don’t want these actions
> > to
> > >> occur during planning time, or it may be important to happen on a per
> > >> result row basis.
> > >>
> > >> Note that this is a bit different than
> > >> FunctionDefinition.isDeterministic(), and can exist along-side it.
> > >>
> > >> Looking forward to your feedback and suggestions.
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-452%3A+Allow+Skipping+Invocation+of+Function+Calls+While+Constant-folding
> > >>
> > >>
> > >> Thanks,
> > >> Alan
> > >
> >
> >
>

[jira] [Created] (FLINK-35300) Improve MySqlStreamingChangeEventSource to skip null events in event deserializer

2024-05-06 Thread Xiao Huang (Jira)

Xiao Huang created FLINK-35300:
--

 Summary: Improve MySqlStreamingChangeEventSource to skip null 
events in event deserializer
 Key: FLINK-35300
 URL: https://issues.apache.org/jira/browse/FLINK-35300
 Project: Flink
  Issue Type: Improvement
  Components: Flink CDC
Reporter: Xiao Huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35301) Fix deadlock when loading driver classes

2024-05-06 Thread Xiao Huang (Jira)

Xiao Huang created FLINK-35301:
--

 Summary: Fix deadlock when loading driver classes
 Key: FLINK-35301
 URL: https://issues.apache.org/jira/browse/FLINK-35301
 Project: Flink
  Issue Type: Bug
  Components: Flink CDC
Reporter: Xiao Huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35302) Flink REST server throws exception on unknown fields in RequestBody

2024-05-06 Thread Juntao Hu (Jira)

Juntao Hu created FLINK-35302:
-

 Summary: Flink REST server throws exception on unknown fields in 
RequestBody
 Key: FLINK-35302
 URL: https://issues.apache.org/jira/browse/FLINK-35302
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Affects Versions: 1.19.0
Reporter: Juntao Hu
 Fix For: 1.19.1


As 
[FLIP-401|https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance]
 and FLINK-33258 mentioned, when an old version REST client receives response 
from a new version REST server, with strict JSON mapper, the client will throw 
exceptions on newly added fields, which is not convenient for situations where 
a centralized client deals with REST servers of different versions (e.g. k8s 
operator).
But this incompatibility can also happens at server side, when a new version 
REST client sends requests to an old version REST server with additional 
fields. Making server flexible with unknown fields can save clients from 
backward compatibility code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-447: Upgrade FRocksDB from 6.20.3 to 8.10.0

2024-05-06 Thread ConradJam

+1 (no-binding)

Piotr Nowojski  于2024年5月6日周一 20:17写道：

> +1 (binding)
>
> Piotrek
>
> pon., 6 maj 2024 o 12:35 Roman Khachatryan  napisał(a):
>
> > +1 (binding)
> >
> > Regards,
> > Roman
> >
> >
> > On Mon, May 6, 2024 at 11:56 AM gongzhongqiang <
> gongzhongqi...@apache.org>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Zhongqiang Gong
> > >
> > > yue ma  于2024年5月6日周一 10:54写道：
> > >
> > > > Hi everyone,
> > > >
> > > > Thanks for all the feedback, I'd like to start a vote on the
> FLIP-447:
> > > > Upgrade FRocksDB from 6.20.3 to 8.10.0 [1]. The discussion thread is
> > here
> > > > [2].
> > > >
> > > > The vote will be open for at least 72 hours unless there is an
> > objection
> > > or
> > > > insufficient votes.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-447%3A+Upgrade+FRocksDB+from+6.20.3++to+8.10.0
> > > > [2] https://lists.apache.org/thread/lrxjfpjjwlq4sjzm1oolx58n1n8r48hw
> > > >
> > > > --
> > > > Best,
> > > > Yue
> > > >
> > >
> >
>


-- 
Best

ConradJam

Re: [DISCUSS] FLIP-448: Introduce Pluggable Workflow Scheduler Interface for Materialized Table

2024-05-06 Thread Ron Liu

> 4. It appears that in the section on `public interfaces`, within
`WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to

`CreateWorkflowOperation`, right?

After discussing with Xuyang offline, we need to support periodic workflow
and one-time workflow, they need different information, for example,
periodic workflow needs cron expression, one-time workflow needs refresh
partition, downstream cascade materialized table, etc. Therefore,
CreateWorkflowOperation correspondingly will have two different
implementation classes, which will be cleaner for both the implementer and
the caller.

Best,
Ron

Ron Liu  于2024年5月6日周一 20:48写道：

> Hi, Xuyang
>
> Thanks for joining this discussion
>
> > 1. In the sequence diagram, it appears that there is a missing step for
> obtaining the refresh handler from the catalog during the suspend operation.
>
> Good catch
>
> > 2. The term "cascade refresh" does not seem to be mentioned in FLIP-435.
> The workflow it creates is marked as a "one-time workflow". This is
> different
>
> from a "periodic workflow," and it appears to be a one-off execution. Is
> this actually referring to the Refresh command in FLIP-435?
>
> The cascade refresh is a future work, we don't propose the corresponding
> syntax in FLIP-435. However, intuitively, it would be an extension of the
> Refresh command in FLIP-435.
>
> > 3. The workflow-scheduler.type has no default value; should it be set to
> CRON by default?
>
> Firstly, CRON is not a workflow scheduler. Secondly, I believe that
> configuring the Scheduler should be an action that users are aware of, and
> default values should not be set.
>
> > 4. It appears that in the section on `public interfaces`, within
> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>
> `CreateWorkflowOperation`, right?
>
> Sorry, I don't get your point. Can you give more description?
>
> Best,
> Ron
>
> Xuyang  于2024年5月6日周一 20:26写道：
>
>> Hi, Ron.
>>
>> Thanks for driving this. After reading the entire flip, I have the
>> following questions:
>>
>>
>>
>>
>> 1. In the sequence diagram, it appears that there is a missing step for
>> obtaining the refresh handler from the catalog during the suspend operation.
>>
>>
>>
>>
>> 2. The term "cascade refresh" does not seem to be mentioned in FLIP-435.
>> The workflow it creates is marked as a "one-time workflow". This is
>> different
>>
>> from a "periodic workflow," and it appears to be a one-off execution. Is
>> this actually referring to the Refresh command in FLIP-435?
>>
>>
>>
>>
>> 3. The workflow-scheduler.type has no default value; should it be set to
>> CRON by default?
>>
>>
>>
>>
>> 4. It appears that in the section on `public interfaces`, within
>> `WorkflowOperation`, `CreatePeriodicWorkflowOperation` should be changed to
>>
>> `CreateWorkflowOperation`, right?
>>
>>
>>
>>
>> --
>>
>> Best！
>> Xuyang
>>
>>
>>
>>
>>
>> At 2024-04-22 14:41:39, "Ron Liu"  wrote:
>> >Hi, Dev
>> >
>> >I would like to start a discussion about FLIP-448: Introduce Pluggable
>> >Workflow Scheduler Interface for Materialized Table.
>> >
>> >In FLIP-435[1], we proposed Materialized Table, which has two types of
>> data
>> >refresh modes: Full Refresh & Continuous Refresh Mode. In Full Refresh
>> >mode, the Materialized Table relies on a workflow scheduler to perform
>> >periodic refresh operation to achieve the desired data freshness.
>> >
>> >There are numerous open-source workflow schedulers available, with
>> popular
>> >ones including Airflow and DolphinScheduler. To enable Materialized Table
>> >to work with different workflow schedulers, we propose a pluggable
>> workflow
>> >scheduler interface for Materialized Table in this FLIP.
>> >
>> >For more details, see FLIP-448 [2]. Looking forward to your feedback.
>> >
>> >[1] https://lists.apache.org/thread/c1gnn3bvbfs8v1trlf975t327s4rsffs
>> >[2]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-448%3A+Introduce+Pluggable+Workflow+Scheduler+Interface+for+Materialized+Table
>> >
>> >Best,
>> >Ron
>>
>

[DISCUSSION] FLIP-456: CompiledPlan support for Batch Execution Mode

2024-05-06 Thread Alexey Leonov-Vendrovskiy

Hi everyone,

PTAL at the proposed FLIP-456: CompiledPlan support for Batch Execution
Mode. It is pretty self-describing.

Any thoughts are welcome!

Thanks,
Alexey

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-456%3A+CompiledPlan+support+for+Batch+Execution+Mode
.

38 matches

Mail list logo