[jira] [Created] (FLINK-30410) Rename 'full' to 'latest-full' and 'compacted' to 'compacted-full' for scan.mode in Table Store

2022-12-14 Thread Caizhi Weng (Jira)
Caizhi Weng created FLINK-30410:
---

 Summary: Rename 'full' to 'latest-full' and 'compacted' to 
'compacted-full' for scan.mode in Table Store
 Key: FLINK-30410
 URL: https://issues.apache.org/jira/browse/FLINK-30410
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Caizhi Weng
Assignee: Caizhi Weng
 Fix For: table-store-0.3.0


As discussed in the [dev mailing 
list|https://lists.apache.org/thread/t76f0ofl6k7mlvqotp57ot10x8o1x90p], we're 
going to rename some values for {{scan.mode}} in Table Store.

Specifically, {{full}} will be renamed to {{latest-full}} and {{compacted}} 
will be renamed to {{compacted-full}}, so user can understand that a full 
snapshot will always be produced, no matter for a batch job or a streaming job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSSION] Scan mode in Table Store for Flink Stream and Batch job

2022-12-14 Thread Caizhi Weng
Hi all.

It seems that we've reached at an agreement. We'll rename "full" scan.mode
to "latest-full" and "compacted" scan.mode to "compacted-full".

I've created a ticket about this [1] and will work on it soon.

[1] https://issues.apache.org/jira/browse/FLINK-30410

Shammon FY  于2022年12月12日周一 16:00写道:

> Hi jinsong, caizhi
>
> Thank you for your reply.
>
> I found that what really confused me was the starting position of the
> streaming job to read data. As @caizhi mentioned in kafka, streaming jobs
> only read incremental data and its definition is very clear. In the Table
> Store, streaming jobs also have the mode of reading full data first, we
> need special consideration for it.
>
> So I really agree with @jinsong's "Solution 3", it looks good to me. We
> define that streaming and batch jobs have different read modes by default,
> and read data according to the starting position defined by scan.mode.
> Personally, I think it is easy for users to understand. For reading modes
> beyond the default, for example, full reading for streaming jobs, add
> suffixes such as "full" in scan.mode looks nice. In this way, the
> definition will be clearer, and the read mode will be unified through
> scan.mode. Thanks
>
>
> Best,
> Shammon
>
>
> On Mon, Dec 12, 2022 at 2:49 PM Caizhi Weng  wrote:
>
>> Thanks Shammon for bringing up the discussion.
>>
>> My opinion is to combine everything into scan.mode so that we don't
>> have orthogonal options.
>>
>> You first mention that there are two disadvantages for this solution.
>>
>> *1. The behaviors of some StartupModes in Stream and Batch jobs are
>> inconsistent*
>> This is acceptable from my point of view, because streaming jobs and
>> batch jobs are different after all. Streaming jobs should monitor new
>> changes while batch jobs should only see the records when it is started.
>> This behavior is expected by the users.
>>
>> *2. StartupMode does not define all data reading modes.*
>> Not all reading modes are useful. For example, you mention a reading mode
>> where user first want to read the snapshot at a timestamp, then read all
>> incremental changes. Could you explain in what scenario a user will need
>> this?
>>
>> About the solution you proposed, I think the biggest problem is the
>> default value. That is, if we should read full snapshots by default.
>>
>> When user runs a streaming job with "from-timestamp", he expects to just
>> read incremental changes after the timestamp, just like he is using a
>> "from-timestamp" starting mode of Kafka.
>>
>> However, when user runs a streaming job without a starting timestamp, he
>> expects the result to be correct. In order to provide correct result, we
>> have to produce full snapshots first.
>>
>> So you can see that, for different streaming jobs, the default behavior
>> about whether to read full snapshots is different. It is hard to pick a
>> default value without disturbing the users.
>>
>> Jingsong Li  于2022年12月9日周五 17:33写道:
>>
>>> Thanks Shammon!
>>>
>>> Your summary was very good and very detailed.
>>>
>>> I thought about it again.
>>>
>>> ## Solution 1
>>> Actually, according to what you said, there should be so many modes in
>>> theory.
>>> - Runtime-mode: streaming or batch.
>>> - Range: full or incremental.
>>> - Position: Latest, timestamp, snapshot-id, compacted.
>>>
>>> Advantages: The disassembly is very detailed, and every action is very
>>> clear.
>>> Disadvantages: There are many combinations from orthogonality. In
>>> combination with runtime-mode stream or batch, we can say that there
>>> are 16 modes from orthogonality, many of which are meaningless. As you
>>> said, default behavior is also a problem.
>>>
>>> ## Solution 2
>>> Currently [1]:
>>> - The environment determines the runtime-mode whether it is streaming
>>> or a batch.
>>> - The `scan.mode` determines position.
>>> - No specific option determines `range`, but it is determined by
>>> runtime-mode. However, it is not completely determined by runtime
>>> mode, such as `full` and `compacted`, which are also read in
>>> full-range under the stream.
>>>
>>> Advantages: Simple. The default values of options are what we want for
>>> streaming and batch.
>>> Disadvantages:
>>> 1. The semantics of from timestamp are different in the case of
>>> streaming and batch.
>>> 2. `full` and `compacted` are special.
>>>
>>> ## Solution 3
>>>
>>> I understand that the core problem of solution2 may be more problem 2:
>>> `full` and `compacted` are special.
>>> How about:
>>> - the runtime mode determines whether to read incremental only or full
>>> data.
>>> - `scan.mode` contains: Latest, timestamp, snapshot-id.
>>> The default is full in batch mode and incremental in stream mode.
>>>
>>> However, we have two other choices for `scan.mode`: `latest-full`,
>>> `compacted-full`. Regardless of the runtime-mode, the two choices
>>> force full range to read.
>>>
>>> I think solution 3 is a compromise solution. It can also ensure the
>>> availability of default

[jira] [Created] (FLINK-30411) Flink deployment stuck in UPGRADING state when deploy flinkdeployment without resource

2022-12-14 Thread tanjialiang (Jira)
tanjialiang created FLINK-30411:
---

 Summary: Flink deployment stuck in UPGRADING state when deploy 
flinkdeployment without resource
 Key: FLINK-30411
 URL: https://issues.apache.org/jira/browse/FLINK-30411
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: 1.16.0
Reporter: tanjialiang
 Attachments: image-2022-12-14-17-22-12-656.png

In flink kubernetes operator 1.2.0. When i deploy a flinkdeployments without 
resource, the flink deployment stuck in UPGRADING state.
{code:java}
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: socket-window-word-count
spec:
image: flink:1.16.0-scala_2.12-java8
flinkVersion: v1_16
flinkConfiguration:
taskmanager.numberOfTaskSlots: "1"
serviceAccount: flink
job:
jarURI: local:///opt/flink/examples/streaming/WordCount.jar
parallelism: 2
upgradeMode: stateless{code}
 

when i kubectl describe flinkdeployments, i found this error message

!image-2022-12-14-17-22-12-656.png!

 

maybe we can validate it when apply flinkdeployment?  When it is invalid, throw 
an error rather than apply flinkdeployment succeed.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30412) create many checkpoint empty dir when job not enable checkpoint

2022-12-14 Thread xiaodao (Jira)
xiaodao created FLINK-30412:
---

 Summary: create many checkpoint empty dir when job not enable 
checkpoint
 Key: FLINK-30412
 URL: https://issues.apache.org/jira/browse/FLINK-30412
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.15.2
Reporter: xiaodao


when we submit job to flink session cluster , after a long time, we find it 
create too much

empty checkpoint dir,and it over hdfs max node limit ;

i found StreamingJobGraphGenerator set snapshot whennever the job is open 
checkpoint;

jobGraph.setSnapshotSettings(settings) 
{code:java}
private void configureCheckpointing() 
CheckpointConfig cfg = streamGraph.getCheckpointConfig(); long interval = 
cfg.getCheckpointInterval(); if (interval < MINIMAL_CHECKPOINT_TIME) { // 
interval of max value means disable periodic checkpoint interval = 
Long.MAX_VALUE; }
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30413) Drop Share and Key_Shared subscription support in Pulsar connector

2022-12-14 Thread Yufan Sheng (Jira)
Yufan Sheng created FLINK-30413:
---

 Summary: Drop Share and Key_Shared subscription support in Pulsar 
connector
 Key: FLINK-30413
 URL: https://issues.apache.org/jira/browse/FLINK-30413
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Pulsar
Affects Versions: 1.17.0
Reporter: Yufan Sheng
 Fix For: 1.17.0


A lot of Pulsar connector test unstable issues are related to {{Shared}} and 
{{Key_Shared}} subscription. Because this two subscription is designed to 
consume the records in an unordered way. And we can support multiple consumers 
in same topic partition. But this feature lead to some drawbacks in connector.

1. Performance

Flink is a true stream processor with high correctness support. But support 
multiple consumer will require higher correctness which depends on Pulsar 
transaction. But the internal implementation of Pulsar transaction on source is 
record the message one by one and stores all the pending ack status in client 
side. Which is slow and memory inefficient.

This means that we can only use {{Shared}} and {{Key_Shared}} on Flink with low 
throughput. This against our intention to support these two subscription. 
Because adding multiple consumer to same partition can increase the consuming 
speed.

2. Unstable

Pulsar transaction acknowledge the messages one by one in an internal Pulsar's 
topic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30414) Add unit test time out when run ci.

2022-12-14 Thread Aiden Gong (Jira)
Aiden Gong created FLINK-30414:
--

 Summary: Add unit test time out when run ci.
 Key: FLINK-30414
 URL: https://issues.apache.org/jira/browse/FLINK-30414
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Aiden Gong
 Attachments: image-2022-12-14-17-59-56-800.png

!image-2022-12-14-17-59-56-800.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Release flink-connector-jdbc v3.0.0, release candidate #1

2022-12-14 Thread Martijn Visser
Hi all,

This RC is cancelled due to the missing LICENSE and NOTICE file.

I'll work on a fix (including removing the filter pushdown) and create a
new RC later.

Best regards,

Martijn

On Mon, Dec 12, 2022 at 9:47 AM Chesnay Schepler  wrote:

> On 09/12/2022 16:12, Martijn Visser wrote:
> > That was because the number of changes between 1.16.0 and master were
> > limited.
> >
> Doesn't the filter pushdown in FLINK-16024 affect topologies and thus
> potentially savepoint compatibility?
>
>


[jira] [Created] (FLINK-30415) Sync JDBC connector to match with previously released 1.16 version

2022-12-14 Thread Martijn Visser (Jira)
Martijn Visser created FLINK-30415:
--

 Summary: Sync JDBC connector to match with previously released 
1.16 version
 Key: FLINK-30415
 URL: https://issues.apache.org/jira/browse/FLINK-30415
 Project: Flink
  Issue Type: Bug
  Components: Connectors / JDBC
Affects Versions: jdbc-3.0.0
Reporter: Martijn Visser
Assignee: Martijn Visser


The current {{3.0}} branch of the externalized JDBC connector contains commits 
that were synced from Flink's {{master}} branch. Those commits shouldn't be 
there, but should be in the {{main}} branch for the externalized JDBC connector 
(which could be released in a next {{3.1}} release)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30416) Add configureSession REST API in the SQL Gateway

2022-12-14 Thread Shengkai Fang (Jira)
Shengkai Fang created FLINK-30416:
-

 Summary: Add configureSession REST API in the SQL Gateway
 Key: FLINK-30416
 URL: https://issues.apache.org/jira/browse/FLINK-30416
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Gateway
Affects Versions: 1.17.0
Reporter: Shengkai Fang
 Fix For: 1.17.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-12-14 Thread yu zelin
Hi, all,

Thanks for all your feedbacks so far. Through the discussion on this 
thread[1], I think we have came to a consensus, so I’d like to start a 
vote on FLIP-275[2].

The vote will last for at least 72 hours (Dec 19th, 13:00 GMT, excluding 
weekend days) unless there is an objection or insufficient vote.

Best,
Yu Zelin

[1] https://lists.apache.org/thread/zpx64l0z91b0sz0scv77h0g13ptj4xxo 
[2] https://cwiki.apache.org/confluence/x/T48ODg

[jira] [Created] (FLINK-30417) Rabbit error is not clear

2022-12-14 Thread Yaron Shani (Jira)
Yaron Shani created FLINK-30417:
---

 Summary: Rabbit error is not clear
 Key: FLINK-30417
 URL: https://issues.apache.org/jira/browse/FLINK-30417
 Project: Flink
  Issue Type: Improvement
  Components: Connectors/ RabbitMQ
Affects Versions: 1.14.6
Reporter: Yaron Shani


Hey,

When there is no connection to the RabbitMQ flink prints:
{code:java}
java.lang.RuntimeException: Error while creating the channel{code}
or
{code:java}
java.net.SocketTimeoutException: connect timeout{code}
 

This is making very difficult for people to understand what exactly is failing. 
Is it the connection between the jobs/tasks manager? is it Kafka? Rabbit? 
something else?

I am going to make a PR that state that the error is the RabbitMQ connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] Release flink-connector-jdbc, release candidate #2

2022-12-14 Thread Martijn Visser
Hi everyone,
Please review and vote on the release candidate #2 for the version 3.0.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which are signed with the key with fingerprint
A5F3BCE4CBE993573EC5966A65321B8382B219AF [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag v3.0.0-rc2 [5],
* website pull request listing the new release [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352590
[2]
https://dist.apache.org/repos/dist/dev/flink/flink-connector-jdbc-3.0.0-rc2
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1565/
[5] https://github.com/apache/flink-connector-
/releases/tag/v3.0.0-rc2
[6] https://github.com/apache/flink-web/pull/590


[jira] [Created] (FLINK-30418) Implement synchronous KinesisClient in EFO

2022-12-14 Thread Daren Wong (Jira)
Daren Wong created FLINK-30418:
--

 Summary: Implement synchronous KinesisClient in EFO
 Key: FLINK-30418
 URL: https://issues.apache.org/jira/browse/FLINK-30418
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / AWS
Affects Versions: aws-connector-4.1.0
Reporter: Daren Wong
 Fix For: aws-connector-4.1.0


Use synchronous KinesisClient for consumer registration/de-registration.
h3. 
Why?

This is because current async client can cause the following RuntimeException 
when app parallelism is high.
{quote}Unable to execute HTTP request: The channel was closed before the 
protocol could be determined.
{quote}
In addition, it can also cause deadlock situation as described in 
https://issues.apache.org/jira/browse/FLINK-30304



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


I want to join

2022-12-14 Thread Ruslan Maznytsia



Slack Invite

2022-12-14 Thread Grant Wade
Hi,
I'd like to join the Flink Slack Community, but the invite link on
https://flink.apache.org/community.html#slack is expired. Can someone
please send me an invite link?

Thanks,
Grant


Slack link expired

2022-12-14 Thread Ruben van Vreeland
Hi,

I would love to join the Slack community, yet the link is expired.

Could someone renew the link, or invite me?

Looking forward to meeting you there!


Invitation link to Community Slack

2022-12-14 Thread Lauri Suurväli
Hello!

The invitation link to Flink Community Slack has expired on the website. It
was recommended to write to the dev mailing list in case this had happened.
Would anyone be so kind and send me an invitation link?

Best regards,
Lauri Suurväli

-- 


***

This message is intended solely 
for the addressee and may contain confidential information. If you have 
received this message in error, please send it back to us, immediately and 
permanently delete it, and do not use, copy or disclose the information 
contained in this message or in any attachment. Any unauthorized use is 
strictly prohibited.



***


Re: Slack Invite

2022-12-14 Thread Martijn Visser
Hi Grant,

I've just updated the invite, so it should work again. Can you check?

Best regards,

Martijn

On Wed, Dec 14, 2022 at 2:45 PM Grant Wade 
wrote:

> Hi,
> I'd like to join the Flink Slack Community, but the invite link on
> https://flink.apache.org/community.html#slack is expired. Can someone
> please send me an invite link?
>
> Thanks,
> Grant
>


Re: Slack link expired

2022-12-14 Thread Martijn Visser
Hi Ruben,

I've just updated the invite, so it should work again. Can you check?

Best regards,

Martijn

On Wed, Dec 14, 2022 at 2:46 PM Ruben van Vreeland <
rubenvanvreel...@gmail.com> wrote:

> Hi,
>
> I would love to join the Slack community, yet the link is expired.
>
> Could someone renew the link, or invite me?
>
> Looking forward to meeting you there!
>


Re: I want to join

2022-12-14 Thread Martijn Visser
Hi Ruslan,

Check https://flink.apache.org/community.html and
https://flink.apache.org/contributing/how-to-contribute.html for all
information.

Best regards,

Martijn

On Wed, Dec 14, 2022 at 2:45 PM Ruslan Maznytsia 
wrote:

>
>


Re: Invitation link to Community Slack

2022-12-14 Thread Martijn Visser
Hi Lauri,

I've just updated the invite link, so it hopefully should work again.

Best regards,

Martijn

On Wed, Dec 14, 2022 at 2:46 PM Lauri Suurväli
 wrote:

> Hello!
>
> The invitation link to Flink Community Slack has expired on the website. It
> was recommended to write to the dev mailing list in case this had happened.
> Would anyone be so kind and send me an invitation link?
>
> Best regards,
> Lauri Suurväli
>
> --
>
>
> ***
>
> This message is intended solely
> for the addressee and may contain confidential information. If you have
> received this message in error, please send it back to us, immediately and
> permanently delete it, and do not use, copy or disclose the information
> contained in this message or in any attachment. Any unauthorized use is
> strictly prohibited.
>
>
>
> ***
>


Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

2022-12-14 Thread Zhu Zhu
Hi Hang & MengYue,

Thanks for creating this FLIP!

I think it is very useful, mainly in two aspects:
1. Enables OperatorCoordinators to register metrics. Currently
the coordinators has no way to do this. And operator coordinator
metric group further enables the SplitEnumerator to have access
to a registered metric group (via the existing public interface
SplitEnumeratorContext#metricGroup()), which is null at the moment.

2. Defines the scope of operator coordinator metrics. A clear definition
makes it easy for users to find their wanted metrics. The definition
also helps to avoid conflicts of metrics from multiple OperatorCoordinators
of the same kind. E.g. each SourceCoordinator may have its own
numSourceSplits metric, these metrics should not be directly registered
to the job metric group.

What I'm a bit concerned is the necessity of the introduced common metrics
numEventsInCounter & numEventsOutCounter. If there any case which strongly
requires them?

Regarding the concerns of Chesnay,
> A dedicated coordinator MG implementation is overkill
Directly using the job metric group can result in metric conflicts, as mentioned
in above #2.

Thanks,
Zhu

Dong Lin  于2022年12月10日周六 14:16写道:

>
> Hi Chesney,
>
> Just to double check with you, OperatorCoordinatorMetricGroup (annotated as
> @PublicEvolving) has already been introduced into Flink by FLIP-179
> .
> And that FLIP has got you +1.. Do you mean we should remove this
> OperatorCoordinatorMetricGroup?
>
> Regards,
> Dong
>
> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler  wrote:
>
> > As a whole I feel like this FLIP is overly complicated. A dedicated
> > coordinator MG implementation is overkill; it could just re-use the
> > existing Task/OperatorMGs to create the same structure we have on TMs,
> > similar to what we did with the Job MG.
> >
> > However, I'm not convinced that this is required anyway, because all the
> > example metrics you listed can be implemented on the TM side +
> > aggregating them in the external metrics backend.
> >
> > Since I'm on holidays soon, just so no one tries to pull a fast one on
> > me, if this were to go to a vote as-is I'd be against it.
> >
> >
> > On 09/12/2022 15:30, Dong Lin wrote:
> > > Hi Hang,
> > >
> > > Thanks for the FLIP! The FLIP looks good and it is pretty informative.
> > >
> > > I have just two minor comments regarding names:
> > > - Would it be useful to rename the config key as
> > > *metrics.scope.jm.job.operator-coordinator* for consistency with
> > > *metrics.scope.jm.job
> > > *(which is not named as *jm-job)?
> > > - Maybe rename the variable as SCOPE_NAMING_OPERATOR_COORDINATOR for
> > > simplicity and consistency with SCOPE_NAMING_OPERATOR (which is not named
> > > as SCOPE_NAMING_TM_JOB_OPERATOR)?
> > >
> > > Cheers,
> > > Dong
> > >
> > >
> > >
> > > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan  wrote:
> > >
> > >> Hi all,
> > >>
> > >> MengYue and I created FLIP-274[1] Introduce metric group for
> > >> OperatorCoordinator. OperatorCoordinator is the coordinator for runtime
> > >> operators and running on Job Manager. The coordination mechanism is
> > >> operator events between OperatorCoordinator and its all operators, the
> > >> coordination is more and more using in Flink, for example many Sources
> > and
> > >> Sinks depend on the mechanism to assign splits and coordinate commits to
> > >> external systems. The OperatorCoordinator is widely using in flink kafka
> > >> connector, flink pulsar connector, flink cdc connector, flink hudi
> > >> connector and so on.
> > >>
> > >> But there is not a suitable metric group scope for the
> > OperatorCoordinator
> > >> and not an implementation for the interface
> > OperatorCoordinatorMetricGroup.
> > >> These metrics in OperatorCoordinator could be how many splits/partitions
> > >> have been assigned to source readers, how many files have been written
> > out
> > >> by sink writers, these metrics not only help users to know the job
> > progress
> > >> but also make big job maintaining easier. Thus we propose the FLIP-274
> > to
> > >> introduce a new metric group scope for OperatorCoordinator and provide
> > an
> > >> internal implementation for OperatorCoordinatorMetricGroup.
> > >>
> > >> Could you help review this FLIP when you get time? Any feedback is
> > >> appreciated!
> > >>
> > >> Best,
> > >> Hang
> > >>
> > >> [1]
> > >>
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
> > >>
> >
> >


Re: [DISCUSS] FLIP-276: Data Consistency of Streaming and Batch ETL in Flink and Table Store

2022-12-14 Thread Shammon FY
Hi Piotr,

Thanks for your valuable input which makes me consider the core point of
data consistency in deep. I'd like to define the data consistency on the
whole streaming & batch processing as follows and I hope that we can have
an agreement on it:

BOutput = Fn(BInput), BInput is a bounded input which is splitted from
unbounded streaming, Fn is the computation of a node or ETL, BOutput is the
bounded output of BInput. All the data in BInput and BOutput are unordered,
and BInput and BOutput are data consistent.

The key points above include 1) the segment semantics of BInput; 2) the
computation semantics of Fn

1. The segment semantics of BInput
a) Transactionality of data. It is necessary to ensure the semantic
transaction of the bounded data set when it is splitted from the unbounded
streaming. For example, we cannot split multiple records in one transaction
to different bounded data sets.
b) Timeliness of data. Some data is related with time, such as boundary
data for a window. It is necessary to consider whether the bounded data set
needs to include a watermark which can trigger the window result.
c) Constraints of data. The Timestamp Barrier should perform some specific
operations after computation in operators, for example, force flush data.

Checkpoint Barrier misses all the semantics above, and we should support
user to define Timestamp for data on Event Time or System Time according to
the job and computation later.

2. The computation semantics of Fn
a) Deterministic computation
Most computations are deterministic such as map, filter, count, sum and
ect. They generate the same unordered result from the same unordered input
every time, and we can easily define data consistency on the input and
output for them.

b) Non-deterministic computation
Some computations are non-deterministic. They will produce different
results from the same input every time. I try to divide them into the
following types:
1) Non-deterministic computation semantics, such as rank operator. When it
computes multiple times (for example, failover), the first or last output
results can both be the final result which will cause different failover
handlers for downstream jobs. I will expand it later.
2) Non-deterministic computation optimization, such as async io. It is
necessary to sync these operations when the barrier of input arrives.
3) Deviation caused by data segmentat and computation semantics, such as
Window. This requires that the users should customize the data segmentation
according to their needs correctly.

Checkpoint Barrier matches a) and Timestamp Barrier can match all a) and b).

We define data consistency of BInput and BOutput based all above. The
BOutput of upstream ETL will be the BInput of the next ETL, and multiple
ETL jobs form a complex "ETL Topology".

Based on the above definitions, I'd like to give a general proposal with
"Timetamp Barrier" in my mind, it's not very detailed and please help to
review it and feel free to comment @David, @Piotr

1. Data segment with Timestamp
a) Users can define the Timestamp Barrier with System Time, Event Time.
b) Source nodes generate the same Timestamp Barrier after reading data from
RootTable
c) There is a same Timetamp data in each record according to Timestamp
Barrier, such as (a, T), (b, T), (c, T), (T, barrier)

2. Computation with Timestamp
a) Records are unordered with the same Timestamp. Stateless operators such
as map/flatmap/filter can process data without aligning Timestamp Barrier,
which is different from Checkpoint Barrier.
b) Records between Timestamp are ordered. Stateful operators must align
data and compute by each Timestamp, then compute by Timetamp sequence.
c) Stateful operators will output results of specific Timestamp after
computation.
d) Sink operator "commit records" with specific Timestamp and report the
status to JobManager

3. Read data with Timestamp
a) Downstream ETL reads data according to Timestamp after upstream ETL
"commit" it.
b) Stateful operators interact with state when computing data of Timestamp,
but they won't trigger checkpoint for every Timestamp. Therefore source ETL
job can generate Timestamp every few seconds or even hundreds of
milliseconds
c) Based on Timestamp the delay between ETL jobs will be very small, and in
the best case the E2E latency maybe only tens of seconds.

4. Failover and Recovery
ETL jobs are cascaded through the Intermediate Table. After a single ETL
job fails, it needs to replay the input data and recompute the results. As
you mentioned, whether the cascaded ETL jobs are restarted depends on the
determinacy of the intermediate data between them.
a) An ETL job will rollback and reread data from upstream ETL by specific
Timestamp according to the Checkpoint.
b) According to the management of Checkpoint and Timestamp, ETL can replay
all Timestamp and data after failover, which means BInput is the same
before and after failover.

c) For deterministic Fn, it generates the same BOutput from the same BInput

[jira] [Created] (FLINK-30419) Allow tuning of transaction timeout

2022-12-14 Thread Vicky Papavasileiou (Jira)
Vicky Papavasileiou created FLINK-30419:
---

 Summary: Allow tuning of transaction timeout
 Key: FLINK-30419
 URL: https://issues.apache.org/jira/browse/FLINK-30419
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Vicky Papavasileiou


FTS sets the producer transaction timeout to 1hr. The maximum allowed by a 
kafka broker is 15 mins. This causes exceptions to be thrown and the job dies 
when kafka log is enabled on a table. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30420) NPE thrown when using window time in Table API

2022-12-14 Thread Jiang Xin (Jira)
Jiang Xin created FLINK-30420:
-

 Summary: NPE thrown when using window time in Table API
 Key: FLINK-30420
 URL: https://issues.apache.org/jira/browse/FLINK-30420
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Reporter: Jiang Xin


Run the following unit test and it would fail.
{code:java}
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.ExecutionCheckpointingOptions;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import 
org.apache.flink.streaming.api.functions.source.datagen.DataGeneratorSource;
import 
org.apache.flink.streaming.api.functions.source.datagen.SequenceGenerator;
import org.apache.flink.table.api.DataTypes;
import org.apache.flink.table.api.Schema;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.Tumble;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.table.functions.AggregateFunction;
import org.apache.flink.table.functions.TableFunction;

import org.junit.Before;
import org.junit.Test;

import java.sql.Timestamp;
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;

import static org.apache.flink.table.api.DataTypes.TIMESTAMP;
import static org.apache.flink.table.api.Expressions.$;
import static org.apache.flink.table.api.Expressions.call;
import static org.apache.flink.table.api.Expressions.currentTimestamp;
import static org.apache.flink.table.api.Expressions.lit;

public class TestAggWithSourceWatermark {

private StreamTableEnvironment tEnv;
private StreamExecutionEnvironment env;

@Before
public void before() {
Configuration config = new Configuration();
env = StreamExecutionEnvironment.getExecutionEnvironment(config);
env.setParallelism(1);
tEnv = StreamTableEnvironment.create(env);
}

@Test
public void testWindowTime() {
DataStream stream =
env.addSource(
new DataGeneratorSource<>(
SequenceGenerator.intGenerator(0, 30), 
1, 30l))
.returns(Integer.class);
DataStream> streamWithTime =
stream.map(x -> Tuple2.of(x, System.currentTimeMillis()))
.returns(Types.TUPLE(Types.INT, Types.LONG))
.assignTimestampsAndWatermarks(
WatermarkStrategy.>forBoundedOutOfOrderness(
Duration.ofSeconds(2))
.withTimestampAssigner(
(ctx) -> (element, 
recordTimestamp) -> element.f1));
Schema schema =
Schema.newBuilder()
.column("f0", DataTypes.INT())
.column("f1", DataTypes.BIGINT())
.columnByMetadata("rowtime", "TIMESTAMP_LTZ(3)")
.watermark("rowtime", "SOURCE_WATERMARK()")
.build();

Table table = tEnv.fromDataStream(streamWithTime, schema);
table = table.select($("rowtime"));

Table windowedTable =
table.window(Tumble.over("5.seconds").on("rowtime").as("w"))
.groupBy($("w"))
.select(
call(UDAF.class, $("rowtime")).as("row_times"),
$("w").rowtime().as("window_time"),
currentTimestamp().as("current_timestamp"));

windowedTable =
windowedTable
.joinLateral(call(SplitFunction.class, 
$("row_times")).as("rowtime"))
.select(
$("rowtime").cast(TIMESTAMP(3)).as("rowtime"),
$("window_time"),
$("current_timestamp"));
windowedTable.printSchema();
windowedTable.execute().print();
}

public static class SplitFunction extends TableFunction {

public void eval(List times) {
for (int i = 0; i < times.size(); i++) {
collect(times.get(i));
}
}
}

public static class UDAF extends AggregateFunction, 
List> {

public UDAF() {}

@Override
public List createAccumulator() {
return new ArrayList<>();
}

public void accumulate(List accumulator, Timestamp num) {
accumulator.add(num);
}

@Override
public List getValue(List accum

Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

2022-12-14 Thread Samrat Deb
Thank you Danny for more insights on the flink-connector-aws-base[1].

It looks like localstack supports glue [2], we already use localstack for
> integration tests so we can follow suite here.


As GlueCatalog will be a part of flink-connector-aws-base. As per
suggestion, we will reuse code and resources as much as possible and add
extra things required in extensible manner.

Bests,
Samrat


[1]
https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
[2] https://docs.localstack.cloud/user-guide/aws/glue/




On Tue, Dec 13, 2022 at 9:32 PM Danny Cranmer 
wrote:

> Hello Samrat,
>
> Sorry for the late response.
>
> +1 for a native Glue Data Catalog integration. We have
> internally developed a Glue Data Catalog catalog implementation that shims
> hive. We have been meaning to contribute, but this solution can replace our
> internal one.
>
> +1 for putting this in the flink-connector-aws. With regards to
> configuration, we have a flink-connector-aws-base [1] module where all the
> common configurations should go. Anything common, such as authentication
> providers, please use. Additionally any new configurations you need to add
> please consider them going into aws-base if they might be reusable for
> other AWS integrations.
>
> > We will create an e2e integration test cases capturing all the
> implementation in a mock environment.
>
> It looks like localstack supports glue [2], we already use localstack for
> integration tests so we can follow suite here.
>
> Thanks,
> Danny
>
> [1]
> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>
> On Mon, Dec 12, 2022 at 12:18 PM Samrat Deb  wrote:
>
>> Hi Konstantin Knauf,
>>
>> Can you explain how users are expected to authenticate with AWS Glue? I
>>> don't see any catalog options regardng authx. So I assume the credentials
>>> are taken from the environment?
>>
>>
>> We are planning to put GlueCatalog in flink-connector-aws[1].
>> flink-connector-aws already provides base and already built AwsConfigs[2].
>> These configs can be reused for the Catalog purpose also.
>> I will update the FLIP-277[3] with the auth related configs in the
>> Configuration Section.
>>
>> Users can pass these values as a part of config in catalog creation and
>> if not provided it will try to fetch from the environment.
>> This will allow users to create multiple catalog instances on the same
>> session pointing to different accounts. ( I haven't tested multi
>> account glue catalog instances during POC) .
>>
>> [1] https://github.com/apache/flink-connector-aws
>> 
>> [2]
>> https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
>> [3]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>
>> Bests,
>> Samrat
>>
>> On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb  wrote:
>>
>>> Hi Jark,
>>> Apologies for late reply.
>>> Thank you for your valuable input.
>>>
>>> Besides, I have a question about Glue Namespace. Could you share the
 documentation of the Glue
  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
 Metaspace Mapping" section,
 if there is a database "mydb" under namespace "ns1", is that mean the
 database name in Flink is "ns1.mydb"?
>>>
>>> There is no concept of namespace in glue data catalog.
>>> There are 3 levels in glue data catalog
>>> - catalog
>>> - database
>>> - table
>>>
>>> I have added the mapping in FLIP-277[1]. and updated it .
>>> it is directly database name from flink to database name in glue
>>> Please ignore the typo leftover in doc previously.
>>>
>>> Best,
>>> Samrat
>>>
>>>
>>> On Fri, Dec 9, 2022 at 8:38 PM Jark Wu  wrote:
>>>
 Hi Samrat,

 Thanks a lot for driving the new catalog, and sorry for jumping into the
 discussion late.

 As Flink SQL is becoming the first-class citizen of the Flink API, we
 are
 planning to push Catalog
 to become the first-class citizen of the connector instead of Source &
 Sink. For Flink SQL users,
 using Catalog is as natural and user-friendly as working with databases,
 rather than having to define
 DDL and schemas over and over again. This is also how Trino/Presto does.

 Regarding the repo for the Glue catalog, I think we can add it to
 flink-connector-aws. We don't need
 separate repos for Catalogs because Catalog is a kind of connector
 (others
 are sources & sinks).
 For example, MySqlCatalog[1] and PostgresCatalog[2] are in
 flink-connector-jdbc, and HiveCatalog is
 in flink-connector-hive. This can reduce repository maintenance, and I
 think maybe some common
 AWS utils ca

[jira] [Created] (FLINK-30421) Move TGT renewal to hadoop module

2022-12-14 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-30421:
-

 Summary: Move TGT renewal to hadoop module
 Key: FLINK-30421
 URL: https://issues.apache.org/jira/browse/FLINK-30421
 Project: Flink
  Issue Type: Sub-task
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-277: Native GlueCatalog Support in Flink

2022-12-14 Thread Samrat Deb
Hi All ,

Thank you for all your valuable suggestions and questions regarding the
proposals.

In case there are more queries or questions from the community , I will
keep this discussion Thread open for a couple of more days and proceed with
next steps.

Bests
Samrat

On Wed, Dec 14, 2022 at 9:41 PM Samrat Deb  wrote:

>
>
> Thank you Danny for more insights on the flink-connector-aws-base[1].
>
> It looks like localstack supports glue [2], we already use localstack for
>> integration tests so we can follow suite here.
>
>
> As GlueCatalog will be a part of flink-connector-aws-base. As per
> suggestion, we will reuse code and resources as much as possible and add
> extra things required in extensible manner.
>
> Bests,
> Samrat
>
>
> [1]
> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>
>
>
>
> On Tue, Dec 13, 2022 at 9:32 PM Danny Cranmer 
> wrote:
>
>> Hello Samrat,
>>
>> Sorry for the late response.
>>
>> +1 for a native Glue Data Catalog integration. We have
>> internally developed a Glue Data Catalog catalog implementation that shims
>> hive. We have been meaning to contribute, but this solution can replace our
>> internal one.
>>
>> +1 for putting this in the flink-connector-aws. With regards to
>> configuration, we have a flink-connector-aws-base [1] module where all the
>> common configurations should go. Anything common, such as authentication
>> providers, please use. Additionally any new configurations you need to add
>> please consider them going into aws-base if they might be reusable for
>> other AWS integrations.
>>
>> > We will create an e2e integration test cases capturing all the
>> implementation in a mock environment.
>>
>> It looks like localstack supports glue [2], we already use localstack for
>> integration tests so we can follow suite here.
>>
>> Thanks,
>> Danny
>>
>> [1]
>> https://github.com/apache/flink-connector-aws/tree/main/flink-connector-aws-base
>> [2] https://docs.localstack.cloud/user-guide/aws/glue/
>>
>> On Mon, Dec 12, 2022 at 12:18 PM Samrat Deb 
>> wrote:
>>
>>> Hi Konstantin Knauf,
>>>
>>> Can you explain how users are expected to authenticate with AWS Glue? I
 don't see any catalog options regardng authx. So I assume the
 credentials
 are taken from the environment?
>>>
>>>
>>> We are planning to put GlueCatalog in flink-connector-aws[1].
>>> flink-connector-aws already provides base and already built AwsConfigs[2].
>>> These configs can be reused for the Catalog purpose also.
>>> I will update the FLIP-277[3] with the auth related configs in the
>>> Configuration Section.
>>>
>>> Users can pass these values as a part of config in catalog creation and
>>> if not provided it will try to fetch from the environment.
>>> This will allow users to create multiple catalog instances on the same
>>> session pointing to different accounts. ( I haven't tested multi
>>> account glue catalog instances during POC) .
>>>
>>> [1] https://github.com/apache/flink-connector-aws
>>> 
>>> [2]
>>> https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws-base/src/main/java/org/apache/flink/connector/aws/config/AWSConfigConstants.java
>>> [3]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-277%3A+Native+GlueCatalog+Support+in+Flink
>>>
>>> Bests,
>>> Samrat
>>>
>>> On Mon, Dec 12, 2022 at 5:32 PM Samrat Deb 
>>> wrote:
>>>
 Hi Jark,
 Apologies for late reply.
 Thank you for your valuable input.

 Besides, I have a question about Glue Namespace. Could you share the
> documentation of the Glue
>  Namespaces? (Sorry, I didn't find it.) According to the "Flink Glue
> Metaspace Mapping" section,
> if there is a database "mydb" under namespace "ns1", is that mean the
> database name in Flink is "ns1.mydb"?

 There is no concept of namespace in glue data catalog.
 There are 3 levels in glue data catalog
 - catalog
 - database
 - table

 I have added the mapping in FLIP-277[1]. and updated it .
 it is directly database name from flink to database name in glue
 Please ignore the typo leftover in doc previously.

 Best,
 Samrat


 On Fri, Dec 9, 2022 at 8:38 PM Jark Wu  wrote:

> Hi Samrat,
>
> Thanks a lot for driving the new catalog, and sorry for jumping into
> the
> discussion late.
>
> As Flink SQL is becoming the first-class citizen of the Flink API, we
> are
> planning to push Catalog
> to become the first-class citizen of the connector instead of Source &
> Sink. For Flink SQL users,
> using Catalog is as natural and user-friendly as working with
> databases,
> rather than having to define
> DDL and schemas over and over again. This is a

[jira] [Created] (FLINK-30422) Generalize token framework provider API

2022-12-14 Thread Gabor Somogyi (Jira)
Gabor Somogyi created FLINK-30422:
-

 Summary: Generalize token framework provider API
 Key: FLINK-30422
 URL: https://issues.apache.org/jira/browse/FLINK-30422
 Project: Flink
  Issue Type: Sub-task
Reporter: Gabor Somogyi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[ANNOUNCE] Apache Flink Kubernetes Operator 1.3.0 released

2022-12-14 Thread Őrhidi Mátyás
The Apache Flink community is very happy to announce the release of Apache
Flink Kubernetes Operator 1.3.0.

Release highlights:

   - Upgrade to Fabric8 6.x.x and JOSDK 4.x.x
   - Restart unhealthy Flink clusters
   - Contribute the Flink Kubernetes Operator to OperatorHub
   - Publish flink-kubernetes-operator-api module separately

Please check out the release blog post for an overview of the release:
https://flink.apache.org/news/2022/12/14/release-kubernetes-operator-1.3.0.html

The release is available for download at:
https://flink.apache.org/downloads.html

Maven artifacts for Flink Kubernetes Operator can be found at:
https://search.maven.org/artifact/org.apache.flink/flink-kubernetes-operator

Official Docker image for Flink Kubernetes Operator applications can be
found at: https://hub.docker.com/r/apache/flink-kubernetes-operator

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12352322

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,

Matyas Orhidi


Re: [DISCUSS] FLIP-271: Autoscaling

2022-12-14 Thread Maximilian Michels
A heads-up: Gyula just opened a PR with the code contribution based on the
design: https://github.com/apache/flink-kubernetes-operator/pull/484

We have run some tests based on the current state and achieved very good
results thus far. We were able to cut the resources of some of the
deployments by 50% yielding very stable configurations for mostly static
data rates. Also, we could achieve good scaling decisions on high-volume
pipelines with fluctuating traffic which remained backlog free despite many
adjustments due to the varying traffic.

One of the most pressing issues we will have to solve is an integration
with the K8s scheduler to upfront reserve resources to not hit any resource
limits after scaling. Scaling currently redeploys the entire application
which has some risks because we surrender the pods for each scaling. This
can perhaps be achieved with the Rescale API.

-Max

On Sat, Nov 26, 2022 at 3:02 AM Prasanna kumar <
prasannakumarram...@gmail.com> wrote:

> Thanks for the reply.
>
> Gyula and Max.
>
> Prasanna
>
>
> On Sat, 26 Nov 2022, 00:24 Maximilian Michels,  wrote:
>
> > Hi John, hi Prasanna, hi Rui,
> >
> > Gyula already gave great answers to your questions, just adding to it:
> >
> > >What’s the reason to add auto scaling to the Operator instead of to the
> > JobManager?
> >
> > As Gyula mentioned, the JobManager is not the ideal place, at least not
> > until Flink supports in-place autoscaling which is a related but
> ultimately
> > very different problem because it involves solving job reconfiguration at
> > runtime. I believe the AdaptiveScheduler has moved into this direction
> and
> > there is nothing preventing us from using it in the future once it has
> > evolved further. For now, going through the job redeployment route seems
> > like the easiest and safest way.
> >
> > >Could we finally use the autoscaler as a outside tool? or run it as a
> > separate java process?
> >
> > I think we could but I wouldn't make it a requirement for the first
> > version. There is nothing preventing the autoscaler from running as a
> > separate k8s/yarn deployment which would provide some of the same
> > availability guarantees as the operator or any deployment has on
> k8s/yarn.
> > However, I think this increases complexity by a fair bit because the
> > operator already has all the configuration and tooling to manage Flink
> > jobs. I'm not at all opposed to coming up with a way to allow the
> > autoscaler to run separately as well as with the k8s operator. I just
> think
> > it is out of scope for the first version to keep the complexity and scope
> > under control.
> >
> > -Max
> >
> >
> > On Fri, Nov 25, 2022 at 8:16 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Hi Gyula
> > >
> > > Thanks for the clarification!
> > >
> > > Best
> > > Rui Fan
> > >
> > > On Fri, Nov 25, 2022 at 1:50 PM Gyula Fóra 
> wrote:
> > >
> > > > Rui, Prasanna:
> > > >
> > > > I am afraid that creating a completely independent autoscaler process
> > > that
> > > > works with any type of Flink clusters is out of scope right now due
> to
> > > the
> > > > following reasons:
> > > >
> > > > If we were to create a new general process, we would have to
> implement
> > > high
> > > > availability and a pluggable mechanism to durably store metadata etc.
> > The
> > > > process itself would also have to run somewhere so we would have to
> > > provide
> > > > integrations.
> > > >
> > > >  It would also not be able to scale clusters easily without adding
> > > > Kubernetes-operator-like functionality to it, and if the user has to
> do
> > > it
> > > > manually most of the value is already lost.
> > > >
> > > > Last but not least this would have the potential of interfering with
> > > other
> > > > actions the user might be currently doing, making the autoscaler
> itself
> > > > complex and more unreliable.
> > > >
> > > > These are all prohibitive reasons at this point. We already have a
> > > > prototype that tackle these smoothly as part of the Kubernetes
> > operator.
> > > >
> > > > Instead of trying to put the autoscaler somewhere else we might also
> > > > consider supporting different cluster types within the Kubernetes
> > > operator.
> > > > While that might sound silly at first, it is of similar scope to your
> > > > suggestions and could help the problem.
> > > >
> > > > As for the new config question, we could collectively decide to
> > backport
> > > > this feature to enable the autoscaler as it is a very minor change.
> > > >
> > > > Gyula
> > > >
> > > > On Fri, 25 Nov 2022 at 06:21, John Roesler 
> > wrote:
> > > >
> > > > > Thanks for this answer, Gyula!
> > > > > -John
> > > > >
> > > > > On Thu, Nov 24, 2022, at 14:53, Gyula Fóra wrote:
> > > > > > Hi John!
> > > > > >
> > > > > > Thank you for the excellent question.
> > > > > >
> > > > > > There are few reasons why we felt that the operator is the right
> > > place
> > > > > for
> > > > > > this component:
> > > > > >
> > > > > >  - Ideally the autos

[jira] [Created] (FLINK-30423) Introduce cast executor codegen for column type evolution

2022-12-14 Thread Shammon (Jira)
Shammon created FLINK-30423:
---

 Summary: Introduce cast executor codegen for column type evolution
 Key: FLINK-30423
 URL: https://issues.apache.org/jira/browse/FLINK-30423
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Affects Versions: table-store-0.3.0
Reporter: Shammon


Introduce cast executor codegen for column type evolution



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30424) Add source operator restore readerState log to distinguish split is from newPartitions or split state

2022-12-14 Thread Ran Tao (Jira)
Ran Tao created FLINK-30424:
---

 Summary: Add source operator restore readerState log to 
distinguish split is from newPartitions or split state
 Key: FLINK-30424
 URL: https://issues.apache.org/jira/browse/FLINK-30424
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.15.3, 1.16.0, 1.16.1
Reporter: Ran Tao


When a job start firstly, we can find 'assignPartitions' from log。but if source 
recover from state, we can not distinguish the newPartitions is from timed 
discover thread or from reader task state.  

We can add a helper log to distinguish and confirm the reader using split state 
in recover situation.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)