Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Yun Gao
Hi,

Very thanks Dawid for proposing the FLIP to clarify the ownership for the 
states. +1 for the overall changes since it makes the behavior clear and 
provide users a determined method to finally cleanup savepoints / retained 
checkpoints.

Regarding the changes to the public interface, it seems currently the changes 
are all bound
to the savepoint, but from the FLIP it seems perhaps we might also need to 
support the claim declaration
for retained checkpoints like in the cli side[1] ? If so, then might it be 
better to change the option name
from `execution.savepoint.restore-mode` to something like 
`execution.restore-mode`? 

Best,
Yun


[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpoints/#resuming-from-a-retained-checkpoint


--
From:Konstantin Knauf 
Send Time:2021 Nov. 19 (Fri.) 16:00
To:dev 
Subject:Re: [DISCUSS] FLIP-193: Snapshots ownership

Hi Dawid,

Thanks for working on this FLIP. Clarifying the differences and
guarantees around savepoints and checkpoints will make it easier and safer
for users and downstream projects and platforms to work with them.

+1 to the changing the current (undefined) behavior when recovering from
retained checkpoints. Users can now choose between claiming and not
claiming, which I think will make the current mixed behavior obsolete.

Cheers,

Konstantin

On Fri, Nov 19, 2021 at 8:19 AM Dawid Wysakowicz 
wrote:

> Hi devs,
>
> I'd like to bring up for a discussion a proposal to clean up ownership
> of snapshots, both checkpoints and savepoints.
>
> The goal here is to make it clear who is responsible for deleting
> checkpoints/savepoints files and when can that be done in a safe manner.
>
> Looking forward for your feedback!
>
> Best,
>
> Dawid
>
> [1] https://cwiki.apache.org/confluence/x/bIyqCw
>
>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk



[jira] [Created] (FLINK-24977) validatePKConstraints in KafkaDynamicTableFactory throws wrong exception message

2021-11-22 Thread Jingsong Lee (Jira)
Jingsong Lee created FLINK-24977:


 Summary: validatePKConstraints in KafkaDynamicTableFactory throws 
wrong exception message
 Key: FLINK-24977
 URL: https://issues.apache.org/jira/browse/FLINK-24977
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: Jingsong Lee


options.get(VALUE_FORMAT) should be 

configuration.get(VALUE_FORMAT)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24978) Upgrade Asm to 9.2

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24978:


 Summary: Upgrade Asm to 9.2
 Key: FLINK-24978
 URL: https://issues.apache.org/jira/browse/FLINK-24978
 Project: Flink
  Issue Type: Sub-task
  Components: BuildSystem / Shaded
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0, shaded-15.0


As per usual we need to bump flink-shaded-asm.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Definition of Done for Apache Flink

2021-11-22 Thread Matthias Pohl
I also like the checklist provided by our current PR template. One annoying
thing in my opinion, though, is that we do not rely on checkboxes. Ingo
already proposed such a change in [1]. Chesnay had some good points on
certain items that were meant to be added to the template but are actually
already checked automatically [2]. In the end it comes down to noticing
these checks and acting on it if one of them fails. I see the benefits of
having an explicit check for something like that in the PR. But again,
adding more items increases the risk of users just ignoring it.

One other thing to raise awareness for users might be to move the
CONTRIBUTING.md into the root folder. Github still recognizes the file if
it is located in the project's root [3]. Hence, I don't see a need to
"hide" it in the .github subfolder. Or was there another reason to put the
file into that folder?

Matthias

[1] https://github.com/apache/flink/pull/17801#issuecomment-969363303
[2] https://github.com/apache/flink/pull/17801#issuecomment-970048058
[3]
https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors#about-contributing-guidelines

On Thu, Nov 18, 2021 at 12:03 PM Yun Tang  wrote:

> Hi Joe,
>
> Thanks for bringing this to our attention.
>
> In general, I agreed with Chesnay's reply on PR [1]. For the rule-3, we
> might indeed create another PR to add documentation previously. And I think
> if forcing to obey it to include the documentation in the same PR, that
> could benefit the review progress. Thus, I am not against for this rule.
>
> For the rule related to the PR description, I think current flinkbot has
> tools to let committer to run command like "@flinkbot approve description".
> However, I think many committers did not leverage this, which makes the bot
> useless at most of the time. I think this discussion draws the attention
> that whether we should strictly obey the review process via using flinkbot
> or still not force committer to leverage it.
>
> [1] https://github.com/apache/flink/pull/17801#issuecomment-970048058
>
> Best
> Yun Tang
>
> On 2021/11/16 10:38:39 Ingo Bürk wrote:
> > > On the other hand I am a silent fan of the current PR template because
> > > it also provides a summary of the PR to make it easier for committers
> > > to determine the impacts.
> >
> > I 100% agree that part of a PR (and thus the template) should be the
> > summary of the what, why, and how of the changes. I also see value in
> > marking a PR as a breaking change if the author is aware of it being one
> > (of course a committer needs to verify this nonetheless).
> >
> > But apart from that, there's a lot of questions in there that no one
> seems
> > to care about, and e.g. the question of how a change can be verified
> seems
> > fairly useless to me: if tests have been changed, that can trivially be
> > seen in the PR. The CI runs on top of that anyway as well. So I never
> > really understood why I need to manually list all the tests I have
> touched
> > here (or maybe I misunderstood this question the entire time?).
> >
> > If the template is supposed to be useful for the committer rather than
> the
> > author, it would have to be mandatory to fill it out, which de-facto it
> > isn't.
> >
> > Also, even if we keep all the same information, I would still love to see
> > it converted into checkboxes. I know it's a small detail, but it's much
> > less annoying than the current template. Something like
> >
> > ```
> > - [ ] This pull requests changes the public API (i.e., any class
> annotated
> > with `@Public(Evolving)`)
> > - [ ] This pull request adds, removes, or updates dependencies
> > - [ ] I have updated the documentation to reflect the changes made in
> this
> > pull request
> > ```
> >
> > On Tue, Nov 16, 2021 at 10:28 AM Fabian Paul  wrote:
> >
> > > Hi all,
> > >
> > > Maybe I am the devil's advocate but I see the stability of master and
> > > the definition of done as disjunct properties. I think it is more a
> > > question of prioritization that test instabilities are treated as
> > > critical tickets and have to be addressed before continuing any other
> > > work. It will always happen that we merge code that is not 100%
> > > stable; that is probably the nature of software development. I agree
> > > when it comes to documentation that PRs are only mergeable if the
> > > documentation has also been updated.
> > >
> > > On the other hand I am a silent fan of the current PR template because
> > > it also provides a summary of the PR to make it easier for committers
> > > to determine the impacts. It also reminds the contributors of our
> > > principles i.e. how do you verify the change should probably not be
> > > answered with "test were not possible".
> > >
> > > I agree with @Martijn Visser that we can improve the CI i.e.
> > > performance regression test, execute s3 test but these things should
> > > be addressed in another discussion.
> 

[jira] [Created] (FLINK-24979) Remove MaxPermSize configuration in HBase surefire config

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24979:


 Summary: Remove MaxPermSize configuration in HBase surefire config
 Key: FLINK-24979
 URL: https://issues.apache.org/jira/browse/FLINK-24979
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Connectors / HBase
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


The MaxPermSize parameter has no effect since JDK 8, and is actively rejected 
in Java 17. Given that we for years it worked just fine in without it, we can 
just remove it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24980) Collect sizes of result partitions when tasks finish

2021-11-22 Thread Lijie Wang (Jira)
Lijie Wang created FLINK-24980:
--

 Summary: Collect sizes of result partitions when tasks finish
 Key: FLINK-24980
 URL: https://issues.apache.org/jira/browse/FLINK-24980
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Metrics
Reporter: Lijie Wang


The adaptive batch job scheduler needs to know the size of each result 
partition when the task is finished.

This issue will introduce the *numBytesProduced* counter and register it into 
{*}TaskIOMetricGroup{*}, to record the size of each result partition. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24981) Fix JDK12+ profile activations

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24981:


 Summary: Fix JDK12+ profile activations
 Key: FLINK-24981
 URL: https://issues.apache.org/jira/browse/FLINK-24981
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


All Java11 profiles should be active for Java 11 and all subsequent Java 
versions, but this currently only applies to the "main" profile in the root pom.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24982) File leak in AbstractRecoverableWriterTest

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24982:


 Summary: File leak in AbstractRecoverableWriterTest
 Key: FLINK-24982
 URL: https://issues.apache.org/jira/browse/FLINK-24982
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24983) Upgrade surefire to 3.0.0-M5

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24983:


 Summary: Upgrade surefire to 3.0.0-M5
 Key: FLINK-24983
 URL: https://issues.apache.org/jira/browse/FLINK-24983
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


Surefire 3.0.0-M5 comes with a new TCP/IP communication channel between 
surefire and JVM forks.
This will allow us to resolve "Corrupted STDOUT" issues when the JVM is 
printing warnings due to unsafe accesses.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Roman Khachatryan
Hi,

Thanks for the proposal Dawid, I have some questions and remarks:

1. How will stop-with-savepoint be handled?
Shouldn't side effects be enforced in this case? (i.e. send
notifyCheckpointComplete)

2. Instead of forcing re-upload, can we "inverse control" in no-claim mode?
Anyways, any external tool will have to poll Flink API waiting for the
next (full) checkpoint, before deleting the retained checkpoint,
right?
Instead, we can provide an API which tells whether the 1st checkpoint
is still in use (and not force re-upload it).

Under the hood, it can work like this:
- for the checkpoint Flink recovers from, remember all shared state
handles it is adding
- when unregistering shared state handles, remove them from the set above
- when the set becomes empty the 1st checkpoint can be deleted externally

Besides not requiring re-upload, it seems much simpler and less invasive.
On the downside, state deletion can be delayed; but I think this is a
reasonable trade-off.

3. Alternatively, re-upload not necessarily on 1st checkpoint, but
after a configured number of checkpoints?
There is a high chance that after some more checkpoints, initial state
will not be used (because of compaction),
so backends won't have to re-upload anything (or small part).

4. Re-uploaded artifacts must not be deleted on checkpoin abortion
This should be addressed in https://issues.apache.org/jira/browse/FLINK-24611.
If not, I think the FLIP should consider this case.

5. Enforcing re-upload by a single task and Changelog state backend
With Changelog state backend, a file can be shared by multiple operators.
Therefore, getIntersection() is irrelevant here, because operators
might not be sharing any key groups.
(so we'll have to analyze "raw" file usage I think).

6. Enforcing re-upload by a single task and skew
If we use some greedy logic like subtask 0 always re-uploads then it
might be overloaded.
So we'll have to obtain a full list of subtasks first (then probably
choose randomly or round-robin).
However, that requires rebuilding Task snapshot, which is doable but
not trivial (which I think supports "reverse API option").

7. I think it would be helpful to list file systems / object stores
that support "fast" copy (ideally with latency numbers).

Regards,
Roman

On Mon, Nov 22, 2021 at 9:24 AM Yun Gao  wrote:
>
> Hi,
>
> Very thanks Dawid for proposing the FLIP to clarify the ownership for the
> states. +1 for the overall changes since it makes the behavior clear and
> provide users a determined method to finally cleanup savepoints / retained 
> checkpoints.
>
> Regarding the changes to the public interface, it seems currently the changes 
> are all bound
> to the savepoint, but from the FLIP it seems perhaps we might also need to 
> support the claim declaration
> for retained checkpoints like in the cli side[1] ? If so, then might it be 
> better to change the option name
> from `execution.savepoint.restore-mode` to something like 
> `execution.restore-mode`?
>
> Best,
> Yun
>
>
> [1] 
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpoints/#resuming-from-a-retained-checkpoint
>
>
> --
> From:Konstantin Knauf 
> Send Time:2021 Nov. 19 (Fri.) 16:00
> To:dev 
> Subject:Re: [DISCUSS] FLIP-193: Snapshots ownership
>
> Hi Dawid,
>
> Thanks for working on this FLIP. Clarifying the differences and
> guarantees around savepoints and checkpoints will make it easier and safer
> for users and downstream projects and platforms to work with them.
>
> +1 to the changing the current (undefined) behavior when recovering from
> retained checkpoints. Users can now choose between claiming and not
> claiming, which I think will make the current mixed behavior obsolete.
>
> Cheers,
>
> Konstantin
>
> On Fri, Nov 19, 2021 at 8:19 AM Dawid Wysakowicz 
> wrote:
>
> > Hi devs,
> >
> > I'd like to bring up for a discussion a proposal to clean up ownership
> > of snapshots, both checkpoints and savepoints.
> >
> > The goal here is to make it clear who is responsible for deleting
> > checkpoints/savepoints files and when can that be done in a safe manner.
> >
> > Looking forward for your feedback!
> >
> > Best,
> >
> > Dawid
> >
> > [1] https://cwiki.apache.org/confluence/x/bIyqCw
> >
> >
> >
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


[jira] [Created] (FLINK-24984) Rename MAVEN_OPTS variable in azure yaml

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24984:


 Summary: Rename MAVEN_OPTS variable in azure yaml
 Key: FLINK-24984
 URL: https://issues.apache.org/jira/browse/FLINK-24984
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / Azure Pipelines
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


The build-apache-repo.ymnl defines a MAVEN_OPTS variable, which are later 
passed to Maven.
This is a bit misleading because MAVEN_OPTS already has semantics attached to 
it by Maven itself, in that it passes the value of such an environment variable 
to the JVM.
As such it can, and should, only be used to control JVM parameters (e.g., 
memory), and not Maven-specific stuff. In fact the JVM would fail if, for 
example, it were to contain a profile activation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24985) LocalBufferPoolDestroyTest#isInBlockingBufferRequest does not work on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24985:


 Summary: LocalBufferPoolDestroyTest#isInBlockingBufferRequest does 
not work on Java 17
 Key: FLINK-24985
 URL: https://issues.apache.org/jira/browse/FLINK-24985
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


LocalBufferPoolDestroyTest#isInBlockingBufferRequest does some nasty stacktrace 
analysis to find out out whether a thread is currently waiting for a buffer or 
not.
The assumptions it makes over the stracktrace layout no longer apply on Java 
17, and need to be adjusted.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24986) StateHandleSerializationTest fails on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24986:


 Summary: StateHandleSerializationTest fails on Java 17
 Key: FLINK-24986
 URL: https://issues.apache.org/jira/browse/FLINK-24986
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


It appears the classpath searches via reflection have become more powerful on 
Java 17, as the test now also detects an anonymous implementation in the 
RocksDBStateDownloaderTest.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24987) Enhance ExternalizedCheckpointCleanup enum

2021-11-22 Thread Nicolaus Weidner (Jira)
Nicolaus Weidner created FLINK-24987:


 Summary: Enhance ExternalizedCheckpointCleanup enum
 Key: FLINK-24987
 URL: https://issues.apache.org/jira/browse/FLINK-24987
 Project: Flink
  Issue Type: Improvement
  Components: API / DataStream
Affects Versions: 1.14.0
Reporter: Nicolaus Weidner


We use the config setting 
[execution.checkpointing.externalized-checkpoint-retention|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/ExecutionCheckpointingOptions.java#L90-L119]
 to distinguish three cases:
- delete on cancellation
- retain on cancellation
- no externalized checkpoints (if no value is set)

It would be easier to understand if we had an explicit enum value 
NO_EXTERNALIZED_CHECKPOINTS for the third case in 
[ExternalizedCheckpointCleanup|https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L702-L742].
 This would also avoid potential issues for clients with handling null values 
(for example, null values being dropped on serialization could be annoying when 
trying to change from RETAIN_ON_CANCELLATION to no external checkpoints).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24988) Upgrade lombok to 1.18.22

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24988:


 Summary: Upgrade lombok to 1.18.22
 Key: FLINK-24988
 URL: https://issues.apache.org/jira/browse/FLINK-24988
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


Our current Lombok version fails on Java 17 due to illegal accesses to JDK 
internals. Newer versions of Lombok no longer do this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
@Yun

I think it is a good comment with I agree in principal. However, we use
--fromSavepoint (cli), savepointPath (REST API),
SavepointRestoreSettings for both restoring from a savepoint and an
externalized checkpoint already. I wanted to voice that concern.
Nevertheless I am fine with changing it to execution.restore-mode, if
there are no other comments on that matter, I will change it.

@Roman:

Re 1. Correct, stop-with-savepoint should commit side-effects. Will add
that to the doc.

Re.2 What I don't like about this counter proposal is that it still has
no clearly defined point in time when it is safe to delete the original
checkpoint. Users would have a hard time reasoning about it and
debugging. Even worse, I think worst case it might never happen that all
the original files are no longer in use (I am not too familiar with
RocksDB compaction, but what happens if there are key ranges that are
never accessed again?) I agree it is unlikely, but possible, isn't it?
Definitely it can take a significant time and many checkpoints to do so.

Re. 3 I believe where you are coming from is that you'd like to keep the
checkpointing time minimal and reuploading files may increase it. The
proposal so far builds on the assumption we could in most cases use a
cheap duplicate API instead of re-upload. I could see this as a
follow-up if it becomes a bottleneck. It would be a bit invasive though,
as we would have to somehow keep track which files should not be reused
on TMs.

Re. 2 & 3 Neither of the counter proposals work well for taking
incremental savepoints. We were thinking of building incremental
savepoints on the same concept. I think delaying the completion of an
independent savepoint to a closer undefined future is not a nice
property of savepoints.

Re 4. Good point. We should make sure the first *completed *checkpoint
has the independent/full checkpoint property rather than just the first
triggered.

Re. 5 & 6 I need a bit more time to look into it.

Best,

Dawid

On 22/11/2021 11:40, Roman Khachatryan wrote:
> Hi,
>
> Thanks for the proposal Dawid, I have some questions and remarks:
>
> 1. How will stop-with-savepoint be handled?
> Shouldn't side effects be enforced in this case? (i.e. send
> notifyCheckpointComplete)
>
> 2. Instead of forcing re-upload, can we "inverse control" in no-claim mode?
> Anyways, any external tool will have to poll Flink API waiting for the
> next (full) checkpoint, before deleting the retained checkpoint,
> right?
> Instead, we can provide an API which tells whether the 1st checkpoint
> is still in use (and not force re-upload it).
>
> Under the hood, it can work like this:
> - for the checkpoint Flink recovers from, remember all shared state
> handles it is adding
> - when unregistering shared state handles, remove them from the set above
> - when the set becomes empty the 1st checkpoint can be deleted externally
>
> Besides not requiring re-upload, it seems much simpler and less invasive.
> On the downside, state deletion can be delayed; but I think this is a
> reasonable trade-off.
>
> 3. Alternatively, re-upload not necessarily on 1st checkpoint, but
> after a configured number of checkpoints?
> There is a high chance that after some more checkpoints, initial state
> will not be used (because of compaction),
> so backends won't have to re-upload anything (or small part).
>
> 4. Re-uploaded artifacts must not be deleted on checkpoin abortion
> This should be addressed in https://issues.apache.org/jira/browse/FLINK-24611.
> If not, I think the FLIP should consider this case.
>
> 5. Enforcing re-upload by a single task and Changelog state backend
> With Changelog state backend, a file can be shared by multiple operators.
> Therefore, getIntersection() is irrelevant here, because operators
> might not be sharing any key groups.
> (so we'll have to analyze "raw" file usage I think).
>
> 6. Enforcing re-upload by a single task and skew
> If we use some greedy logic like subtask 0 always re-uploads then it
> might be overloaded.
> So we'll have to obtain a full list of subtasks first (then probably
> choose randomly or round-robin).
> However, that requires rebuilding Task snapshot, which is doable but
> not trivial (which I think supports "reverse API option").
>
> 7. I think it would be helpful to list file systems / object stores
> that support "fast" copy (ideally with latency numbers).
>
> Regards,
> Roman
>
> On Mon, Nov 22, 2021 at 9:24 AM Yun Gao  wrote:
>> Hi,
>>
>> Very thanks Dawid for proposing the FLIP to clarify the ownership for the
>> states. +1 for the overall changes since it makes the behavior clear and
>> provide users a determined method to finally cleanup savepoints / retained 
>> checkpoints.
>>
>> Regarding the changes to the public interface, it seems currently the 
>> changes are all bound
>> to the savepoint, but from the FLIP it seems perhaps we might also need to 
>> support the claim declaration
>> for retained checkpoints like in the cli side[1] ? If s

[jira] [Created] (FLINK-24989) Upgrade shade-plugin to 3.2.4

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24989:


 Summary: Upgrade shade-plugin to 3.2.4
 Key: FLINK-24989
 URL: https://issues.apache.org/jira/browse/FLINK-24989
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24990) LookupJoinITCase fails on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24990:


 Summary: LookupJoinITCase fails on Java 17
 Key: FLINK-24990
 URL: https://issues.apache.org/jira/browse/FLINK-24990
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


The UserDefinedFunctionHelper validates that a given function is public.

The {{InMemory[Async]LookupFunction}} classes are not public however. Probably 
some Java<->Scala interplay that causes this to not be detected at this time.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24991) FlinkStatistic does not compile on later Scala versions

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24991:


 Summary: FlinkStatistic does not compile on later Scala versions
 Key: FLINK-24991
 URL: https://issues.apache.org/jira/browse/FLINK-24991
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


After upgrading Scala to 2.12.15 for testing purposes (because 2.12.7 does not 
work on Java 17), I got a compile error int he FlinkStatistic.

{code}
[ERROR] FlinkStatistic.scala:146: error: Int does not take parameters
[ERROR] if (builder.nonEmpty && builder.length() > 2) {
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24992) StreamingWithStateTestBase does not compile on later Scala versions

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24992:


 Summary: StreamingWithStateTestBase does not compile on later 
Scala versions
 Key: FLINK-24992
 URL: https://issues.apache.org/jira/browse/FLINK-24992
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


After upgrading Scala to 2.12.15 for testing purposes (because 2.12.7 does not 
work on Java 17), I got a compile error in the StreamingWithStateTestBase.

The {{subSequence}} method is no longer available for arrays.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-11-22 Thread wenlong.lwl
Hi, Timo, thanks for driving the discussion and the preparation on the
FLIP. This is a pain point of Flink SQL complaining by our users badly. I
have  seen many cases where our users suffer while trying to upgrade the
flink  version in order to take advantage of the bug fixes and performance
improvements on the new version. It often takes a long time verifying the
new plan,  reoptimizing the config, recomputing the state,  waiting for a
safe point to make the new job active in production, etc. There are many
times that new problems show up in upgrading.

I have a question on COMPILE AND EXECUTE. It doesn't look so good that we
just execute the plan and ignore the statement when the plan already
exists, but the plan and SQL are not matched. The result would be quite
confusing if we still execute the plan directly, we may need to add a
validation. Personally I would prefer not to provide such a shortcut, let
users use  COMPILE PLAN IF NOT EXISTS and EXECUTE explicitly, which can be
understood by new users even without inferring the docs.

Best,
Wenlong


[jira] [Created] (FLINK-24993) ExceptionUtils#isDirectOutOfMemoryError doesn't work on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24993:


 Summary: ExceptionUtils#isDirectOutOfMemoryError doesn't work on 
Java 17
 Key: FLINK-24993
 URL: https://issues.apache.org/jira/browse/FLINK-24993
 Project: Flink
  Issue Type: Sub-task
  Components: API / Core
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


ExceptionUtils#isDirectOutOfMemoryError doesn't work on Java 17 because the 
error message has changed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Roman Khachatryan
Thanks Dawid,

Regarding clarity,
I think that all proposals require waiting for some event: re-upload /
checkpoint completion / api response.
But with the current one, there is an assumption: "initial checkpoint
can be deleted once a new one completes" (instead of just "initial
checkpoint can be deleted once the API says it can be deleted").
So I think it's actually more clear to offer this explicit API and rely on it.

Regarding delaying the deletion,
I agree that it can delay deletion, but how important is it?
Checkpoints are usually stored on relatively cheap storage like S3, so
some delay shouldn't be an issue (especially taking rounding into
account); it can even be cheaper or comparable to paying for
re-upload/duplicate calls.

Infinite delay can be an issue though, I agree.
Maybe @Yun can clarify the likelihood of never deleting some SST files
by RocksDB?
For the changelog backend, old files won't be used once
materialization succeeds.

Yes, my concern is checkpointing time, but also added complexity:
> It would be a bit invasive though, as we would have to somehow keep track 
> which files should not be reused on TMs.
I think we need this anyway if we choose to re-upload files once the
job is running.
The new checkpoint must be formed by re-uploaded old artifacts AND
uploaded new artifacts.


Regards,
Roman


On Mon, Nov 22, 2021 at 12:42 PM Dawid Wysakowicz
 wrote:
>
> @Yun
>
> I think it is a good comment with I agree in principal. However, we use 
> --fromSavepoint (cli), savepointPath (REST API), SavepointRestoreSettings for 
> both restoring from a savepoint and an externalized checkpoint already. I 
> wanted to voice that concern. Nevertheless I am fine with changing it to 
> execution.restore-mode, if there are no other comments on that matter, I will 
> change it.
>
> @Roman:
>
> Re 1. Correct, stop-with-savepoint should commit side-effects. Will add that 
> to the doc.
>
> Re.2 What I don't like about this counter proposal is that it still has no 
> clearly defined point in time when it is safe to delete the original 
> checkpoint. Users would have a hard time reasoning about it and debugging. 
> Even worse, I think worst case it might never happen that all the original 
> files are no longer in use (I am not too familiar with RocksDB compaction, 
> but what happens if there are key ranges that are never accessed again?) I 
> agree it is unlikely, but possible, isn't it? Definitely it can take a 
> significant time and many checkpoints to do so.
>
> Re. 3 I believe where you are coming from is that you'd like to keep the 
> checkpointing time minimal and reuploading files may increase it. The 
> proposal so far builds on the assumption we could in most cases use a cheap 
> duplicate API instead of re-upload. I could see this as a follow-up if it 
> becomes a bottleneck. It would be a bit invasive though, as we would have to 
> somehow keep track which files should not be reused on TMs.
>
> Re. 2 & 3 Neither of the counter proposals work well for taking incremental 
> savepoints. We were thinking of building incremental savepoints on the same 
> concept. I think delaying the completion of an independent savepoint to a 
> closer undefined future is not a nice property of savepoints.
>
> Re 4. Good point. We should make sure the first completed checkpoint has the 
> independent/full checkpoint property rather than just the first triggered.
>
> Re. 5 & 6 I need a bit more time to look into it.
>
> Best,
>
> Dawid
>
> On 22/11/2021 11:40, Roman Khachatryan wrote:
>
> Hi,
>
> Thanks for the proposal Dawid, I have some questions and remarks:
>
> 1. How will stop-with-savepoint be handled?
> Shouldn't side effects be enforced in this case? (i.e. send
> notifyCheckpointComplete)
>
> 2. Instead of forcing re-upload, can we "inverse control" in no-claim mode?
> Anyways, any external tool will have to poll Flink API waiting for the
> next (full) checkpoint, before deleting the retained checkpoint,
> right?
> Instead, we can provide an API which tells whether the 1st checkpoint
> is still in use (and not force re-upload it).
>
> Under the hood, it can work like this:
> - for the checkpoint Flink recovers from, remember all shared state
> handles it is adding
> - when unregistering shared state handles, remove them from the set above
> - when the set becomes empty the 1st checkpoint can be deleted externally
>
> Besides not requiring re-upload, it seems much simpler and less invasive.
> On the downside, state deletion can be delayed; but I think this is a
> reasonable trade-off.
>
> 3. Alternatively, re-upload not necessarily on 1st checkpoint, but
> after a configured number of checkpoints?
> There is a high chance that after some more checkpoints, initial state
> will not be used (because of compaction),
> so backends won't have to re-upload anything (or small part).
>
> 4. Re-uploaded artifacts must not be deleted on checkpoin abortion
> This should be addressed in https://issues.apache.org/jira/bro

[jira] [Created] (FLINK-24994) WindowedStream#reduce(ReduceFunction function) gives useless suggestion when reducefunction is richfunction.

2021-11-22 Thread bx123 (Jira)
bx123 created FLINK-24994:
-

 Summary: WindowedStream#reduce(ReduceFunction function) gives 
useless suggestion when reducefunction is richfunction.
 Key: FLINK-24994
 URL: https://issues.apache.org/jira/browse/FLINK-24994
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Reporter: bx123


1. when reducefunction is really a richfunction, the exception suggests to use 

reduce(ReduceFunction, WindowFunction) instead. While this is useless, we can 
tests this by just call(richreducefunction, new PassThroughWindowFunction<>()) 
ourself.

this wrong suggestion appeares in AllWindowedStream too.

 

2.AggerateFunction should not be richfunction not just when used in 
WindowedStream/AllWindowedStream#aggerate(). this should be checked when it is 
used in a StateDescriptor.

 

3. Typically, when reducefunction/aggeratefunction is used in a StateDescriptor 
and it is a richfunction. The lifecycle method open/close has no chance to be 
called and the operation in open/close will not be processed. and the 
runtimecontext will always be null as we cannot set it. 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24995) EnumValueSerializerCompatibilityTest fails to compile on later Scala versions

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24995:


 Summary: EnumValueSerializerCompatibilityTest fails to compile on 
later Scala versions
 Key: FLINK-24995
 URL: https://issues.apache.org/jira/browse/FLINK-24995
 Project: Flink
  Issue Type: Sub-task
  Components: API / Scala
Reporter: Chesnay Schepler


After upgrading Scala to 2.12.15 for testing purposes (because 2.12.7 does not 
work on Java 17), I got a compile error in the 
EnumValueSerializerCompatibilityTest#compileScalaFile.

The printSummary() method of the CompileReporter was apparently removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24996) Update CI image to contain Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24996:


 Summary: Update CI image to contain Java 17
 Key: FLINK-24996
 URL: https://issues.apache.org/jira/browse/FLINK-24996
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24997) count(null) not supported in flink sql query

2021-11-22 Thread zouyunhe (Jira)
zouyunhe created FLINK-24997:


 Summary: count(null) not supported in flink sql query
 Key: FLINK-24997
 URL: https://issues.apache.org/jira/browse/FLINK-24997
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client, Table SQL / Planner
Affects Versions: 1.14.0
Reporter: zouyunhe


I use sql client to submit a query sql to flink session cluster,  the sql is 
`select count(null)`, it failed and exception throws

```

org.apache.flink.table.client.gateway.SqlExecutionException: Could not execute 
SQL statement.
        at 
org.apache.flink.table.client.gateway.local.LocalExecutor.executeOperation(LocalExecutor.java:211)
 ~[flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.client.gateway.local.LocalExecutor.executeQuery(LocalExecutor.java:231)
 ~[flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.client.cli.CliClient.callSelect(CliClient.java:532) 
~[flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.client.cli.CliClient.callOperation(CliClient.java:423) 
~[flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.client.cli.CliClient.lambda$executeStatement$1(CliClient.java:332)
 [flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at java.util.Optional.ifPresent(Optional.java:183) ~[?:?]
        at 
org.apache.flink.table.client.cli.CliClient.executeStatement(CliClient.java:325)
 [flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:297)
 [flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:221)
 [flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:151) 
[flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at org.apache.flink.table.client.SqlClient.start(SqlClient.java:95) 
[flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:187) 
[flink-sql-client_2.12-1.14.0.jar:1.14.0]
        at org.apache.flink.table.client.SqlClient.main(SqlClient.java:161) 
[flink-sql-client_2.12-1.14.0.jar:1.14.0]
Caused by: java.lang.UnsupportedOperationException: Unsupported type 'NULL' to 
get internal serializer
        at 
org.apache.flink.table.runtime.typeutils.InternalSerializers.createInternal(InternalSerializers.java:125)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.runtime.typeutils.InternalSerializers.create(InternalSerializers.java:55)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
        at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) 
~[?:?]
        at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) 
~[?:?]
        at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:550) ~[?:?]
        at 
java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
 ~[?:?]
        at 
java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:517) ~[?:?]
        at 
org.apache.flink.table.runtime.typeutils.RowDataSerializer.(RowDataSerializer.java:73)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.runtime.typeutils.InternalSerializers.createInternal(InternalSerializers.java:109)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.runtime.typeutils.InternalSerializers.create(InternalSerializers.java:55)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.runtime.typeutils.InternalTypeInfo.of(InternalTypeInfo.java:83)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecCalc.translateToPlanInternal(CommonExecCalc.java:106)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:134)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:250)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.planner.plan.nodes.exec.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.java:75)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:134)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]
        at 
org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:250)
 ~[flink-table_2.12-1.14.0.jar:1.14.0]

```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24998) SIGSEGV in Kryo / C2 CompilerThread on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24998:


 Summary: SIGSEGV in Kryo / C2 CompilerThread on Java 17
 Key: FLINK-24998
 URL: https://issues.apache.org/jira/browse/FLINK-24998
 Project: Flink
  Issue Type: Sub-task
  Components: API / Type Serialization System
Reporter: Chesnay Schepler
 Attachments: -__w-1-s-flink-tests-target-hs_err_pid470059.log

While running our tests on CI with Java 17 they failed infrequently with a 
SIGSEGV error.

All occurrences were related to Kryo and happened in the C2 CompilerThread.

{code:java}
Current thread (0x7f1394165c00):  JavaThread "C2 CompilerThread0" daemon 
[_thread_in_native, id=470077, stack(0x7f1374361000,0x7f1374462000)]

Current CompileTask:
C2:  14251 6333   4   com.esotericsoftware.kryo.io.Input::readString 
(127 bytes)
{code}

The full error is attached to the ticket. I can also provide the core dump if 
needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-24999) flink-python doesn't work on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-24999:


 Summary: flink-python doesn't work on Java 17
 Key: FLINK-24999
 URL: https://issues.apache.org/jira/browse/FLINK-24999
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Chesnay Schepler


{code:java}
java.lang.UnsupportedOperationException: sun.misc.Unsafe or 
java.nio.DirectByteBuffer.(long, int) not available
 at 
io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:490)
 at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:257)
 at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:247)
 at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:248)
 at 
org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:228)
 at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:242)
 at 
org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:132)
 at org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:120)
 at 
org.apache.flink.table.runtime.arrow.ArrowUtilsTest.testReadArrowBatches(ArrowUtilsTest.java:389)
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25000) Scala 2.12.7 doesn't compile on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25000:


 Summary: Scala 2.12.7 doesn't compile on Java 17
 Key: FLINK-25000
 URL: https://issues.apache.org/jira/browse/FLINK-25000
 Project: Flink
  Issue Type: Sub-task
  Components: API / Scala
Reporter: Chesnay Schepler


Fails with "fails with /packages cannot be represented as URI" during 
compilation.

2.12.5 was working fine.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25001) Zookeeper 3.4 fails on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25001:


 Summary: Zookeeper 3.4 fails on Java 17
 Key: FLINK-25001
 URL: https://issues.apache.org/jira/browse/FLINK-25001
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Chesnay Schepler


See https://issues.apache.org/jira/browse/ZOOKEEPER-3779



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25002) Setup required --add-opens/--add-exports

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25002:


 Summary: Setup required --add-opens/--add-exports
 Key: FLINK-25002
 URL: https://issues.apache.org/jira/browse/FLINK-25002
 Project: Flink
  Issue Type: Sub-task
  Components: Build System, Build System / CI, Documentation, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


Java 17 actually enforces the encapsulation of the JDK (opposed to Java 11 
which just printed warnings), requiring us to explicitly open/export any 
package that we access illegally.

The following is a list of opens/exports that I needed to get most tests to 
pass, also with some comments which component needed them. Overall the 
ClosureCleaner and FieldSerializer result in the most offenses, as they try to 
access private fields.

These properties need to be set _for all JVMs in which we run Flink_, including 
surefire forks, other tests processes (TestJvmProcess/TestProcessBuilder/Yarn) 
and the distribution.
This needs some though on how we can share this list across poms (surefire), 
code (test processes / yarn) and the configuration (distribution).

{code:xml}
 --add-exports java.base/sun.net.util=ALL-UNNAMED --add-exports java.rmi/sun.rmi.registry=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED 
--add-exports jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-exports java.security.jgss/sun.security.krb5=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.math=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.sql/java.sql=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.text=ALL-UNNAMED --add-opens java.base/java.time=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens java.base/java.util.concurrent.locks=ALL-UNNAMED --add-opens java.base/java.util.stream=ALL-UNNAMED --add-opens java.base/sun.util.calendar=ALL-UNNAMED

{code}

Additionally, the following JVM arguments must be supplied when running Maven:

{code}
export MAVEN_OPTS="\
--add-exports jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED \
--add-exports jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED \
--add-exports jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED \
--add-exports jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED \
--add-exports jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED \
--add-exports java.security.jgss/sun.security.krb5=ALL-UNNAMED"
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25003) RestClientTest#testConnectionTimeout fails on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25003:


 Summary: RestClientTest#testConnectionTimeout fails on Java 17
 Key: FLINK-25003
 URL: https://issues.apache.org/jira/browse/FLINK-25003
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST, Tests
Reporter: Chesnay Schepler


The test fails because the exception type has changed; with the returned 
exception no longer being a ConnectException but just a plain SocketException, 
presumably because the JDK fails earlier since it can't find a route.

We could change the assertion, or change the test somehow to actually work 
against a server which doesn't allow the establishment of a connection.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25004) Disable spotless on Java 17

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25004:


 Summary: Disable spotless on Java 17
 Key: FLINK-25004
 URL: https://issues.apache.org/jira/browse/FLINK-25004
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


The current spotless version doesn't on Java 17. An upgrade is at this time not 
possible because compatible versions no longer run on Java 8.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25005) Add java17-target profile

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25005:


 Summary: Add java17-target profile
 Key: FLINK-25005
 URL: https://issues.apache.org/jira/browse/FLINK-25005
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.15.0


Add a new profile analogous to the java11-target profile.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25006) Investigate memory issues on CI

2021-11-22 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-25006:


 Summary: Investigate memory issues on CI
 Key: FLINK-25006
 URL: https://issues.apache.org/jira/browse/FLINK-25006
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Reporter: Chesnay Schepler


During my tests with Java 17 I noticed significant speed regressions in tests 
that actually run jobs, to the point where they effectively stalled. If they 
managed to complete, then subsequent tests would also experience a significant 
slowdown.

As an experiment I disable fork reuse, which very quickly resulted in direct 
memory OOMs.

We need to investigate what is going on. I can provide logs and a branch with 
Java 17 to anyone who is interested.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
I don't think there is anything wrong with the API. My comment was about
what's behind the API. With tracking the shared files on JM you can not
say if you can clear the files after couple of checkpoints or 10s, 100s
or 1000s, which translates into minutes/hours/days/weeks of processing.

I think we need this anyway if we choose to re-upload files once the
job is running.

Not in the same way. If you assume the 1st checkpoint needs to be "full"
you know you are not allowed to use any shared files. It's true you
should know about the shared files of the previous checkpoint, but e.g.
RocksDB already tracks that. However if you go with multiple
checkpoints, you would need to remember 1) what was the id of the
initial checkpoints 2) what files it used. Assuming we restored from
chk-42 and we configured to "force full snapshot" after 3 checkpoints.
We would have to remember what were the files used for chk-42 at the
time of chk-45 (which we don't do so far).

Best,

Dawid

On 22/11/2021 14:09, Roman Khachatryan wrote:
> Thanks Dawid,
>
> Regarding clarity,
> I think that all proposals require waiting for some event: re-upload /
> checkpoint completion / api response.
> But with the current one, there is an assumption: "initial checkpoint
> can be deleted once a new one completes" (instead of just "initial
> checkpoint can be deleted once the API says it can be deleted").
> So I think it's actually more clear to offer this explicit API and rely on it.
>
> Regarding delaying the deletion,
> I agree that it can delay deletion, but how important is it?
> Checkpoints are usually stored on relatively cheap storage like S3, so
> some delay shouldn't be an issue (especially taking rounding into
> account); it can even be cheaper or comparable to paying for
> re-upload/duplicate calls.
>
> Infinite delay can be an issue though, I agree.
> Maybe @Yun can clarify the likelihood of never deleting some SST files
> by RocksDB?
> For the changelog backend, old files won't be used once
> materialization succeeds.
>
> Yes, my concern is checkpointing time, but also added complexity:
>> It would be a bit invasive though, as we would have to somehow keep track 
>> which files should not be reused on TMs.
> I think we need this anyway if we choose to re-upload files once the
> job is running.
> The new checkpoint must be formed by re-uploaded old artifacts AND
> uploaded new artifacts.
>
>
> Regards,
> Roman
>
>
> On Mon, Nov 22, 2021 at 12:42 PM Dawid Wysakowicz
>  wrote:
>> @Yun
>>
>> I think it is a good comment with I agree in principal. However, we use 
>> --fromSavepoint (cli), savepointPath (REST API), SavepointRestoreSettings 
>> for both restoring from a savepoint and an externalized checkpoint already. 
>> I wanted to voice that concern. Nevertheless I am fine with changing it to 
>> execution.restore-mode, if there are no other comments on that matter, I 
>> will change it.
>>
>> @Roman:
>>
>> Re 1. Correct, stop-with-savepoint should commit side-effects. Will add that 
>> to the doc.
>>
>> Re.2 What I don't like about this counter proposal is that it still has no 
>> clearly defined point in time when it is safe to delete the original 
>> checkpoint. Users would have a hard time reasoning about it and debugging. 
>> Even worse, I think worst case it might never happen that all the original 
>> files are no longer in use (I am not too familiar with RocksDB compaction, 
>> but what happens if there are key ranges that are never accessed again?) I 
>> agree it is unlikely, but possible, isn't it? Definitely it can take a 
>> significant time and many checkpoints to do so.
>>
>> Re. 3 I believe where you are coming from is that you'd like to keep the 
>> checkpointing time minimal and reuploading files may increase it. The 
>> proposal so far builds on the assumption we could in most cases use a cheap 
>> duplicate API instead of re-upload. I could see this as a follow-up if it 
>> becomes a bottleneck. It would be a bit invasive though, as we would have to 
>> somehow keep track which files should not be reused on TMs.
>>
>> Re. 2 & 3 Neither of the counter proposals work well for taking incremental 
>> savepoints. We were thinking of building incremental savepoints on the same 
>> concept. I think delaying the completion of an independent savepoint to a 
>> closer undefined future is not a nice property of savepoints.
>>
>> Re 4. Good point. We should make sure the first completed checkpoint has the 
>> independent/full checkpoint property rather than just the first triggered.
>>
>> Re. 5 & 6 I need a bit more time to look into it.
>>
>> Best,
>>
>> Dawid
>>
>> On 22/11/2021 11:40, Roman Khachatryan wrote:
>>
>> Hi,
>>
>> Thanks for the proposal Dawid, I have some questions and remarks:
>>
>> 1. How will stop-with-savepoint be handled?
>> Shouldn't side effects be enforced in this case? (i.e. send
>> notifyCheckpointComplete)
>>
>> 2. Instead of forcing re-upload, can we "inverse control" in no-claim mode?
>> Anyways, an

[jira] [Created] (FLINK-25007) Session window with dynamic gap doesn't work

2021-11-22 Thread Ori Popowski (Jira)
Ori Popowski created FLINK-25007:


 Summary: Session window with dynamic gap doesn't work
 Key: FLINK-25007
 URL: https://issues.apache.org/jira/browse/FLINK-25007
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.12.0
 Environment: Local environment
Reporter: Ori Popowski


I am creating a simple application with events firing every 15 seconds. I 
created a {{

SessionWindowTimeGapExtractor}} which returns 90 minutes, but after the 4th 
event, it should return 1 millisecond. I expected that after the 4th event, a 
session window will trigger, but it's not what happens. In reality the session 
window never triggers, even though after the 4th event, the session gap is 
effectively 1 millisecond and the interval between events is 15 seconds.

 
{code:java}
object Main {

  def main(args: Array[String]): Unit = {
val senv = StreamExecutionEnvironment.getExecutionEnvironment
senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

val now = Instant.now()

senv
  .addSource(new Source(now))
  .assignAscendingTimestamps(_.time.toEpochMilli)
  .keyBy(_ => 1)
  .window(DynamicEventTimeSessionWindows.withDynamicGap(new 
SessionWindowTimeGapExtractor[Element] {
override def extract(element: Element): Long = {
  if (element.sessionEnd) 1
  else 90.minutes.toMillis
}
  }))
  .process(new ProcessWindowFunction[Element, Vector[Element], Int, 
TimeWindow] {
override def process(k: Int, context: Context, elements: 
Iterable[Element], out: Collector[Vector[Element]]): Unit = {
  out.collect(elements.toVector)
}
  })
  .print()

senv.execute()
  }
}

case class Element(id: Int, time: Instant, sessionEnd: Boolean = false)

class Source(now: Instant) extends RichSourceFunction[Element] {
  @volatile private var isRunning = true
  private var totalInterval = 0L
  private var i = 0

  override def run(ctx: SourceFunction.SourceContext[Element]): Unit = {
while (isRunning) {
  val element = Element(i, now.plusMillis(totalInterval))

  if (i >= 4) ctx.collect(element.copy(sessionEnd = true))
  else ctx.collect(element)

  i += 1
  totalInterval += 15.seconds.toMillis
  Thread.sleep(15.seconds.toMillis)
}
  }

  override def cancel(): Unit = {
isRunning = false
  }
}{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[DISCUSS] Deprecate Java 8 support

2021-11-22 Thread Chesnay Schepler
As with any Java version there comes a time when one needs to ask 
themselves for how long one intends to stick with it.


With Java 17 being released 2 months ago, and the active support for 
Java 8 ending in 4 months, it is time for us to think about that with 
regard to Java 8.


As such I'd like to start a discussion about deprecating support for 
Java 8 in 1.15.
We do not need to arrive with an exact date for the removal in this 
discussion; the main goal is to signal to users that they should 
(prepare to) migrate to Java 11 _now_.
That said, we should consider dropping the support entirely in the next 
2-3 releases.


There are of course some problems that we are already aware of, like the 
Hive/Hbase connectors that currently do not support Java 11.
However, we mustn't hold back the entire project because of external 
projects that are (way to) slow to adapt.
Maybe us deprecating Java 8 would also add a bit of pressure to get on 
with it.


There are numerous advantages that properly migrating to Java 11 would 
bring us (simplify build system, easier support for Java 17, all the API 
goodies of Java 9-11, new garbage collectors (Epsilon/ZGC/Shenandoah)).


Let me know what you think.



Re: [DISCUSS] Shall casting functions return null or throw exceptions for invalid input

2021-11-22 Thread Francesco Guardiani
> NULL in SQL essentially means "UNKNOWN", it's not as scary as a null in
java which will easily cause a NPE or some random behavior with a c++
function call.

This is true from the user point of view, except our runtime doesn't treat
null as some value where you can safely execute operations and get "noop"
results. In our runtime null is Java's null, hence causing issues and
generating NPEs here and there when nulls are not expected.

> It will really create a big mess after users upgrade their SQL jobs

This is what I don't really understand here: how adding a configuration
option causes issues here? We make it very clear in our release notes that
you need to switch that flag if you're relying on this behavior and that's
it: if you reprocess jobs every time you upgrade, you just flip the switch
before reprocessing and you won't have any issues. If you don't because you
use the hybrid source, either you upgrade your query or you flip the flag
and in both cases this shouldn't generate any issue.
Since it's a big change, I also expect to keep this flag for some releases,
at least up to Flink 2.

On Sat, Nov 20, 2021 at 7:25 AM Kurt Young  wrote:

> Hi Francesco,
>
> Thanks for sharing your opinion about this and examples with other
> programming
> languages. I just want to mention, that NULL in SQL world is a bit
> different with the
> meaning in programming languages like java.
>
> NULL in SQL essentially means "UNKNOWN", it's not as scary as a null in
> java which
> will easily cause a NPE or some random behavior with a c++ function call.
> UNKNOWN
> means it could be any value. In java, the condition "null == null" always
> return true. But
> in SQL, it returns NULL, which means UNKNOWN.
>
> Another example, if you run following statements:
> select 'true' where 3 in (1, 2, 3, null) // this will print true
> select 'true' where 3 not in (1, 2, null) // this won't print anything
>
> In summary, SQL's NULL is a bit different from others, it has its own
> meaning. So I won't
> compare the behavior of returning NULL with programming languages and then
> judge it
> as bad behavior. And it's not a very big deal if we return NULL when trying
> to cast "abc"
> to an integer, which means we don't know the correct value.
>
> But still, I'm ok to change the behavior, but just not now. It will really
> create a big mess after
> users upgrade their SQL jobs. I'm either fine to do it in some really big
> version change like
> Flink 2.0, or we can do it after we have some universal error records
> handling mechanism, so
> in that way, users could have a chance to handle such a situation.
>
> Best,
> Kurt
>
>
> On Fri, Nov 19, 2021 at 7:29 PM Francesco Guardiani <
> france...@ververica.com>
> wrote:
>
> > Hi all,
> >
> > tl;dr:
> >
> > I think Timo pretty much said it all. As described in the issue, my
> > proposal is:
> >
> > * Let's switch the default behavior of CAST to fail
> > * Let's add TRY_CAST to have the old behavior
> > * Let's add a rule (disabled by default) that wraps every CAST in a TRY,
> in
> > order to keep the old behavior.
> > * Let's put a giant warning in the release notes explaining to enable the
> > rule, in case you're depending on the old behavior
> >
> > This way, we break no SQL scripts, as you can apply this flag to every
> > previously running script. We can also think to another strategy, more
> than
> > the planner rule, to keep the old behavior, always behind a flag disabled
> > by default.
> >
> > Timing of this proposal is also crucial, since CAST is a basic primitive
> of
> > our language and, after we have the upgrade story in place, this is going
> > to be a whole more harder to deal with.
> >
> > And I would say that in the next future, we should start thinking to
> > support proper error handling strategies, that is:
> >
> > * How users are supposed to handle records that fails an expression
> > computation, aggregation, etc?
> > * Can we provide some default strategies, like log and discard, send to a
> > dead letter queue?
> >
> > Now let me go a bit more deep in the reason we really need such change:
> >
> > For me the issue is not really about being compliant with the SQL
> standard
> > or not, or the fact that other databases behaves differently from us, but
> > the fact that the CAST function we have right now is effectively a
> footgun
> >  for our users.
> > The concept of casting one value to another inherently involves some
> > concept of failure, this is something as a programmer I expect, exactly
> > like when dividing a value by 0 or when sending a message to an external
> > system. And this is why every programming language has some explicit way
> to
> > signal to you such failures exist and, some of them, even force you to
> deal
> > with such failures, e.g. Java has the Exceptions and the try catch block,
> > Rust has the ? operator, Golang returns you an error together with the
> > result. Not failing when a failure is inher

Re: [DISCUSS] Deprecate Java 8 support

2021-11-22 Thread Martijn Visser
Hi Chesnay,

Thanks for bringing this up for discussion. Big +1 for dropping Java 8 and
deprecating it in 1.15, given that Java 8 support will end. We already see
other dependencies that Flink use either have dropped support for Java 8
(Trino) or are going to drop it (Kafka).

Best regards,

Martijn

On Mon, 22 Nov 2021 at 16:28, Chesnay Schepler  wrote:

> As with any Java version there comes a time when one needs to ask
> themselves for how long one intends to stick with it.
>
> With Java 17 being released 2 months ago, and the active support for
> Java 8 ending in 4 months, it is time for us to think about that with
> regard to Java 8.
>
> As such I'd like to start a discussion about deprecating support for
> Java 8 in 1.15.
> We do not need to arrive with an exact date for the removal in this
> discussion; the main goal is to signal to users that they should
> (prepare to) migrate to Java 11 _now_.
> That said, we should consider dropping the support entirely in the next
> 2-3 releases.
>
> There are of course some problems that we are already aware of, like the
> Hive/Hbase connectors that currently do not support Java 11.
> However, we mustn't hold back the entire project because of external
> projects that are (way to) slow to adapt.
> Maybe us deprecating Java 8 would also add a bit of pressure to get on
> with it.
>
> There are numerous advantages that properly migrating to Java 11 would
> bring us (simplify build system, easier support for Java 17, all the API
> goodies of Java 9-11, new garbage collectors (Epsilon/ZGC/Shenandoah)).
>
> Let me know what you think.
>
>


Re: [DISCUSS] Deprecate Java 8 support

2021-11-22 Thread Ingo Bürk
Hi,

also a +1 from me because of everything Chesnay already said.


Ingo

On Mon, Nov 22, 2021 at 4:41 PM Martijn Visser 
wrote:

> Hi Chesnay,
>
> Thanks for bringing this up for discussion. Big +1 for dropping Java 8 and
> deprecating it in 1.15, given that Java 8 support will end. We already see
> other dependencies that Flink use either have dropped support for Java 8
> (Trino) or are going to drop it (Kafka).
>
> Best regards,
>
> Martijn
>
> On Mon, 22 Nov 2021 at 16:28, Chesnay Schepler  wrote:
>
> > As with any Java version there comes a time when one needs to ask
> > themselves for how long one intends to stick with it.
> >
> > With Java 17 being released 2 months ago, and the active support for
> > Java 8 ending in 4 months, it is time for us to think about that with
> > regard to Java 8.
> >
> > As such I'd like to start a discussion about deprecating support for
> > Java 8 in 1.15.
> > We do not need to arrive with an exact date for the removal in this
> > discussion; the main goal is to signal to users that they should
> > (prepare to) migrate to Java 11 _now_.
> > That said, we should consider dropping the support entirely in the next
> > 2-3 releases.
> >
> > There are of course some problems that we are already aware of, like the
> > Hive/Hbase connectors that currently do not support Java 11.
> > However, we mustn't hold back the entire project because of external
> > projects that are (way to) slow to adapt.
> > Maybe us deprecating Java 8 would also add a bit of pressure to get on
> > with it.
> >
> > There are numerous advantages that properly migrating to Java 11 would
> > bring us (simplify build system, easier support for Java 17, all the API
> > goodies of Java 9-11, new garbage collectors (Epsilon/ZGC/Shenandoah)).
> >
> > Let me know what you think.
> >
> >
>


Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
There is one more fundamental issue with either of your two proposals
that've just came to my mind. What happens if you have externalized
checkpoints and the job fails before the initial checkpoint can be
safely removed? You have a situation where you have a retained
checkpoint that was built on top of the original one. Basically ending
in a situation we have right now that you never know when it is safe to
delete a retained checkpoint.

BTW, the intention for the "claim" mode was to support cases when users
are concerned with the performance of the first checkpoint. In those
cases they can claim the checkpoint on don't pay the additional cost of
the first checkpoint.

Best,

Dawid

On 22/11/2021 14:09, Roman Khachatryan wrote:
> Thanks Dawid,
>
> Regarding clarity,
> I think that all proposals require waiting for some event: re-upload /
> checkpoint completion / api response.
> But with the current one, there is an assumption: "initial checkpoint
> can be deleted once a new one completes" (instead of just "initial
> checkpoint can be deleted once the API says it can be deleted").
> So I think it's actually more clear to offer this explicit API and rely on it.
>
> Regarding delaying the deletion,
> I agree that it can delay deletion, but how important is it?
> Checkpoints are usually stored on relatively cheap storage like S3, so
> some delay shouldn't be an issue (especially taking rounding into
> account); it can even be cheaper or comparable to paying for
> re-upload/duplicate calls.
>
> Infinite delay can be an issue though, I agree.
> Maybe @Yun can clarify the likelihood of never deleting some SST files
> by RocksDB?
> For the changelog backend, old files won't be used once
> materialization succeeds.
>
> Yes, my concern is checkpointing time, but also added complexity:
>> It would be a bit invasive though, as we would have to somehow keep track 
>> which files should not be reused on TMs.
> I think we need this anyway if we choose to re-upload files once the
> job is running.
> The new checkpoint must be formed by re-uploaded old artifacts AND
> uploaded new artifacts.
>
>
> Regards,
> Roman
>
>
> On Mon, Nov 22, 2021 at 12:42 PM Dawid Wysakowicz
>  wrote:
>> @Yun
>>
>> I think it is a good comment with I agree in principal. However, we use 
>> --fromSavepoint (cli), savepointPath (REST API), SavepointRestoreSettings 
>> for both restoring from a savepoint and an externalized checkpoint already. 
>> I wanted to voice that concern. Nevertheless I am fine with changing it to 
>> execution.restore-mode, if there are no other comments on that matter, I 
>> will change it.
>>
>> @Roman:
>>
>> Re 1. Correct, stop-with-savepoint should commit side-effects. Will add that 
>> to the doc.
>>
>> Re.2 What I don't like about this counter proposal is that it still has no 
>> clearly defined point in time when it is safe to delete the original 
>> checkpoint. Users would have a hard time reasoning about it and debugging. 
>> Even worse, I think worst case it might never happen that all the original 
>> files are no longer in use (I am not too familiar with RocksDB compaction, 
>> but what happens if there are key ranges that are never accessed again?) I 
>> agree it is unlikely, but possible, isn't it? Definitely it can take a 
>> significant time and many checkpoints to do so.
>>
>> Re. 3 I believe where you are coming from is that you'd like to keep the 
>> checkpointing time minimal and reuploading files may increase it. The 
>> proposal so far builds on the assumption we could in most cases use a cheap 
>> duplicate API instead of re-upload. I could see this as a follow-up if it 
>> becomes a bottleneck. It would be a bit invasive though, as we would have to 
>> somehow keep track which files should not be reused on TMs.
>>
>> Re. 2 & 3 Neither of the counter proposals work well for taking incremental 
>> savepoints. We were thinking of building incremental savepoints on the same 
>> concept. I think delaying the completion of an independent savepoint to a 
>> closer undefined future is not a nice property of savepoints.
>>
>> Re 4. Good point. We should make sure the first completed checkpoint has the 
>> independent/full checkpoint property rather than just the first triggered.
>>
>> Re. 5 & 6 I need a bit more time to look into it.
>>
>> Best,
>>
>> Dawid
>>
>> On 22/11/2021 11:40, Roman Khachatryan wrote:
>>
>> Hi,
>>
>> Thanks for the proposal Dawid, I have some questions and remarks:
>>
>> 1. How will stop-with-savepoint be handled?
>> Shouldn't side effects be enforced in this case? (i.e. send
>> notifyCheckpointComplete)
>>
>> 2. Instead of forcing re-upload, can we "inverse control" in no-claim mode?
>> Anyways, any external tool will have to poll Flink API waiting for the
>> next (full) checkpoint, before deleting the retained checkpoint,
>> right?
>> Instead, we can provide an API which tells whether the 1st checkpoint
>> is still in use (and not force re-upload it).
>>
>> Under the hood, it can wo

Re: [DISCUSS] Conventions on assertions to use in tests

2021-11-22 Thread Francesco Guardiani
Hi all,

Given I see generally consensus around having a convention and using
assertj, I propose to merge these 2 PRs:

* Add the explanation of this convention in our code quality guide:
https://github.com/apache/flink-web/pull/482
* Add assertj to dependency management in the parent pom and link in the PR
template the code quality guide: https://github.com/apache/flink/pull/17871

WDYT?

Once we merge those, I'll work in the next days to add some custom
assertions in table-common for RowData and Row (commonly asserted
everywhere in the table codebase).

@Matthias Pohl  about the confluence page, it seems
a bit outdated, judging from the last modified date. I propose to continue
to use this guide
https://flink.apache.org/contributing/code-style-and-quality-common.html as
it seems more complete.


On Mon, Nov 22, 2021 at 8:58 AM Matthias Pohl 
wrote:

> Agree. Clarifying once more what our preferred option is here, is a good
> idea. So, +1 for unification. I don't have a strong opinion on what
> framework to use. But we may want to add this at the end of the discussion
> to our documentation (e.g. [1] or maybe the PR description?) to make users
> aware of it and be able to provide a reference in case it comes up again
> (besides this ML thread). Or do we already have something like that
> somewhere in the docs where I missed it?
>
> Matthias
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Best+Practices+and+Lessons+Learned
>
> On Wed, Nov 17, 2021 at 11:13 AM Marios Trivyzas  wrote:
>
>> I'm also +1 both for unification and specifically for assertJ.
>> I think it covers a wide variety of assertions and as Francesco mentioned
>> it's easily extensible, so that
>> we can create custom assertions where needed, and avoid repeating test
>> code.
>>
>> On Tue, Nov 16, 2021 at 9:57 AM David Morávek  wrote:
>>
>> > I don't have any strong opinions on the asserting framework that we use,
>> > but big +1 for the unification.
>> >
>> > Best,
>> > D.
>> >
>> > On Tue, Nov 16, 2021 at 9:37 AM Till Rohrmann 
>> > wrote:
>> >
>> > > Using JUnit5 with assertJ is fine with me if the community agrees.
>> Having
>> > > guides for best practices would definitely help with the transition.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Mon, Nov 15, 2021 at 5:34 PM Francesco Guardiani <
>> > > france...@ververica.com>
>> > > wrote:
>> > >
>> > > > > It is a bit unfortunate that we have tests that follow different
>> > > > patterns.
>> > > > This, however, is mainly due to organic growth. I think the
>> community
>> > > > started with Junit4, then we chose to use Hamcrest because of its
>> > better
>> > > > expressiveness.
>> > > >
>> > > > That is fine, I'm sorry if my mail felt like a rant :)
>> > > >
>> > > > > Personally, I don't have a strong preference for which testing
>> tools
>> > to
>> > > > use. The important bit is that we agree as a community, then
>> document
>> > the
>> > > > choice and finally stick to it. So before starting to use assertj,
>> we
>> > > > should probably align with the folks working on the Junit5 effort
>> > first.
>> > > >
>> > > > As Arvid pointed out, using assertj might help the people working on
>> > the
>> > > > junit5 effort as well, since assertj works seamlessly with junit4,
>> > junit5
>> > > > and even other java testing frameworks.
>> > > >
>> > > > > But I'm not sure if it's wise to change everything at once also
>> > > > from the perspective of less active contributors. We may alleviate
>> that
>> > > > pain by providing good guides though. So maybe, we should also
>> include
>> > a
>> > > > temporal dimension into the discussion.
>> > > >
>> > > > This is why I'm proposing a convention and not a rewrite of all the
>> > tests
>> > > > at once, that's unfeasible. As you sad, we can provide guides, like
>> in
>> > > our
>> > > > contribution guides, explaining our assertion convention, that is
>> use
>> > > > assertj or whatever other library we want to use and how. So then we
>> > can
>> > > > ask contributors to use such assertion convention when they PR new
>> > tests
>> > > or
>> > > > when they modify existing ones. Something like that:
>> > > >
>> > >
>> >
>> https://github.com/apache/flink-web/commit/87c572ccd4e0ae48eeff3eb15ad9847d302e659d
>> > > >
>> > > > On Fri, Nov 12, 2021 at 5:07 PM Arvid Heise 
>> wrote:
>> > > >
>> > > >> JUnit5 migration is currently mostly prepared. The rules are being
>> > > >> migrated
>> > > >> [1] and Hang and Qingsheng have migrated most tests in their branch
>> > > afaik
>> > > >> (Kudos to them!).
>> > > >>
>> > > >> Using assertj would make migration easier as it's independent of
>> the
>> > > JUnit
>> > > >> version. But the same can be said about hamcrest, albeit less
>> > > expressive.
>> > > >>
>> > > >> I'm personally in favor of assertj (disclaimer I contributed to the
>> > > >> project
>> > > >> a bit). But I'm not sure if it's wise to change everything at once
>> > also
>> > > >> from the perspective of less active cont

[jira] [Created] (FLINK-25008) Improve behaviour of NotNullEnforcer when dropping records

2021-11-22 Thread Marios Trivyzas (Jira)
Marios Trivyzas created FLINK-25008:
---

 Summary: Improve behaviour of NotNullEnforcer when dropping records
 Key: FLINK-25008
 URL: https://issues.apache.org/jira/browse/FLINK-25008
 Project: Flink
  Issue Type: Improvement
Reporter: Marios Trivyzas


By default *NotNullEnforcer* is configured as *ERROR,* so if a record with 
*null* value(s) for the corresponding column(s) marked as *NOT NULL* is 
processed an error is thrown. User can change the configuration and choose 
*DROP* so those records would be silently dropped and not end up in the sink.

Maybe it worths adding another option, like *LOG_AND_DROP* so that those 
records are not silently dropped, but instead end up in some log and facilitate 
easier debugging or post-processing of the pipeline.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Roman Khachatryan
> If you assume the 1st checkpoint needs to be "full" you know you are not 
> allowed to use any shared files.
> It's true you should know about the shared files of the previous checkpoint, 
> but e.g. RocksDB already tracks that.

I mean that the design described by FLIP implies the following (PCIIW):
1. treat SST files from the initial checkpoint specially: re-upload or
send placeholder - depending on those attributes in state handle
2. (SST files from newer checkpoints are re-uploaded depending on
confirmation currently; so yes there is tracking, but it's different)
3. SharedStateRegistry must allow replacing state under the existing
key; otherwise, if a new key is used then other parallel subtasks
should learn somehow this key and use it; However, allowing
replacement must be limited to this scenario, otherwise it can lead to
previous checkpoint corruption in normal cases

Forcing a full checkpoint after completing N checkpoints instead of
immediately would only require enabling (1) after N checkpoints.
And with the "poll API until checkpoint released" approach, those
changes aren't necessary.

> There is one more fundamental issue with either of your two proposals that've 
> just came to my mind.
> What happens if you have externalized checkpoints and the job fails before 
> the initial checkpoint can be safely removed?

You start the job from the latest created checkpoint and wait for it
to be allowed for deletion. Then you can delete it, and all previous
checkpoints (or am I missing something?)

> With tracking the shared files on JM you can not say if you can clear the 
> files after couple of checkpoints or 10s, 100s or 1000s,
> which translates into minutes/hours/days/weeks of processing.
This doesn't necessarily translate into higher cost (because of saved
RPC etc., as I mentioned above).
However, I do agree that an infinite or arbitrary high delay is unacceptable.

The added complexity above doesn't seem negligible to me (especially
in SharedStateHandle); and should therefore be weighted against those
operational disadvantages (given that the number of checkpoints to
wait is bounded in practice).

Regards,
Roman




On Mon, Nov 22, 2021 at 5:05 PM Dawid Wysakowicz  wrote:
>
> There is one more fundamental issue with either of your two proposals
> that've just came to my mind. What happens if you have externalized
> checkpoints and the job fails before the initial checkpoint can be
> safely removed? You have a situation where you have a retained
> checkpoint that was built on top of the original one. Basically ending
> in a situation we have right now that you never know when it is safe to
> delete a retained checkpoint.
>
> BTW, the intention for the "claim" mode was to support cases when users
> are concerned with the performance of the first checkpoint. In those
> cases they can claim the checkpoint on don't pay the additional cost of
> the first checkpoint.
>
> Best,
>
> Dawid
>
> On 22/11/2021 14:09, Roman Khachatryan wrote:
> > Thanks Dawid,
> >
> > Regarding clarity,
> > I think that all proposals require waiting for some event: re-upload /
> > checkpoint completion / api response.
> > But with the current one, there is an assumption: "initial checkpoint
> > can be deleted once a new one completes" (instead of just "initial
> > checkpoint can be deleted once the API says it can be deleted").
> > So I think it's actually more clear to offer this explicit API and rely on 
> > it.
> >
> > Regarding delaying the deletion,
> > I agree that it can delay deletion, but how important is it?
> > Checkpoints are usually stored on relatively cheap storage like S3, so
> > some delay shouldn't be an issue (especially taking rounding into
> > account); it can even be cheaper or comparable to paying for
> > re-upload/duplicate calls.
> >
> > Infinite delay can be an issue though, I agree.
> > Maybe @Yun can clarify the likelihood of never deleting some SST files
> > by RocksDB?
> > For the changelog backend, old files won't be used once
> > materialization succeeds.
> >
> > Yes, my concern is checkpointing time, but also added complexity:
> >> It would be a bit invasive though, as we would have to somehow keep track 
> >> which files should not be reused on TMs.
> > I think we need this anyway if we choose to re-upload files once the
> > job is running.
> > The new checkpoint must be formed by re-uploaded old artifacts AND
> > uploaded new artifacts.
> >
> >
> > Regards,
> > Roman
> >
> >
> > On Mon, Nov 22, 2021 at 12:42 PM Dawid Wysakowicz
> >  wrote:
> >> @Yun
> >>
> >> I think it is a good comment with I agree in principal. However, we use 
> >> --fromSavepoint (cli), savepointPath (REST API), SavepointRestoreSettings 
> >> for both restoring from a savepoint and an externalized checkpoint 
> >> already. I wanted to voice that concern. Nevertheless I am fine with 
> >> changing it to execution.restore-mode, if there are no other comments on 
> >> that matter, I will change it.
> >>
> >> @Roman:
> >>

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Dawid Wysakowicz
There is one more fundamental issue with either of your two
proposals that've just came to my mind.
What happens if you have externalized checkpoints and the job fails
before the initial checkpoint can be safely removed?

You start the job from the latest created checkpoint and wait for it
to be allowed for deletion. Then you can delete it, and all previous
checkpoints (or am I missing something?)


Let me clarify it with an example. You start with chk-42, Flink takes
e.g. three checkpoints chk-43, chk-44, chk-45 all still reference chk-42
files. After that it fails. We have externalized checkpoints enabled,
therefore we have retained all checkpoints. Users starts a new program
from let's say chk-45. At this point your proposal does not give the
user any help in regards when chk-42 can be safely removed. (This is
also how Flink works right now).

To make it even harder you can arbitrarily complicate it, 1) start a job
from chk-44, 2) start a job from a chk-47 which depends on chk-45, 3)
never start a job from chk-44, it is not claimed by any job, thus it is
never deleted, users must remember themselves that chk-44 originated
from chk-42 etc.) User would be forced to build a lineage system for
checkpoints to track which checkpoints depend on each other.

I mean that the design described by FLIP implies the following (PCIIW):
1. treat SST files from the initial checkpoint specially: re-upload or
send placeholder - depending on those attributes in state handle
2. (SST files from newer checkpoints are re-uploaded depending on
confirmation currently; so yes there is tracking, but it's different)
3. SharedStateRegistry must allow replacing state under the existing
key; otherwise, if a new key is used then other parallel subtasks
should learn somehow this key and use it; However, allowing
replacement must be limited to this scenario, otherwise it can lead to
previous checkpoint corruption in normal cases

I might not understand your points, but I don't think FLIP implies any
of this. The FLIP suggests to send along with the CheckpointBarrier a
flag "force full checkpoint". Then the state backend should respect it
and should not use any of the previous shared handles. Now let me
explain how that would work for RocksDB incremental checkpoints.

 1. Simplest approach: upload all local RocksDB files. This works
exactly the same as the first incremental checkpoint for a fresh start.
 2. Improvement on 1) we already do know which files were uploaded for
the initial checkpoint. Therefore instead of uploading the local
files that are same with files uploaded for the initial checkpoint
we call duplicate for those files and upload just the diff.

It does not require any changes to the SharedStateRegistry nor to state
handles, at least for RocksDB.

Best,

Dawid


On 22/11/2021 19:33, Roman Khachatryan wrote:
>> If you assume the 1st checkpoint needs to be "full" you know you are not 
>> allowed to use any shared files.
>> It's true you should know about the shared files of the previous checkpoint, 
>> but e.g. RocksDB already tracks that.
> I mean that the design described by FLIP implies the following (PCIIW):
> 1. treat SST files from the initial checkpoint specially: re-upload or
> send placeholder - depending on those attributes in state handle
> 2. (SST files from newer checkpoints are re-uploaded depending on
> confirmation currently; so yes there is tracking, but it's different)
> 3. SharedStateRegistry must allow replacing state under the existing
> key; otherwise, if a new key is used then other parallel subtasks
> should learn somehow this key and use it; However, allowing
> replacement must be limited to this scenario, otherwise it can lead to
> previous checkpoint corruption in normal cases
>
> Forcing a full checkpoint after completing N checkpoints instead of
> immediately would only require enabling (1) after N checkpoints.
> And with the "poll API until checkpoint released" approach, those
> changes aren't necessary.
>
>> There is one more fundamental issue with either of your two proposals 
>> that've just came to my mind.
>> What happens if you have externalized checkpoints and the job fails before 
>> the initial checkpoint can be safely removed?
> You start the job from the latest created checkpoint and wait for it
> to be allowed for deletion. Then you can delete it, and all previous
> checkpoints (or am I missing something?)
>
>> With tracking the shared files on JM you can not say if you can clear the 
>> files after couple of checkpoints or 10s, 100s or 1000s,
>> which translates into minutes/hours/days/weeks of processing.
> This doesn't necessarily translate into higher cost (because of saved
> RPC etc., as I mentioned above).
> However, I do agree that an infinite or arbitrary high delay is unacceptable.
>
> The added complexity above doesn't seem negligible to me (especially
> in SharedStateHandle); and should th

[jira] [Created] (FLINK-25009) Output slotSharingGroup as part of JsonGraph

2021-11-22 Thread Xinbin Huang (Jira)
Xinbin Huang created FLINK-25009:


 Summary: Output slotSharingGroup as part of JsonGraph
 Key: FLINK-25009
 URL: https://issues.apache.org/jira/browse/FLINK-25009
 Project: Flink
  Issue Type: Improvement
Reporter: Xinbin Huang


`flink info` currently doesn't output the slotSharingGroup information for each 
operator. This makes it impossible to derive total parallelism from the json 
graph when slo sharing group is explicitly set for the job.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Shall casting functions return null or throw exceptions for invalid input

2021-11-22 Thread Kurt Young
> This is what I don't really understand here: how adding a configuration
option causes issues here?
This is why: for most Flink production use cases I see, it's not like a
couple of people manage ~5 Flink
jobs, so they can easily track all the big changes in every minor Flink
version. Typically use case are like
a group of people managing some streaming platform, which will provide
Flink as an execution engine
to their users. Alibaba has more than 40K online streaming SQL jobs, and
ByteDance also has a similar
number. Most of the time, whether upgrading Flink version will be
controlled by the user of the platform,
not the platform provider. The platform will most likely provide multiple
Flink version support.

Even if you can count on the platform provider to read all the release
notes carefully, their users won't. So
we are kind of throw the responsibility to all the platform provider, make
them to take care of the semantic
changes. They have to find some good way to control the impactions when
their users upgrade Flink's version.
And if they don't find a good solution around this, and their users
encounter some online issues, they will be
blamed. And you can guess who they would blame.

Flink is a very popular engine now, every decision we make will affect the
users a lot. If you want them to make
some changes, I would argue we should make them think it's worth it.

Best,
Kurt


On Mon, Nov 22, 2021 at 11:29 PM Francesco Guardiani <
france...@ververica.com> wrote:

> > NULL in SQL essentially means "UNKNOWN", it's not as scary as a null in
> java which will easily cause a NPE or some random behavior with a c++
> function call.
>
> This is true from the user point of view, except our runtime doesn't treat
> null as some value where you can safely execute operations and get "noop"
> results. In our runtime null is Java's null, hence causing issues and
> generating NPEs here and there when nulls are not expected.
>
> > It will really create a big mess after users upgrade their SQL jobs
>
> This is what I don't really understand here: how adding a configuration
> option causes issues here? We make it very clear in our release notes that
> you need to switch that flag if you're relying on this behavior and that's
> it: if you reprocess jobs every time you upgrade, you just flip the switch
> before reprocessing and you won't have any issues. If you don't because you
> use the hybrid source, either you upgrade your query or you flip the flag
> and in both cases this shouldn't generate any issue.
> Since it's a big change, I also expect to keep this flag for some releases,
> at least up to Flink 2.
>
> On Sat, Nov 20, 2021 at 7:25 AM Kurt Young  wrote:
>
> > Hi Francesco,
> >
> > Thanks for sharing your opinion about this and examples with other
> > programming
> > languages. I just want to mention, that NULL in SQL world is a bit
> > different with the
> > meaning in programming languages like java.
> >
> > NULL in SQL essentially means "UNKNOWN", it's not as scary as a null in
> > java which
> > will easily cause a NPE or some random behavior with a c++ function call.
> > UNKNOWN
> > means it could be any value. In java, the condition "null == null" always
> > return true. But
> > in SQL, it returns NULL, which means UNKNOWN.
> >
> > Another example, if you run following statements:
> > select 'true' where 3 in (1, 2, 3, null) // this will print true
> > select 'true' where 3 not in (1, 2, null) // this won't print anything
> >
> > In summary, SQL's NULL is a bit different from others, it has its own
> > meaning. So I won't
> > compare the behavior of returning NULL with programming languages and
> then
> > judge it
> > as bad behavior. And it's not a very big deal if we return NULL when
> trying
> > to cast "abc"
> > to an integer, which means we don't know the correct value.
> >
> > But still, I'm ok to change the behavior, but just not now. It will
> really
> > create a big mess after
> > users upgrade their SQL jobs. I'm either fine to do it in some really big
> > version change like
> > Flink 2.0, or we can do it after we have some universal error records
> > handling mechanism, so
> > in that way, users could have a chance to handle such a situation.
> >
> > Best,
> > Kurt
> >
> >
> > On Fri, Nov 19, 2021 at 7:29 PM Francesco Guardiani <
> > france...@ververica.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > tl;dr:
> > >
> > > I think Timo pretty much said it all. As described in the issue, my
> > > proposal is:
> > >
> > > * Let's switch the default behavior of CAST to fail
> > > * Let's add TRY_CAST to have the old behavior
> > > * Let's add a rule (disabled by default) that wraps every CAST in a
> TRY,
> > in
> > > order to keep the old behavior.
> > > * Let's put a giant warning in the release notes explaining to enable
> the
> > > rule, in case you're depending on the old behavior
> > >
> > > This way, we break no SQL scripts, as you can apply this flag to every
> > > previously running script. We ca

Re: [DISCUSS] Deprecate Java 8 support

2021-11-22 Thread Jingsong Li
Hi Chesnay,

Thanks for bringing this for discussion.

We should dig deeper into the current Java version of Flink users. At
least make sure Java 8 is not a mainstream version.

Receiving this signal, the user may be unhappy because his application
may be all on Java 8. Upgrading is a big job, after all, many systems
have not been upgraded yet. (Like you said, HBase and Hive)

In my opinion, it is too early to deprecate support for Java 8. We
should wait for a safer point in time.

On Mon, Nov 22, 2021 at 11:45 PM Ingo Bürk  wrote:
>
> Hi,
>
> also a +1 from me because of everything Chesnay already said.
>
>
> Ingo
>
> On Mon, Nov 22, 2021 at 4:41 PM Martijn Visser 
> wrote:
>
> > Hi Chesnay,
> >
> > Thanks for bringing this up for discussion. Big +1 for dropping Java 8 and
> > deprecating it in 1.15, given that Java 8 support will end. We already see
> > other dependencies that Flink use either have dropped support for Java 8
> > (Trino) or are going to drop it (Kafka).
> >
> > Best regards,
> >
> > Martijn
> >
> > On Mon, 22 Nov 2021 at 16:28, Chesnay Schepler  wrote:
> >
> > > As with any Java version there comes a time when one needs to ask
> > > themselves for how long one intends to stick with it.
> > >
> > > With Java 17 being released 2 months ago, and the active support for
> > > Java 8 ending in 4 months, it is time for us to think about that with
> > > regard to Java 8.
> > >
> > > As such I'd like to start a discussion about deprecating support for
> > > Java 8 in 1.15.
> > > We do not need to arrive with an exact date for the removal in this
> > > discussion; the main goal is to signal to users that they should
> > > (prepare to) migrate to Java 11 _now_.
> > > That said, we should consider dropping the support entirely in the next
> > > 2-3 releases.
> > >
> > > There are of course some problems that we are already aware of, like the
> > > Hive/Hbase connectors that currently do not support Java 11.
> > > However, we mustn't hold back the entire project because of external
> > > projects that are (way to) slow to adapt.
> > > Maybe us deprecating Java 8 would also add a bit of pressure to get on
> > > with it.
> > >
> > > There are numerous advantages that properly migrating to Java 11 would
> > > bring us (simplify build system, easier support for Java 17, all the API
> > > goodies of Java 9-11, new garbage collectors (Epsilon/ZGC/Shenandoah)).
> > >
> > > Let me know what you think.
> > >
> > >
> >



-- 
Best, Jingsong Lee


Re: [DISCUSS] FLIP-190: Support Version Upgrades for Table API & SQL Programs

2021-11-22 Thread godfrey he
Hi Timo,

Thanks for driving this discussion, the sql job's upgrading compatibility always
is a big pain point. In the last version we completed some work, this FLIP will
make the whole upgrade story possible.

I have a few comments:
1)  "EXPLAIN PLAN EXECUTE STATEMENT SET BEGIN ... END" is missing.
It's better we can add this syntax and make the API more complete.

2) about the annotation of the ExecNode, it's hard to maintain the supported
versions for "supportedPlanChanges" and "supportedSavepointChanges".
Imagine that, when we are upgrading Flink from 1.15 to 1.16, most ExecNodes are
not changed (high probability scenarios), but we need add supported
version (1.16)
to most (even all) ExecNodes manually. Considering that the supported
versions are
continuous, we only need annotate the start version (when the ExecNode
is introduced)
and the end version (when the change is compatible and a new ExecNode
with new version
needs to be introduced) for supportedPlanChanges and supportedSavepointChanges.
e.g. supportedSavepointChanges ={start=1_15, end=1_16}

Best,
Godfrey

wenlong.lwl  于2021年11月22日周一 下午9:07写道:
>
> Hi, Timo, thanks for driving the discussion and the preparation on the
> FLIP. This is a pain point of Flink SQL complaining by our users badly. I
> have  seen many cases where our users suffer while trying to upgrade the
> flink  version in order to take advantage of the bug fixes and performance
> improvements on the new version. It often takes a long time verifying the
> new plan,  reoptimizing the config, recomputing the state,  waiting for a
> safe point to make the new job active in production, etc. There are many
> times that new problems show up in upgrading.
>
> I have a question on COMPILE AND EXECUTE. It doesn't look so good that we
> just execute the plan and ignore the statement when the plan already
> exists, but the plan and SQL are not matched. The result would be quite
> confusing if we still execute the plan directly, we may need to add a
> validation. Personally I would prefer not to provide such a shortcut, let
> users use  COMPILE PLAN IF NOT EXISTS and EXECUTE explicitly, which can be
> understood by new users even without inferring the docs.
>
> Best,
> Wenlong


[jira] [Created] (FLINK-25010) Speed up hive's createMRSplits by multi thread

2021-11-22 Thread Liu (Jira)
Liu created FLINK-25010:
---

 Summary: Speed up hive's createMRSplits by multi thread
 Key: FLINK-25010
 URL: https://issues.apache.org/jira/browse/FLINK-25010
 Project: Flink
  Issue Type: Improvement
Reporter: Liu


We have thousands of hive partitions and the method createMRSplits will take 
much time, for example, ten minutes. We can speed up the process by multi 
thread for different partitions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-25011) Introduce VertexParallelismDecider

2021-11-22 Thread Lijie Wang (Jira)
Lijie Wang created FLINK-25011:
--

 Summary: Introduce VertexParallelismDecider
 Key: FLINK-25011
 URL: https://issues.apache.org/jira/browse/FLINK-25011
 Project: Flink
  Issue Type: Sub-task
Reporter: Lijie Wang


Introduce VertexParallelismDecider and provide a default implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Improve the name and structure of job vertex and operator name for job

2021-11-22 Thread Yun Tang
Hi Wenlong,

Thanks for the update. And +1 for this proposal.

Best
Yun Tang

On 2021/11/22 07:28:25 "wenlong.lwl" wrote:
> Hi,  Yun Gao, Yun Tang, and Aitozi,
> thanks for the suggestion again, I have added following section in proposed
> changes at
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-195%3A+Improve+the+name+and+structure+of+vertex+and+operator+name+for+job
> to
> add a prefix  [vertex-idx] for vertex name. please checkout to see whether
> you have more comments, thanks.
> 
> Add a prefix [vertex-idx]  for vertex name, so that we can easily match the
> name at WebUI and the content at log
> 
>1. idx will be the index of the job vertex order by node id of its head
>operator at StreamGraph, so the idx will not change when the program does
>not change.
> 
> 
> On Mon, 22 Nov 2021 at 14:20, Yun Gao  wrote:
> 
> > Hi,
> >
> > Very thanks Wenlong for bringing up this issue and very thanks for the
> > warm discussion! Big +1 to improve the name and structure of job vertex for
> > it make the life of the developers much easier and shorten the operator
> > name would saves a lot of money for reducing the amount of logs to store.
> >
> > And for "[vertex-x]", perhaps we could distinguish between the different
> > use cases:
> > 1. For debugging and locate the tasks / operators, it would indeed help a
> > lot, especially when there are tasks with similar name.
> > 2. For metric names, perhaps users want to keep the same name across
> > multiple runs, even if the job graph have changed.
> >
> > Thus might it be possible we instead add the [vertex-x] to the task level,
> > and have a separate index field for the JobVertex, the index would be
> > presented in the UI / thread name and some critical logs, while we still
> > leave users to have full control of the operator names?
> >
> > Best,
> > Yun
> >
> >
> > --
> > From:Aitozi 
> > Send Time:2021 Nov. 22 (Mon.) 00:01
> > To:dev 
> > Subject:Re: [DISCUSS] Improve the name and structure of job vertex and
> > operator name for job
> >
> > Hi Wenlong
> >
> > I think it's a nice work, big +1 for it (non-binding).
> >
> > Thanks to Yun Tang for mention our internal version improvements on vertex
> > name display.
> > I think add a "[vertex-x]" prefix can tell people the information about the
> > vertex sequence. The vertex-id and node-id are at different abstract
> > level after all.
> > And add an "[vertex-x]" prefix will not cause too much consume of word
> > count :) . So I'm + 1 to add "[vertex-x]" prefix to the vertex name.
> >
> > Best,
> > Aitozi
> >
> >
> > 刘建刚  于2021年11月20日周六 下午6:27写道:
> >
> > > +1 for the FLIP. We have met the problem that a long name stuck the
> > metric
> > > collection for SQL jobs.
> > >
> > > wenlong.lwl  于2021年11月19日周五 下午10:29写道:
> > >
> > > > hi, yun,
> > > > Thanks for the suggestion, but I am not sure whether we need such a
> > > prefix
> > > > or not, because the log has included vertex id, when the name is
> > concise
> > > > enough, we can get the vertex id easily.
> > > > Does anyone have some comments on this?
> > > >
> > > >
> > > > Best,
> > > > Wenlong Lyu
> > > >
> > > > On Thu, 18 Nov 2021 at 19:03, Yun Tang  wrote:
> > > >
> > > > > Hi Wenlong,
> > > > >
> > > > > Thanks for bringing up this discussion and I believe many guys have
> > > ever
> > > > > suffered from the long and unreadable operator name for long time.
> > > > >
> > > > > I have another suggestion which inspired by Aitozi, that we could add
> > > > some
> > > > > hint to tell the vertex index. Such as make the pipeline from "source
> > > -->
> > > > > flatMap --> sink" to "[vertex-0] souce --> [vertex-1] flatMap -->
> > > > > [vertex-2] sink".
> > > > > This could make user or developer much easier to know which vertex is
> > > > > wrong when meeting exceptions.
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > >
> > > > > On 2021/11/17 07:42:28 godfrey he wrote:
> > > > > > Hi Wenlong, I'm fine with the config options.
> > > > > >
> > > > > > Best,
> > > > > > Godfrey
> > > > > >
> > > > > > wenlong.lwl  于2021年11月17日周三 下午3:13写道:
> > > > > >
> > > > > > >
> > > > > > > Hi Chesney and Konstantin,
> > > > > > > thanks for your feedback, I have added a section about How we
> > > support
> > > > > set
> > > > > > > description at DataStream API in the doc.
> > > > > > >
> > > > > > >
> > > > > > > Bests,
> > > > > > > Wenlong
> > > > > > >
> > > > > > > On Tue, 16 Nov 2021 at 21:05, Konstantin Knauf <
> > kna...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > Thanks for starting this discussion. I am in favor of solving
> > > this
> > > > > for
> > > > > > > > DataStream and Table API at the same time, using the same
> > > > > configuration
> > > > > > > > keys. IMO we shouldn't introduce any additional fragmentation
> > if
> > > we
> > > > > can
> > > > > > > > avoid it.
> > > > > > > >
> > > > > > > > Cheers,
> 

Re: [DISCUSS] FLIP-193: Snapshots ownership

2021-11-22 Thread Yun Tang
Hi,

For the likelihood of never deleting some SST files by RocksDB. Unfortunately, 
it could happen as current level compaction strategy in RocksDB is triggered by 
upper input level size reached to the threshold and the compaction priority 
cannot guarantee all files would be choosed during several round compactions.

Actually, I am a bit in favor of this FLIP to manage checkpoints within Flink 
as we have heared from many users that they cannot delete older checkpoints 
after several rounds of re-launching Flink jobs. Current Flink would not delete 
older checkpoints automatically when restoring from older retained checkpoint, 
which makes the base checkpoint directory becomes larger and larger. However, 
if they decide to delete the older checkpoint directory of other job-ids, they 
might not be able to recover from the last completed checkpoint as it might 
depend on some artifacts in older checkpoint directory.

And I think reuploading would indeed increase the 1st checkpoint duration after 
restoring. For aliyun oss, the developer said that copping files (larger than 
32MB) from one location to another within same bucket on DFS could cause 
hundreds millseconds. However, from my experiences, copying on HDFS might not 
be so quick. Maybe some numbers here could be better.

I have two questions here:
1. If the 1st full checkpoint did not complete in the end, the next checkpoints 
have to try to reupload all artifacts again. I think this problem could be 
mitigated if task knows some files have been uploaded before.
2. Some user has usage to restore different jobs based on a same externalized 
checkpoint. I think this usage would break after introducing this FLIP, and we 
must tell users explicitly if choose to make Flink manage the checkpoints by 
default.

Best
Yun Tang


On 2021/11/22 19:49:11 Dawid Wysakowicz wrote:
> There is one more fundamental issue with either of your two
> proposals that've just came to my mind.
> What happens if you have externalized checkpoints and the job fails
> before the initial checkpoint can be safely removed?
> 
> You start the job from the latest created checkpoint and wait for it
> to be allowed for deletion. Then you can delete it, and all previous
> checkpoints (or am I missing something?)
> 
> 
> Let me clarify it with an example. You start with chk-42, Flink takes
> e.g. three checkpoints chk-43, chk-44, chk-45 all still reference chk-42
> files. After that it fails. We have externalized checkpoints enabled,
> therefore we have retained all checkpoints. Users starts a new program
> from let's say chk-45. At this point your proposal does not give the
> user any help in regards when chk-42 can be safely removed. (This is
> also how Flink works right now).
> 
> To make it even harder you can arbitrarily complicate it, 1) start a job
> from chk-44, 2) start a job from a chk-47 which depends on chk-45, 3)
> never start a job from chk-44, it is not claimed by any job, thus it is
> never deleted, users must remember themselves that chk-44 originated
> from chk-42 etc.) User would be forced to build a lineage system for
> checkpoints to track which checkpoints depend on each other.
> 
> I mean that the design described by FLIP implies the following (PCIIW):
> 1. treat SST files from the initial checkpoint specially: re-upload or
> send placeholder - depending on those attributes in state handle
> 2. (SST files from newer checkpoints are re-uploaded depending on
> confirmation currently; so yes there is tracking, but it's different)
> 3. SharedStateRegistry must allow replacing state under the existing
> key; otherwise, if a new key is used then other parallel subtasks
> should learn somehow this key and use it; However, allowing
> replacement must be limited to this scenario, otherwise it can lead to
> previous checkpoint corruption in normal cases
> 
> I might not understand your points, but I don't think FLIP implies any
> of this. The FLIP suggests to send along with the CheckpointBarrier a
> flag "force full checkpoint". Then the state backend should respect it
> and should not use any of the previous shared handles. Now let me
> explain how that would work for RocksDB incremental checkpoints.
> 
>  1. Simplest approach: upload all local RocksDB files. This works
> exactly the same as the first incremental checkpoint for a fresh start.
>  2. Improvement on 1) we already do know which files were uploaded for
> the initial checkpoint. Therefore instead of uploading the local
> files that are same with files uploaded for the initial checkpoint
> we call duplicate for those files and upload just the diff.
> 
> It does not require any changes to the SharedStateRegistry nor to state
> handles, at least for RocksDB.
> 
> Best,
> 
> Dawid
> 
> 
> On 22/11/2021 19:33, Roman Khachatryan wrote:
> >> If you assume the 1st checkpoint needs to be "full" you know you are not 
> >> allowed to use a

Re: [DISCUSS] Deprecate Java 8 support

2021-11-22 Thread David Morávek
Thank you Chesnay for starting the discussion! This will generate bit of a
work for some users, but it's a good thing to keep moving the project
forward. Big +1 for this.

Jingsong:

Receiving this signal, the user may be unhappy because his application
> may be all on Java 8. Upgrading is a big job, after all, many systems
> have not been upgraded yet. (Like you said, HBase and Hive)
>

The whole point of deprecation is to raise awareness, that this will be
happening eventually and users should take some steps to address this in
medium-term. If I understand Chesnay correctly, we'd still keep Java 8
around for quite some time to give users enough time to upgrade, but
without raising awareness we'd fight the very same argument later in time.

All of the prerequisites from 3rd party projects for both HBase [1] and
Hive [2] to fully support Java 11 have been completed, so the ball is on
their side and there doesn't seem to be much activity. Generating bit more
pressure on these efforts might be a good thing.

It would be great to identify some of these users and learn bit more about
their situation. Are they keeping up with latest Flink developments or are
they lagging behind (this would also give them way more time for eventual
upgrade)?

[1] https://issues.apache.org/jira/browse/HBASE-22972
[2] https://issues.apache.org/jira/browse/HIVE-22415

Best,
D.

On Tue, Nov 23, 2021 at 3:08 AM Jingsong Li  wrote:

> Hi Chesnay,
>
> Thanks for bringing this for discussion.
>
> We should dig deeper into the current Java version of Flink users. At
> least make sure Java 8 is not a mainstream version.
>
> Receiving this signal, the user may be unhappy because his application
> may be all on Java 8. Upgrading is a big job, after all, many systems
> have not been upgraded yet. (Like you said, HBase and Hive)
>
> In my opinion, it is too early to deprecate support for Java 8. We
> should wait for a safer point in time.
>
> On Mon, Nov 22, 2021 at 11:45 PM Ingo Bürk  wrote:
> >
> > Hi,
> >
> > also a +1 from me because of everything Chesnay already said.
> >
> >
> > Ingo
> >
> > On Mon, Nov 22, 2021 at 4:41 PM Martijn Visser 
> > wrote:
> >
> > > Hi Chesnay,
> > >
> > > Thanks for bringing this up for discussion. Big +1 for dropping Java 8
> and
> > > deprecating it in 1.15, given that Java 8 support will end. We already
> see
> > > other dependencies that Flink use either have dropped support for Java
> 8
> > > (Trino) or are going to drop it (Kafka).
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Mon, 22 Nov 2021 at 16:28, Chesnay Schepler 
> wrote:
> > >
> > > > As with any Java version there comes a time when one needs to ask
> > > > themselves for how long one intends to stick with it.
> > > >
> > > > With Java 17 being released 2 months ago, and the active support for
> > > > Java 8 ending in 4 months, it is time for us to think about that with
> > > > regard to Java 8.
> > > >
> > > > As such I'd like to start a discussion about deprecating support for
> > > > Java 8 in 1.15.
> > > > We do not need to arrive with an exact date for the removal in this
> > > > discussion; the main goal is to signal to users that they should
> > > > (prepare to) migrate to Java 11 _now_.
> > > > That said, we should consider dropping the support entirely in the
> next
> > > > 2-3 releases.
> > > >
> > > > There are of course some problems that we are already aware of, like
> the
> > > > Hive/Hbase connectors that currently do not support Java 11.
> > > > However, we mustn't hold back the entire project because of external
> > > > projects that are (way to) slow to adapt.
> > > > Maybe us deprecating Java 8 would also add a bit of pressure to get
> on
> > > > with it.
> > > >
> > > > There are numerous advantages that properly migrating to Java 11
> would
> > > > bring us (simplify build system, easier support for Java 17, all the
> API
> > > > goodies of Java 9-11, new garbage collectors
> (Epsilon/ZGC/Shenandoah)).
> > > >
> > > > Let me know what you think.
> > > >
> > > >
> > >
>
>
>
> --
> Best, Jingsong Lee
>


Re: [DISCUSS] Deprecate Java 8 support

2021-11-22 Thread Matthias Pohl
Thanks for constantly driving these maintenance topics, Chesnay. +1 from my
side for deprecating Java 8. I see the point Jingsong is raising. But I
agree with what David already said here. Deprecating the Java version is a
tool to make users aware of it (same as starting this discussion thread).
If there's no major opposition against deprecating it in the community we
should move forward in this regard to make the users who do not
regularly browse the mailing list aware of it. That said, deprecating Java
8 in 1.15 does not necessarily mean that it is dropped in 1.16.

Best,
Matthias

On Tue, Nov 23, 2021 at 8:46 AM David Morávek  wrote:

> Thank you Chesnay for starting the discussion! This will generate bit of a
> work for some users, but it's a good thing to keep moving the project
> forward. Big +1 for this.
>
> Jingsong:
>
> Receiving this signal, the user may be unhappy because his application
> > may be all on Java 8. Upgrading is a big job, after all, many systems
> > have not been upgraded yet. (Like you said, HBase and Hive)
> >
>
> The whole point of deprecation is to raise awareness, that this will be
> happening eventually and users should take some steps to address this in
> medium-term. If I understand Chesnay correctly, we'd still keep Java 8
> around for quite some time to give users enough time to upgrade, but
> without raising awareness we'd fight the very same argument later in time.
>
> All of the prerequisites from 3rd party projects for both HBase [1] and
> Hive [2] to fully support Java 11 have been completed, so the ball is on
> their side and there doesn't seem to be much activity. Generating bit more
> pressure on these efforts might be a good thing.
>
> It would be great to identify some of these users and learn bit more about
> their situation. Are they keeping up with latest Flink developments or are
> they lagging behind (this would also give them way more time for eventual
> upgrade)?
>
> [1] https://issues.apache.org/jira/browse/HBASE-22972
> [2] https://issues.apache.org/jira/browse/HIVE-22415
>
> Best,
> D.
>
> On Tue, Nov 23, 2021 at 3:08 AM Jingsong Li 
> wrote:
>
> > Hi Chesnay,
> >
> > Thanks for bringing this for discussion.
> >
> > We should dig deeper into the current Java version of Flink users. At
> > least make sure Java 8 is not a mainstream version.
> >
> > Receiving this signal, the user may be unhappy because his application
> > may be all on Java 8. Upgrading is a big job, after all, many systems
> > have not been upgraded yet. (Like you said, HBase and Hive)
> >
> > In my opinion, it is too early to deprecate support for Java 8. We
> > should wait for a safer point in time.
> >
> > On Mon, Nov 22, 2021 at 11:45 PM Ingo Bürk  wrote:
> > >
> > > Hi,
> > >
> > > also a +1 from me because of everything Chesnay already said.
> > >
> > >
> > > Ingo
> > >
> > > On Mon, Nov 22, 2021 at 4:41 PM Martijn Visser 
> > > wrote:
> > >
> > > > Hi Chesnay,
> > > >
> > > > Thanks for bringing this up for discussion. Big +1 for dropping Java
> 8
> > and
> > > > deprecating it in 1.15, given that Java 8 support will end. We
> already
> > see
> > > > other dependencies that Flink use either have dropped support for
> Java
> > 8
> > > > (Trino) or are going to drop it (Kafka).
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Mon, 22 Nov 2021 at 16:28, Chesnay Schepler 
> > wrote:
> > > >
> > > > > As with any Java version there comes a time when one needs to ask
> > > > > themselves for how long one intends to stick with it.
> > > > >
> > > > > With Java 17 being released 2 months ago, and the active support
> for
> > > > > Java 8 ending in 4 months, it is time for us to think about that
> with
> > > > > regard to Java 8.
> > > > >
> > > > > As such I'd like to start a discussion about deprecating support
> for
> > > > > Java 8 in 1.15.
> > > > > We do not need to arrive with an exact date for the removal in this
> > > > > discussion; the main goal is to signal to users that they should
> > > > > (prepare to) migrate to Java 11 _now_.
> > > > > That said, we should consider dropping the support entirely in the
> > next
> > > > > 2-3 releases.
> > > > >
> > > > > There are of course some problems that we are already aware of,
> like
> > the
> > > > > Hive/Hbase connectors that currently do not support Java 11.
> > > > > However, we mustn't hold back the entire project because of
> external
> > > > > projects that are (way to) slow to adapt.
> > > > > Maybe us deprecating Java 8 would also add a bit of pressure to get
> > on
> > > > > with it.
> > > > >
> > > > > There are numerous advantages that properly migrating to Java 11
> > would
> > > > > bring us (simplify build system, easier support for Java 17, all
> the
> > API
> > > > > goodies of Java 9-11, new garbage collectors
> > (Epsilon/ZGC/Shenandoah)).
> > > > >
> > > > > Let me know what you think.
> > > > >
> > > > >
> > > >
> >
> >
> >
> > --
> > Best, Jingsong Lee
> >


Re: [ANNOUNCE] New Apache Flink Committer - Yangze Guo

2021-11-22 Thread Matthias Pohl
Congratulations :-)

On Tue, Nov 16, 2021 at 2:33 PM Benchao Li  wrote:

> Congratulations Yangze~
>
> Dawid Wysakowicz  于2021年11月15日周一 下午10:28写道:
>
> > Congrats!
> >
> > On 15/11/2021 15:22, Marios Trivyzas wrote:
> > > Congrats Yangze!
> > >
> > > On Mon, Nov 15, 2021 at 1:34 PM Martijn Visser 
> > > wrote:
> > >
> > >> Congrats Yangze!
> > >>
> > >> On Mon, 15 Nov 2021 at 09:13, Yangze Guo  wrote:
> > >>
> > >>> Thank you all very much!
> > >>> It's my honor to work with you in such a great community.
> > >>>
> > >>> Best,
> > >>> Yangze Guo
> > >>>
> > >>> On Mon, Nov 15, 2021 at 2:08 PM Yang Wang 
> > wrote:
> >  Congratulations Yangze!
> > 
> >  Best,
> >  Yang
> > 
> >  Yu Li  于2021年11月15日周一 下午2:01写道:
> > > Congratulations, Yangze!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 15 Nov 2021 at 12:30, Rui Li 
> wrote:
> > >
> > >> Congratulations Yangze!
> > >>
> > >> On Mon, Nov 15, 2021 at 11:44 AM wenlong.lwl <
> > >> wenlong88@gmail.com
> > >> wrote:
> > >>
> > >>> Congratulations, Yangze!
> > >>>
> > >>> On Mon, 15 Nov 2021 at 10:37, Zhilong Hong  >
> > >>> wrote:
> >  Congratulations, Yangze!
> > 
> >  On Mon, Nov 15, 2021 at 10:13 AM Qingsheng Ren <
> > >>> renqs...@gmail.com>
> > >>> wrote:
> > > Congratulations Yangze!
> > >
> > > --
> > > Best Regards,
> > >
> > > Qingsheng Ren
> > > Email: renqs...@gmail.com
> > > On Nov 12, 2021, 10:11 AM +0800, Xintong Song <
> > >>> tonysong...@gmail.com
> > >>> ,
> > > wrote:
> > >> Hi everyone,
> > >>
> > >> On behalf of the PMC, I'm very happy to announce Yangze Guo
> > >>> as a
> > >> new
> > > Flink
> > >> committer.
> > >>
> > >> Yangze has been consistently contributing to this project
> > >> for
> > >> almost
> > >>> 3
> > >> years. His contributions are mainly in the resource
> > >>> management and
> > >> deployment areas, represented by the fine-grained resource
> > >> management
> >  and
> > >> external resource framework. In addition to feature works,
> > >>> he's
> > >> also
> > > active
> > >> in miscellaneous contributions, including PR reviews,
> > >> document
> > > enhancement,
> > >> mailing list services and meetup/FF talks.
> > >>
> > >> Please join me in congratulating Yangze Guo for becoming a
> > >>> Flink
> > > committer!
> > >> Thank you~
> > >>
> > >> Xintong Song
> > >>
> > >> --
> > >> Best regards!
> > >> Rui Li
> > >>
> > >
> >
> >
>
> --
>
> Best,
> Benchao Li


Re: [ANNOUNCE] New Apache Flink Committer - Fabian Paul

2021-11-22 Thread Matthias Pohl
Congratulations, Fabian. :-)

On Tue, Nov 16, 2021 at 5:08 PM Zakelly Lan  wrote:

> Congratulations, Fabian
>
> Best,
> Zakelly
>
> On Tue, Nov 16, 2021 at 11:21 PM liwei li  wrote:
>
> > Congratulations Fabian :-)
> >
> > Piotr Nowojski  于2021年11月16日周二 下午11:16写道:
> >
> > > Congratulations :)
> > >
> > > wt., 16 lis 2021 o 16:04 Yang Wang  napisał(a):
> > >
> > > > Congratulations Fabian!
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Fabian Paul  于2021年11月16日周二 下午3:57写道:
> > > >
> > > > > Thanks for the warm welcome, I am looking forward to continuing
> > > > > working with you all.
> > > > >
> > > > > Best,
> > > > > Fabian
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-24976) sink utils not check the schema info between query and sink table

2021-11-22 Thread xiaodao (Jira)
xiaodao created FLINK-24976:
---

 Summary: sink utils not check the schema info between  query and 
sink table
 Key: FLINK-24976
 URL: https://issues.apache.org/jira/browse/FLINK-24976
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Planner
Affects Versions: 1.12.5
Reporter: xiaodao


sql like this
{code:java}
//CREATE TABLE source
(
    id        INT,
    name      STRING,
    PROCTIME AS PROCTIME()
) WITH (
      'connector' = 'kafka'
      ,'topic' = 'da'
      ,'properties.bootstrap.servers' = 'localhost:9092'
      ,'properties.group.id' = 'test'
      ,'scan.startup.mode' = 'earliest-offset'
      ,'format' = 'json'
      ,'json.timestamp-format.standard' = 'SQL'
      ); create table MyResultTable (
    id int,
    name string,
    primary key (id) not enforced
) with (
    'connector' = 'jdbc',
    'url' = 'jdbc:mysql://localhost:3306/test',
    'table-name' = 'users',
    'username' = 'root',
    'password' = 'abc123'
);     
insert into MyResultTable select id as idx, name, age from source; {code}
in this sql, sink table has field "id","name", but my query result is just 
"idx", "name";

the sql execute is ok;

but my question why it not valid name of query and sink table;

in will cause mistake when the field is too much.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [ANNOUNCE] New Apache Flink Committer - Jing Zhang

2021-11-22 Thread Matthias Pohl
Congratulations :-)

On Thu, Nov 18, 2021 at 3:23 AM Jingsong Li  wrote:

> Congratulations, Jing! Well deserved!
>
> On Wed, Nov 17, 2021 at 3:00 PM Lincoln Lee 
> wrote:
> >
> > Congratulations, Jing!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jing Zhang  于2021年11月17日周三 上午10:24写道:
> >
> > > Thanks to everyone. It's my honor to work in community with you all.
> > >
> > > Best,
> > > Jing Zhang
> > >
> > > Zakelly Lan  于2021年11月17日周三 上午12:06写道:
> > >
> > > > Congratulations,  Jing!
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > > On Tue, Nov 16, 2021 at 11:03 PM Yang Wang 
> > > wrote:
> > > >
> > > >> Congratulations,  Jing!
> > > >>
> > > >> Best,
> > > >> Yang
> > > >>
> > > >> Benchao Li  于2021年11月16日周二 下午9:31写道:
> > > >>
> > > >> > Congratulations Jing~
> > > >> >
> > > >> > OpenInx  于2021年11月16日周二 下午1:58写道:
> > > >> >
> > > >> > > Congrats Jing!
> > > >> > >
> > > >> > > On Tue, Nov 16, 2021 at 11:59 AM Terry Wang  >
> > > >> wrote:
> > > >> > >
> > > >> > > > Congratulations,  Jing!
> > > >> > > > Well deserved!
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Terry Wang
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > > 2021年11月16日 上午11:27,Zhilong Hong  写道:
> > > >> > > > >
> > > >> > > > > Congratulations, Jing!
> > > >> > > > >
> > > >> > > > > Best regards,
> > > >> > > > > Zhilong Hong
> > > >> > > > >
> > > >> > > > > On Mon, Nov 15, 2021 at 9:41 PM Martijn Visser <
> > > >> > mart...@ververica.com>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > >> Congratulations Jing!
> > > >> > > > >>
> > > >> > > > >> On Mon, 15 Nov 2021 at 14:39, Timo Walther <
> twal...@apache.org
> > > >
> > > >> > > wrote:
> > > >> > > > >>
> > > >> > > > >>> Hi everyone,
> > > >> > > > >>>
> > > >> > > > >>> On behalf of the PMC, I'm very happy to announce Jing
> Zhang
> > > as a
> > > >> > new
> > > >> > > > >>> Flink committer.
> > > >> > > > >>>
> > > >> > > > >>> Jing has been very active in the Flink community esp. in
> the
> > > >> > > Table/SQL
> > > >> > > > >>> area for quite some time: 81 PRs [1] in total and is also
> > > >> active on
> > > >> > > > >>> answering questions on the user mailing list. She is
> currently
> > > >> > > > >>> contributing a lot around the new windowing table-valued
> > > >> functions
> > > >> > > [2].
> > > >> > > > >>>
> > > >> > > > >>> Please join me in congratulating Jing Zhang for becoming a
> > > Flink
> > > >> > > > >> committer!
> > > >> > > > >>>
> > > >> > > > >>> Thanks,
> > > >> > > > >>> Timo
> > > >> > > > >>>
> > > >> > > > >>> [1] https://github.com/apache/flink/pulls/beyond1920
> > > >> > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-23997
> > > >> > > > >>>
> > > >> > > > >>
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> >
> > > >> > Best,
> > > >> > Benchao Li
> > > >> >
> > > >>
> > > >
> > >
>
>
>
> --
> Best, Jingsong Lee
>


Re: [ANNOUNCE] New Apache Flink Committer - Jing Zhang

2021-11-22 Thread 刘建刚
Congratulations!

Matthias Pohl  于2021年11月22日周一 下午4:10写道:

> Congratulations :-)
>
> On Thu, Nov 18, 2021 at 3:23 AM Jingsong Li 
> wrote:
>
> > Congratulations, Jing! Well deserved!
> >
> > On Wed, Nov 17, 2021 at 3:00 PM Lincoln Lee 
> > wrote:
> > >
> > > Congratulations, Jing!
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jing Zhang  于2021年11月17日周三 上午10:24写道:
> > >
> > > > Thanks to everyone. It's my honor to work in community with you all.
> > > >
> > > > Best,
> > > > Jing Zhang
> > > >
> > > > Zakelly Lan  于2021年11月17日周三 上午12:06写道:
> > > >
> > > > > Congratulations,  Jing!
> > > > >
> > > > > Best,
> > > > > Zakelly
> > > > >
> > > > > On Tue, Nov 16, 2021 at 11:03 PM Yang Wang 
> > > > wrote:
> > > > >
> > > > >> Congratulations,  Jing!
> > > > >>
> > > > >> Best,
> > > > >> Yang
> > > > >>
> > > > >> Benchao Li  于2021年11月16日周二 下午9:31写道:
> > > > >>
> > > > >> > Congratulations Jing~
> > > > >> >
> > > > >> > OpenInx  于2021年11月16日周二 下午1:58写道:
> > > > >> >
> > > > >> > > Congrats Jing!
> > > > >> > >
> > > > >> > > On Tue, Nov 16, 2021 at 11:59 AM Terry Wang <
> zjuwa...@gmail.com
> > >
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Congratulations,  Jing!
> > > > >> > > > Well deserved!
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Terry Wang
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > > 2021年11月16日 上午11:27,Zhilong Hong 
> 写道:
> > > > >> > > > >
> > > > >> > > > > Congratulations, Jing!
> > > > >> > > > >
> > > > >> > > > > Best regards,
> > > > >> > > > > Zhilong Hong
> > > > >> > > > >
> > > > >> > > > > On Mon, Nov 15, 2021 at 9:41 PM Martijn Visser <
> > > > >> > mart...@ververica.com>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > >> Congratulations Jing!
> > > > >> > > > >>
> > > > >> > > > >> On Mon, 15 Nov 2021 at 14:39, Timo Walther <
> > twal...@apache.org
> > > > >
> > > > >> > > wrote:
> > > > >> > > > >>
> > > > >> > > > >>> Hi everyone,
> > > > >> > > > >>>
> > > > >> > > > >>> On behalf of the PMC, I'm very happy to announce Jing
> > Zhang
> > > > as a
> > > > >> > new
> > > > >> > > > >>> Flink committer.
> > > > >> > > > >>>
> > > > >> > > > >>> Jing has been very active in the Flink community esp. in
> > the
> > > > >> > > Table/SQL
> > > > >> > > > >>> area for quite some time: 81 PRs [1] in total and is
> also
> > > > >> active on
> > > > >> > > > >>> answering questions on the user mailing list. She is
> > currently
> > > > >> > > > >>> contributing a lot around the new windowing table-valued
> > > > >> functions
> > > > >> > > [2].
> > > > >> > > > >>>
> > > > >> > > > >>> Please join me in congratulating Jing Zhang for
> becoming a
> > > > Flink
> > > > >> > > > >> committer!
> > > > >> > > > >>>
> > > > >> > > > >>> Thanks,
> > > > >> > > > >>> Timo
> > > > >> > > > >>>
> > > > >> > > > >>> [1] https://github.com/apache/flink/pulls/beyond1920
> > > > >> > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-23997
> > > > >> > > > >>>
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > Best,
> > > > >> > Benchao Li
> > > > >> >
> > > > >>
> > > > >
> > > >
> >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>