[jira] [Created] (FLINK-33491) Support json column validated

2023-11-09 Thread ouyangwulin (Jira)
ouyangwulin created FLINK-33491:
---

 Summary: Support json column validated
 Key: FLINK-33491
 URL: https://issues.apache.org/jira/browse/FLINK-33491
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / Runtime
Affects Versions: 1.8.4, 1.9.4
Reporter: ouyangwulin
 Fix For: 1.8.4, 1.9.4






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33492) Fix unavailable links in connector download page

2023-11-09 Thread Yubin Li (Jira)
Yubin Li created FLINK-33492:


 Summary: Fix unavailable links in connector download page
 Key: FLINK-33492
 URL: https://issues.apache.org/jira/browse/FLINK-33492
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.19.0
Reporter: Yubin Li


there are several unavailable connector download links (hbase, kafka, etc)

https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/downloads/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Release flink-connector-pulsar 4.1.0

2023-11-09 Thread Martijn Visser
Hi Tison,

I would be +1 for releasing it, but does it include support for Flink
1.18? I think that the tests still failed for it, and I think that
support should be in place before releasing a new version of the
connector. What do you think?

Best regards,

Martijn

On Thu, Nov 9, 2023 at 7:09 AM Leonard Xu  wrote:
>
> Hey, Tison.
>
> +1 to release  flink-connector-pulsar 4.1.0.
>
> I’m glad to offer help for the release.
>
>
> Best,
> Leonard
>
>
>
> > 2023年11月9日 下午1:30,tison  写道:
> >
> > Hi,
> >
> > I'd propose to cut a new release for flink-connector-pulsar 4.1.0[1].
> >
> > From the last release (4.0.0), we mainly achieved:
> >
> > 1. Implement table connector (integrated with Flink SQL)
> > 2. Drop the requirement for using adminURL
> > 3. Support JDK 11
> >
> > I can help in driving the release but perhaps we need some more PMC
> > members' attention and help.
> >
> > What do you think?
> >
> > Best,
> > tison.
> >
> > [1] https://github.com/apache/flink-connector-pulsar
>


[DISCUSSION] flink-connector-shared-utils release process

2023-11-09 Thread Etienne Chauchot

Hi all,

flink-connector-shared-utils contains utilities for connectors (parent 
pom, ci scripts, template test connector project etc...). It is divided 
into 2 main branches:


- parent_pom (1) containing just a pom.xml

- ci_utils (2) containing the test project using the parent pom and the 
ci scripts.



The problem is when we want to test changes to the parent pom in the 
test project, we need to release org.apache.flink:flink-connector-parent 
so that the test project in the other branch can use this updated 
parent. This seems bad to trigger a release for testing.


So I would like to propose setting up a snapshot for 
org.apache.flink:flink-connector-parent with regular  deployment to 
https://repository.apache.org/content/repositories/snapshots like we do 
for flink.


An alternative could be to keep using 
io.github.zentol.flink:flink-connector-parent for testing and release 
this special artifact when needed by ci_utils test project.


WDYT ?


[1] https://github.com/apache/flink-connector-shared-utils/tree/parent_pom

[2] https://github.com/apache/flink-connector-shared-utils/tree/ci_utils

Best

Etienne


[jira] [Created] (FLINK-33493) Elasticsearch connector ElasticsearchWriterITCase test failed

2023-11-09 Thread Yuxin Tan (Jira)
Yuxin Tan created FLINK-33493:
-

 Summary: Elasticsearch connector ElasticsearchWriterITCase test 
failed
 Key: FLINK-33493
 URL: https://issues.apache.org/jira/browse/FLINK-33493
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch
Reporter: Yuxin Tan


When I ran tests, the test failed. The failed reason is

{code:java}
Error:  
/home/runner/work/flink-connector-elasticsearch/flink-connector-elasticsearch/flink-connector-elasticsearch-base/src/test/java/org/apache/flink/connector/elasticsearch/sink/ElasticsearchWriterITCase.java:[197,46]
 cannot find symbol
  symbol:   method 
mock(org.apache.flink.metrics.MetricGroup,org.apache.flink.metrics.groups.OperatorIOMetricGroup)
  location: class 
org.apache.flink.runtime.metrics.groups.InternalSinkWriterMetricGroup
Error:  
/home/runner/work/flink-connector-elasticsearch/flink-connector-elasticsearch/flink-connector-elasticsearch-base/src/test/java/org/apache/flink/connector/elasticsearch/sink/ElasticsearchWriterITCase.java:[273,46]
 cannot find symbol
  symbol:   method mock(org.apache.flink.metrics.MetricGroup)
{code}

https://github.com/apache/flink-connector-elasticsearch/actions/runs/6809899863/job/18517273714?pr=77#step:13:134.

ElasticsearchWriterITCase called Flink "InternalSinkWriterMetricGroup#mock", 
and it is renamed in https://github.com/apache/flink/pull/23541 (in Flink 
1.19). So the test failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33494) FLIP-376: Add DISTRIBUTED BY clause

2023-11-09 Thread Timo Walther (Jira)
Timo Walther created FLINK-33494:


 Summary: FLIP-376: Add DISTRIBUTED BY clause
 Key: FLINK-33494
 URL: https://issues.apache.org/jira/browse/FLINK-33494
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Timo Walther






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33495) Add catalog and connector ability API and validation

2023-11-09 Thread Timo Walther (Jira)
Timo Walther created FLINK-33495:


 Summary: Add catalog and connector ability API and validation
 Key: FLINK-33495
 URL: https://issues.apache.org/jira/browse/FLINK-33495
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


Add API infra before adjusting the parser:

- CatalogTable
- CatalogTable.Builder
- TableDistribution
- SupportsBucketing

This includes validation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33496) Expose DISTRIBUTED BY clause via parser

2023-11-09 Thread Timo Walther (Jira)
Timo Walther created FLINK-33496:


 Summary: Expose DISTRIBUTED BY clause via parser
 Key: FLINK-33496
 URL: https://issues.apache.org/jira/browse/FLINK-33496
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther
Assignee: Timo Walther


Expose DISTRIBUTED BY clause via parser and TableDescriptor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33497) Update the Kafka connector to support DISTRIBUTED BY clause

2023-11-09 Thread Timo Walther (Jira)
Timo Walther created FLINK-33497:


 Summary: Update the Kafka connector to support DISTRIBUTED BY 
clause
 Key: FLINK-33497
 URL: https://issues.apache.org/jira/browse/FLINK-33497
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kafka
Reporter: Timo Walther


The Kafka connector can be one of the first connectors supporting the 
DISTRIBUTED BY clause. The clause can be translated into 'key.fields' and 
'properties.num.partitons' in the WITH clause.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33498) Do not allow manually triggering incremental checkpoints with full checkpoint configured

2023-11-09 Thread Zakelly Lan (Jira)
Zakelly Lan created FLINK-33498:
---

 Summary: Do not allow manually triggering incremental checkpoints 
with full checkpoint configured
 Key: FLINK-33498
 URL: https://issues.apache.org/jira/browse/FLINK-33498
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Reporter: Zakelly Lan
Assignee: Zakelly Lan
 Fix For: 1.19.0


Currently, when a job is configured to run with incremental checkpoint 
disabled, user manually triggers the incremental checkpoint will actually 
trigger a full checkpoint. That is because the files from full checkpoint 
cannot be shared with an incremental checkpoint. So we'd better throw some 
exception somewhere around {{CheckpointCoordinator}} and fail the request in 
this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[RESULT][VOTE] FLIP-376: Add DISTRIBUTED BY clause

2023-11-09 Thread Timo Walther

Hi everyone,

The voting time for [VOTE] FLIP-376: Add DISTRIBUTED BY clause[1] has 
passed. I'm closing the vote now.


There were 10 +1 votes, of which 8 were binding:

- Jing Ge (binding)
- Martijn Visser (binding)
- Lincoln Lee (binding)
- Benchao Li (binding)
- Dawid Wysakowicz (binding)
- Jim Hughes (non-binding)
- Jingsong Li (binding)
- Zhanghao Chen (non-binding)
- Sergey Nuyanzin (binding)
- Leonard Xu (binding)

There were no -1 votes.

Thus, FLIP-376 has been accepted.

Thanks everyone for joining the discussion and giving feedback!

[1] https://lists.apache.org/thread/cft2d9jc2lr8gv6dyyz7b62188mf07sj

Cheers,
Timo


On 08.11.23 10:14, Leonard Xu wrote:

+1(binding)

Best,
Leonard


2023年11月8日 下午1:05,Sergey Nuyanzin  写道:

+1 (binding)

On Wed, Nov 8, 2023 at 6:02 AM Zhanghao Chen 
wrote:


+1 (non-binding)

Best,
Zhanghao Chen

From: Timo Walther 
Sent: Monday, November 6, 2023 19:38
To: dev 
Subject: [VOTE] FLIP-376: Add DISTRIBUTED BY clause

Hi everyone,

I'd like to start a vote on FLIP-376: Add DISTRIBUTED BY clause[1] which
has been discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection
or not enough votes.

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
[2] https://lists.apache.org/thread/z1p4f38x9ohv29y4nlq8g1wdxwfjwxd1

Cheers,
Timo




--
Best regards,
Sergey






Re: Request a release of flink-connector-kafka version 3.1.0 (to consume kafka 3.4.0 with Flink 1.18)

2023-11-09 Thread Martijn Visser
Hi,

The CVE is related to the Kafka Connect API and I think of that as a
false-positive for the Flink Kafka connector. I would be inclined to
preferably get https://issues.apache.org/jira/browse/FLINK-32197 in,
and then do a release afterwards. But I would like to understand from
Mason if he thinks that's feasible.

Best regards,

Martijn

On Tue, Nov 7, 2023 at 9:45 AM Jean-Marc Paulin  wrote:
>
> Hi,
>
> I had a chat on [FLINK-31599] Update kafka version to 3.4.0 by Ge · Pull 
> Request #11 · apache/flink-connector-kafka 
> (github.com) .
>
> We are consuming Flink 1.18, and the flink-connector-kafka 3.0.1.
> Flink 3.2.3 currently in use has the  
> CVE-2023-25194
>   vulnerability addressed in Kafka 3.4.0. We will need to move to Kafka 3.4.0 
> for our customers. I have tried to consume Kafka client 3.4.0 but that fails 
> after a while. I tracked that down to a change required in the 
> flink-connector-kafka source code. The PR11 above has the required changes, 
> and is merge in main, but is not currently released.
>
> I would really appreciate if you could release a newer version of the 
> flink-connector-kafka that would enable us to use Kafka 3.4.0.
>
> Many thanks
>
> JM
>
> [https://opengraph.githubassets.com/54669eeddff74373a431b6540c3602aefd5fb25232da040f59d9dbb1254615c6/apache/flink-connector-kafka/pull/11]
> [FLINK-31599] Update kafka version to 3.4.0 by Ge · Pull Request #11 · 
> apache/flink-connector-kafka
> Apache flink. Contribute to apache/flink-connector-kafka development by 
> creating an account on GitHub.
> github.com
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU


[jira] [Created] (FLINK-33499) Addressing the stability of org.apache.flink.cep.operator.CEPOperatorTest.testCEPOperatorCleanupEventTime

2023-11-09 Thread Krishna Anandan Ganesan (Jira)
Krishna Anandan Ganesan created FLINK-33499:
---

 Summary: Addressing the stability of 
org.apache.flink.cep.operator.CEPOperatorTest.testCEPOperatorCleanupEventTime
 Key: FLINK-33499
 URL: https://issues.apache.org/jira/browse/FLINK-33499
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Reporter: Krishna Anandan Ganesan


I am proposing to fix the flaky behavior observed in :

`org.apache.flink.cep.operator.CEPOperatorTest.testCEPOperatorCleanupEventTime`

*STEPS TO REPRODUCE THE ISSUE:*

Using the [NonDex|https://github.com/TestingResearchIllinois/NonDex] plugin, 
the test can be executed first.

{code}
mvn -pl flink-libraries/flink-cep edu.illinois:nondex-maven-plugin:2.1.1:nondex 
-Dtest=org.apache.flink.cep.operator.CEPOperatorTest#testCEPOperatorCleanupEventTime
{code}

The following error was seen:

{code}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.561 s 
<<< FAILURE! - in org.apache.flink.cep.operator.CEPOperatorTest
[ERROR] 
org.apache.flink.cep.operator.CEPOperatorTest.testCEPOperatorCleanupEventTime  
Time elapsed: 0.536 s  <<< FAILURE!
java.lang.AssertionError: expected: but 
was:
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:120)
at org.junit.Assert.assertEquals(Assert.java:146)
at 
org.apache.flink.cep.operator.CEPOperatorTest.verifyPattern(CEPOperatorTest.java:1177)
at 
org.apache.flink.cep.operator.CEPOperatorTest.testCEPOperatorCleanupEventTime(CEPOperatorTest.java:644)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
at 
org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42)
at 
org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80)
at 
org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:147)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:127)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:90)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:55)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:102)
at 
org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:54)
at 
org.junit.platform.launcher.core.DefaultLauncher.execute(De

[jira] [Created] (FLINK-33500) Run storing the JobGraph an asynchronous operation

2023-11-09 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33500:
-

 Summary: Run storing the JobGraph an asynchronous operation
 Key: FLINK-33500
 URL: https://issues.apache.org/jira/browse/FLINK-33500
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.17.1, 1.18.0, 1.19.0
Reporter: Matthias Pohl


Currently, submitting a job starts with storing the JobGraph (in HA setups) in 
the {{{}JobGraphStore{}}}. This includes writing the file to S3 (or some other 
remote file system). The job submission is done in the {{{}Dispatcher{}}}'s 
main thread. If writing the {{JobGraph}} is slow, it would block any other 
operation on the {{{}Dispatcher{}}}. See 
[Dispatcher#persistAndRunJob|https://github.com/apache/flink/blob/52cbeb90f32ca36c59590df1daa6748995c9b7f8/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L645]
 as code reference.

This Jira issue is about moving the job submission into the {{ioExecutor}} as 
an asynchronous call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-390: Support System out and err to be redirected to LOG or discarded

2023-11-09 Thread Piotr Nowojski
Hi Rui,

> I see java8 doesn't have `Arrays.equals(int[] a, int aFromIndex, int
> aToIndex, int[] b, int bFromIndex, int bToIndex)`,
> and java11 has it. Do you have any other suggestions for java8?

Maybe use `ByteBuffer.wrap`?

ByteBuffer.wrap(array, ..., ...).equals(ByteBuffer.wrap(array2, ..., ...))

This shouldn't have overheads as far as I remember.

Or implement your own loop? It shouldn't be more than a couple of lines.

Best,
Piotrek

czw., 9 lis 2023 o 06:43 Rui Fan <1996fan...@gmail.com> napisał(a):

> Hi Piotr, Archit, Feng and Hangxiang:
>
> Thanks a lot for your feedback!
>
> Following is my comment, please correct me if I misunderstood anything!
>
> To Piotr:
>
> > Is there a reason why you are suggesting to copy out bytes from `buf` to
> `bytes`,
> > instead of using `Arrays.equals(int[] a, int aFromIndex, int aToIndex,
> int[] b, int bFromIndex, int bToIndex)`?
>
> I see java8 doesn't have `Arrays.equals(int[] a, int aFromIndex, int
> aToIndex, int[] b, int bFromIndex, int bToIndex)`,
> and java11 has it. Do you have any other suggestions for java8?
>
> Also, this code doesn't run in production. As the comment of
> System.lineSeparator():
>
> > On UNIX systems, it returns {@code "\n"}; on Microsoft
> > Windows systems it returns {@code "\r\n"}.
>
> So Mac and Linux just return one character, we will compare
> one byte directly.
>
>
>
> To Feng:
>
> > Will they be written to the taskManager.log file by default
> > or the taskManager.out file?
>
> I prefer LOG as the default value for taskmanager.system-out.mode.
> It's useful for job stability and doesn't introduce significant impact to
> users. Also, our production has already used this feature for
> more than 1 years, it works well.
>
> However, I write the DEFAULT as the default value for
> taskmanager.system-out.mode, because when the community introduces
> new options, the default value often selects the original behavior.
>
> Looking forward to hear more thoughts from community about this
> default value.
>
> > If we can make taskmanager.out splittable and rolling, would it be
> > easier for users to use this feature?
>
> Making taskmanager.out splittable and rolling is a good choice!
> I have some concerns about it:
>
> 1. Users may also want to use LOG.info in their code and just
>   accidentally use System.out.println. It is possible that they will
>   also find the logs directly in taskmanager.log.
> 2. I'm not sure whether the rolling strategy is easy to implement.
>   If we do it, it's necessary to define a series of flink options similar
>   to log options, such as: fileMax(how many files should be retained),
>   fileSize(The max size each file), fileNamePatten (The suffix of file
> name),
> 3. Check the file size periodically: all logs are written by log plugin,
>   they can check the log file size after writing. However, System.out
>   are written directly. And flink must start a thread to check the latest
>   taskmanager.out size periodically. If it's too quick, most of job aren't
>   necessary. If it's too slow, the file size cannot be controlled properly.
>
> Redirect it to LOG.info may be a reasonable and easy choice.
> The user didn't really want to log into taskmanager.out, it just
> happened by accident.
>
>
>
> To Hangxiang:
>
> > 1. I have a similar concern as Feng. Will we redirect to another log file
> > not taskManager.log ?
>
> Please see my last comment, thanks!
>
> > taskManager.log contains lots of important information like init log. It
> > will be rolled quickly if we redirect out and error here.
>
> IIUC, this issue isn't caused by System.out, and it can happen if user
> call a lot of LOG.info. As I mentioned before: the user didn't really want
> to log into taskmanager.out, it just happened by accident.
> So, if users change the System.out to LOG.info, it still happen.
>
> > 2. Since we have redirected to LOG mode, Could we also log the subtask
> info
> > ? It may help us to debug granularly.
>
> I'm not sure what `log the subtask info` means. Let me confirm with you
> first.
> Do you mean like this: LOG.info("taskName {} : {}", taskName,
> userLogContext)?
>
> Best,
> Rui
>
> On Thu, Nov 9, 2023 at 11:47 AM Hangxiang Yu  wrote:
>
> > Hi, Rui.
> > Thanks for the proposal. It sounds reasonable.
> > I have some questions, PTAL:
> > 1. I have a similar concern as Feng. Will we redirect to another log file
> > not taskManager.log ?
> > taskManager.log contains lots of important information like init log. It
> > will be rolled quickly if we redirect out and error here.
> > 2. Since we have redirected to LOG mode, Could we also log the subtask
> info
> > ? It may help us to debug granularly.
> >
> > On Thu, Nov 9, 2023 at 9:47 AM Feng Jin  wrote:
> >
> > > Hi, Rui.
> > >
> > > Thank you for initiating this proposal.
> > >
> > > I have a question regarding redirecting stdout and stderr to LOG:
> > >
> > > Will they be written to the taskManager.log file by default or the
> > > taskManager.out file?
> > > If we ca

[jira] [Created] (FLINK-33501) Rely on Maven wrapper instead of having custom Maven installation logic

2023-11-09 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33501:
-

 Summary: Rely on Maven wrapper instead of having custom Maven 
installation logic
 Key: FLINK-33501
 URL: https://issues.apache.org/jira/browse/FLINK-33501
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Affects Versions: 1.17.1, 1.18.0, 1.19.0
Reporter: Matthias Pohl


I noticed that we could use the Maven wrapper instead of having a custom setup 
logic for Maven in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-390: Support System out and err to be redirected to LOG or discarded

2023-11-09 Thread Rui Fan
Hi Piotr,

Thanks for your feedback!

> Or implement your own loop? It shouldn't be more than a couple of lines.

Implementing it directly is fine, I have updated the FLIP.
And this logic can be found in the  `isLineEnded` method.

Best,
Rui

On Thu, Nov 9, 2023 at 11:00 PM Piotr Nowojski 
wrote:

> Hi Rui,
>
> > I see java8 doesn't have `Arrays.equals(int[] a, int aFromIndex, int
> > aToIndex, int[] b, int bFromIndex, int bToIndex)`,
> > and java11 has it. Do you have any other suggestions for java8?
>
> Maybe use `ByteBuffer.wrap`?
>
> ByteBuffer.wrap(array, ..., ...).equals(ByteBuffer.wrap(array2, ..., ...))
>
> This shouldn't have overheads as far as I remember.
>
> Or implement your own loop? It shouldn't be more than a couple of lines.
>
> Best,
> Piotrek
>
> czw., 9 lis 2023 o 06:43 Rui Fan <1996fan...@gmail.com> napisał(a):
>
> > Hi Piotr, Archit, Feng and Hangxiang:
> >
> > Thanks a lot for your feedback!
> >
> > Following is my comment, please correct me if I misunderstood anything!
> >
> > To Piotr:
> >
> > > Is there a reason why you are suggesting to copy out bytes from `buf`
> to
> > `bytes`,
> > > instead of using `Arrays.equals(int[] a, int aFromIndex, int aToIndex,
> > int[] b, int bFromIndex, int bToIndex)`?
> >
> > I see java8 doesn't have `Arrays.equals(int[] a, int aFromIndex, int
> > aToIndex, int[] b, int bFromIndex, int bToIndex)`,
> > and java11 has it. Do you have any other suggestions for java8?
> >
> > Also, this code doesn't run in production. As the comment of
> > System.lineSeparator():
> >
> > > On UNIX systems, it returns {@code "\n"}; on Microsoft
> > > Windows systems it returns {@code "\r\n"}.
> >
> > So Mac and Linux just return one character, we will compare
> > one byte directly.
> >
> >
> >
> > To Feng:
> >
> > > Will they be written to the taskManager.log file by default
> > > or the taskManager.out file?
> >
> > I prefer LOG as the default value for taskmanager.system-out.mode.
> > It's useful for job stability and doesn't introduce significant impact to
> > users. Also, our production has already used this feature for
> > more than 1 years, it works well.
> >
> > However, I write the DEFAULT as the default value for
> > taskmanager.system-out.mode, because when the community introduces
> > new options, the default value often selects the original behavior.
> >
> > Looking forward to hear more thoughts from community about this
> > default value.
> >
> > > If we can make taskmanager.out splittable and rolling, would it be
> > > easier for users to use this feature?
> >
> > Making taskmanager.out splittable and rolling is a good choice!
> > I have some concerns about it:
> >
> > 1. Users may also want to use LOG.info in their code and just
> >   accidentally use System.out.println. It is possible that they will
> >   also find the logs directly in taskmanager.log.
> > 2. I'm not sure whether the rolling strategy is easy to implement.
> >   If we do it, it's necessary to define a series of flink options similar
> >   to log options, such as: fileMax(how many files should be retained),
> >   fileSize(The max size each file), fileNamePatten (The suffix of file
> > name),
> > 3. Check the file size periodically: all logs are written by log plugin,
> >   they can check the log file size after writing. However, System.out
> >   are written directly. And flink must start a thread to check the latest
> >   taskmanager.out size periodically. If it's too quick, most of job
> aren't
> >   necessary. If it's too slow, the file size cannot be controlled
> properly.
> >
> > Redirect it to LOG.info may be a reasonable and easy choice.
> > The user didn't really want to log into taskmanager.out, it just
> > happened by accident.
> >
> >
> >
> > To Hangxiang:
> >
> > > 1. I have a similar concern as Feng. Will we redirect to another log
> file
> > > not taskManager.log ?
> >
> > Please see my last comment, thanks!
> >
> > > taskManager.log contains lots of important information like init log.
> It
> > > will be rolled quickly if we redirect out and error here.
> >
> > IIUC, this issue isn't caused by System.out, and it can happen if user
> > call a lot of LOG.info. As I mentioned before: the user didn't really
> want
> > to log into taskmanager.out, it just happened by accident.
> > So, if users change the System.out to LOG.info, it still happen.
> >
> > > 2. Since we have redirected to LOG mode, Could we also log the subtask
> > info
> > > ? It may help us to debug granularly.
> >
> > I'm not sure what `log the subtask info` means. Let me confirm with you
> > first.
> > Do you mean like this: LOG.info("taskName {} : {}", taskName,
> > userLogContext)?
> >
> > Best,
> > Rui
> >
> > On Thu, Nov 9, 2023 at 11:47 AM Hangxiang Yu 
> wrote:
> >
> > > Hi, Rui.
> > > Thanks for the proposal. It sounds reasonable.
> > > I have some questions, PTAL:
> > > 1. I have a similar concern as Feng. Will we redirect to another log
> file
> > > not taskManager.log ?
> > > taskManager.log contains lot

Re: [VOTE] Release flink-connector-gcp-pubsub v3.0.2, release candidate #1

2023-11-09 Thread Martijn Visser
I agree with Leonard. We should also not update 1.16.0 to 1.18.0, but
to the lowest supported Flink version of this release (in this case,
1.17.0)

On Thu, Nov 9, 2023 at 3:08 AM Leonard Xu  wrote:
>
> Thanks Danny for the reply.
>
> -1 (binding)
>
> Let’s fix the outdated version in the source code and spin a new rc2.
>
> I’d like to open a PR to fix it, and hope everything OK in your Flink Forward 
> Trip.
>
> Best,
> Leonard
>
>
>
> > Thanks for helping to verify the release. The 1.16.0 Flink version in the
> > pom is a miss, ideally it should be updated to 1.18.0, additionally this
> > should have been updated to 1.17.x previously. I would not consider it a
> > hard blocker since the Maven build overrides this variable based on the
> > provided -Dflink.version and the 1.17/1.18 binaries are valid. However it
> > is non ideal that the default version in the source is outdated. Given that
> > we are yet to receive any binding votes I am happy to spin an rc2 however
> > am a bit busy this week at Flink Forward. I will consider this vote
> > open for now unless you make your -1 binding.
> >
> > Thanks,
> > Danny
> >
> > On Tue, Nov 7, 2023 at 8:02 PM Leonard Xu  wrote:
> >
> >> Thanks Danny for driving this.  I'm considering -1, please correct me if I
> >> understand wrong.
> >>
>  * The sources can be compiled and unit tests pass with flink.version
> >> 1.17.1
>  and flink.version 1.18.0
> 
>  * Nexus has two staged artifact ids for 3.0.2-1.17 and 3.0.2-1.18
>  - flink-connector-gcp-pubsub (.jar, -javadoc.jar, -sources.jar and .pom)
>  - flink-connector-gcp-pubsub-parent (only .pom)
> >>
> >>
> >> This release aims to support Flink 1.17 and new released Flink 1.18,but
> >> why is the version in pom file [1] still 1.16.0 ?  IIUC, it should be
> >> 1.17.0  according the process [2].
> >>
> >> Best,
> >> Leonard
> >>
> >> [1]
> >> https://github.com/apache/flink-connector-gcp-pubsub/blob/v3.0.2-rc1/pom.xml#L51
> >> [1]
> >> https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development
> >>
> >>
> >>> 2023年11月7日 下午12:03,Samrat Deb  写道:
> >>>
> >>> +1(non-binding)
> >>>
> >>> - Checked release notes
> >>> - Verified checksums and signatures
> >>> - Verified no binaries in release
> >>> - Build connector from source
> >>>
> >>> Bests,
> >>> Samrat
> >>>
> >>> On Mon, 6 Nov 2023 at 8:20 PM, Ryan Skraba  >>>
> >>> wrote:
> >>>
>  Hello! +1 (non-binding)
> 
>  One note: the parent pom still has 1.16.0 for the Maven property of
>  flink.version for both 1.17 and 1.18 releases.
> 
>  I've validated the source for the RC1:
>  flink-connector-gcp-pubsub-3.0.2-src.tgz at r65060
>  * The sha512 checksum is OK.
>  * The source file is signed correctly.
>  * The signature 0F79F2AFB2351BC29678544591F9C1EC125FD8DB is found in the
>  KEYS file, and on https://keyserver.ubuntu.com/
>  * The source file is consistent with the GitHub tag v3.0.2-rc1, which
>  corresponds to commit 4c6be836e6c0f36ef5711f12d7b935254e7d248d
>  - The files explicitly excluded by create_pristine_sources (such as
>  .gitignore and the submodule tools/releasing/shared) are not present.
>  * Has a LICENSE file and a NOTICE file
>  * Does not contain any compiled binaries.
> 
> 
> 
>  I did a simple smoke test on an emulated Pub/Sub with the 1.18 version.
> 
>  All my best, Ryan Skraba
> 
> >>
> >>
>


Re: [DISCUSS] FLIP-389: Annotate SingleThreadFetcherManager and FutureCompletingBlockingQueue as PublicEvolving

2023-11-09 Thread Martijn Visser
Hi all,

I'm looking at the original Jira that introduced these stability
designations [1] and I'm just curious if it was intended that these
Internal classes would be used directly, or if we just haven't created
the right abstractions? The reason for asking is because moving
something from Internal to a public designation is an easy fix, but I
want to make sure that it's also the right fix. If we are missing good
abstractions, then I would rather invest in those.

Best regards,

Martijn

[1] https://issues.apache.org/jira/browse/FLINK-22358

On Wed, Nov 8, 2023 at 12:40 PM Leonard Xu  wrote:
>
> Thanks Hongshun for starting this discussion.
>
> +1 from my side.
>
> IIRC, @Jiangjie(Becket) also mentioned this in FLINK-31324 comment[1].
>
> Best,
> Leonard
>
> [1] 
> https://issues.apache.org/jira/browse/FLINK-31324?focusedCommentId=17696756&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17696756
>
>
>
> > 2023年11月8日 下午5:42,Hongshun Wang  写道:
> >
> > Hi devs,
> >
> > I would like to start a discussion on FLIP-389: Annotate
> > SingleThreadFetcherManager and FutureCompletingBlockingQueue as
> > PublicEvolving.[
> > 
> > 1].
> >
> > Though the SingleThreadFetcherManager is annotated as Internal, it actually
> > acts as some-degree public API, which is widely used in many connector
> > projects: flink-cdc-connector
> > 
> > , flink-connector-mongodb
> > 
> > and
> > soon.
> >
> > Moreover, even the constructor of SingleThreadMultiplexSourceReaderBase
> > (which is PublicEvolving) includes the params of SingleThreadFetcherManager
> > and FutureCompletingBlockingQueue.  That means that the
> > SingleThreadFetcherManager  and FutureCompletingBlockingQueue have already
> > been exposed to users for a long time and are widely used.
> >
> > Considering that all source implementations are using them de facto, why
> > not annotate SingleThreadFetcherManager and FutureCompletingBlockingQueue
> > as PublicEvolving so that developers will modify it more carefully to avoid
> > any potential issues.  As shown in FLINK-31324[2], FLINK-28853[3] used
> > to change the default constructor of SingleThreadFetcherManager. However,
> > it influenced a lot. Finally, the former constructor was added back and
> > marked as Deprecated。
> >
> > In conclusion, the goal of this FLIP is to annotate
> > SingleThreadFetcherManager(includes its parent class) and
> > FutureCompletingBlockingQueue as PublicEvolving.
> >
> > Looking forward to hearing from you.
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465498
> >
> > [2] https://issues.apache.org/jira/browse/FLINK-31324
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-28853
>


[jira] [Created] (FLINK-33502) HybridShuffleITCase caused a fatal error

2023-11-09 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33502:
-

 Summary: HybridShuffleITCase caused a fatal error
 Key: FLINK-33502
 URL: https://issues.apache.org/jira/browse/FLINK-33502
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Affects Versions: 1.19.0
Reporter: Matthias Pohl


[https://github.com/XComp/flink/actions/runs/6789774296/job/18458197040#step:12:9177]
{code:java}
Error: 21:21:35 21:21:35.379 [ERROR] Error occurred in starting fork, check 
output in log
9168Error: 21:21:35 21:21:35.379 [ERROR] Process Exit Code: 239
9169Error: 21:21:35 21:21:35.379 [ERROR] Crashed tests:
9170Error: 21:21:35 21:21:35.379 [ERROR] 
org.apache.flink.test.runtime.HybridShuffleITCase
9171Error: 21:21:35 21:21:35.379 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
9172Error: 21:21:35 21:21:35.379 [ERROR] Command was /bin/sh -c cd 
/root/flink/flink-tests && /usr/lib/jvm/jdk-11.0.19+7/bin/java -XX:+UseG1GC 
-Xms256m -XX:+IgnoreUnrecognizedVMOptions 
--add-opens=java.base/java.util=ALL-UNNAMED 
--add-opens=java.base/java.io=ALL-UNNAMED -Xmx1536m -jar 
/root/flink/flink-tests/target/surefire/surefirebooter10811559899200556131.jar 
/root/flink/flink-tests/target/surefire 2023-11-07T20-32-50_466-jvmRun4 
surefire6242806641230738408tmp surefire_1603959900047297795160tmp
9173Error: 21:21:35 21:21:35.379 [ERROR] Error occurred in starting fork, check 
output in log
9174Error: 21:21:35 21:21:35.379 [ERROR] Process Exit Code: 239
9175Error: 21:21:35 21:21:35.379 [ERROR] Crashed tests:
9176Error: 21:21:35 21:21:35.379 [ERROR] 
org.apache.flink.test.runtime.HybridShuffleITCase
9177Error: 21:21:35 21:21:35.379 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:532)
9178Error: 21:21:35 21:21:35.379 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:479)
9179Error: 21:21:35 21:21:35.379 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:322)
9180Error: 21:21:35 21:21:35.379 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:266)
[...] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33503) Upgrade Maven wrapper to 3.2.0

2023-11-09 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33503:
-

 Summary: Upgrade Maven wrapper to 3.2.0
 Key: FLINK-33503
 URL: https://issues.apache.org/jira/browse/FLINK-33503
 Project: Flink
  Issue Type: Improvement
  Components: Build System / CI
Affects Versions: 1.17.1, 1.18.0, 1.19.0
Reporter: Matthias Pohl


It downloads some binaries to execute
in maven-wrapper 3.2.0 there was added checksum check, should we also leverage 
this feature[1] before execution of downloaded binaries?
[1] https://issues.apache.org/jira/browse/MWRAPPER-75



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33504) Supported parallel jobs

2023-11-09 Thread Matthias Pohl (Jira)
Matthias Pohl created FLINK-33504:
-

 Summary: Supported parallel jobs
 Key: FLINK-33504
 URL: https://issues.apache.org/jira/browse/FLINK-33504
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / CI
Reporter: Matthias Pohl


{quote}
Up to 10 free Microsoft-hosted parallel jobs that can run for up to 360 minutes 
(6 hours) each time
{quote}
Azure CI allows up to 10 parallel jobs for public repos 
([source|https://learn.microsoft.com/en-us/azure/devops/pipelines/licensing/concurrent-jobs?view=azure-devops&tabs=ms-hosted]).

Looks like GHA allows up to 256 parallel jobs 
([source|https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs]):

{quote}
A matrix will generate a maximum of 256 jobs per workflow run.
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33505) switch away from using netty 3 based Pekko Classic Remoting

2023-11-09 Thread PJ Fanning (Jira)
PJ Fanning created FLINK-33505:
--

 Summary: switch away from using netty 3 based Pekko Classic 
Remoting
 Key: FLINK-33505
 URL: https://issues.apache.org/jira/browse/FLINK-33505
 Project: Flink
  Issue Type: Improvement
Reporter: PJ Fanning


It is my understanding that Flink uses the Netty 3 based Pekko Classic Remoting.

Netty 3 has a lot of security issues.

It will be months before Pekko 1.1.0 is released but that switches Classic 
Remoting to use Netty 4.

Akka and Pekko actually recommend that users switch to using Artery based 
communications.

Even if you wait for Pekko 1.1.0, the new Netty 4 based classic remoting will 
need to be tested.

There is also the option of dropping Pekko - FLINK-29281

If you don't want to try Artery and don't want to wait for Pekko 1.1.0, you 
might be able to copy over 5 classes that add Netty 4 support and update your 
application.conf. This would be approximately 
https://github.com/apache/incubator-pekko/pull/778. There is a bit more work to 
do in terms of debugging the test failure and it seems that this change is 
unlikely to be merged back to the Pekko 1.0.x line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33506) Make AWS connectors compilable with jdk17

2023-11-09 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-33506:
---

 Summary: Make AWS connectors compilable with jdk17
 Key: FLINK-33506
 URL: https://issues.apache.org/jira/browse/FLINK-33506
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / AWS
Reporter: Sergey Nuyanzin
Assignee: Sergey Nuyanzin


Since 1.18 Flink with jdk 17 support is released it would make sense to add 
such support for connectors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Release flink-shaded 16.2

2023-11-09 Thread Yuxin Tan
Hi, Weijie,

Thank you for volunteering to be a release manager.

Currently, no objections are received. If there are no objections, we
will release flink-shaded 16.2 next week.

Best,
Yuxin


weijie guo  于2023年11月8日周三 11:11写道:

> Thanks Yuxin for driving this!
>
> I am willing to help release this. :)
>
> Best regards,
>
> Weijie
>
>
> Yuxin Tan  于2023年11月6日周一 15:45写道:
>
>> Hi, all,
>>
>> I would like to discuss creating a new 16.2 patch release for
>> flink-shaded[1].
>>
>> In our ARM environment, we recently encountered a critical bug
>> (FLINK-33417[2])
>> while using flink-shaded 16.1 in flink 1.17. The issue has been discussed
>> in the
>> discussion[3]. Fortunately, we could resolve this bug by updating the
>> netty version.
>>
>> To address this issue comprehensively, we need to release a new minor
>> version for
>> flink-shaded. As a result, Flink 1.17 should update its dependent
>> flink-shaded version
>> accordingly. However, there is no need for Flink 1.18+ to update the
>> flink-shaded
>> version since these newer versions already depend on flink-shaded 17.0+
>> and have
>> updated their netty version, thereby eliminating the existence of this
>> bug.
>>
>> Considering the low probability of this issue occurring, I do not
>> recommend
>> considering it as a blocker for the Flink 1.17 release. Furthermore, the
>> external connector has been decoupled from flink-shaded in FLINK-33190[4],
>> and the compatibility of the newer netty version 4.1.91 has been verified
>> in the new
>> Flink 1.18+(which is dependent on flink-shaded 17.0 and it is dependent
>> on
>> netty 4.1.91). Hence, we believe this update will not cause any
>> compatibility issues.
>>
>> If there are no objections, we would greatly appreciate it if a committer
>> could
>> volunteer as the release manager. Thank you.
>>
>> [1] https://github.com/apache/flink-shaded
>> [2] https://issues.apache.org/jira/browse/FLINK-33417
>> [3] https://lists.apache.org/thread/y1c8545bcsx2836d9pgfdzj65knvw7kb
>> [4] https://issues.apache.org/jira/browse/FLINK-33190
>>
>> Best,
>> Yuxin
>>
>


[jira] [Created] (FLINK-33507) JsonToRowDataConverters can't parse zero timestamp '0000-00-00 00:00:00'

2023-11-09 Thread jinzhuguang (Jira)
jinzhuguang created FLINK-33507:
---

 Summary: JsonToRowDataConverters can't parse zero timestamp  
'-00-00 00:00:00'
 Key: FLINK-33507
 URL: https://issues.apache.org/jira/browse/FLINK-33507
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.16.0
 Environment: Flink 1.16.0
Reporter: jinzhuguang


When I use Flink CDC to synchronize data from MySQL, Kafka is used to store 
data in JSON format. But when I read data from Kafka, I found that the 
Timestamp type data "-00-00 00:00:00" in MySQL could not be parsed by 
Flink, and the error was reported as follows:

Caused by: 
org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: Fail 
to deserialize at field: data.
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
    at 
org.apache.flink.formats.json.JsonRowDataDeserializationSchema.convertToRowData(JsonRowDataDeserializationSchema.java:131)
    at 
org.apache.flink.formats.json.canal.CanalJsonDeserializationSchema.deserialize(CanalJsonDeserializationSchema.java:234)
    ... 17 more
Caused by: 
org.apache.flink.formats.json.JsonToRowDataConverters$JsonParseException: Fail 
to deserialize at field: update_time.
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:354)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createArrayConverter$94141d67$1(JsonToRowDataConverters.java:304)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
    ... 20 more
Caused by: java.time.format.DateTimeParseException: Text '-00-00 00:00:00' 
could not be parsed: Invalid value for MonthOfYear (valid values 1 - 12): 0
    at 
java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1781)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.convertToTimestamp(JsonToRowDataConverters.java:224)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$wrapIntoNullableConverter$de0b9253$1(JsonToRowDataConverters.java:380)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.convertField(JsonToRowDataConverters.java:370)
    at 
org.apache.flink.formats.json.JsonToRowDataConverters.lambda$createRowConverter$ef66fe9a$1(JsonToRowDataConverters.java:350)
    ... 25 more
Caused by: java.time.DateTimeException: Invalid value for MonthOfYear (valid 
values 1 - 12): 0
    at java.time.temporal.ValueRange.checkValidIntValue(ValueRange.java:330)
    at java.time.temporal.ChronoField.checkValidIntValue(ChronoField.java:722)
    at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:550)
    at java.time.chrono.IsoChronology.resolveYMD(IsoChronology.java:123)
    at 
java.time.chrono.AbstractChronology.resolveDate(AbstractChronology.java:472)
    at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:492)
    at java.time.chrono.IsoChronology.resolveDate(IsoChronology.java:123)
    at java.time.format.Parsed.resolveDateFields(Parsed.java:351)
    at java.time.format.Parsed.resolveFields(Parsed.java:257)
    at java.time.format.Parsed.resolve(Parsed.java:244)
    at 
java.time.format.DateTimeParseContext.toResolved(DateTimeParseContext.java:331)
    at 
java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1955)
    at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1777)
    ... 29 more

Usually MySQL allows the server and client to parse this type of data and treat 
it as NULL, so I think Flink should also support it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33508) Support for wildcard paths in Flink History Server for multi cluster environment

2023-11-09 Thread Jayadeep Jayaraman (Jira)
Jayadeep Jayaraman created FLINK-33508:
--

 Summary: Support for wildcard paths in Flink History Server for 
multi cluster environment
 Key: FLINK-33508
 URL: https://issues.apache.org/jira/browse/FLINK-33508
 Project: Flink
  Issue Type: Improvement
Reporter: Jayadeep Jayaraman


In Cloud users typically create multiple clusters which are ephemeral and want 
a single history server to look at historical jobs.

To implement this history server needs to support wildcard paths and this 
change is to support such wildcard paths



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] FLIP-381: Deprecate configuration getters/setters that return/set complex Java objects

2023-11-09 Thread Junrui Lee
Hi everyone,

Thank you to everyone for the feedback on FLIP-381: Deprecate configuration
getters/setters that return/set complex Java objects[1] which has been
discussed in this thread [2].

I would like to start a vote for it. The vote will be open for at least 72
hours (excluding weekends) unless there is an objection or not enough votes.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278464992
[2]https://lists.apache.org/thread/y5owjkfxq3xs9lmpdbl6d6jmqdgbjqxo


[jira] [Created] (FLINK-33509) flaky test testNodeAffinity() in InitTaskManagerDecoratorTest.java

2023-11-09 Thread Ruby (Jira)
Ruby created FLINK-33509:


 Summary: flaky test testNodeAffinity() in 
InitTaskManagerDecoratorTest.java
 Key: FLINK-33509
 URL: https://issues.apache.org/jira/browse/FLINK-33509
 Project: Flink
  Issue Type: Bug
 Environment: Java 11
Reporter: Ruby


When applying Nondex to the test, the NodeSelectorRequirement object shows 
nondeterminism. When testing, we assume that requirement would be equal to 
expected_requirement, both of them are the instance of NodeSelectorRequirement 
object. The NodeSelectorRequirement object has three attributes, including key, 
operator, and values ​​list.  It is possible to get values list's elements in 
order `[blockedNode1, blockedNode2]`, while the expected result is 
`[blockedNode2, blockedNode1]` which is incorrect. 

 

The root cause appeared in line 56 of `KubernetesTaskManagerTestBase.java`. 
(flink-kubernetes/src/test/java/org/apache/flink/kubernetes/kubeclient/KubernetesTaskManagerTestBase.java)
 Here we define `BLOCKED_NODES` as a new `hashSet`. In 
`InitTaskManagerDecoratorTest.java`, when initializing the 
`expected_requirement` in the test, the values ​​being passed was 
this`BLOCKED_NODES`, which is an **unordered Set**. Later, the code convert 
this **hashSet** into **arrayList**, which led to the unstable result of the 
values list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-389: Annotate SingleThreadFetcherManager and FutureCompletingBlockingQueue as PublicEvolving

2023-11-09 Thread Hongshun Wang
@Martijn, I agree with you.


I also have two questions at the beginning:

   - Why is an Internal class
   exposed as a constructor param of a Public class?
   - Should these classes be exposed as public?

For the first question,  I noticed that before the original Jira[1] ,
all these classes missed the annotate , so it was not abnormal that
FutureCompletingBlockingQueue and SingleThreadFetcherManager were
constructor params of SingleThreadMultiplexSourceReaderBase.
 However,
this jira marked FutureCompletingBlockingQueue and
SingleThreadFetcherManager as Internal, while marked
SingleThreadMultiplexSourceReaderBase as Public. It's a good choice,
but also forget that FutureCompletingBlockingQueue and
SingleThreadFetcherManager have already been  exposed by
SingleThreadMultiplexSourceReaderBase.
 Thus, this problem occurs because we didn't
clearly define the boundaries at the origin design. We should pay more
attention to it when creating a new class.


For the second question, I think at least SplitFetcherManager
should be Public. There are few reasons:

   -  Connector developers want to decide their own
   thread mode. For example, Whether to recycle fetchers by overriding
   SplitFetcherManager#maybeShutdownFinishedFetchers
   when idle. Sometimes, developers want SplitFetcherManager react as a
   FixedThreadPool, because
   each time a thread is recycled then recreated, the context
resources need to be rebuilt. I met a related issue in flink cdc[2].
   -
   KafkaSourceFetcherManager[3] also  extends
SingleThreadFetcherManager to commitOffsets. But now kafka souce is
not in Flink repository, so it's not allowed any more.

[1] https://issues.apache.org/jira/browse/FLINK-22358

[2]
https://github.com/ververica/flink-cdc-connectors/pull/2571#issuecomment-1797585418

[3]
https://github.com/apache/flink-connector-kafka/blob/979791c4c71e944c16c51419cf9a84aa1f8fea4c/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/fetcher/KafkaSourceFetcherManager.java#L52

Looking forward to hearing from you.

Best regards,
Hongshun

On Thu, Nov 9, 2023 at 11:46 PM Martijn Visser 
wrote:

> Hi all,
>
> I'm looking at the original Jira that introduced these stability
> designations [1] and I'm just curious if it was intended that these
> Internal classes would be used directly, or if we just haven't created
> the right abstractions? The reason for asking is because moving
> something from Internal to a public designation is an easy fix, but I
> want to make sure that it's also the right fix. If we are missing good
> abstractions, then I would rather invest in those.
>
> Best regards,
>
> Martijn
>
> [1] https://issues.apache.org/jira/browse/FLINK-22358
>
> On Wed, Nov 8, 2023 at 12:40 PM Leonard Xu  wrote:
> >
> > Thanks Hongshun for starting this discussion.
> >
> > +1 from my side.
> >
> > IIRC, @Jiangjie(Becket) also mentioned this in FLINK-31324 comment[1].
> >
> > Best,
> > Leonard
> >
> > [1]
> https://issues.apache.org/jira/browse/FLINK-31324?focusedCommentId=17696756&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17696756
> >
> >
> >
> > > 2023年11月8日 下午5:42,Hongshun Wang  写道:
> > >
> > > Hi devs,
> > >
> > > I would like to start a discussion on FLIP-389: Annotate
> > > SingleThreadFetcherManager and FutureCompletingBlockingQueue as
> > > PublicEvolving.[
> > > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465498
> >
> > > 1].
> > >
> > > Though the SingleThreadFetcherManager is annotated as Internal, it
> actually
> > > acts as some-degree public API, which is widely used in many connector
> > > projects: flink-cdc-connector
> > > <
> https://github.com/ververica/flink-cdc-connectors/blob/release-2.3.0/flink-connector-mysql-cdc/src/main/java/com/ververica/cdc/connectors/mysql/source/reader/MySqlSourceReader.java#L93
> >
> > > , flink-connector-mongodb
> > > <
> https://github.com/apache/flink-connector-mongodb/blob/main/flink-connector-mongodb/src/main/java/org/apache/flink/connector/mongodb/source/reader/MongoSourceReader.java#L58
> >
> > > and
> > > soon.
> > >
> > > Moreover, even the constructor of SingleThreadMultiplexSourceReaderBase
> > > (which is PublicEvolving) includes the params of
> SingleThreadFetcherManager
> > > and FutureCompletingBlockingQueue.  That means that the
> > > SingleThreadFetcherManager  and FutureCompletingBlockingQueue have
> already
> > > been exposed to users for a long time and are widely used.
> > >
> > > Considering that all source implementations are using them de facto,
> why
> > > not annotate SingleThreadFetcherManager and
> FutureCompletingBlockingQueue
> > > as PublicEvolving so that developers will modify it more carefully to
> avoid
> > > any potential issues.  As shown in FLINK-31324[2], FLINK-28853[3] used
> > > to change the default constructor of SingleThreadFetcherManager.
> However,
> > > it influenced a lot. Finally, the former constructor was added back and
> > > marked as De

[jira] [Created] (FLINK-33510) Update plugin for SBOM generation to 2.7.10

2023-11-09 Thread Vinod Anandan (Jira)
Vinod Anandan created FLINK-33510:
-

 Summary: Update plugin for SBOM generation to 2.7.10
 Key: FLINK-33510
 URL: https://issues.apache.org/jira/browse/FLINK-33510
 Project: Flink
  Issue Type: Improvement
Reporter: Vinod Anandan


Update the CycloneDX Maven plugin for SBOM generation to 2.7.10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33511) flink SqlGateway select bigint type column get cast exception

2023-11-09 Thread xiaodao (Jira)
xiaodao created FLINK-33511:
---

 Summary: flink SqlGateway select bigint type column get cast 
exception
 Key: FLINK-33511
 URL: https://issues.apache.org/jira/browse/FLINK-33511
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Gateway
Affects Versions: 1.18.0
Reporter: xiaodao


when i open a beeline client connect to flink sqlgateway;

i create table like
{code:java}
//代码占位符
CREATE TABLE Orders (
order_number BIGINT,
priceDECIMAL(32,2),
buyerROW,
order_time   TIMESTAMP(3)
) WITH (
  'connector' = 'datagen'
) {code}
and then select * from Orders;

i got exception:

java.lang.Long cannot be cast to org.apache.flink.table.data.StringData



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-379: Dynamic source parallelism inference for batch jobs

2023-11-09 Thread Xia Sun
 Thanks Leonard for the feedback and sorry for my late response.

> `How user disable the parallelism inference if they want to use fixed
source parallelism?`
> `Could you explain the priority the static parallelism set from table
layer and the proposed dynamic source parallelism?`

>From the user's perspective, if the user specifies a fixed parallelism for
the source, dynamic source parallelism inference will be automatically
disabled. From the perspective of priority, the user’s specified
parallelism > the static parallelism inference > dynamic parallelism
inference. Because the dynamic source parallelism inference will take
effect at the runtime stage and the validity conditions are: (1) the
current ExecutionGraph is a dynamic graph, and (2) the parallelism of the
source vertex is not specified (that is, the parallelism is -1).

> `the workflow for streaming job may looks like ... which is totally
different, the later one lacks a lot of infra in Flink, right?`

Indeed, as of now, the dynamic parallelism inference is exclusively for
batch jobs, so it only takes into account the necessary information for
batch scenarios. In the future, when we introduce support for automatic
parallelism inference in streaming jobs, we can include the required
information for streaming jobs to avoid unnecessarily complicating the
current design.
Moreover, The workflow you mentioned seems a bit complicated. Our current
idea is to perform the parallelism inference during the initialization
phase of streaming jobs and proceed to schedule the entire job once the
source parallelism is determined. This process will naturally occur during
job startup, eliminating the need for additional restarts.

> `So, could we consider the boundness info when design the interface? Both
FileSource and Hive Source offer streaming read ability, imaging this case:
Flink Streaming Hive Source should not apply the dynamic source parallelism
even it implemented the feature as it severing a streaming job.`

Thanks for your feedback, it is reallly a good input. Currently, the
dynamic parallelism inference logic is only triggered in batch jobs.
Therefore, the logic will not be called in the streaming jobs.
In the future, if streaming jobs also support runtime parallelism
inference, then theoretically, the source can no longer be distinguished
between streaming jobs and batch jobs at the runtime stage. In addition,
since the new interface is implemented together with the Source interface,
the Source::getBoundedness() method can also be obtained when inferring
parallelism.

Best regards,
Xia

Leonard Xu  于2023年11月8日周三 16:19写道:

> Thanks Xia and Zhu Zhu for kickoff this discussion.
>
> The dynamic source parallelism inference is a useful feature for batch
> story. I’ve some comments about current design.
>
> 1.How user disable the parallelism inference if they want to use fixed
> source parallelism? They can configure fixed parallelism in table layer
> currently as you explained above.
>
> 2.Could you explain the priority the static parallelism set from table
> layer and the proposed dynamic source parallelism? And changing the default
> value `table.exec.hive.infer-source-parallelism` as a sub-task does not
> resolve all case, because other Sources can set their own parallelism too.
>
> 3.Current design only works for batch josb, the workflow for streaming job
> may looks like (1) inference  parallelism for streaming source like kafka
> (2) stop job with a savepoint  (3) apply new parallelism for job (4)
> schedule the streaming job from savepoint which is totally different, the
> later one lacks a lot of infra in Flink, right?  So, could we consider the
> boundness info when design the interface? Both FileSource and Hive Source
> offer streaming read ability, imaging this case: Flink Streaming Hive
> Source should not apply the dynamic source parallelism even it implemented
> the feature as it severing a streaming job.
>
> Best,
> Leonard
>
>
> > 2023年11月1日 下午6:21,Xia Sun  写道:
> >
> > Thanks Lijie for the comments!
> > 1. For Hive source, dynamic parallelism inference in batch scenarios is a
> > superset of static parallelism inference. As a follow-up task, we can
> > consider changing the default value of
> > 'table.exec.hive.infer-source-parallelism' to false.
> >
> > 2. I think that both dynamic parallelism inference and static parallelism
> > inference have their own use cases. Currently, for streaming sources and
> > other sources that are not sensitive to dynamic information, the benefits
> > of dynamic parallelism inference may not be significant. In such cases,
> we
> > can continue to use static parallelism inference.
> >
> > Thanks,
> > Xia
> >
> > Lijie Wang  于2023年11月1日周三 14:52写道:
> >
> >> Hi Xia,
> >>
> >> Thanks for driving this FLIP, +1 for the proposal.
> >>
> >> I have 2 questions about the relationship between static inference and
> >> dynamic inference:
> >>
> >> 1. AFAIK, currently the hive table source enable static inference by
> >>

Re: Adding a new channel to Apache Flink slack workspace

2023-11-09 Thread Yun Tang
+1 and glad to see more guys are using PyFlink.

Best
Yun Tang

From: Jing Ge 
Sent: Wednesday, November 8, 2023 3:18
To: ro...@decodable.co.invalid 
Cc: dev@flink.apache.org 
Subject: Re: Adding a new channel to Apache Flink slack workspace

+1 since there are so many questions wrt PyFlink.

Best regards,
Jing

On Tue, Nov 7, 2023 at 2:23 AM Robin Moffatt 
wrote:

> Since there have been no objections, can we go ahead and get this channel
> created please?
>
> thanks :)
>
> On Thu, 26 Oct 2023 at 16:00, Alexander Fedulov <
> alexander.fedu...@gmail.com>
> wrote:
>
> > +1
> > I agree that the topic is distinct enough to justify a dedicated channel
> > and this could lead to more active participation from people who work on
> > it.
> >
> > Best,
> > Alexander
> >
> > On Wed, 25 Oct 2023 at 20:03, Robin Moffatt  wrote:
> >
> > > Hi,
> > >
> > > I'd like to propose adding a PyFlink channel to the Apache Flink slack
> > > workspace.
> > >
> > > By creating a channel focussed on this it will help people find
> previous
> > > discussions as well as target new discussions and questions to the
> > correct
> > > place. PyFlink is a sufficiently distinct component to make a dedicated
> > > channel viable and useful IMHO.
> > >
> > > There was a brief discussion of this on Slack already, the archive for
> > > which can be found here:
> > >
> > >
> >
> https://www.linen.dev/s/apache-flink/t/16006099/re-surfacing-for-the-admins-https-apache-flink-slack-com-arc#1c7e-177a-4c37-8a34-a917883152ac
> > >
> > > thanks,
> > >
> > > Robin.
> > >
> >
>


[jira] [Created] (FLINK-33512) Update download link in doc of Kafka connector

2023-11-09 Thread Qingsheng Ren (Jira)
Qingsheng Ren created FLINK-33512:
-

 Summary: Update download link in doc of Kafka connector
 Key: FLINK-33512
 URL: https://issues.apache.org/jira/browse/FLINK-33512
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Documentation
Affects Versions: kafka-3.0.1
Reporter: Qingsheng Ren


Currently the download link of Kafka connector in documentations points to a 
non-existed version `1.18.0`:

DataStream API: 
[https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/datastream/kafka/]

Table API Kafka: 
[https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/kafka/]

Table API Upsert Kafka: 
[https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/upsert-kafka/]

The latest version should be 3.0.1-1.17 and 3.0.1-1.18.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)