[jira] [Assigned] (FLINK-32445) BlobStore.closeAndCleanupAllData doesn't do any close action

2023-10-04 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-32445:
-

Assignee: Jiabao Sun

> BlobStore.closeAndCleanupAllData doesn't do any close action
> 
>
> Key: FLINK-32445
> URL: https://issues.apache.org/jira/browse/FLINK-32445
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Runtime / Coordination
>Affects Versions: 1.18.0
>Reporter: Matthias Pohl
>Assignee: Jiabao Sun
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available, starter
>
> We might want to refactor {{BlobStore.closeAndCleanupAllData}}: It doesn't 
> close any resources (and doesn't need to). Therefore, renaming the interfaces 
> method to {{cleanAllData}} seems to be more appropriate.
> This enables us to remove redundant code in 
> {{AbstractHaServices.closeAndCleanupAllData}} and {{AbstractHaServices.close}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-32445][runtime] Refactor BlobStoreService's closeAndCleanupAllData to cleanupAllData [flink]

2023-10-04 Thread via GitHub


XComp commented on code in PR #23424:
URL: https://github.com/apache/flink/pull/23424#discussion_r1345334927


##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/AbstractHaServices.java:
##
@@ -192,32 +192,18 @@ public void closeAndCleanupAllData() throws Exception {
 try {
 internalCleanup();
 deletedHAData = true;
+blobStoreService.cleanupAllData();
 } catch (Exception t) {
 exception = t;
 }
 
-try {
-if (leaderElectionService != null) {
-leaderElectionService.close();
-}
-} catch (Throwable t) {
-exception = ExceptionUtils.firstOrSuppressed(t, exception);
+if (!deletedHAData) {
+logger.info(
+"Cannot delete HA blobs because we failed to delete the 
pointers in the HA store.");
 }
 
 try {
-internalClose();
-} catch (Throwable t) {
-exception = ExceptionUtils.firstOrSuppressed(t, exception);
-}
-
-try {
-if (deletedHAData) {
-blobStoreService.closeAndCleanupAllData();
-} else {
-logger.info(
-"Cannot delete HA blobs because we failed to delete 
the pointers in the HA store.");
-blobStoreService.close();
-}
+close();

Review Comment:
   I'm wondering whether we should apply the same pattern to the 
`HighAvailabilityServices` interface. There we have two methods `close()` and 
`closeAndCleanupAllData()` which are then called 
[MiniCluster#terminateMiniClusterServices(boolean):1281ff](https://github.com/apache/flink/blob/603181da811edb47c0d573492639a381fbbedc28/flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java#L1281)
 and 
[ClusterEntrypoint#stopClusterServices(boolean):503ff](https://github.com/apache/flink/blob/603181da811edb47c0d573492639a381fbbedc28/flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/ClusterEntrypoint.java#L503)
 respectively.
   
   Instead, `HighAvailabilityServices` could implement `AutoCloseable` and 
`closeAndCleanupAllData` will be refactored to `cleanupAllData()`. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33007] Integrate autoscaler config validation into the general validator flow [flink-kubernetes-operator]

2023-10-04 Thread via GitHub


gyfora commented on code in PR #682:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/682#discussion_r1345402764


##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/validation/AutoScalerValidator.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.autoscaler.validation;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.kubernetes.operator.api.FlinkDeployment;
+import org.apache.flink.kubernetes.operator.api.FlinkSessionJob;
+import org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentSpec;
+import 
org.apache.flink.kubernetes.operator.autoscaler.config.AutoScalerOptions;
+import org.apache.flink.kubernetes.operator.utils.FlinkUtils;
+import org.apache.flink.kubernetes.operator.validation.FlinkResourceValidator;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Optional;
+
+/** Validator implementation for the AutoScaler from {@link Configuration}. */
+public class AutoScalerValidator implements FlinkResourceValidator {
+
+private static final Logger LOG = 
LoggerFactory.getLogger(FlinkUtils.class);
+
+@Override
+public Optional validateSessionJob(
+FlinkSessionJob sessionJob, Optional session) {
+LOG.info("AutoScaler Configurations validator {}", sessionJob);
+var spec = sessionJob.getSpec();
+if (spec.getFlinkConfiguration() != null) {
+var flinkConfiguration = 
Configuration.fromMap(spec.getFlinkConfiguration());
+return validateAutoScalerFlinkConfiguration(flinkConfiguration);
+}
+return Optional.empty();
+}
+
+@Override
+public Optional validateDeployment(FlinkDeployment deployment) {
+LOG.info("AutoScaler Configurations validator {}", deployment);
+FlinkDeploymentSpec spec = deployment.getSpec();
+if (spec.getFlinkConfiguration() != null) {
+var flinkConfiguration = 
Configuration.fromMap(spec.getFlinkConfiguration());
+return validateAutoScalerFlinkConfiguration(flinkConfiguration);
+}
+return Optional.empty();
+}
+
+private Optional validateAutoScalerFlinkConfiguration(
+Configuration flinkConfiguration) {
+return firstPresent(
+validateNumber(flinkConfiguration, 
AutoScalerOptions.MAX_SCALE_DOWN_FACTOR, 0.0d),
+validateNumber(flinkConfiguration, 
AutoScalerOptions.MAX_SCALE_UP_FACTOR, 1.0d),
+validateNumber(flinkConfiguration, 
AutoScalerOptions.TARGET_UTILIZATION, 0.0d),
+validateNumber(
+flinkConfiguration,
+AutoScalerOptions.TARGET_UTILIZATION_BOUNDARY,
+0.0d,
+1.0d));

Review Comment:
   I think we should not validate the max here.



##
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/validation/AutoScalerValidator.java:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.kubernetes.operator.autoscaler.validation;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.kubernetes.

Re: [PR] [FLINK-33095] Job jar issue should be reported as BAD_REQUEST instead of INTERNAL_SERVER_ERROR. [flink]

2023-10-04 Thread via GitHub


XComp commented on code in PR #23428:
URL: https://github.com/apache/flink/pull/23428#discussion_r1345443798


##
flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/utils/JarHandlerUtils.java:
##
@@ -189,7 +189,11 @@ public PackagedProgram toPackagedProgram(Configuration 
configuration) {
 .setArguments(programArgs.toArray(new String[0]))
 .build();
 } catch (final ProgramInvocationException e) {
-throw new CompletionException(e);
+throw new CompletionException(

Review Comment:
   I'm not sure whether we assume here that the `ProgramInvocationException` is 
caused by something the user provided (which would allow the BAD_REQUEST 
response). 
[PackagedProgram#extractContainedLibraries](https://github.com/apache/flink/blob/c08bef1f7913eb1c416c278354fd62b82b172549/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java#L554)
 throws a `ProgramInvocationException` for unknown IO errors, for instance, 
which would be actually an internal issue, wouldn't it?



##
flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/AbstractHandler.java:
##
@@ -242,8 +242,14 @@ private CompletableFuture handleException(
 return CompletableFuture.completedFuture(null);
 }
 int maxLength = flinkHttpObjectAggregator.maxContentLength() - 
OTHER_RESP_PAYLOAD_OVERHEAD;
-if (throwable instanceof RestHandlerException) {
-RestHandlerException rhe = (RestHandlerException) throwable;
+if (throwable instanceof RestHandlerException
+|| throwable.getCause() instanceof RestHandlerException) {

Review Comment:
   That's a strange change (checking for both, the exception and the 
exception's cause), don't you think? It might be an indication that we should 
do the change somewhere else. WDYT? :thinking: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (FLINK-30025) Unified the max display column width for SqlClient and Table APi in both Streaming and Batch execMode

2023-10-04 Thread Jing Ge (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Ge resolved FLINK-30025.
-
Resolution: Fixed

> Unified the max display column width for SqlClient and Table APi in both 
> Streaming and Batch execMode
> -
>
> Key: FLINK-30025
> URL: https://issues.apache.org/jira/browse/FLINK-30025
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Affects Versions: 1.16.0, 1.15.2
>Reporter: Jing Ge
>Assignee: Jing Ge
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.0
>
>
> FLIP-279 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-279+Unified+the+max+display+column+width+for+SqlClient+and+Table+APi+in+both+Streaming+and+Batch+execMode]
>  
> Background info:
> table.execute().print() can only use the default max column width. When 
> running table API program "table.execute().print();", the columns with long 
> string value are truncated to 30 chars. E.g.,:
> !https://static.dingtalk.com/media/lALPF6XTM7ZO1FXNASrNBEI_1090_298.png_620x1q90.jpg?auth_bizType=%27IM%27&bizType=im|width=457,height=125!
> I tried set the max width with: 
> tEnv.getConfig.getConfiguration.setInteger("sql-client.display.max-column-width",
>  100); It has no effect.  How can I set the max-width?
> Here is the example code:
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> val tEnv = StreamTableEnvironment.create(env)
> tEnv.getConfig.getConfiguration.setInteger("sql-client.display.max-column-width",
>  100)
> val orderA = env
>   .fromCollection(Seq(Order(1L, "beer", 3), Order(1L, 
> "diaper-{-}{{-}}.diaper{{-}}{-}{-}.diaper{-}{-}{{-}}.diaper{{-}}{-}-.", 4), 
> Order(3L, "rubber", 2)))
>   .toTable(tEnv)
> orderA.execute().print()
>  
> "sql-client.display.max-column-width" seems only work in cli: SET 
> 'sql-client.display.max-column-width' = '40';
> While using Table API, by default, the DEFAULT_MAX_COLUMN_WIDTH in PrintStyle 
> is used now. It should be configurable. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33053][zookeeper] Manually remove the leader watcher after ret… [flink]

2023-10-04 Thread via GitHub


XComp commented on code in PR #23415:
URL: https://github.com/apache/flink/pull/23415#discussion_r1345486880


##
flink-runtime/src/main/java/org/apache/flink/runtime/leaderretrieval/ZooKeeperLeaderRetrievalDriver.java:
##
@@ -122,6 +125,19 @@ public void close() throws Exception {
 
client.getConnectionStateListenable().removeListener(connectionStateListener);
 
 cache.close();
+
+try {
+if (client.getZookeeperClient().isConnected()
+&& 
!connectionInformationPath.contains(RESOURCE_MANAGER_NODE)) {

Review Comment:
   @KarmaGYZ Sorry for jumping in that late. I was on vacation for the past 
three weeks. I have a question about this change: Why do we have to filter the 
ResourceManager node here? It feels odd to refer to the ResourceManager in the 
`ZooKeeperLeaderRetrievalDriver`. Knowledge about the ResourceManager component 
should be out-of-scope for the driver implementation. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33113) YARNSessionFIFOITCase.checkForProhibitedLogContents: Interrupted waiting to send RPC request to server

2023-10-04 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-33113:
--
Description: 
This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53197&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=27683

fails as 

{code}
Sep 14 05:52:35 2023-09-14 05:51:31,456 ERROR 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - Exception 
on heartbeat
Sep 14 05:52:35 java.io.InterruptedIOException: Interrupted waiting to send RPC 
request to server
Sep 14 05:52:35 java.io.InterruptedIOException: Interrupted waiting to send RPC 
request to server
Sep 14 05:52:35 at org.apache.hadoop.ipc.Client.call(Client.java:1461) 
~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at org.apache.hadoop.ipc.Client.call(Client.java:1403) 
~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at com.sun.proxy.$Proxy31.allocate(Unknown Source) 
~[?:?]
Sep 14 05:52:35 at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
 ~[hadoop-yarn-common-2.10.2.jar:?]
Sep 14 05:52:35 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) ~[?:1.8.0_292]
Sep 14 05:52:35 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_292]
Sep 14 05:52:35 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_292]
Sep 14 05:52:35 at java.lang.reflect.Method.invoke(Method.java:498) 
~[?:1.8.0_292]
Sep 14 05:52:35 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at com.sun.proxy.$Proxy32.allocate(Unknown Source) 
~[?:?]
Sep 14 05:52:35 at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:297)
 ~[hadoop-yarn-client-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:274)
 [hadoop-yarn-client-2.10.2.jar:?]
Sep 14 05:52:35 Caused by: java.lang.InterruptedException
Sep 14 05:52:35 at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_292]
Sep 14 05:52:35 at 
java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_292]
Sep 14 05:52:35 at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1177) 
~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at org.apache.hadoop.ipc.Client.call(Client.java:1456) 
~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 ... 17 more
{code}

  was:
This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53197&view=logs&j=5cae8624-c7eb-5c51-92d3-4d2dacedd221&t=5acec1b4-945b-59ca-34f8-168928ce5199&l=27683

fails as {noformat}

Sep 14 05:52:35 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
 ~[hadoop-common-2.10.2.jar:?]
Sep 14 05:52:35 at com.sun.proxy.$Proxy32.allocate(Unknown Source) 
~[?:?]
Sep 14 05:52:35 at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:297)
 ~[hadoop-yarn-client-2.10.2.jar:?]
Sep 14 05:52:35 at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:274)
 [hadoop-yarn-client-2.10.2.jar:?]
Sep 14 05:52:35 Caused by: java.lang.InterruptedException
Sep 14 05:52:35 at 
java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) ~[?:1.8.0_292]
Sep 14 05:52:35 at 
java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:1.8.0_292]
Sep 14 05:52:35 at 
org.apache.hadoop.ipc.Client$Connection.s

Re: [PR] [FLINK-33179] Throw exception when serialising or deserialising ExecNode with invalid type [flink]

2023-10-04 Thread via GitHub


twalthr commented on code in PR #23488:
URL: https://github.com/apache/flink/pull/23488#discussion_r1345505603


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/ExecNodeContext.java:
##
@@ -104,9 +104,16 @@ private ExecNodeContext(@Nullable Integer id, String name, 
Integer version) {
 @JsonCreator
 public ExecNodeContext(String value) {
 this.id = null;
-String[] split = value.split("_");
-this.name = split[0];
-this.version = Integer.valueOf(split[1]);
+try {
+String[] split = value.split("_");
+this.name = split[0];
+this.version = Integer.valueOf(split[1]);
+} catch (Exception e) {
+throw new TableException(

Review Comment:
   Not sure if this generic catch all is helpful. Esp instructing users to file 
a ticket although we already know that we serialized an invalid node. Should we 
explicitly check for `name == "null"` and tell users that the node is 
unsupported?



##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/ExecNodeContext.java:
##
@@ -167,12 +174,19 @@ public ExecNodeContext withId(int id) {
  */
 @JsonValue
 public String getTypeAsString() {
+if (name == null || version == null) {
+throw new TableException(
+String.format(
+"Can not serialise ExecNode with id: %d. Missing 
type, this is a bug,"

Review Comment:
   nit: use american english (`Can not serialize`)



##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/serde/ExecNodeGraphJsonSerializerTest.java:
##
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.nodes.exec.serde;
+
+import org.apache.flink.FlinkVersion;
+import org.apache.flink.api.dag.Transformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.table.api.DataTypes;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.planner.delegation.PlannerBase;
+import org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase;
+import org.apache.flink.table.planner.plan.nodes.exec.ExecNodeConfig;
+import org.apache.flink.table.planner.plan.nodes.exec.ExecNodeContext;
+import org.apache.flink.table.planner.plan.nodes.exec.ExecNodeGraph;
+
+import 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.ObjectWriter;
+
+import org.junit.jupiter.api.Test;
+
+import java.util.Collections;
+
+import static org.assertj.core.api.Assertions.assertThatThrownBy;
+
+/** Tests for {@link ExecNodeGraphJsonSerializer}. */
+class ExecNodeGraphJsonSerializerTest {
+
+@Test
+void testSerialisingUnsupportedNode() {

Review Comment:
   `testSerializing`



##
flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/plan/nodes/exec/UnsupportedNodesInPlanTest.java:
##
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.plan.nodes.exec;
+
+import org.apache.flink.table.api.EnvironmentSettings;
+import org.apache.flink.table.api.PlanReference;
+import org.apache.flink.table.api.TableEnvironment;
+import org.apache.flink.table.planner.utils.TableTestBase;
+
+import org.junit.Test;
+
+import static org

Re: [PR] [hotfix][docs] Update KDA to MSF in vendor solutions docs (1.15) [flink]

2023-10-04 Thread via GitHub


dannycranmer merged PR #23487:
URL: https://github.com/apache/flink/pull/23487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33178) Highly parallel apps suffer from bottleneck in NativePRNG

2023-10-04 Thread Emre Kartoglu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771788#comment-17771788
 ] 

Emre Kartoglu commented on FLINK-33178:
---

[~martijnvisser] Thanks for the comment. You're right to point that out.

Just double-checked this, I can see the same code path to the same Netty call 
in the master branch: 
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/netty/PartitionRequestQueue.java#L320C21-L320C28]

Unless Netty has done something to improve it, I believe the bottleneck might 
still be there. Looking at the code 
[https://github.com/netty/netty/blob/d773f37e3422b8bc38429bbde94583173c3b7e4a/handler/src/main/java/io/netty/handler/ssl/SslHandler.java#L809]
 I can see the same call to SSLEngine. So I believe the same bottleneck might 
be there in the current master head.

It also depends on the JDK being used, so users could work around the issue by 
using an upgraded version etc. 

I am happy to report the issue. But please feel free to close the ticket if you 
think the issue is best addressed elsewhere, or if you think we should observe 
it in practice in newer Flink versions. Judging by the codepath, I believe we 
would see the issue in newer versions too.

> Highly parallel apps suffer from bottleneck in NativePRNG 
> --
>
> Key: FLINK-33178
> URL: https://issues.apache.org/jira/browse/FLINK-33178
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Affects Versions: 1.13.2
>Reporter: Emre Kartoglu
>Priority: Major
>
> I observed the below thread dumps that highlighted a potential bottleneck in 
> Flink/Netty/JDK. The application (Flink 1.13) from which I took the thread 
> dumps had very high parallelism and was distributed on nodes with >150GB 
> random access memory.
> It appears that there is a call to "Arrays.copyOfRange" in a syncrhonized 
> block in "sun.security.provider.NativePRNG", which blocks other threads 
> waiting for the lock to the same synchronized block. This appears to be a 
> problem only with highly parallel applications. I don't know exactly at what 
> parallelism it starts becoming a problem, and how much of a bottleneck it 
> actually is.
> I was also slightly hesitant about creating a Flink ticket as the improvement 
> could well be made in Netty or even JDK. But I believe we should have a 
> record of the issue in Flink Jira.
> Related: [https://bugs.openjdk.org/browse/JDK-8278371]
>  
>  
> {code:java}
> "Flink Netty Server (6121) Thread 43" #930 daemon prio=5 os_prio=0 
> cpu=2298176.43ms elapsed=44352.31s allocated=155G defined_classes=0 
> tid=0x7f0a3397f800 nid=0x519 waiting for monitor entry  
> [0x7efc5d549000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> sun.security.provider.NativePRNG$RandomIO.implNextBytes(java.base@11.0.18/NativePRNG.java:544)
>         - waiting to lock <0x7f0b62c2eee8> (a java.lang.Object)
>         at 
> sun.security.provider.NativePRNG.engineNextBytes(java.base@11.0.18/NativePRNG.java:220)
>         at 
> java.security.SecureRandom.nextBytes(java.base@11.0.18/SecureRandom.java:751)
>         at 
> sun.security.ssl.SSLCipher$T11BlockWriteCipherGenerator$BlockWriteCipher.encrypt(java.base@11.0.18/SSLCipher.java:1498)
>         at 
> sun.security.ssl.OutputRecord.t10Encrypt(java.base@11.0.18/OutputRecord.java:441)
>         at 
> sun.security.ssl.OutputRecord.encrypt(java.base@11.0.18/OutputRecord.java:345)
>         at 
> sun.security.ssl.SSLEngineOutputRecord.encode(java.base@11.0.18/SSLEngineOutputRecord.java:287)
>         at 
> sun.security.ssl.SSLEngineOutputRecord.encode(java.base@11.0.18/SSLEngineOutputRecord.java:189)
>         at 
> sun.security.ssl.SSLEngineImpl.encode(java.base@11.0.18/SSLEngineImpl.java:285)
>         at 
> sun.security.ssl.SSLEngineImpl.writeRecord(java.base@11.0.18/SSLEngineImpl.java:231)
>         at 
> sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.18/SSLEngineImpl.java:136)
>         - eliminated <0x7f0b6aab70c8> (a sun.security.ssl.SSLEngineImpl)
>         at 
> sun.security.ssl.SSLEngineImpl.wrap(java.base@11.0.18/SSLEngineImpl.java:116)
>         - locked <0x7f0b6aab70c8> (a sun.security.ssl.SSLEngineImpl)
>         at javax.net.ssl.SSLEngine.wrap(java.base@11.0.18/SSLEngine.java:522)
>         at 
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:1071)
>         at 
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrap(SslHandler.java:843)
>         at 
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.wrapAndFlush(SslHandler.java:811)
>         at 
> org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.flush(SslHandler.java:792)
>

Re: [PR] [hotfix][docs] Update KDA to MSF in vendor solutions docs (1.17) [flink]

2023-10-04 Thread via GitHub


dannycranmer merged PR #23486:
URL: https://github.com/apache/flink/pull/23486


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs] Update KDA to MSF in vendor solutions docs (1.16) [flink]

2023-10-04 Thread via GitHub


dannycranmer merged PR #23485:
URL: https://github.com/apache/flink/pull/23485


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs] Update KDA to MSF in vendor solutions docs (1.18) [flink]

2023-10-04 Thread via GitHub


dannycranmer merged PR #23484:
URL: https://github.com/apache/flink/pull/23484


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [hotfix][docs] Update KDA to MSF in vendor solutions docs [flink]

2023-10-04 Thread via GitHub


dannycranmer merged PR #23483:
URL: https://github.com/apache/flink/pull/23483


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33181) Table using `kinesis` connector can not be used for both read & write operations if it's defined with unsupported sink property

2023-10-04 Thread Khanh Vu (Jira)
Khanh Vu created FLINK-33181:


 Summary: Table using `kinesis` connector can not be used for both 
read & write operations if it's defined with unsupported sink property
 Key: FLINK-33181
 URL: https://issues.apache.org/jira/browse/FLINK-33181
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis, Table SQL / Runtime
Affects Versions: 1.15.4
Reporter: Khanh Vu


First, I define a table which uses `kinesis` connector with an unsupported 
property for sink, e.g. `scan.stream.initpos`:

```
%flink.ssql(type=update)

-- Create input

DROP TABLE IF EXISTS `kds_input`;
CREATE TABLE `kds_input` (
  `some_string` STRING,
  `some_int` BIGINT,
  `time` AS PROCTIME()
) WITH (
  'connector' = 'kinesis',
  'stream' = 'ExampleInputStream',
  'aws.region' = 'us-east-1',
  'scan.stream.initpos' = 'LATEST',
  'format' = 'csv'
);
```

I can read from my table (kds_input) without any issue, but it will throw 
exception if I try to write to the table:
```
%flink.ssql(type=update)
--  Use to generate data in the input table

DROP TABLE IF EXISTS connector_cve_datagen;
CREATE TABLE connector_cve_datagen(
  `some_string` STRING,
  `some_int` BIGINT
) WITH (
'connector' = 'datagen',
'rows-per-second' = '1',
'fields.some_string.length' = '2');
INSERT INTO kds_input SELECT some_string, some_int from connector_cve_datagen
```

Exception observed:
```
Caused by: org.apache.flink.table.api.ValidationException: Unsupported options 
found for 'kinesis'.

Unsupported options:

scan.stream.initpos

Supported options:

aws.region
connector
csv.allow-comments
csv.array-element-delimiter
csv.disable-quote-character
csv.escape-character
csv.field-delimiter
csv.ignore-parse-errors
csv.null-literal
csv.quote-character
format
property-version
sink.batch.max-size
sink.fail-on-error
sink.flush-buffer.size
sink.flush-buffer.timeout
sink.partitioner
sink.partitioner-field-delimiter
sink.producer.collection-max-count (deprecated)
sink.producer.collection-max-size (deprecated)
sink.producer.fail-on-error (deprecated)
sink.producer.record-max-buffered-time (deprecated)
sink.requests.max-buffered
sink.requests.max-inflight
stream
at 
org.apache.flink.table.factories.FactoryUtil.validateUnconsumedKeys(FactoryUtil.java:624)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validate(FactoryUtil.java:914)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validate(FactoryUtil.java:978)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validateExcept(FactoryUtil.java:938)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validateExcept(FactoryUtil.java:978)
at 
org.apache.flink.connector.kinesis.table.KinesisDynamicTableSinkFactory.createDynamicTableSink(KinesisDynamicTableSinkFactory.java:65)
at 
org.apache.flink.table.factories.FactoryUtil.createDynamicTableSink(FactoryUtil.java:259)
... 36 more
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33181) Table using `kinesis` connector can not be used for both read & write operations if it's defined with unsupported sink property

2023-10-04 Thread Khanh Vu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khanh Vu updated FLINK-33181:
-
Description: 
First, I define a table which uses `kinesis` connector with an unsupported 
property for sink, e.g. `scan.stream.initpos`:

 

```
%flink.ssql(type=update)

– Create input

DROP TABLE IF EXISTS `kds_input`;
CREATE TABLE `kds_input` (
`some_string` STRING,
`some_int` BIGINT,
`time` AS PROCTIME()
) WITH (
'connector' = 'kinesis',
'stream' = 'ExampleInputStream',
'aws.region' = 'us-east-1',
'scan.stream.initpos' = 'LATEST',
'format' = 'csv'
);
```

I can read from my table (kds_input) without any issue, but it will throw 
exception if I try to write to the table:
```
%flink.ssql(type=update)
– Use to generate data in the input table

DROP TABLE IF EXISTS connector_cve_datagen;
CREATE TABLE connector_cve_datagen(
`some_string` STRING,
`some_int` BIGINT
) WITH (
'connector' = 'datagen',
'rows-per-second' = '1',
'fields.some_string.length' = '2');
INSERT INTO kds_input SELECT some_string, some_int from connector_cve_datagen
```

Exception observed:
```
Caused by: org.apache.flink.table.api.ValidationException: Unsupported options 
found for 'kinesis'.

Unsupported options:

scan.stream.initpos

Supported options:

aws.region
connector
csv.allow-comments
csv.array-element-delimiter
csv.disable-quote-character
csv.escape-character
csv.field-delimiter
csv.ignore-parse-errors
csv.null-literal
csv.quote-character
format
property-version
sink.batch.max-size
sink.fail-on-error
sink.flush-buffer.size
sink.flush-buffer.timeout
sink.partitioner
sink.partitioner-field-delimiter
sink.producer.collection-max-count (deprecated)
sink.producer.collection-max-size (deprecated)
sink.producer.fail-on-error (deprecated)
sink.producer.record-max-buffered-time (deprecated)
sink.requests.max-buffered
sink.requests.max-inflight
stream
at 
org.apache.flink.table.factories.FactoryUtil.validateUnconsumedKeys(FactoryUtil.java:624)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validate(FactoryUtil.java:914)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validate(FactoryUtil.java:978)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validateExcept(FactoryUtil.java:938)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validateExcept(FactoryUtil.java:978)
at 
org.apache.flink.connector.kinesis.table.KinesisDynamicTableSinkFactory.createDynamicTableSink(KinesisDynamicTableSinkFactory.java:65)
at 
org.apache.flink.table.factories.FactoryUtil.createDynamicTableSink(FactoryUtil.java:259)
... 36 more
```

  was:
First, I define a table which uses `kinesis` connector with an unsupported 
property for sink, e.g. `scan.stream.initpos`:

```
%flink.ssql(type=update)

-- Create input

DROP TABLE IF EXISTS `kds_input`;
CREATE TABLE `kds_input` (
  `some_string` STRING,
  `some_int` BIGINT,
  `time` AS PROCTIME()
) WITH (
  'connector' = 'kinesis',
  'stream' = 'ExampleInputStream',
  'aws.region' = 'us-east-1',
  'scan.stream.initpos' = 'LATEST',
  'format' = 'csv'
);
```

I can read from my table (kds_input) without any issue, but it will throw 
exception if I try to write to the table:
```
%flink.ssql(type=update)
--  Use to generate data in the input table

DROP TABLE IF EXISTS connector_cve_datagen;
CREATE TABLE connector_cve_datagen(
  `some_string` STRING,
  `some_int` BIGINT
) WITH (
'connector' = 'datagen',
'rows-per-second' = '1',
'fields.some_string.length' = '2');
INSERT INTO kds_input SELECT some_string, some_int from connector_cve_datagen
```

Exception observed:
```
Caused by: org.apache.flink.table.api.ValidationException: Unsupported options 
found for 'kinesis'.

Unsupported options:

scan.stream.initpos

Supported options:

aws.region
connector
csv.allow-comments
csv.array-element-delimiter
csv.disable-quote-character
csv.escape-character
csv.field-delimiter
csv.ignore-parse-errors
csv.null-literal
csv.quote-character
format
property-version
sink.batch.max-size
sink.fail-on-error
sink.flush-buffer.size
sink.flush-buffer.timeout
sink.partitioner
sink.partitioner-field-delimiter
sink.producer.collection-max-count (deprecated)
sink.producer.collection-max-size (deprecated)
sink.producer.fail-on-error (deprecated)
sink.producer.record-max-buffered-time (deprecated)
sink.requests.max-buffered
sink.requests.max-inflight
stream
at 
org.apache.flink.table.factories.FactoryUtil.validateUnconsumedKeys(FactoryUtil.java:624)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validate(FactoryUtil.java:914)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validate(FactoryUtil.java:978)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validateExcept(FactoryUtil.java:938)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validateExcept(FactoryUtil.java:978)
at 
org.apac

Re: [PR] [FLINK-33179] Throw exception when serialising or deserialising ExecNode with invalid type [flink]

2023-10-04 Thread via GitHub


dawidwys commented on code in PR #23488:
URL: https://github.com/apache/flink/pull/23488#discussion_r1345519922


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/ExecNodeContext.java:
##
@@ -104,9 +104,16 @@ private ExecNodeContext(@Nullable Integer id, String name, 
Integer version) {
 @JsonCreator
 public ExecNodeContext(String value) {
 this.id = null;
-String[] split = value.split("_");
-this.name = split[0];
-this.version = Integer.valueOf(split[1]);
+try {
+String[] split = value.split("_");
+this.name = split[0];
+this.version = Integer.valueOf(split[1]);
+} catch (Exception e) {
+throw new TableException(

Review Comment:
   Initially I also thought of just null checking. I went for `try catch` 
instead of a null check, because theoretically there might be a few reasons for 
an exception:
   1. Error while splitting.. we don't have two parts
   2. Error while parsing the Integer, the second part is not a number
   3. Error while parsing the Integer, the part is a null
   
   I understand 1 & 2 are unlikely atm, but I guess we thought the same about 
the `null_null` situation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33181) Table using `kinesis` connector can not be used for both read & write operations if it's defined with unsupported sink property

2023-10-04 Thread Khanh Vu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khanh Vu updated FLINK-33181:
-
Description: 
First, I define a table which uses `kinesis` connector with an unsupported 
property for sink, e.g. `scan.stream.initpos`:
{code:sql}
%flink.ssql(type=update)
– Create input
DROP TABLE IF EXISTS `kds_input`;
CREATE TABLE `kds_input` (
`some_string` STRING,
`some_int` BIGINT,
`time` AS PROCTIME()
) WITH (
'connector' = 'kinesis',
'stream' = 'ExampleInputStream',
'aws.region' = 'us-east-1',
'scan.stream.initpos' = 'LATEST',
'format' = 'csv'
);
{code}

I can read from my table (kds_input) without any issue, but it will throw 
exception if I try to write to the table:

{code:sql}
%flink.ssql(type=update)
– Use to generate data in the input table

DROP TABLE IF EXISTS connector_cve_datagen;
CREATE TABLE connector_cve_datagen(
`some_string` STRING,
`some_int` BIGINT
) WITH (
'connector' = 'datagen',
'rows-per-second' = '1',
'fields.some_string.length' = '2');
INSERT INTO kds_input SELECT some_string, some_int from connector_cve_datagen
{code}

Exception observed:
{code:java}
Caused by: org.apache.flink.table.api.ValidationException: Unsupported options 
found for 'kinesis'.

Unsupported options:

scan.stream.initpos

Supported options:

aws.region
connector
csv.allow-comments
csv.array-element-delimiter
csv.disable-quote-character
csv.escape-character
csv.field-delimiter
csv.ignore-parse-errors
csv.null-literal
csv.quote-character
format
property-version
sink.batch.max-size
sink.fail-on-error
sink.flush-buffer.size
sink.flush-buffer.timeout
sink.partitioner
sink.partitioner-field-delimiter
sink.producer.collection-max-count (deprecated)
sink.producer.collection-max-size (deprecated)
sink.producer.fail-on-error (deprecated)
sink.producer.record-max-buffered-time (deprecated)
sink.requests.max-buffered
sink.requests.max-inflight
stream
at 
org.apache.flink.table.factories.FactoryUtil.validateUnconsumedKeys(FactoryUtil.java:624)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validate(FactoryUtil.java:914)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validate(FactoryUtil.java:978)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validateExcept(FactoryUtil.java:938)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validateExcept(FactoryUtil.java:978)
at 
org.apache.flink.connector.kinesis.table.KinesisDynamicTableSinkFactory.createDynamicTableSink(KinesisDynamicTableSinkFactory.java:65)
at 
org.apache.flink.table.factories.FactoryUtil.createDynamicTableSink(FactoryUtil.java:259)
... 36 more
{code}

  was:
First, I define a table which uses `kinesis` connector with an unsupported 
property for sink, e.g. `scan.stream.initpos`:

 

```
%flink.ssql(type=update)

– Create input

DROP TABLE IF EXISTS `kds_input`;
CREATE TABLE `kds_input` (
`some_string` STRING,
`some_int` BIGINT,
`time` AS PROCTIME()
) WITH (
'connector' = 'kinesis',
'stream' = 'ExampleInputStream',
'aws.region' = 'us-east-1',
'scan.stream.initpos' = 'LATEST',
'format' = 'csv'
);
```

I can read from my table (kds_input) without any issue, but it will throw 
exception if I try to write to the table:
```
%flink.ssql(type=update)
– Use to generate data in the input table

DROP TABLE IF EXISTS connector_cve_datagen;
CREATE TABLE connector_cve_datagen(
`some_string` STRING,
`some_int` BIGINT
) WITH (
'connector' = 'datagen',
'rows-per-second' = '1',
'fields.some_string.length' = '2');
INSERT INTO kds_input SELECT some_string, some_int from connector_cve_datagen
```

Exception observed:
```
Caused by: org.apache.flink.table.api.ValidationException: Unsupported options 
found for 'kinesis'.

Unsupported options:

scan.stream.initpos

Supported options:

aws.region
connector
csv.allow-comments
csv.array-element-delimiter
csv.disable-quote-character
csv.escape-character
csv.field-delimiter
csv.ignore-parse-errors
csv.null-literal
csv.quote-character
format
property-version
sink.batch.max-size
sink.fail-on-error
sink.flush-buffer.size
sink.flush-buffer.timeout
sink.partitioner
sink.partitioner-field-delimiter
sink.producer.collection-max-count (deprecated)
sink.producer.collection-max-size (deprecated)
sink.producer.fail-on-error (deprecated)
sink.producer.record-max-buffered-time (deprecated)
sink.requests.max-buffered
sink.requests.max-inflight
stream
at 
org.apache.flink.table.factories.FactoryUtil.validateUnconsumedKeys(FactoryUtil.java:624)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validate(FactoryUtil.java:914)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validate(FactoryUtil.java:978)
at 
org.apache.flink.table.factories.FactoryUtil$FactoryHelper.validateExcept(FactoryUtil.java:938)
at 
org.apache.flink.table.factories.FactoryUtil$TableFactoryHelper.validateExcept(FactoryUtil.java:978)
at 
org.apache.flink.connector.kinesis.table.KinesisDy

[jira] [Commented] (FLINK-30649) Shutting down MiniCluster times out

2023-10-04 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771793#comment-17771793
 ] 

Matthias Pohl commented on FLINK-30649:
---

Yeah, essentially, it's hard to investigate if it's not reproducible. That's 
why I was curious whether you have an idea what's going on. It could be also a 
bug in Minikube which could be resolved by upgrading to a newer version.

> Shutting down MiniCluster times out
> ---
>
> Key: FLINK-30649
> URL: https://issues.apache.org/jira/browse/FLINK-30649
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Test Infrastructure
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: stale-assigned, starter, test-stability
>
> {{Run kubernetes session test (default input)}} failed with a timeout.
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44748&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=6317]
> It appears that there was some issue with shutting down the pods of the 
> MiniCluster:
> {code:java}
> 2023-01-12T08:22:13.1388597Z timed out waiting for the condition on 
> pods/flink-native-k8s-session-1-7dc9976688-gq788
> 2023-01-12T08:22:13.1390040Z timed out waiting for the condition on 
> pods/flink-native-k8s-session-1-taskmanager-1-1 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-30883) Missing JobID caused the k8s e2e test to fail

2023-10-04 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683432#comment-17683432
 ] 

Matthias Pohl edited comment on FLINK-30883 at 10/4/23 10:01 AM:
-

I extracted the available logs into dedicated files. It appears that the 
jobmanager restarted at least once. The strange thing is that even in 
{{jobmanager.0.log}} it states that the job was recovered.

Source: jobmanager.0.log
{code:java}
Feb 01 15:03:03 2023-02-01 14:55:28,715 INFO  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
Job 68e961ca was recovered successfully.
{code}
It looks like there was more than one JobManager restart happening with the 
expected log line "Job \{jobId} is submitted" probably being located in the 
logs of the missing JobManager run. But I struggle to find evidence for this 
theory: No additional logs are provided.


was (Author: mapohl):
I extracted the available logs into dedicated files. It appears that the 
jobmanager restarted at least once. The strange thing is that even in 
{{jobmanager.0.log}} it states that the job was recovered.

Source: jobmanager.0.log
{code}
Feb 01 15:03:03 2023-02-01 14:55:28,715 INFO  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
Job 68e961ca was recovered successfully.
{code}

 It looks like there was more than one JobManager restart happening with the 
expected log line "Job {jobId} is submitted" probably being located in the logs 
of the missing JobManager run. But I struggle to find evidence for this theory: 
No additional logs are provided.

> Missing JobID caused the k8s e2e test to fail
> -
>
> Key: FLINK-30883
> URL: https://issues.apache.org/jira/browse/FLINK-30883
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Runtime / Coordination
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
> Attachments: e2e_test_failure.log, 
> flink-vsts-client-fv-az378-840.log, jobmanager.0.log, jobmanager.1.log, 
> taskmanager.log
>
>
> We've experienced a test failure in {{Run kubernetes application HA test}} 
> due to a {{CliArgsException}}:
> {code}
> Feb 01 15:03:15 org.apache.flink.client.cli.CliArgsException: Missing JobID. 
> Specify a JobID to cancel a job.
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:689) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1107) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>  ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45569&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=9866



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30883) Missing JobID caused the k8s e2e test to fail

2023-10-04 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771796#comment-17771796
 ] 

Matthias Pohl commented on FLINK-30883:
---

Hi [~Fei Feng] , could you provide logs of your scenario? Is it reproducible?

> Missing JobID caused the k8s e2e test to fail
> -
>
> Key: FLINK-30883
> URL: https://issues.apache.org/jira/browse/FLINK-30883
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes, Runtime / Coordination
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
> Attachments: e2e_test_failure.log, 
> flink-vsts-client-fv-az378-840.log, jobmanager.0.log, jobmanager.1.log, 
> taskmanager.log
>
>
> We've experienced a test failure in {{Run kubernetes application HA test}} 
> due to a {{CliArgsException}}:
> {code}
> Feb 01 15:03:15 org.apache.flink.client.cli.CliArgsException: Missing JobID. 
> Specify a JobID to cancel a job.
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:689) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1107) 
> ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
>  ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>  [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Feb 01 15:03:15   at 
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157) 
> [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=45569&view=logs&j=bea52777-eaf8-5663-8482-18fbc3630e81&s=ae4f8708-9994-57d3-c2d7-b892156e7812&t=b2642e3a-5b86-574d-4c8a-f7e2842bfb14&l=9866



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33000) SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using a ThreadFactory

2023-10-04 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771803#comment-17771803
 ] 

Matthias Pohl commented on FLINK-33000:
---

[~fsk119] are you taking care of the backports as well?

> SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using 
> a ThreadFactory
> -
>
> Key: FLINK-33000
> URL: https://issues.apache.org/jira/browse/FLINK-33000
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Gateway, Tests
>Affects Versions: 1.16.2, 1.18.0, 1.17.1, 1.19.0
>Reporter: Matthias Pohl
>Assignee: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available, starter
>
> {{SqlGatewayServiceITCase}} uses a {{ExecutorThreadFactory}} for its 
> asynchronous operations. Instead, one should use {{TestExecutorExtension}} to 
> ensure proper cleanup of threads.
> We might also want to remove the {{AbstractTestBase}} parent class because 
> that uses JUnit4 whereas {{SqlGatewayServiceITCase}} is already based on 
> JUnit5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-20625][pubsub,e2e] Add PubSubSource connector using FLIP-27 [flink-connector-gcp-pubsub]

2023-10-04 Thread via GitHub


XComp commented on PR #2:
URL: 
https://github.com/apache/flink-connector-gcp-pubsub/pull/2#issuecomment-1746591605

   @dchristle how about you? ;-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33179] Throw exception when serialising or deserialising ExecNode with invalid type [flink]

2023-10-04 Thread via GitHub


twalthr commented on code in PR #23488:
URL: https://github.com/apache/flink/pull/23488#discussion_r1345578855


##
flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/nodes/exec/ExecNodeContext.java:
##
@@ -104,9 +104,16 @@ private ExecNodeContext(@Nullable Integer id, String name, 
Integer version) {
 @JsonCreator
 public ExecNodeContext(String value) {
 this.id = null;
-String[] split = value.split("_");
-this.name = split[0];
-this.version = Integer.valueOf(split[1]);
+try {
+String[] split = value.split("_");
+this.name = split[0];
+this.version = Integer.valueOf(split[1]);
+} catch (Exception e) {
+throw new TableException(

Review Comment:
   The null_null situation was a bug in the serialization method. During JSON 
deserialization so many things can go wrong if users modify the JSON plan 
themselves. I vote for only implement what we need for this particular use 
case. Also adding a comment there, why we need to to support plans older than 
1.18.1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33116) CliClientTest.testCancelExecutionInteractiveMode fails with NPE on AZP

2023-10-04 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-33116:
--
Description: 
This build 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53309&view=logs&j=ce3801ad-3bd5-5f06-d165-34d37e757d90&t=5e4d9387-1dcc-5885-a901-90469b7e6d2f&l=12264]

fails as
{noformat}
Sep 18 02:26:15 02:26:15.743 [ERROR] 
org.apache.flink.table.client.cli.CliClientTest.testCancelExecutionInteractiveMode
  Time elapsed: 0.1 s  <<< ERROR!
Sep 18 02:26:15 java.lang.NullPointerException
Sep 18 02:26:15 at 
org.apache.flink.table.client.cli.CliClient.closeTerminal(CliClient.java:284)
Sep 18 02:26:15 at 
org.apache.flink.table.client.cli.CliClient.close(CliClient.java:108)
Sep 18 02:26:15 at 
org.apache.flink.table.client.cli.CliClientTest.testCancelExecutionInteractiveMode(CliClientTest.java:314)
Sep 18 02:26:15 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
{noformat}

  was:
This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53309&view=logs&j=ce3801ad-3bd5-5f06-d165-34d37e757d90&t=5e4d9387-1dcc-5885-a901-90469b7e6d2f&l=12264

fails as
{noformat}

Sep 18 02:26:15 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
Sep 18 02:26:15 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
Sep 18 02:26:15 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
Sep 18 02:26:15 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
Sep 18 02:26:15 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
Sep 18 02:26:15 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)

...
{noformat}


> CliClientTest.testCancelExecutionInteractiveMode fails with NPE on AZP
> --
>
> Key: FLINK-33116
> URL: https://issues.apache.org/jira/browse/FLINK-33116
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: stale-critical, test-stability
>
> This build 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53309&view=logs&j=ce3801ad-3bd5-5f06-d165-34d37e757d90&t=5e4d9387-1dcc-5885-a901-90469b7e6d2f&l=12264]
> fails as
> {noformat}
> Sep 18 02:26:15 02:26:15.743 [ERROR] 
> org.apache.flink.table.client.cli.CliClientTest.testCancelExecutionInteractiveMode
>   Time elapsed: 0.1 s  <<< ERROR!
> Sep 18 02:26:15 java.lang.NullPointerException
> Sep 18 02:26:15   at 
> org.apache.flink.table.client.cli.CliClient.closeTerminal(CliClient.java:284)
> Sep 18 02:26:15   at 
> org.apache.flink.table.client.cli.CliClient.close(CliClient.java:108)
> Sep 18 02:26:15   at 
> org.apache.flink.table.client.cli.CliClientTest.testCancelExecutionInteractiveMode(CliClientTest.java:314)
> Sep 18 02:26:15   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33116) CliClientTest.testCancelExecutionInteractiveMode fails with NPE on AZP

2023-10-04 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771813#comment-17771813
 ] 

Matthias Pohl commented on FLINK-33116:
---

Looks like the {{CliClient}} is not thread-safe: The close call fails with a 
{{NullPointerException}} because {{CliClient.close()}} calls 
{{closeTerminal()}} if a terminal set. But the test finishes the 
{{client.executeInInteractiveMode()}} in a separate thread which calls 
{{closeTerminal()}} in the end as well. [~fsk119] can you delegate this issue?

> CliClientTest.testCancelExecutionInteractiveMode fails with NPE on AZP
> --
>
> Key: FLINK-33116
> URL: https://issues.apache.org/jira/browse/FLINK-33116
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: stale-critical, test-stability
>
> This build 
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53309&view=logs&j=ce3801ad-3bd5-5f06-d165-34d37e757d90&t=5e4d9387-1dcc-5885-a901-90469b7e6d2f&l=12264]
> fails as
> {noformat}
> Sep 18 02:26:15 02:26:15.743 [ERROR] 
> org.apache.flink.table.client.cli.CliClientTest.testCancelExecutionInteractiveMode
>   Time elapsed: 0.1 s  <<< ERROR!
> Sep 18 02:26:15 java.lang.NullPointerException
> Sep 18 02:26:15   at 
> org.apache.flink.table.client.cli.CliClient.closeTerminal(CliClient.java:284)
> Sep 18 02:26:15   at 
> org.apache.flink.table.client.cli.CliClient.close(CliClient.java:108)
> Sep 18 02:26:15   at 
> org.apache.flink.table.client.cli.CliClientTest.testCancelExecutionInteractiveMode(CliClientTest.java:314)
> Sep 18 02:26:15   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-32108][test] KubernetesExtension calls assumeThat in @BeforeAll callback which doesn't print the actual failure message [flink]

2023-10-04 Thread via GitHub


XComp commented on PR #23472:
URL: https://github.com/apache/flink/pull/23472#issuecomment-1746649051

   How is this different to the solution before? :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-32582) Move TypeSerializerUpgradeTestBase from Kafka connector into flink-connector-common

2023-10-04 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-32582:
--
Description: 
The externalization of connectors caused problems with the Flink's test data 
generation. The Kafka connector relied on TypeSerializerUpgradeTestBase for 
some test cases which was fine prior to FLINK-27518 where the test data 
generation was handled individually.

With FLINK-27518 the process was automated in Flink 1.18. For now, the 
TypeSerializerUpgradeTestBase class was just copied over into the Kafka 
connector since it was the only connector that would utilize this test base.

But we might want to provide a more generalized solution where the test base is 
provided by {{flink-connector-common}} to offer a generalized approach for any 
connector.

  was:
The externalization of connectors made caused problems with the Flink's test 
data generation. The Kafka connector relied on TypeSerializerUpgradeTestBase 
for some test cases which was fine prior to FLINK-27518 where the test data 
generation was handled individually.

With FLINK-27518 the process was automated in Flink 1.18. For now, the 
TypeSerializerUpgradeTestBase class was just copied over into the Kafka 
connector since it was the only connector that would utilize this test base.

But we might want to provide a more generalized solution where the test base is 
provided by {{flink-connector-common}} to offer a generalized approach for any 
connector.


> Move TypeSerializerUpgradeTestBase from Kafka connector into 
> flink-connector-common
> ---
>
> Key: FLINK-32582
> URL: https://issues.apache.org/jira/browse/FLINK-32582
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Matthias Pohl
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> The externalization of connectors caused problems with the Flink's test data 
> generation. The Kafka connector relied on TypeSerializerUpgradeTestBase for 
> some test cases which was fine prior to FLINK-27518 where the test data 
> generation was handled individually.
> With FLINK-27518 the process was automated in Flink 1.18. For now, the 
> TypeSerializerUpgradeTestBase class was just copied over into the Kafka 
> connector since it was the only connector that would utilize this test base.
> But we might want to provide a more generalized solution where the test base 
> is provided by {{flink-connector-common}} to offer a generalized approach for 
> any connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33176][Connectors/Kinesis] Handle null value in RowDataKinesisDeserializationSchema [flink-connector-aws]

2023-10-04 Thread via GitHub


dannycranmer commented on code in PR #102:
URL: 
https://github.com/apache/flink-connector-aws/pull/102#discussion_r1345737928


##
flink-connector-aws/flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/table/RowDataKinesisDeserializationSchemaTest.java:
##
@@ -0,0 +1,90 @@
+package org.apache.flink.streaming.connectors.kinesis.table;

Review Comment:
   Missing copyright header



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32445][runtime] Refactor BlobStoreService's closeAndCleanupAllData to cleanupAllData [flink]

2023-10-04 Thread via GitHub


Jiabao-Sun commented on code in PR #23424:
URL: https://github.com/apache/flink/pull/23424#discussion_r1345742478


##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/AbstractHaServices.java:
##
@@ -192,32 +192,18 @@ public void closeAndCleanupAllData() throws Exception {
 try {
 internalCleanup();
 deletedHAData = true;
+blobStoreService.cleanupAllData();
 } catch (Exception t) {
 exception = t;
 }
 
-try {
-if (leaderElectionService != null) {
-leaderElectionService.close();
-}
-} catch (Throwable t) {
-exception = ExceptionUtils.firstOrSuppressed(t, exception);
+if (!deletedHAData) {
+logger.info(
+"Cannot delete HA blobs because we failed to delete the 
pointers in the HA store.");
 }
 
 try {
-internalClose();
-} catch (Throwable t) {
-exception = ExceptionUtils.firstOrSuppressed(t, exception);
-}
-
-try {
-if (deletedHAData) {
-blobStoreService.closeAndCleanupAllData();
-} else {
-logger.info(
-"Cannot delete HA blobs because we failed to delete 
the pointers in the HA store.");
-blobStoreService.close();
-}
+close();

Review Comment:
   Thanks @XComp. It make sense to me. 
   Could you help review it again when you have time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33176][Connectors/Kinesis] Handle null value in RowDataKinesisDeserializationSchema [flink-connector-aws]

2023-10-04 Thread via GitHub


z3d1k commented on code in PR #102:
URL: 
https://github.com/apache/flink-connector-aws/pull/102#discussion_r1345798939


##
flink-connector-aws/flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/table/RowDataKinesisDeserializationSchemaTest.java:
##
@@ -0,0 +1,90 @@
+package org.apache.flink.streaming.connectors.kinesis.table;

Review Comment:
   Added missing header



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33149] Bump snappy to 1.1.10.4 [flink-statefun]

2023-10-04 Thread via GitHub


davidradl commented on code in PR #341:
URL: https://github.com/apache/flink-statefun/pull/341#discussion_r1345814020


##
statefun-kafka-io/pom.xml:
##
@@ -43,9 +43,7 @@ under the License.
 ${kafka.version}
 
 
+org.apache.flink:flink-streaming-java -->

Review Comment:
   It would be good to leave a comment here. As far as I can see Flink 1.16.2 
has snappy-java 1.1.8.3 which is vulnerable - so you want to exclude it here. 
But 1.17 Flink and above uses snappy-java 1.1.10.4. So this is a point in time 
change, because of your dependancy on the back level Flink. I assume we would 
want to move to a provided dependancy when we depend on a Flink 1.17 or above. 
Have I understood this correctly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.4 in /statefun-flink [flink-statefun]

2023-10-04 Thread via GitHub


davidradl commented on PR #340:
URL: https://github.com/apache/flink-statefun/pull/340#issuecomment-1746899156

   See comment in https://github.com/apache/flink-statefun/pull/341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Max Retries [flink-statefun]

2023-10-04 Thread via GitHub


davidradl commented on PR #337:
URL: https://github.com/apache/flink-statefun/pull/337#issuecomment-1746901840

   Is there a Jira associated with this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31049] [flink-connector-kafka]Add support for Kafka record headers to KafkaSink [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #8:
URL: https://github.com/apache/flink/pull/8#issuecomment-1746926177

   @tzulitai @MartijnVisser I see https://github.com/apache/flink/pull/22797 , 
this pr is no longer relevant to this repository, I suggest closing this pr and 
porting the change to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-28185][Connector/Kafka] Handle missing timestamps when there a… [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #22416:
URL: https://github.com/apache/flink/pull/22416#issuecomment-1746924764

   @junjiem see https://github.com/apache/flink/pull/22797 , this pr is no 
longer relevant to this repository, I suggest closing this pr and porting the 
change to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31262][kafka] Move kafka sql connector fat jar test to SmokeKafkaITCase [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #22060:
URL: https://github.com/apache/flink/pull/22060#issuecomment-1746927104

I see https://github.com/apache/flink/pull/22797 , this pr is no longer 
relevant to this repository, I suggest closing this pr and porting the change 
to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31006][connector/kafka] Fix noMoreNewPartitionSplits is not se… [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #21909:
URL: https://github.com/apache/flink/pull/21909#issuecomment-1746928308

   @liuyongvs  I see https://github.com/apache/flink/pull/22797 , this pr is no 
longer relevant to this repository, I suggest closing this pr and porting the 
change to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-29492] Return Kafka producer to the pool when the Kafka sink is not the end of the chain [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #21226:
URL: https://github.com/apache/flink/pull/21226#issuecomment-1746929409

   @ruanhang1993  I see https://github.com/apache/flink/pull/22797 , this pr is 
no longer relevant to this repository, I suggest closing this pr and porting 
the change to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-29492) Kafka exactly-once sink causes OutOfMemoryError

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-29492:
---
Labels: pull-request-available  (was: )

> Kafka exactly-once sink causes OutOfMemoryError
> ---
>
> Key: FLINK-29492
> URL: https://issues.apache.org/jira/browse/FLINK-29492
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.15.3
>Reporter: Robert Metzger
>Assignee: Hang Ruan
>Priority: Critical
>  Labels: pull-request-available
>
> My Kafka exactly-once sinks are periodically failing with a 
> {{OutOfMemoryError: Java heap space}}.
> This looks very similar to FLINK-28250. But I am running 1.15.2, which 
> contains a fix for FLINK-28250.
> Exception:
> {code:java}
> java.io.IOException: Could not perform checkpoint 2281 for operator 
> http_events[3]: Writer (1/1)#1.
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1210)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:147)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:287)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:64)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:493)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.triggerGlobalCheckpoint(AbstractAlignedBarrierHandlerState.java:74)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:66)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.lambda$processBarrier$2(SingleCheckpointBarrierHandler.java:234)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.markCheckpointAlignedAndTransformState(SingleCheckpointBarrierHandler.java:262)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:231)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:181)
>   at 
> org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:159)
>   at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
>   at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:519)
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:804)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:753)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:948)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:741)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:563)
>   at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: Could not 
> complete snapshot 2281 for operator http_events[3]: Writer (1/1)#1. Failure 
> reason: Checkpoint was declined.
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:269)
>   at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:173)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:348)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.checkpointStreamOperator(RegularOperatorChain.java:227)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.buildOperatorSnapshotFutures(RegularOperatorChain.java:212)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:192)
>   at 
> org.apache.flink.streaming.runtime.tasks.Sub

Re: [PR] [FLINK-25538][flink-connector-kafka] JUnit5 Migration [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #20991:
URL: https://github.com/apache/flink/pull/20991#issuecomment-1746931693

   @snuyanzin  I see https://github.com/apache/flink/pull/22797 , this pr is no 
longer relevant to this repository, I suggest closing this pr and porting the 
change to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-13226] [connectors / kafka] Fix race condition between transaction commit and produc… [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #9224:
URL: https://github.com/apache/flink/pull/9224#issuecomment-1746934218

   @pnowojski  I see https://github.com/apache/flink/pull/22797 , this pr is no 
longer relevant to this repository, I suggest closing this pr and porting the 
change to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-26614]Upsert Kafka SQL Connector:Support startup mode of timestamp and specific offsets [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #19067:
URL: https://github.com/apache/flink/pull/19067#issuecomment-1746937641

   @slinkydeveloper  I see https://github.com/apache/flink/pull/22797 , this pr 
is no longer relevant to this repository, I suggest closing this pr and porting 
the change to the [new Flink connector kafka repo]( 
https://github.com/apache/flink-connector-kafka) if it is still relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31049] [flink-connector-kafka]Add support for Kafka record headers to KafkaSink [flink]

2023-10-04 Thread via GitHub


AlexAxeman commented on PR #8:
URL: https://github.com/apache/flink/pull/8#issuecomment-1746945765

   It was implemented in the new repo in [this 
PR](https://github.com/apache/flink-connector-kafka/pull/18).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-31049] [flink-connector-kafka]Add support for Kafka record headers to KafkaSink [flink]

2023-10-04 Thread via GitHub


AlexAxeman closed pull request #8: [FLINK-31049] [flink-connector-kafka]Add 
support for Kafka record headers to KafkaSink
URL: https://github.com/apache/flink/pull/8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-13226] [connectors / kafka] Fix race condition between transaction commit and produc… [flink]

2023-10-04 Thread via GitHub


flinkbot commented on PR #9224:
URL: https://github.com/apache/flink/pull/9224#issuecomment-1746946522

   
   ## CI report:
   
   * 838d369f5c230c4debd3eac7a28bd2bc75711a01 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-33182) Allow metadata columns in NduAnalyzer with ChangelogNormalize

2023-10-04 Thread Timo Walther (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771868#comment-17771868
 ] 

Timo Walther commented on FLINK-33182:
--

[~lincoln.86xy] do you agree with this assumption?

> Allow metadata columns in NduAnalyzer with ChangelogNormalize
> -
>
> Key: FLINK-33182
> URL: https://issues.apache.org/jira/browse/FLINK-33182
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Timo Walther
>Priority: Major
>
> Currently, the NduAnalyzer is very strict about metadata columns in updating 
> sources. However, for upsert sources (like Kafka) that contain an incomplete 
> changelog, the planner always adds a ChangelogNormalize node. 
> ChangelogNormalize will make sure that metadata columns can be considered 
> deterministic. So the NduAnalyzer should be satisfied in this case. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33182) Allow metadata columns in NduAnalyzer with ChangelogNormalize

2023-10-04 Thread Timo Walther (Jira)
Timo Walther created FLINK-33182:


 Summary: Allow metadata columns in NduAnalyzer with 
ChangelogNormalize
 Key: FLINK-33182
 URL: https://issues.apache.org/jira/browse/FLINK-33182
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


Currently, the NduAnalyzer is very strict about metadata columns in updating 
sources. However, for upsert sources (like Kafka) that contain an incomplete 
changelog, the planner always adds a ChangelogNormalize node. 
ChangelogNormalize will make sure that metadata columns can be considered 
deterministic. So the NduAnalyzer should be satisfied in this case. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-32445][runtime] Refactor BlobStoreService's closeAndCleanupAllData to cleanupAllData [flink]

2023-10-04 Thread via GitHub


XComp commented on code in PR #23424:
URL: https://github.com/apache/flink/pull/23424#discussion_r1345865694


##
flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/ClusterEntrypoint.java:
##
@@ -498,12 +498,15 @@ protected CompletableFuture 
stopClusterServices(boolean cleanupHaData) {
 }
 
 if (haServices != null) {
-try {
-if (cleanupHaData) {
-haServices.closeAndCleanupAllData();
-} else {
-haServices.close();
+if (cleanupHaData) {

Review Comment:
   Initially, I thought that it would be nicer code afterwards by just removing 
the if/else. But you're right with having separate try/catch blocks as well 
(which unfortunately, makes the code less readable).
   We could introduce a default implementation 
{{HighAvailabilityServices#closeWithOptionalClean(boolean)}} that does this 
optional cleanup while closing the services. That would remove the redundant 
code from {{MiniCluster}} and {{ClusterEntrypoint}}.



##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/HighAvailabilityServices.java:
##
@@ -212,19 +212,17 @@ default LeaderRetrievalService 
getClusterRestEndpointLeaderRetriever() {
 void close() throws Exception;
 
 /**
- * Closes the high availability services (releasing all resources) and 
deletes all data stored
- * by these services in external stores.
+ * Deletes all data stored by high availability services in external 
stores.
  *
  * After this method was called, the any job or session that was 
managed by these high

Review Comment:
   ```suggestion
* After this method was called, any job or session that was managed 
by these high
   ```
   nit



##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/HighAvailabilityServices.java:
##
@@ -212,19 +212,17 @@ default LeaderRetrievalService 
getClusterRestEndpointLeaderRetriever() {
 void close() throws Exception;

Review Comment:
   Keep the method to keep the JavaDoc. But we could make 
{{HighAvailabilityServices}} extend {{AutoCloseable}}.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32445][runtime] Refactor BlobStoreService's closeAndCleanupAllData to cleanupAllData [flink]

2023-10-04 Thread via GitHub


XComp commented on code in PR #23424:
URL: https://github.com/apache/flink/pull/23424#discussion_r1345868304


##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/HighAvailabilityServices.java:
##
@@ -212,19 +212,17 @@ default LeaderRetrievalService 
getClusterRestEndpointLeaderRetriever() {
 void close() throws Exception;

Review Comment:
   Keep the method to keep the JavaDoc. But we could make 
{{HighAvailabilityServices}} extend {{AutoCloseable}}.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33183) Enable metadata columns in NduAnalyzer with retract if non-virtual

2023-10-04 Thread Timo Walther (Jira)
Timo Walther created FLINK-33183:


 Summary: Enable metadata columns in NduAnalyzer with retract if 
non-virtual
 Key: FLINK-33183
 URL: https://issues.apache.org/jira/browse/FLINK-33183
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Timo Walther


Currently, the NduAnalyzer is very strict about metadata columns in updating 
sources. Compared to append and upsert sources (see also FLINK-33182), retract 
sources are tricky. And the analyzer is actually correct.

However, for retract sources we should expose more functionality to the user 
and add a warning to the documentation that retract mode could potentially 
cause NDU problems if not enough attention is paid. We should only throw an 
error on virtual metadata columns. Persisted metadata columns can be considered 
“safe“. When a metadata column is persisted, we can assume that an upstream 
Flink job fills its content thus likely also fills its correct retraction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33183) Enable metadata columns in NduAnalyzer with retract if non-virtual

2023-10-04 Thread Timo Walther (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771876#comment-17771876
 ] 

Timo Walther commented on FLINK-33183:
--

[~lincoln.86xy] what is you opinion here? This might be more controversial than 
FLINK-33182. But I think that the behavior actually depends on VIRTUAL vs 
non-VIRTUAL in order to cause issues.

> Enable metadata columns in NduAnalyzer with retract if non-virtual
> --
>
> Key: FLINK-33183
> URL: https://issues.apache.org/jira/browse/FLINK-33183
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Timo Walther
>Priority: Major
>
> Currently, the NduAnalyzer is very strict about metadata columns in updating 
> sources. Compared to append and upsert sources (see also FLINK-33182), 
> retract sources are tricky. And the analyzer is actually correct.
> However, for retract sources we should expose more functionality to the user 
> and add a warning to the documentation that retract mode could potentially 
> cause NDU problems if not enough attention is paid. We should only throw an 
> error on virtual metadata columns. Persisted metadata columns can be 
> considered “safe“. When a metadata column is persisted, we can assume that an 
> upstream Flink job fills its content thus likely also fills its correct 
> retraction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-30831) Improving e2e test output in case errors/exceptions are found

2023-10-04 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-30831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl reassigned FLINK-30831:
-

Assignee: Flaviu Cicio

> Improving e2e test output in case errors/exceptions are found
> -
>
> Key: FLINK-30831
> URL: https://issues.apache.org/jira/browse/FLINK-30831
> Project: Flink
>  Issue Type: Improvement
>  Components: Test Infrastructure
>Affects Versions: 1.17.0, 1.15.3, 1.16.1
>Reporter: Matthias Pohl
>Assignee: Flaviu Cicio
>Priority: Minor
>  Labels: auto-deprioritized-major, starter
>
> Some e2e tests parse the Flink logs for exceptions using {{grep}}. We then 
> print the first 500 lines of the each log file in case an exception that 
> shouldn't be ignored is found (see 
> [internal_check_logs_for_exceptions|https://github.com/apache/flink/blob/c9e87fe410c42f7e7c19c81456d4212a58564f5e/flink-end-to-end-tests/test-scripts/common.sh#L449]
>  or 
> [check_logs_for_errors|https://github.com/apache/flink/blob/c9e87fe410c42f7e7c19c81456d4212a58564f5e/flink-end-to-end-tests/test-scripts/common.sh#L387]).
>  Instead, we could use {{grep -C200}} to actually print the context of the 
> exception.
> This would help especially in those situations where the exception doesn't 
> appear in the first 500 lines.
> This issue does not necessarily only include the two aforementioned code 
> locations. One should check the scripts for other code with a similar issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30831) Improving e2e test output in case errors/exceptions are found

2023-10-04 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771882#comment-17771882
 ] 

Matthias Pohl commented on FLINK-30831:
---

Thanks for volunteering. i assigned the issue to you.

> Improving e2e test output in case errors/exceptions are found
> -
>
> Key: FLINK-30831
> URL: https://issues.apache.org/jira/browse/FLINK-30831
> Project: Flink
>  Issue Type: Improvement
>  Components: Test Infrastructure
>Affects Versions: 1.17.0, 1.15.3, 1.16.1
>Reporter: Matthias Pohl
>Assignee: Flaviu Cicio
>Priority: Minor
>  Labels: auto-deprioritized-major, starter
>
> Some e2e tests parse the Flink logs for exceptions using {{grep}}. We then 
> print the first 500 lines of the each log file in case an exception that 
> shouldn't be ignored is found (see 
> [internal_check_logs_for_exceptions|https://github.com/apache/flink/blob/c9e87fe410c42f7e7c19c81456d4212a58564f5e/flink-end-to-end-tests/test-scripts/common.sh#L449]
>  or 
> [check_logs_for_errors|https://github.com/apache/flink/blob/c9e87fe410c42f7e7c19c81456d4212a58564f5e/flink-end-to-end-tests/test-scripts/common.sh#L387]).
>  Instead, we could use {{grep -C200}} to actually print the context of the 
> exception.
> This would help especially in those situations where the exception doesn't 
> appear in the first 500 lines.
> This issue does not necessarily only include the two aforementioned code 
> locations. One should check the scripts for other code with a similar issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (FLINK-32380) Support Java records

2023-10-04 Thread Gyula Fora (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-32380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora reassigned FLINK-32380:
--

Assignee: Gyula Fora

> Support Java records
> 
>
> Key: FLINK-32380
> URL: https://issues.apache.org/jira/browse/FLINK-32380
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Reporter: Chesnay Schepler
>Assignee: Gyula Fora
>Priority: Major
>
> Reportedly Java records are not supported, because they are neither detected 
> by our Pojo serializer nor supported by Kryo 2.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-32445][runtime] Refactor BlobStoreService's closeAndCleanupAllData to cleanupAllData [flink]

2023-10-04 Thread via GitHub


Jiabao-Sun commented on code in PR #23424:
URL: https://github.com/apache/flink/pull/23424#discussion_r1346010506


##
flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/ClusterEntrypoint.java:
##
@@ -498,12 +498,15 @@ protected CompletableFuture 
stopClusterServices(boolean cleanupHaData) {
 }
 
 if (haServices != null) {
-try {
-if (cleanupHaData) {
-haServices.closeAndCleanupAllData();
-} else {
-haServices.close();
+if (cleanupHaData) {

Review Comment:
   Thanks @XComp. Sounds great to me.
   Could you help take a look again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30859] Remove all Kafka connector code from main repo [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #22797:
URL: https://github.com/apache/flink/pull/22797#issuecomment-1747160496

   @zentol @tzulitai As the kafka connector code has now been removed from the 
repo in the change that has been merged in 
[#8](https://github.com/apache/flink/commit/149a5e34c1ed8d8943c901a98c65c70693915811)
  , this and the other referenced Kafka connector orientated prs need to be 
moved to the new Flink Kafka connector repository. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-30859] Remove all Kafka connector code from main repo [flink]

2023-10-04 Thread via GitHub


davidradl commented on PR #22797:
URL: https://github.com/apache/flink/pull/22797#issuecomment-1747162915

   @zentol @tzulitai As the kafka connector code has now been removed from the 
repo in the change that has been merged in 
[#8](https://github.com/apache/flink/commit/149a5e34c1ed8d8943c901a98c65c70693915811)
  , this and the other referenced Kafka connector orientated prs need to be 
moved to the new Flink Kafka connector repository. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update filesystem.md spell error : continously to continuously [flink]

2023-10-04 Thread via GitHub


XComp commented on PR #23422:
URL: https://github.com/apache/flink/pull/23422#issuecomment-1747197573

   I see that you provided another PR with similar changes (#23431). Please 
move all the changes into a single PR. I'm gonna close the other PR without 
merging it. We should do the typo changes covering the same issue in a single 
hotfix commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] spell error : continous to continuous [flink]

2023-10-04 Thread via GitHub


XComp commented on PR #23431:
URL: https://github.com/apache/flink/pull/23431#issuecomment-1747198710

   Closing this PR in favor of #23422. Please collect all the typo fixes in 
there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] spell error : continous to continuous [flink]

2023-10-04 Thread via GitHub


XComp closed pull request #23431: spell error : continous to continuous
URL: https://github.com/apache/flink/pull/23431


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-33176][Connectors/Kinesis] Handle null value in RowDataKinesisDeserializationSchema [flink-connector-aws]

2023-10-04 Thread via GitHub


dannycranmer merged PR #102:
URL: https://github.com/apache/flink-connector-aws/pull/102


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (FLINK-33176) Kinesis source throws NullPointerException in Table API on ignored parsing errors

2023-10-04 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer reassigned FLINK-33176:
-

Assignee: Aleksandr Pilipenko

> Kinesis source throws NullPointerException in Table API on ignored parsing 
> errors
> -
>
> Key: FLINK-33176
> URL: https://issues.apache.org/jira/browse/FLINK-33176
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
>
> Using following example table:
>  
> {code:java}
> CREATE TABLE source (
>   text STRING,
>   `arrival_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL
> ) WITH (
>   'connector' = 'kinesis',
>   'stream' = 'test',
>   'aws.region' = 'us-east-1',
>   'json.ignore-parse-errors' = 'true',
>   'format' = 'json'
> ) {code}
> Connector throws NullPointerException when source consumes malformed json 
> message:
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:137)
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:44)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.deserializeRecordForCollectionAndUpdateState(ShardConsumer.java:202)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.lambda$run$0(ShardConsumer.java:126)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:118)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:102)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:114)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at 
> java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (FLINK-33176) Kinesis source throws NullPointerException in Table API on ignored parsing errors

2023-10-04 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer resolved FLINK-33176.
---
Resolution: Fixed

> Kinesis source throws NullPointerException in Table API on ignored parsing 
> errors
> -
>
> Key: FLINK-33176
> URL: https://issues.apache.org/jira/browse/FLINK-33176
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.2.0
>
>
> Using following example table:
>  
> {code:java}
> CREATE TABLE source (
>   text STRING,
>   `arrival_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL
> ) WITH (
>   'connector' = 'kinesis',
>   'stream' = 'test',
>   'aws.region' = 'us-east-1',
>   'json.ignore-parse-errors' = 'true',
>   'format' = 'json'
> ) {code}
> Connector throws NullPointerException when source consumes malformed json 
> message:
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:137)
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:44)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.deserializeRecordForCollectionAndUpdateState(ShardConsumer.java:202)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.lambda$run$0(ShardConsumer.java:126)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:118)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:102)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:114)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at 
> java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33176) Kinesis source throws NullPointerException in Table API on ignored parsing errors

2023-10-04 Thread Danny Cranmer (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771905#comment-17771905
 ] 

Danny Cranmer commented on FLINK-33176:
---

merged commit 
[{{5105d48}}|https://github.com/apache/flink-connector-aws/commit/5105d48bf7334503b314d188462c3542f438f1d6]
 into apache:main

> Kinesis source throws NullPointerException in Table API on ignored parsing 
> errors
> -
>
> Key: FLINK-33176
> URL: https://issues.apache.org/jira/browse/FLINK-33176
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.2.0
>
>
> Using following example table:
>  
> {code:java}
> CREATE TABLE source (
>   text STRING,
>   `arrival_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL
> ) WITH (
>   'connector' = 'kinesis',
>   'stream' = 'test',
>   'aws.region' = 'us-east-1',
>   'json.ignore-parse-errors' = 'true',
>   'format' = 'json'
> ) {code}
> Connector throws NullPointerException when source consumes malformed json 
> message:
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:137)
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:44)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.deserializeRecordForCollectionAndUpdateState(ShardConsumer.java:202)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.lambda$run$0(ShardConsumer.java:126)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:118)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:102)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:114)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at 
> java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33176) Kinesis source throws NullPointerException in Table API on ignored parsing errors

2023-10-04 Thread Danny Cranmer (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Cranmer updated FLINK-33176:
--
Fix Version/s: aws-connector-4.2.0

> Kinesis source throws NullPointerException in Table API on ignored parsing 
> errors
> -
>
> Key: FLINK-33176
> URL: https://issues.apache.org/jira/browse/FLINK-33176
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kinesis
>Affects Versions: 1.15.4, aws-connector-4.1.0
>Reporter: Aleksandr Pilipenko
>Assignee: Aleksandr Pilipenko
>Priority: Major
>  Labels: pull-request-available
> Fix For: aws-connector-4.2.0
>
>
> Using following example table:
>  
> {code:java}
> CREATE TABLE source (
>   text STRING,
>   `arrival_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL
> ) WITH (
>   'connector' = 'kinesis',
>   'stream' = 'test',
>   'aws.region' = 'us-east-1',
>   'json.ignore-parse-errors' = 'true',
>   'format' = 'json'
> ) {code}
> Connector throws NullPointerException when source consumes malformed json 
> message:
> {code:java}
> java.lang.NullPointerException
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:137)
>     at 
> org.apache.flink.streaming.connectors.kinesis.table.RowDataKinesisDeserializationSchema.deserialize(RowDataKinesisDeserializationSchema.java:44)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.deserializeRecordForCollectionAndUpdateState(ShardConsumer.java:202)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.lambda$run$0(ShardConsumer.java:126)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:118)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.publisher.polling.PollingRecordPublisher.run(PollingRecordPublisher.java:102)
>     at 
> org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:114)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>     at 
> java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:829) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] Update with.md defintion spell error ,fix with definition [flink]

2023-10-04 Thread via GitHub


XComp closed pull request #23432: Update with.md defintion spell error ,fix 
with definition
URL: https://github.com/apache/flink/pull/23432


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update with.md defintion spell error ,fix with definition [flink]

2023-10-04 Thread via GitHub


XComp commented on PR #23432:
URL: https://github.com/apache/flink/pull/23432#issuecomment-1747214634

   Thanks for your contribution, @yanfangli85 and for looking into this, 
@davidradl. David is right: We should do documentation changes in both the 
English and Chinese version at the same time to keep them on par. I'm gonna 
close this PR in favor of your first (as far as I can see?) PR #23422 to reduce 
the clutter. Feel free to put the changes in individual hotfix commits within 
PR #23422. That helps keeping the discussion in a single location.
   
   Feel free to ask if you need help with creating additional commits in the 
existing PR (if that's the reason why you create multiple PRs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update match_recognize.md , preterite of seek fix ,change seeked to sought [flink]

2023-10-04 Thread via GitHub


XComp commented on PR #23433:
URL: https://github.com/apache/flink/pull/23433#issuecomment-1747218067

   Thanks for your contribution, @yanfangli85 As pointed out in other PRs: We 
should do documentation changes in both the English and Chinese version at the 
same time to keep them on par. I'm gonna close this PR in favor of your first 
(as far as I can see?) PR #23422 to reduce the clutter. Feel free to put the 
changes in individual hotfix commits within PR #23422. That helps keeping the 
discussion in a single location.
   
   Feel free to ask if you need help with creating additional commits in the 
existing PR (if that's the reason why you create multiple PRs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update match_recognize.md , preterite of seek fix ,change seeked to sought [flink]

2023-10-04 Thread via GitHub


XComp closed pull request #23433: Update match_recognize.md  , preterite   of 
seek fix ,change seeked to sought
URL: https://github.com/apache/flink/pull/23433


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update create.md , ifx spell error, change statment to statement [flink]

2023-10-04 Thread via GitHub


XComp closed pull request #23434: Update create.md , ifx spell error, change 
statment to statement
URL: https://github.com/apache/flink/pull/23434


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update create.md , ifx spell error, change statment to statement [flink]

2023-10-04 Thread via GitHub


XComp commented on PR #23434:
URL: https://github.com/apache/flink/pull/23434#issuecomment-174778

   Thanks for your contribution, @yanfangli85 and for looking into this, 
@davidradl. David is right: We should do documentation changes in both the 
English and Chinese version at the same time to keep them on par. I'm gonna 
close this PR in favor of your first (as far as I can see?) PR #23422 to reduce 
the clutter. Feel free to put the changes in individual hotfix commits within 
PR #23422. That helps keeping the discussion in a single location. Creating 
individual PRs for each typo change seems to be a bit too much overhead.
   
   Feel free to ask if you need help with creating additional commits in the 
existing PR (if that's the reason why you create multiple PRs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update local_installation.md [flink]

2023-10-04 Thread via GitHub


snuyanzin commented on code in PR #23240:
URL: https://github.com/apache/flink/pull/23240#discussion_r1346137247


##
docs/content/docs/try-flink/local_installation.md:
##
@@ -46,6 +46,8 @@ to have __Java 11__ installed. To check the Java version 
installed, type in your
 $ java -version
 ```
 
+**Notes**: JDK 17 and above is not supported yet!!!
+

Review Comment:
   In fact it is supported started from 1.18.x
   Also above in this doc there is already mentioned that there should be JAva 
11 downloaded
   could you please give more insites why dow we need it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix(sec): upgrade com.google.guava:guava to 32.0.0-jre [flink]

2023-10-04 Thread via GitHub


snuyanzin commented on PR #23150:
URL: https://github.com/apache/flink/pull/23150#issuecomment-1747230400

   @ChengDaqi2023  the build is failing
   could you have a look please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32445][runtime] Refactor BlobStoreService's closeAndCleanupAllData to cleanupAllData [flink]

2023-10-04 Thread via GitHub


XComp commented on code in PR #23424:
URL: https://github.com/apache/flink/pull/23424#discussion_r1346146391


##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/HighAvailabilityServices.java:
##
@@ -212,19 +213,53 @@ default LeaderRetrievalService 
getClusterRestEndpointLeaderRetriever() {
 void close() throws Exception;
 
 /**
- * Closes the high availability services (releasing all resources) and 
deletes all data stored
- * by these services in external stores.
+ * Deletes all data stored by high availability services in external 
stores.
  *
- * After this method was called, the any job or session that was 
managed by these high
+ * After this method was called, any job or session that was managed by 
these high
  * availability services will be unrecoverable.
  *
  * If an exception occurs during cleanup, this method will attempt to 
continue the cleanup
  * and report exceptions only after all cleanup steps have been attempted.
  *
- * @throws Exception Thrown, if an exception occurred while closing these 
services or cleaning
- * up data stored by them.
+ * @throws Exception Thrown, if an exception occurred while cleaning up 
data stored by them.
  */
-void closeAndCleanupAllData() throws Exception;
+void cleanupAllData() throws Exception;
+
+/**
+ * Closes the high availability services (releasing all resources) and 
optionally deletes all
+ * data stored by these services in external stores.
+ *
+ * After this method was called, any job or session that was managed by 
these high
+ * availability services will be unrecoverable.
+ *
+ * If an exception occurs during cleanup or close, this method will 
attempt to continue the
+ * cleanup or close and report exceptions only after all cleanup and close 
steps have been
+ * attempted.
+ *
+ * @param cleanupData Whether cleanup of data stored by the high 
availability services should be
+ * attempted.
+ * @throws Exception Thrown, if an exception occurred while cleaning up 
data stored by the high
+ * availability services.

Review Comment:
   ```suggestion
* Calls {@link #cleanupAllData()} (if {@code true} is passed as a 
parameter) before calling {@link #close()} on this instance. Any error that 
appeared during the cleanup will be propagated after calling {@code close()}.
   ```
   nit: I'm not a big fan of duplicating documentation because it makes 
maintaining the documentation harder. What do you think of the proposal above?



##
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/HighAvailabilityServices.java:
##
@@ -212,19 +213,53 @@ default LeaderRetrievalService 
getClusterRestEndpointLeaderRetriever() {
 void close() throws Exception;
 
 /**
- * Closes the high availability services (releasing all resources) and 
deletes all data stored
- * by these services in external stores.
+ * Deletes all data stored by high availability services in external 
stores.
  *
- * After this method was called, the any job or session that was 
managed by these high
+ * After this method was called, any job or session that was managed by 
these high
  * availability services will be unrecoverable.
  *
  * If an exception occurs during cleanup, this method will attempt to 
continue the cleanup
  * and report exceptions only after all cleanup steps have been attempted.
  *
- * @throws Exception Thrown, if an exception occurred while closing these 
services or cleaning
- * up data stored by them.
+ * @throws Exception Thrown, if an exception occurred while cleaning up 
data stored by them.

Review Comment:
   ```suggestion
* @throws Exception if an error occurred while cleaning up data stored 
by them.
   ```
   nit: just to fix using "throw" twice



##
flink-runtime/src/test/java/org/apache/flink/runtime/highavailability/AbstractHaServicesTest.java:
##
@@ -97,7 +98,8 @@ void 
testCloseAndCleanupAllDataDoesNotDeleteBlobsIfCleaningUpHADataFails() throw
 },
 ignored -> {});
 
-
assertThatThrownBy(haServices::closeAndCleanupAllData).isInstanceOf(FlinkException.class);
+
assertThatThrownBy(haServices::cleanupAllData).isInstanceOf(FlinkException.class);
+haServices.close();

Review Comment:
   ```suggestion
   assertThatThrownBy(() -> 
haServices.closeWithOptionalCleanup(true)).isInstanceOf(FlinkException.class);
   ```
   This improves the test coverage (i.e. it would check the newly added default 
implementation).



##
flink-runtime/src/test/java/org/apache/flink/runtime/highavailability/AbstractHaServicesTest.java:
##
@@ -66,13 +66,14 @@ void 
testCloseAndCleanupAllDataDeletesBlobsAfterCleaningUpHAData() throws Except
 () -> 
clos

Re: [PR] [FLINK-33020] OpensearchSinkTest.testAtLeastOnceSink timed out [flink-connector-opensearch]

2023-10-04 Thread via GitHub


reta commented on PR #32:
URL: 
https://github.com/apache/flink-connector-opensearch/pull/32#issuecomment-1747262996

   Blocked by https://github.com/opensearch-project/OpenSearch/pull/9286/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (FLINK-33020) OpensearchSinkTest.testAtLeastOnceSink timed out

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-33020:
---
Labels: pull-request-available  (was: )

> OpensearchSinkTest.testAtLeastOnceSink timed out
> 
>
> Key: FLINK-33020
> URL: https://issues.apache.org/jira/browse/FLINK-33020
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Opensearch
>Affects Versions: opensearch-1.0.2
>Reporter: Martijn Visser
>Priority: Blocker
>  Labels: pull-request-available
>
> https://github.com/apache/flink-connector-opensearch/actions/runs/6061205003/job/16446139552#step:13:1029
> {code:java}
> Error:  Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.837 
> s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.opensearch.OpensearchSinkTest
> Error:  
> org.apache.flink.streaming.connectors.opensearch.OpensearchSinkTest.testAtLeastOnceSink
>   Time elapsed: 5.022 s  <<< ERROR!
> java.util.concurrent.TimeoutException: testAtLeastOnceSink() timed out after 
> 5 seconds
>   at 
> org.junit.jupiter.engine.extension.TimeoutInvocation.createTimeoutException(TimeoutInvocation.java:70)
>   at 
> org.junit.jupiter.engine.extension.TimeoutInvocation.proceed(TimeoutInvocation.java:59)
>   at 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
>   at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
>   at 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
>   at 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
>   at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
>   at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>   at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>   at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>   at 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>   at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
>   at 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
>   at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
>   at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
>   at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
>   at 
> org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
>   at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
>   at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
>   at 
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
>   at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
>   at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
>   at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
>   at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
>   at 
> org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
>   at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
>   at 
> org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
>   at 
> org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
>   at 
> org.j

Re: [PR] [FLINK-33020] OpensearchSinkTest.testAtLeastOnceSink timed out [flink-connector-opensearch]

2023-10-04 Thread via GitHub


snuyanzin commented on PR #32:
URL: 
https://github.com/apache/flink-connector-opensearch/pull/32#issuecomment-1747270859

   jira issue https://issues.apache.org/jira/browse/FLINK-33175


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (FLINK-18356) flink-table-planner Exit code 137 returned from process

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771919#comment-17771919
 ] 

Sergey Nuyanzin commented on FLINK-18356:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53549&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=11712

> flink-table-planner Exit code 137 returned from process
> ---
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 
> 1.19.0
>Reporter: Piotr Nowojski
>Assignee: Yunhong Zheng
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Attachments: 1234.jpg, app-profiling_4.gif, 
> image-2023-01-11-22-21-57-784.png, image-2023-01-11-22-22-32-124.png, 
> image-2023-02-16-20-18-09-431.png, image-2023-07-11-19-28-52-851.png, 
> image-2023-07-11-19-35-54-530.png, image-2023-07-11-19-41-18-626.png, 
> image-2023-07-11-19-41-37-105.png
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [FLINK-33020] OpensearchSinkTest.testAtLeastOnceSink timed out [flink-connector-opensearch]

2023-10-04 Thread via GitHub


reta commented on PR #32:
URL: 
https://github.com/apache/flink-connector-opensearch/pull/32#issuecomment-1747279406

   > jira issue https://issues.apache.org/jira/browse/FLINK-33175
   
   Wrong copy / paste on my side 🗡️ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-32445][runtime] Refactor BlobStoreService's closeAndCleanupAllData to cleanupAllData [flink]

2023-10-04 Thread via GitHub


Jiabao-Sun commented on PR #23424:
URL: https://github.com/apache/flink/pull/23424#issuecomment-1747284639

   Thanks @XComp for the hard review. PTAL again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (FLINK-33184) HybridShuffleITCase fails with exception in resource cleanup of task Map

2023-10-04 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-33184:
---

 Summary: HybridShuffleITCase fails with exception in resource 
cleanup of task Map
 Key: FLINK-33184
 URL: https://issues.apache.org/jira/browse/FLINK-33184
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.19.0
Reporter: Sergey Nuyanzin


This build fails 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53548&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8710
{noformat} 
Map (5/10)#0] ERROR org.apache.flink.runtime.taskmanager.Task   
 [] - FATAL - exception in resource cleanup of task Map (5/10)#0 
(159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
.
java.lang.IllegalStateException: Leaking buffers.
at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
at 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
01:17:22,375 [flink-pekko.actor.default-dispatcher-5] INFO  
org.apache.flink.runtime.taskmanager.Task[] - Task Sink: 
Unnamed (3/10)#0 is already in state CANCELING
01:17:22,375 [Map (5/10)#0] ERROR 
org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - FATAL - 
exception in resource cleanup of task Map (5/10)#0 
(159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
.
java.lang.IllegalStateException: Leaking buffers.
at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
at 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
 ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at 
org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
[flink-runtime

[jira] [Commented] (FLINK-33000) SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using a ThreadFactory

2023-10-04 Thread Jiabao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771922#comment-17771922
 ] 

Jiabao Sun commented on FLINK-33000:


Thanks [~mapohl] and [~fsk119].
The PRs for release 1.18/1.17/1.16 is ready now.
Please help review it when you have time.

> SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using 
> a ThreadFactory
> -
>
> Key: FLINK-33000
> URL: https://issues.apache.org/jira/browse/FLINK-33000
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Gateway, Tests
>Affects Versions: 1.16.2, 1.18.0, 1.17.1, 1.19.0
>Reporter: Matthias Pohl
>Assignee: Jiabao Sun
>Priority: Major
>  Labels: pull-request-available, starter
>
> {{SqlGatewayServiceITCase}} uses a {{ExecutorThreadFactory}} for its 
> asynchronous operations. Instead, one should use {{TestExecutorExtension}} to 
> ensure proper cleanup of threads.
> We might also want to remove the {{AbstractTestBase}} parent class because 
> that uses JUnit4 whereas {{SqlGatewayServiceITCase}} is already based on 
> JUnit5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30719) flink-runtime-web failed due to a corrupted nodejs dependency

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771924#comment-17771924
 ] 

Sergey Nuyanzin commented on FLINK-30719:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53545&view=logs&j=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5&t=54421a62-0c80-5aad-3319-094ff69180bb&l=9332

> flink-runtime-web failed due to a corrupted nodejs dependency
> -
>
> Key: FLINK-30719
> URL: https://issues.apache.org/jira/browse/FLINK-30719
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend, Test Infrastructure, Tests
>Affects Versions: 1.16.0, 1.17.0, 1.18.0
>Reporter: Matthias Pohl
>Assignee: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44954&view=logs&j=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5&t=54421a62-0c80-5aad-3319-094ff69180bb&l=12550
> The build failed due to a corrupted nodejs dependency:
> {code}
> [ERROR] The archive file 
> /__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz
>  is corrupted and will be deleted. Please try the build again.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33184) HybridShuffleITCase fails with exception in resource cleanup of task Map

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771925#comment-17771925
 ] 

Sergey Nuyanzin commented on FLINK-33184:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53535&view=logs&j=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3&t=712ade8c-ca16-5b76-3acd-14df33bc1cb1&l=8823

> HybridShuffleITCase fails with exception in resource cleanup of task Map
> 
>
> Key: FLINK-33184
> URL: https://issues.apache.org/jira/browse/FLINK-33184
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build fails 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53548&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8710
> {noformat} 
> Map (5/10)#0] ERROR org.apache.flink.runtime.taskmanager.Task 
>[] - FATAL - exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
> ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 01:17:22,375 [flink-pekko.actor.default-dispatcher-5] INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Task Sink: 
> Unnamed (3/10)#0 is already in state CANCELING
> 01:17:22,375 [Map (5/10)#0] ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - FATAL - 
> exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionMana

[jira] [Commented] (FLINK-33184) HybridShuffleITCase fails with exception in resource cleanup of task Map

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771926#comment-17771926
 ] 

Sergey Nuyanzin commented on FLINK-33184:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53530&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=9039

> HybridShuffleITCase fails with exception in resource cleanup of task Map
> 
>
> Key: FLINK-33184
> URL: https://issues.apache.org/jira/browse/FLINK-33184
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build fails 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53548&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8710
> {noformat} 
> Map (5/10)#0] ERROR org.apache.flink.runtime.taskmanager.Task 
>[] - FATAL - exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
> ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 01:17:22,375 [flink-pekko.actor.default-dispatcher-5] INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Task Sink: 
> Unnamed (3/10)#0 is already in state CANCELING
> 01:17:22,375 [Map (5/10)#0] ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - FATAL - 
> exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionMana

[jira] [Commented] (FLINK-18356) flink-table-planner Exit code 137 returned from process

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771929#comment-17771929
 ] 

Sergey Nuyanzin commented on FLINK-18356:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53530&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=11718

> flink-table-planner Exit code 137 returned from process
> ---
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 
> 1.19.0
>Reporter: Piotr Nowojski
>Assignee: Yunhong Zheng
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Attachments: 1234.jpg, app-profiling_4.gif, 
> image-2023-01-11-22-21-57-784.png, image-2023-01-11-22-22-32-124.png, 
> image-2023-02-16-20-18-09-431.png, image-2023-07-11-19-28-52-851.png, 
> image-2023-07-11-19-35-54-530.png, image-2023-07-11-19-41-18-626.png, 
> image-2023-07-11-19-41-37-105.png
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33184) HybridShuffleITCase fails with exception in resource cleanup of task Map

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771930#comment-17771930
 ] 

Sergey Nuyanzin commented on FLINK-33184:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53525&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=8664

> HybridShuffleITCase fails with exception in resource cleanup of task Map
> 
>
> Key: FLINK-33184
> URL: https://issues.apache.org/jira/browse/FLINK-33184
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build fails 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53548&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8710
> {noformat} 
> Map (5/10)#0] ERROR org.apache.flink.runtime.taskmanager.Task 
>[] - FATAL - exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
> ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 01:17:22,375 [flink-pekko.actor.default-dispatcher-5] INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Task Sink: 
> Unnamed (3/10)#0 is already in state CANCELING
> 01:17:22,375 [Map (5/10)#0] ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - FATAL - 
> exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionMana

[jira] [Created] (FLINK-33185) HybridShuffleITCase fails with TimeoutException: Pending slot request timed out in slot pool.

2023-10-04 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-33185:
---

 Summary: HybridShuffleITCase fails with TimeoutException: Pending 
slot request timed out in slot pool.
 Key: FLINK-33185
 URL: https://issues.apache.org/jira/browse/FLINK-33185
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.19.0
Reporter: Sergey Nuyanzin


This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53519&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=0c010d0c-3dec-5bf1-d408-7b18988b1b2b&l=8641
fails as 
{noformat}
Sep 29 05:13:54 Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: 
Pending slot request with SlotRequestId{b6e57c09274f4edc50697300bc8859a8} has 
been released.
Sep 29 05:13:54 at 
org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.lambda$assignResource$4(DefaultExecutionDeployer.java:226)
Sep 29 05:13:54 ... 36 more
Sep 29 05:13:54 Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: 
Pending slot request with SlotRequestId{b6e57c09274f4edc50697300bc8859a8} has 
been released.
Sep 29 05:13:54 at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
Sep 29 05:13:54 at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
Sep 29 05:13:54 at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
Sep 29 05:13:54 at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
Sep 29 05:13:54 ... 34 more
Sep 29 05:13:54 Caused by: org.apache.flink.util.FlinkException: 
org.apache.flink.util.FlinkException: Pending slot request with 
SlotRequestId{b6e57c09274f4edc50697300bc8859a8} has been released.
Sep 29 05:13:54 at 
org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge.releaseSlot(DeclarativeSlotPoolBridge.java:373)
Sep 29 05:13:54 ... 30 more
Sep 29 05:13:54 Caused by: java.util.concurrent.TimeoutException: 
java.util.concurrent.TimeoutException: Pending slot request timed out in slot 
pool.
Sep 29 05:13:54 ... 30 more

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33184) HybridShuffleITCase fails with exception in resource cleanup of task Map

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771938#comment-17771938
 ] 

Sergey Nuyanzin commented on FLINK-33184:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53519&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=b78d9d30-509a-5cea-1fef-db7abaa325ae

> HybridShuffleITCase fails with exception in resource cleanup of task Map
> 
>
> Key: FLINK-33184
> URL: https://issues.apache.org/jira/browse/FLINK-33184
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build fails 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53548&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8710
> {noformat} 
> Map (5/10)#0] ERROR org.apache.flink.runtime.taskmanager.Task 
>[] - FATAL - exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
> ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 01:17:22,375 [flink-pekko.actor.default-dispatcher-5] INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Task Sink: 
> Unnamed (3/10)#0 is already in state CANCELING
> 01:17:22,375 [Map (5/10)#0] ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - FATAL - 
> exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.rel

[jira] [Commented] (FLINK-18356) flink-table-planner Exit code 137 returned from process

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771939#comment-17771939
 ] 

Sergey Nuyanzin commented on FLINK-18356:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53512&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be

> flink-table-planner Exit code 137 returned from process
> ---
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 
> 1.19.0
>Reporter: Piotr Nowojski
>Assignee: Yunhong Zheng
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Attachments: 1234.jpg, app-profiling_4.gif, 
> image-2023-01-11-22-21-57-784.png, image-2023-01-11-22-22-32-124.png, 
> image-2023-02-16-20-18-09-431.png, image-2023-07-11-19-28-52-851.png, 
> image-2023-07-11-19-35-54-530.png, image-2023-07-11-19-41-18-626.png, 
> image-2023-07-11-19-41-37-105.png
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33184) HybridShuffleITCase fails with exception in resource cleanup of task Map

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771941#comment-17771941
 ] 

Sergey Nuyanzin commented on FLINK-33184:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509&view=logs&j=5c8e7682-d68f-54d1-16a2-a09310218a49&t=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba&l=8993

> HybridShuffleITCase fails with exception in resource cleanup of task Map
> 
>
> Key: FLINK-33184
> URL: https://issues.apache.org/jira/browse/FLINK-33184
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build fails 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53548&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8710
> {noformat} 
> Map (5/10)#0] ERROR org.apache.flink.runtime.taskmanager.Task 
>[] - FATAL - exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
> ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 01:17:22,375 [flink-pekko.actor.default-dispatcher-5] INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Task Sink: 
> Unnamed (3/10)#0 is already in state CANCELING
> 01:17:22,375 [Map (5/10)#0] ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - FATAL - 
> exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionMana

[jira] [Updated] (FLINK-33184) HybridShuffleITCase fails with exception in resource cleanup of task Map on AZP

2023-10-04 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin updated FLINK-33184:

Summary: HybridShuffleITCase fails with exception in resource cleanup of 
task Map on AZP  (was: HybridShuffleITCase fails with exception in resource 
cleanup of task Map)

> HybridShuffleITCase fails with exception in resource cleanup of task Map on 
> AZP
> ---
>
> Key: FLINK-33184
> URL: https://issues.apache.org/jira/browse/FLINK-33184
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build fails 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53548&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8710
> {noformat} 
> Map (5/10)#0] ERROR org.apache.flink.runtime.taskmanager.Task 
>[] - FATAL - exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:88)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.fail(ResultPartition.java:284)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.failAllResultPartitions(Task.java:1004)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.taskmanager.Task.releaseResources(Task.java:990) 
> ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:838) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) 
> [flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
> 01:17:22,375 [flink-pekko.actor.default-dispatcher-5] INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Task Sink: 
> Unnamed (3/10)#0 is already in state CANCELING
> 01:17:22,375 [Map (5/10)#0] ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - FATAL - 
> exception in resource cleanup of task Map (5/10)#0 
> (159f887fbd200ea7cfa4aaeb1127c4ab_0a448493b4782967b150582570326227_4_0)
> .
> java.lang.IllegalStateException: Leaking buffers.
> at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
> ~[flink-core-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageMemoryManagerImpl.release(TieredStorageMemoryManagerImpl.java:236)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_292]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.storage.TieredStorageResourceRegistry.clearResourceFor(TieredStorageResourceRegistry.java:59)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.hybrid.tiered.shuffle.TieredResultPartition.releaseInternal(TieredResultPartition.java:195)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:262)
>  ~[flink-runtime-1.19-SNAPSHOT.jar:1.19-SNAPSHOT]
> at 
> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionM

[jira] [Updated] (FLINK-33185) HybridShuffleITCase fails with TimeoutException: Pending slot request timed out in slot pool on AZP

2023-10-04 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin updated FLINK-33185:

Summary: HybridShuffleITCase fails with TimeoutException: Pending slot 
request timed out in slot pool on AZP  (was: HybridShuffleITCase fails with 
TimeoutException: Pending slot request timed out in slot pool.)

> HybridShuffleITCase fails with TimeoutException: Pending slot request timed 
> out in slot pool on AZP
> ---
>
> Key: FLINK-33185
> URL: https://issues.apache.org/jira/browse/FLINK-33185
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53519&view=logs&j=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3&t=0c010d0c-3dec-5bf1-d408-7b18988b1b2b&l=8641
> fails as 
> {noformat}
> Sep 29 05:13:54 Caused by: java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkException: Pending slot request with 
> SlotRequestId{b6e57c09274f4edc50697300bc8859a8} has been released.
> Sep 29 05:13:54   at 
> org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.lambda$assignResource$4(DefaultExecutionDeployer.java:226)
> Sep 29 05:13:54   ... 36 more
> Sep 29 05:13:54 Caused by: java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkException: Pending slot request with 
> SlotRequestId{b6e57c09274f4edc50697300bc8859a8} has been released.
> Sep 29 05:13:54   at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> Sep 29 05:13:54   at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> Sep 29 05:13:54   at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> Sep 29 05:13:54   at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> Sep 29 05:13:54   ... 34 more
> Sep 29 05:13:54 Caused by: org.apache.flink.util.FlinkException: 
> org.apache.flink.util.FlinkException: Pending slot request with 
> SlotRequestId{b6e57c09274f4edc50697300bc8859a8} has been released.
> Sep 29 05:13:54   at 
> org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge.releaseSlot(DeclarativeSlotPoolBridge.java:373)
> Sep 29 05:13:54   ... 30 more
> Sep 29 05:13:54 Caused by: java.util.concurrent.TimeoutException: 
> java.util.concurrent.TimeoutException: Pending slot request timed out in slot 
> pool.
> Sep 29 05:13:54   ... 30 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2023-10-04 Thread Sergey Nuyanzin (Jira)
Sergey Nuyanzin created FLINK-33186:
---

 Summary:  
CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails 
on AZP
 Key: FLINK-33186
 URL: https://issues.apache.org/jira/browse/FLINK-33186
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.19.0
Reporter: Sergey Nuyanzin


This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8762
fails as
{noformat}
Sep 28 01:23:43 Caused by: 
org.apache.flink.runtime.checkpoint.CheckpointException: Task local checkpoint 
failure.
Sep 28 01:23:43 at 
org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
Sep 28 01:23:43 at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
Sep 28 01:23:43 at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
Sep 28 01:23:43 at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
Sep 28 01:23:43 at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
Sep 28 01:23:43 at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
Sep 28 01:23:43 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
Sep 28 01:23:43 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
Sep 28 01:23:43 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Sep 28 01:23:43 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748)

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771942#comment-17771942
 ] 

Sergey Nuyanzin commented on FLINK-33186:
-

[~Jiang Xin], [~lindong] since it is very similar to FLINK-32996 could you 
please have a look?

>  CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished 
> fails on AZP
> -
>
> Key: FLINK-33186
> URL: https://issues.apache.org/jira/browse/FLINK-33186
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0
>Reporter: Sergey Nuyanzin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509&view=logs&j=baf26b34-3c6a-54e8-f93f-cf269b32f802&t=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9&l=8762
> fails as
> {noformat}
> Sep 28 01:23:43 Caused by: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task local 
> checkpoint failure.
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
> Sep 28 01:23:43   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Sep 28 01:23:43   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Sep 28 01:23:43   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31339) PlannerScalaFreeITCase.testImperativeUdaf

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771943#comment-17771943
 ] 

Sergey Nuyanzin commented on FLINK-31339:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53504&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=15329

> PlannerScalaFreeITCase.testImperativeUdaf
> -
>
> Key: FLINK-31339
> URL: https://issues.apache.org/jira/browse/FLINK-31339
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
>
> {{PlannerScalaFreeITCase.testImperativeUdaf}} failed:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46812&view=logs&j=87489130-75dc-54e4-1f45-80c30aa367a3&t=73da6d75-f30d-5d5a-acbe-487a9dcff678&l=15012
> {code}
> Mar 05 05:55:50 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 62.028 s <<< FAILURE! - in 
> org.apache.flink.table.sql.codegen.PlannerScalaFreeITCase
> Mar 05 05:55:50 [ERROR] PlannerScalaFreeITCase.testImperativeUdaf  Time 
> elapsed: 40.924 s  <<< FAILURE!
> Mar 05 05:55:50 org.opentest4j.AssertionFailedError: Did not get expected 
> results before timeout, actual result: 
> [{"before":null,"after":{"user_name":"Bob","order_cnt":1},"op":"c"}, 
> {"before":null,"after":{"user_name":"Alice","order_cnt":1},"op":"c"}, 
> {"before":{"user_name":"Bob","order_cnt":1},"after":null,"op":"d"}, 
> {"before":null,"after":{"user_name":"Bob","order_cnt":2},"op":"c"}]. ==> 
> expected:  but was: 
> Mar 05 05:55:50   at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> Mar 05 05:55:50   at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> Mar 05 05:55:50   at 
> org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
> Mar 05 05:55:50   at 
> org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
> Mar 05 05:55:50   at 
> org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:211)
> Mar 05 05:55:50   at 
> org.apache.flink.table.sql.codegen.SqlITCaseBase.checkJsonResultFile(SqlITCaseBase.java:168)
> Mar 05 05:55:50   at 
> org.apache.flink.table.sql.codegen.SqlITCaseBase.runAndCheckSQL(SqlITCaseBase.java:111)
> Mar 05 05:55:50   at 
> org.apache.flink.table.sql.codegen.PlannerScalaFreeITCase.testImperativeUdaf(PlannerScalaFreeITCase.java:43)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-18356) flink-table-planner Exit code 137 returned from process

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771945#comment-17771945
 ] 

Sergey Nuyanzin commented on FLINK-18356:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53486&view=logs&j=a9db68b9-a7e0-54b6-0f98-010e0aff39e2&t=cdd32e0b-6047-565b-c58f-14054472f1be&l=1000

> flink-table-planner Exit code 137 returned from process
> ---
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 
> 1.19.0
>Reporter: Piotr Nowojski
>Assignee: Yunhong Zheng
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Attachments: 1234.jpg, app-profiling_4.gif, 
> image-2023-01-11-22-21-57-784.png, image-2023-01-11-22-22-32-124.png, 
> image-2023-02-16-20-18-09-431.png, image-2023-07-11-19-28-52-851.png, 
> image-2023-07-11-19-35-54-530.png, image-2023-07-11-19-41-18-626.png, 
> image-2023-07-11-19-41-37-105.png
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729&view=logs&j=9cada3cb-c1d3-5621-16da-0f718fb86602&t=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32604) PyFlink end-to-end fails with kafka-server-stop.sh: No such file or directory

2023-10-04 Thread Sergey Nuyanzin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771947#comment-17771947
 ] 

Sergey Nuyanzin commented on FLINK-32604:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53434&view=logs&j=87489130-75dc-54e4-1f45-80c30aa367a3&t=efbee0b1-38ac-597d-6466-1ea8fc908c50&l=8897

> PyFlink end-to-end fails  with kafka-server-stop.sh: No such file or 
> directory 
> ---
>
> Key: FLINK-32604
> URL: https://issues.apache.org/jira/browse/FLINK-32604
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.18.0
>Reporter: Sergey Nuyanzin
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51253&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=7883
> fails as
> {noformat}
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/kafka-common.sh: line 
> 117: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-27379214502/kafka_2.12-3.2.3/bin/kafka-server-stop.sh:
>  No such file or directory
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/kafka-common.sh: line 
> 121: 
> /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-27379214502/kafka_2.12-3.2.3/bin/zookeeper-server-stop.sh:
>  No such file or directory
> Jul 13 19:43:07 [FAIL] Test script contains errors.
> Jul 13 19:43:07 Checking of logs skipped.
> Jul 13 19:43:07 
> Jul 13 19:43:07 [FAIL] 'PyFlink end-to-end test' failed after 0 minutes and 
> 40 seconds! Test exited with exit code 1
> Jul 13 19:43:07 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2023-10-04 Thread Sergey Nuyanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Nuyanzin updated FLINK-28440:

Affects Version/s: 1.19.0

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.18.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at java.io.FileInputStream.open0(Native Meth

  1   2   >