[jira] [Created] (FLINK-26041) AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure hang on azure
Yun Gao created FLINK-26041: --- Summary: AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure hang on azure Key: FLINK-26041 URL: https://issues.apache.org/jira/browse/FLINK-26041 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Yun Gao {code:java} Feb 08 13:04:58 "main" #1 prio=5 os_prio=0 tid=0x7fdcf000b800 nid=0x47bd waiting on condition [0x7fdcf697b000] Feb 08 13:04:58java.lang.Thread.State: WAITING (parking) Feb 08 13:04:58 at sun.misc.Unsafe.park(Native Method) Feb 08 13:04:58 - parking to wait for <0x8f644330> (a java.util.concurrent.CompletableFuture$Signaller) Feb 08 13:04:58 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) Feb 08 13:04:58 at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) Feb 08 13:04:58 at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) Feb 08 13:04:58 at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) Feb 08 13:04:58 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) Feb 08 13:04:58 at org.apache.flink.util.AutoCloseableAsync.close(AutoCloseableAsync.java:36) Feb 08 13:04:58 at org.apache.flink.test.recovery.AbstractTaskManagerProcessFailureRecoveryTest.testTaskManagerProcessFailure(AbstractTaskManagerProcessFailureRecoveryTest.java:209) Feb 08 13:04:58 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 08 13:04:58 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 08 13:04:58 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 08 13:04:58 at java.lang.reflect.Method.invoke(Method.java:498) Feb 08 13:04:58 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Feb 08 13:04:58 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Feb 08 13:04:58 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Feb 08 13:04:58 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Feb 08 13:04:58 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) Feb 08 13:04:58 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) Feb 08 13:04:58 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) Feb 08 13:04:58 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Feb 08 13:04:58 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) Feb 08 13:04:58 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Feb 08 13:04:58 at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) Feb 08 13:04:58 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) Feb 08 13:04:58 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) Feb 08 13:04:58 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) Feb 08 13:04:58 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) Feb 08 13:04:58 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) Feb 08 13:04:58 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) Feb 08 13:04:58 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) Feb 08 13:04:58 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) Feb 08 13:04:58 at org.junit.runners.ParentRunner.run(ParentRunner.java:413) Feb 08 13:04:58 at org.junit.runners.Suite.runChild(Suite.java:128) Feb 08 13:04:58 at org.junit.runners.Suite.runChild(Suite.java:27) {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30901&view=logs&j=a57e0635-3fad-5b08-57c7-a4142d7d6fa9&t=2ef0effc-1da1-50e5-c2bd-aab434b1c5b7&l=14617 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26042) PyFlinkEmbeddedSubInterpreterTests. test_udf_without_arguments failed on azure
Yun Gao created FLINK-26042: --- Summary: PyFlinkEmbeddedSubInterpreterTests. test_udf_without_arguments failed on azure Key: FLINK-26042 URL: https://issues.apache.org/jira/browse/FLINK-26042 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.15.0 Reporter: Yun Gao {code:java} 2022-02-08T02:55:16.0701246Z Feb 08 02:55:16 === FAILURES === 2022-02-08T02:55:16.0702483Z Feb 08 02:55:16 PyFlinkEmbeddedSubInterpreterTests.test_udf_without_arguments _ 2022-02-08T02:55:16.0703190Z Feb 08 02:55:16 2022-02-08T02:55:16.0703959Z Feb 08 02:55:16 self = 2022-02-08T02:55:16.0704967Z Feb 08 02:55:16 2022-02-08T02:55:16.0705639Z Feb 08 02:55:16 def test_udf_without_arguments(self): 2022-02-08T02:55:16.0706641Z Feb 08 02:55:16 one = udf(lambda: 1, result_type=DataTypes.BIGINT(), deterministic=True) 2022-02-08T02:55:16.0707595Z Feb 08 02:55:16 two = udf(lambda: 2, result_type=DataTypes.BIGINT(), deterministic=False) 2022-02-08T02:55:16.0713079Z Feb 08 02:55:16 2022-02-08T02:55:16.0714866Z Feb 08 02:55:16 table_sink = source_sink_utils.TestAppendSink(['a', 'b'], 2022-02-08T02:55:16.0716495Z Feb 08 02:55:16 [DataTypes.BIGINT(), DataTypes.BIGINT()]) 2022-02-08T02:55:16.0717411Z Feb 08 02:55:16 self.t_env.register_table_sink("Results", table_sink) 2022-02-08T02:55:16.0718059Z Feb 08 02:55:16 2022-02-08T02:55:16.0719148Z Feb 08 02:55:16 t = self.t_env.from_elements([(1, 2), (2, 5), (3, 1)], ['a', 'b']) 2022-02-08T02:55:16.0719974Z Feb 08 02:55:16 > t.select(one(), two()).execute_insert("Results").wait() 2022-02-08T02:55:16.0720697Z Feb 08 02:55:16 2022-02-08T02:55:16.0721294Z Feb 08 02:55:16 pyflink/table/tests/test_udf.py:252: 2022-02-08T02:55:16.0722119Z Feb 08 02:55:16 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2022-02-08T02:55:16.0722943Z Feb 08 02:55:16 pyflink/table/table_result.py:76: in wait 2022-02-08T02:55:16.0723686Z Feb 08 02:55:16 get_method(self._j_table_result, "await")() 2022-02-08T02:55:16.0725024Z Feb 08 02:55:16 .tox/py37-cython/lib/python3.7/site-packages/py4j/java_gateway.py:1322: in __call__ 2022-02-08T02:55:16.0726044Z Feb 08 02:55:16 answer, self.gateway_client, self.target_id, self.name) 2022-02-08T02:55:16.0726824Z Feb 08 02:55:16 pyflink/util/exceptions.py:146: in deco 2022-02-08T02:55:16.0727569Z Feb 08 02:55:16 return f(*a, **kw) 2022-02-08T02:55:16.0728326Z Feb 08 02:55:16 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2022-02-08T02:55:16.0728995Z Feb 08 02:55:16 2022-02-08T02:55:16.0729717Z Feb 08 02:55:16 answer = 'x' 2022-02-08T02:55:16.0730447Z Feb 08 02:55:16 gateway_client = 2022-02-08T02:55:16.0731465Z Feb 08 02:55:16 target_id = 'o26503', name = 'await' 2022-02-08T02:55:16.0732045Z Feb 08 02:55:16 2022-02-08T02:55:16.0732763Z Feb 08 02:55:16 def get_return_value(answer, gateway_client, target_id=None, name=None): 2022-02-08T02:55:16.0733699Z Feb 08 02:55:16 """Converts an answer received from the Java gateway into a Python object. 2022-02-08T02:55:16.0734508Z Feb 08 02:55:16 2022-02-08T02:55:16.0735205Z Feb 08 02:55:16 For example, string representation of integers are converted to Python 2022-02-08T02:55:16.0736228Z Feb 08 02:55:16 integer, string representation of objects are converted to JavaObject 2022-02-08T02:55:16.0736974Z Feb 08 02:55:16 instances, etc. 2022-02-08T02:55:16.0737508Z Feb 08 02:55:16 2022-02-08T02:55:16.0738185Z Feb 08 02:55:16 :param answer: the string returned by the Java gateway 2022-02-08T02:55:16.0739074Z Feb 08 02:55:16 :param gateway_client: the gateway client used to communicate with the Java 2022-02-08T02:55:16.0739994Z Feb 08 02:55:16 Gateway. Only necessary if the answer is a reference (e.g., object, 2022-02-08T02:55:16.0740723Z Feb 08 02:55:16 list, map) 2022-02-08T02:55:16.0741491Z Feb 08 02:55:16 :param target_id: the name of the object from which the answer comes from 2022-02-08T02:55:16.0742350Z Feb 08 02:55:16 (e.g., *object1* in `object1.hello()`). Optional. 2022-02-08T02:55:16.0743175Z Feb 08 02:55:16 :param name: the name of the member from which the answer comes from 2022-02-08T02:55:16.0744315Z Feb 08 02:55:16 (e.g., *hello* in `object1.hello()`). Optional. 2022-02-08T02:55:16.0744973Z Feb 08 02:55:16 """ 2022-02-08T02:55:16.0745608Z Feb 08 02:55:16 if is_error(answer)[0]: 2022-02-08T02:55:16.0746484Z Feb 08 02:55:16 if len(answer) > 1: 2022-02-08T02:55:16.0747162Z Feb 08 02:55:16 type = answer[1] 2022-02-08T02:55:16
[jira] [Created] (FLINK-26043) Add periodic kerberos relogin to DelegationTokenManager
Gabor Somogyi created FLINK-26043: - Summary: Add periodic kerberos relogin to DelegationTokenManager Key: FLINK-26043 URL: https://issues.apache.org/jira/browse/FLINK-26043 Project: Flink Issue Type: Sub-task Affects Versions: 1.15.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26044) Properly declare internal UNNEST function
Timo Walther created FLINK-26044: Summary: Properly declare internal UNNEST function Key: FLINK-26044 URL: https://issues.apache.org/jira/browse/FLINK-26044 Project: Flink Issue Type: Sub-task Components: Table SQL / Planner Reporter: Timo Walther Assignee: Timo Walther UNNEST is not declared as an internal function. We should declare it similar to REPLICATE ROWS. -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: Statefun async http request via RequestReplyFunctionBuilder
Hi Galen, Great to hear it :-) I've assigned you to the ticket [1]. Next you can open a PR against the repository and we will review it. [1] https://issues.apache.org/jira/browse/FLINK-25933 Cheers, Till On Tue, Feb 8, 2022 at 11:58 PM Galen Warren wrote: > I'm ready to pick this one up, I have some code that's working locally. > Shall I create a PR? > > > > On Wed, Feb 2, 2022 at 3:17 PM Igal Shilman > wrote: > > > Great, ping me when you would like to pick this up. > > > > For the related issue, I think that can be a good addition indeed! > > > > On Wed, Feb 2, 2022 at 8:55 PM Galen Warren > > wrote: > > > > > Gotcha, thanks. I may be able to work on that one in a couple weeks if > > > you're looking for help. > > > > > > Unrelated question -- another thing that would be useful for me would > be > > > the ability to set a maximum backoff interval in > > BoundedExponentialBackoff > > > or the async equivalent. My situation is this. I'd like to set a long > > > maxRequestDuration, so it takes a good while for Statefun to "give up" > > on a > > > function call, i.e. perhaps several hours or even days. During that > time, > > > if the backoff interval doubles on each failure, those backoff > intervals > > > get pretty long. > > > > > > Sometimes, in exponential backoff implementations, I've seen the > concept > > of > > > a max backoff interval, i.e. once the backoff interval reaches that > > point, > > > it won't go any higher. So I could set it to, say, 60 seconds, and no > > > matter how long it would retry the function, the interval between > retries > > > wouldn't be more than that. > > > > > > Do you think that would be a useful addition? I could post something to > > the > > > dev list if you want. > > > > > > On Wed, Feb 2, 2022 at 2:39 PM Igal Shilman wrote: > > > > > > > Hi Galen, > > > > You are right, it is not possible, but there is no real reason for > > that. > > > > We should fix this, and I've created the following JIRA issue [1] > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-25933 > > > > > > > > On Wed, Feb 2, 2022 at 6:30 PM Galen Warren > > > > > wrote: > > > > > > > > > Is it possible to choose the async HTTP transport using > > > > > RequestReplyFunctionBuilder? It looks to me that it is not, but I > > > wanted > > > > to > > > > > double check. Thanks. > > > > > > > > > > > > > > >
[jira] [Created] (FLINK-26045) Document error status codes for the REST API
David Morávek created FLINK-26045: - Summary: Document error status codes for the REST API Key: FLINK-26045 URL: https://issues.apache.org/jira/browse/FLINK-26045 Project: Flink Issue Type: Technical Debt Components: Documentation, Runtime / REST Reporter: David Morávek Fix For: 1.15.0 We've introduced new error status codes in FLINK-24275, which are not projected in the REST documentation and the OpenAPI spec file. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26046) Link to OpenAPI specification is dead
David Morávek created FLINK-26046: - Summary: Link to OpenAPI specification is dead Key: FLINK-26046 URL: https://issues.apache.org/jira/browse/FLINK-26046 Project: Flink Issue Type: Technical Debt Components: Documentation Reporter: David Morávek Fix For: 1.15.0 In the REST API docs, the link to the OpenAPI specification is missing the flink prefix, so it's leading nowhere. https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/ -- This message was sent by Atlassian Jira (v8.20.1#820001)
Azure Pipelines are dealing with an incident, causing pipeline runs to fail
Hi everyone, Please keep in mind that Azure Pipelines currently is dealing with an incident [1] which causes all CI pipeline runs on Azure to fail. When the incident has been resolved, it will be required to retrigger your pipeline to see if the pipeline then passes. Best regards, Martijn Visser https://twitter.com/MartijnVisser82 [1] https://status.dev.azure.com/_event/287959626
[jira] [Created] (FLINK-26047) Support usrlib in HDFS for YARN application mode
Biao Geng created FLINK-26047: - Summary: Support usrlib in HDFS for YARN application mode Key: FLINK-26047 URL: https://issues.apache.org/jira/browse/FLINK-26047 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: Biao Geng In YARN Application mode, we currently support using user jar and lib jar from HDFS. For example, we can run commands like: {quote}./bin/flink run-application -t yarn-application \ -Dyarn.provided.lib.dirs="hdfs://myhdfs/my-remote-flink-dist-dir" \ hdfs://myhdfs/jars/my-application.jar{quote} For {{usrlib}}, we currently only support local directory. I propose to add HDFS support for {{usrlib}} to work with CLASSPATH_INCLUDE_USER_JAR better. It can also benefit cases like using notebook to submit flink job. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26048) Yarn access should be separated by semicolon
MichealShin created FLINK-26048: --- Summary: Yarn access should be separated by semicolon Key: FLINK-26048 URL: https://issues.apache.org/jira/browse/FLINK-26048 Project: Flink Issue Type: Bug Reporter: MichealShin Yarn access should be separated by semicolon, otherwise, the config can not be read correctly. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26049) The tolerable-failed-checkpoints logic is invalid when checkpoint trigger failed
fanrui created FLINK-26049: -- Summary: The tolerable-failed-checkpoints logic is invalid when checkpoint trigger failed Key: FLINK-26049 URL: https://issues.apache.org/jira/browse/FLINK-26049 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.14.3, 1.13.5 Reporter: fanrui Fix For: 1.15.0 Attachments: image-2022-02-09-18-08-17-868.png, image-2022-02-09-18-08-34-992.png, image-2022-02-09-18-08-42-920.png After triggerCheckpoint, if checkpoint failed, flink will execute the tolerable-failed-checkpoints logic. But if triggerCheckpoint failed, flink won't execute the tolerable-failed-checkpoints logic. h1. How to reproduce this issue? In our online env, hdfs sre deletes the flink base dir by mistake, and flink job don't have permission to create checkpoint dir. So cause flink trigger checkpoint failed. There are some didn't meet expectations: * JM just log _"Failed to trigger checkpoint for job 6f09d4a15dad42b24d52c987f5471f18 since Trigger checkpoint failure" ._ Don't show the root cause or exception. * user set tolerable-failed-checkpoints=0, but if triggerCheckpoint failed, flink won't execute the tolerable-failed-checkpoints logic. * When triggerCheckpoint failed, numberOfFailedCheckpoints is always 0 * When triggerCheckpoint failed, we can't find checkpoint info in checkpoint history page. !image-2022-02-09-18-08-17-868.png! !image-2022-02-09-18-08-34-992.png! !image-2022-02-09-18-08-42-920.png! h3. *All metrics are normal, so the next day we found out that the checkpoint failed, and the checkpoint has been failing for a day. it's not acceptable to the flink user.* I have some ideas: # Should tolerable-failed-checkpoints logic be executed when triggerCheckpoint fails? # When triggerCheckpoint failed, should increase numberOfFailedCheckpoints? # When triggerCheckpoint failed, should show checkpoint info in checkpoint history page? # JM just show "Failed to trigger checkpoint", should we show detailed exception to easy find the root cause? Masters, could we do these changes? Please correct me if I'm wrong. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26051) row_number =1 and Subsequent SQL has "case when" and "where" statement : The window can only be ordered in ASCENDING mode
chuncheng wu created FLINK-26051: Summary: row_number =1 and Subsequent SQL has "case when" and "where" statement : The window can only be ordered in ASCENDING mode Key: FLINK-26051 URL: https://issues.apache.org/jira/browse/FLINK-26051 Project: Flink Issue Type: Bug Components: Table SQL / Planner Affects Versions: 1.12.2 Reporter: chuncheng wu hello, i have 2 sqls. One sql has rn=1 and the Subsequent SQL has "case when" and "where".it results the exception as follow. It happen in the occasion when logical plan turn into physical plan : {quote}_org.apache.flink.table.api.TableException: The window can only be ordered in ASCENDING mode._ _at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:98)_ _at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregate.translateToPlanInternal(StreamExecOverAggregate.scala:52)_ _at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)_ _at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecOverAggregateBase.translateToPlan(StreamExecOverAggregateBase.scala:42)_ _at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54)_ _at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)_ _at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)_ _at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38)_ _at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)_ _at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)_ _at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)_ _at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)_ _at scala.collection.Iterator$class.foreach(Iterator.scala:891)_ _at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)_ _at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)_ _at scala.collection.AbstractIterable.foreach(Iterable.scala:54)_ _at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)_ _at scala.collection.AbstractTraversable.map(Traversable.scala:104)_ _at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:65)_ _at org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:103)_ _at org.apache.flink.table.planner.delegation.StreamPlanner.explain(StreamPlanner.scala:42)_ _at org.apache.flink.table.api.internal.TableEnvironmentImpl.explainInternal(TableEnvironmentImpl.java:630)_ _at org.apache.flink.table.api.internal.TableImpl.explain(TableImpl.java:582)_ _at com.meituan.grocery.data.flink.test.BugTest.testRowNumber(BugTest.java:69)_ _at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)_ _at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)_ _at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_ _at java.base/java.lang.reflect.Method.invoke(Method.java:568)_ _at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)_ _at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)_ _at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)_ _at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)_ _at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)_ _at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)_ _at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)_ _at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)_ _at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)_ _at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)_ _at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)_ _at org.junit.jupiter.engine.execut
[jira] [Created] (FLINK-26050) Too Many small sst files in rocksdb state backend when using processing time window
shen created FLINK-26050: Summary: Too Many small sst files in rocksdb state backend when using processing time window Key: FLINK-26050 URL: https://issues.apache.org/jira/browse/FLINK-26050 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.14.3, 1.10.2 Reporter: shen When using processing time window, in some workload, there will be a lot of small sst files(serveral KB) in rocksdb local directory and may cause "Too many files error". Use rocksdb tool ldb to find out content in sst files: * column family of these small sst files is "processing_window-timers". * most sst files are in level-1. * records in sst files are almost kTypeDeletion. * creation time of sst file correspond to checkpoint interval. These small sst files seem to be generated when flink checkpoint is triggered. Although all content in sst are delete tags, they are not compacted and deleted in rocksdb compaction because of not intersecting with each other(rocksdb [compaction trivial move|https://github.com/facebook/rocksdb/wiki/Compaction-Trivial-Move]). And there seems to be no chance to delete them because of small size and not intersect with other sst files. I will attach a simple program to reproduce the problem. Since timer in processing time window is generated in strictly ascending order(both put and delete). So If workload of job happen to generate level-0 sst files not intersect with each other.(for example: processing window size much smaller than checkpoint interval, and no window content cross checkpoint interval or no new data in window crossing checkpoint interval). There will be many small sst files generated until job restored from savepoint, or incremental checkpoint is disabled. May be similar problem exists when user use timer in operators with same workload. Code to reproduce the problem: {code:java} package org.apache.flink; import lombok.extern.slf4j.Slf4j; import org.apache.flink.configuration.Configuration; import org.apache.flink.configuration.RestOptions; import org.apache.flink.configuration.TaskManagerOptions; import org.apache.flink.contrib.streaming.state.RocksDBStateBackend; import org.apache.flink.streaming.api.TimeCharacteristic; import org.apache.flink.streaming.api.checkpoint.ListCheckpointed; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.source.SourceFunction; import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction; import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; import org.apache.flink.streaming.api.windowing.time.Time; import org.apache.flink.streaming.api.windowing.windows.TimeWindow; import org.apache.flink.util.Collector; import java.util.Collections; import java.util.List; import java.util.Random; @Slf4j public class StreamApp { public static void main(String[] args) throws Exception { Configuration config = new Configuration(); config.set(RestOptions.ADDRESS, "127.0.0.1"); config.set(RestOptions.PORT, 10086); config.set(TaskManagerOptions.NUM_TASK_SLOTS, 6); new StreamApp().configureApp(StreamExecutionEnvironment.createLocalEnvironment(1, config)); } public void configureApp(StreamExecutionEnvironment env) throws Exception { env.enableCheckpointing(2); // 20sec RocksDBStateBackend rocksDBStateBackend = new RocksDBStateBackend("file:///Users/shenjiaqi/Workspace/jira/flink-51/checkpoints/", true); // need to be reconfigured rocksDBStateBackend.setDbStoragePath("/Users/shenjiaqi/Workspace/jira/flink-51/flink/rocksdb_local_db"); // need to be reconfigured env.setStateBackend(rocksDBStateBackend); env.getCheckpointConfig().setCheckpointTimeout(10); env.getCheckpointConfig().setTolerableCheckpointFailureNumber(5); env.setParallelism(1); env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); env.getConfig().setTaskCancellationInterval(1); for (int i = 0; i < 1; ++i) { createOnePipeline(env); } env.execute("StreamApp"); } private void createOnePipeline(StreamExecutionEnvironment env) { // data source is configured so that little window cross checkpoint interval DataStreamSource stream = env.addSource(new Generator(1, 3000, 3600)); stream.keyBy(x -> x) // make sure window size less than checkpoint interval. though 100ms is too small, I think increase this value can still reproduce the problem with longer time. .window(TumblingProcessingTimeWindows.of(Time.milliseconds(100))) .process(new ProcessWindowFunction() { @Override public void process(String s, ProcessWindowFunction.Context context, Iter
Re: Azure Pipelines are dealing with an incident, causing pipeline runs to fail
Hi everyone, The issue should now be resolved. Best regards, Martijn On Wed, 9 Feb 2022 at 10:55, Martijn Visser wrote: > Hi everyone, > > Please keep in mind that Azure Pipelines currently is dealing with an > incident [1] which causes all CI pipeline runs on Azure to fail. When the > incident has been resolved, it will be required to retrigger your pipeline > to see if the pipeline then passes. > > Best regards, > > Martijn Visser > https://twitter.com/MartijnVisser82 > > [1] https://status.dev.azure.com/_event/287959626 >
[jira] [Created] (FLINK-26052) Update chinese documentation regarding FLIP-203
Dawid Wysakowicz created FLINK-26052: Summary: Update chinese documentation regarding FLIP-203 Key: FLINK-26052 URL: https://issues.apache.org/jira/browse/FLINK-26052 Project: Flink Issue Type: Sub-task Components: Documentation, Runtime / Checkpointing Reporter: Dawid Wysakowicz Relevant english commits: * c1f5c5320150402fc0cb4fbf3a31f9a27b1e4d9a * cd8ea8d5b207569f68acc5a3c8db95cd2ca47ba6 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[DISCUSS] FLINK-25927: Make flink-connector-base dependency usage consistent across all connectors
Hi everyone, I would like to discuss the best approach to address the issue raised in FLINK-25927 [1]. It can be summarized as follows: > flink-connector-base is currently inconsistently used in connectors (directly shaded in some and transitively pulled in via flink-connector-files which is itself shaded in the table uber jar) > FLINK-24687 [2] moved flink-connector-files out of the flink-table uber jar > It is necessary to make usage of flink-connector-base consistent across all connectors One approach is to stop shading flink-connector-files in all connectors and instead package it in flink-dist, making it a part of Flink-wide provided public API. This approach is investigated in the following PoC PR: 18545 [3]. The issue with this approach is that it breaks any existing CI and IDE setups that do not directly rely on flink-dist and also do not include flink-connector-files as an explicit dependency. In theory, a nice alternative would be to make it a part of a dependency that is ubiquitously provided, for instance, flink-streaming-java. Doing that for flink-streaming-java would, however, introduce a dependency cycle and is currently not feasible. It would be great to hear your opinions on what could be the best way forward here. [1] https://issues.apache.org/jira/browse/FLINK-25927 [2] https://issues.apache.org/jira/browse/FLINK-24687 [3] https://github.com/apache/flink/pull/18545 Thanks, Alexander Fedulov
[DISCUSS] Drop Jepsen tests
For a few years by now we had a set of Jepsen tests that verify the correctness of Flinks coordination layer in the case of process crashes. In the past it has indeed found issues and thus provided value to the project, and in general the core idea of it (and Jepsen for that matter) is very sound. However, so far we neither made attempts to make further use of Jepsen (and limited ourselves to very basic tests) nor to familiarize ourselves with the tests/jepsen at all. As a result these tests are difficult to maintain. They (and Jepsen) are written in Clojure, which makes debugging, changes and upstreaming contributions very difficult. Additionally, the tests also make use of a very complicated (Ververica-internal) terraform+ansible setup to spin up and tear down AWS machines. While it works (and is actually pretty cool), it's difficult to adjust because the people who wrote it have left the company. Why I'm raising this now (and not earlier) is because so far keeping the tests running wasn't much of a problem; bump a few dependencies here and there and we're good to go. However, this has changed with the recent upgrade to Zookeeper 3.5, which isn't supported by Jepsen out-of-the-box, completely breaking the tests. We'd now have to write a new Zookeeper 3.5+ integration for Jepsen (again, in Clojure). While I started working on that and could likely finish it, I started to wonder whether it even makes sense to do so, and whether we couldn't invest this time elsewhere. Let me know what you think.
Re: Azure Pipelines are dealing with an incident, causing pipeline runs to fail
Unfortunately it looks like there are still failures. Will keep you posted On Wed, 9 Feb 2022 at 11:51, Martijn Visser wrote: > Hi everyone, > > The issue should now be resolved. > > Best regards, > > Martijn > > On Wed, 9 Feb 2022 at 10:55, Martijn Visser wrote: > >> Hi everyone, >> >> Please keep in mind that Azure Pipelines currently is dealing with an >> incident [1] which causes all CI pipeline runs on Azure to fail. When the >> incident has been resolved, it will be required to retrigger your pipeline >> to see if the pipeline then passes. >> >> Best regards, >> >> Martijn Visser >> https://twitter.com/MartijnVisser82 >> >> [1] https://status.dev.azure.com/_event/287959626 >> >
Re: Azure Pipelines are dealing with an incident, causing pipeline runs to fail
I filed a support request with Microsoft: https://developercommunity.visualstudio.com/t/Number-of-Microsoft-hosted-agents-droppe/1658827?from=email&space=21&sort=newest On Wed, Feb 9, 2022 at 1:04 PM Martijn Visser wrote: > Unfortunately it looks like there are still failures. Will keep you posted > > On Wed, 9 Feb 2022 at 11:51, Martijn Visser wrote: > > > Hi everyone, > > > > The issue should now be resolved. > > > > Best regards, > > > > Martijn > > > > On Wed, 9 Feb 2022 at 10:55, Martijn Visser > wrote: > > > >> Hi everyone, > >> > >> Please keep in mind that Azure Pipelines currently is dealing with an > >> incident [1] which causes all CI pipeline runs on Azure to fail. When > the > >> incident has been resolved, it will be required to retrigger your > pipeline > >> to see if the pipeline then passes. > >> > >> Best regards, > >> > >> Martijn Visser > >> https://twitter.com/MartijnVisser82 > >> > >> [1] https://status.dev.azure.com/_event/287959626 > >> > > >
[jira] [Created] (FLINK-26053) Fix parser generator warnings
Francesco Guardiani created FLINK-26053: --- Summary: Fix parser generator warnings Key: FLINK-26053 URL: https://issues.apache.org/jira/browse/FLINK-26053 Project: Flink Issue Type: Bug Components: Table SQL / API Reporter: Francesco Guardiani When building flink-sql-parser, the javacc logs a couple of warnings: {code} Warning: Choice conflict in [...] construct at line 2920, column 5. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "PLAN" Consider using a lookahead of 2 or more for nested expansion. Warning: Choice conflict involving two expansions at line 2930, column 9 and line 2932, column 9 respectively. A common prefix is: "STATEMENT" Consider using a lookahead of 2 for earlier expansion. Warning: Choice conflict involving two expansions at line 2952, column 9 and line 2954, column 9 respectively. A common prefix is: "STATEMENT" Consider using a lookahead of 2 for earlier expansion. {code} Took from https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30871&view=logs&j=0c940707-2659-5648-cbe6-a1ad63045f0a&t=075c2716-8010-5565-fe08-3c4bb45824a4&l=3911 We should investigate them, as they're often symptom of some bug in our parser template, which might result in unexpected parsing errors. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26054) Enable maven-enforcer to disable table-planner and table-runtime as dependencies
Francesco Guardiani created FLINK-26054: --- Summary: Enable maven-enforcer to disable table-planner and table-runtime as dependencies Key: FLINK-26054 URL: https://issues.apache.org/jira/browse/FLINK-26054 Project: Flink Issue Type: Bug Components: Build System, Table SQL / Planner Reporter: Francesco Guardiani Assignee: Francesco Guardiani >From https://github.com/apache/flink/pull/18676#discussion_r802502438, we >should enable the enforcer for table-planner and table-runtime modules, as >described in the flink-table/README.md -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26055) Use new FlinkVersion enum for serde plans
Timo Walther created FLINK-26055: Summary: Use new FlinkVersion enum for serde plans Key: FLINK-26055 URL: https://issues.apache.org/jira/browse/FLINK-26055 Project: Flink Issue Type: Sub-task Reporter: Timo Walther The new FlinkVersion enum should be used to encode a Flink version from and to JSON. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26056) CLONE - [FLIP-171] KDS implementation of Async Sink Table API
Ahmed Hamdy created FLINK-26056: --- Summary: CLONE - [FLIP-171] KDS implementation of Async Sink Table API Key: FLINK-26056 URL: https://issues.apache.org/jira/browse/FLINK-26056 Project: Flink Issue Type: New Feature Components: Connectors / Kinesis Reporter: Ahmed Hamdy Assignee: Ahmed Hamdy Fix For: 1.15.0 h2. Motivation *User stories:* As a Flink user, I’d like to use the Table API for the new Kinesis Data Streams sink. *Scope:* * Introduce {{AsyncDynamicTableSink}} that enables Sinking Tables into Async Implementations. * Implement a new {{KinesisDynamicTableSink}} that uses {{KinesisDataStreamSink}} Async Implementation and implements {{{}AsyncDynamicTableSink{}}}. * The implementation introduces Async Sink configurations as optional options in the table definition, with default values derived from the {{KinesisDataStream}} default values. * Unit/Integration testing. modify KinesisTableAPI tests for the new implementation, add unit tests for {{AsyncDynamicTableSink}} and {{KinesisDynamicTableSink}} and {{{}KinesisDynamicTableSinkFactory{}}}. * Java / code-level docs. h2. References More details to be found [https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink] *Update:* ^^ Status Update ^^ __List of all work outstanding for 1.15 release__ [Merged] https://github.com/apache/flink/pull/18165 - KDS DataStream Docs [Merged] https://github.com/apache/flink/pull/18396 - [hotfix] for infinte loop if not flushing during commit [Merged] https://github.com/apache/flink/pull/18421 - Mark Kinesis Producer as deprecated (Prod: FLINK-24227) [Merged] https://github.com/apache/flink/pull/18348 - KDS Table API Sink & Docs [Merged] https://github.com/apache/flink/pull/18488 - base sink retry entries in order not in reverse [Merged] https://github.com/apache/flink/pull/18512 - changing failed requests handler to accept List in AsyncSinkWriter [Merged] https://github.com/apache/flink/pull/18483 - Do not expose the element converter [Merged] https://github.com/apache/flink/pull/18468 - Adding Kinesis data streams sql uber-jar Ready for review: [SUCCESS ] https://github.com/apache/flink/pull/18314 - KDF DataStream Sink & Docs [BLOCKED on ^^ ] https://github.com/apache/flink/pull/18426 - rename flink-connector-aws-kinesis-data-* to flink-connector-aws-kinesis-* (module names) and KinesisData*Sink to Kinesis*Sink (class names) Pending PR: * Firehose Table API Sink & Docs * KDF Table API SQL jar TBD: * FLINK-25846: Not shutting down * FLINK-25848: Validation during start up * FLINK-25792: flush() bug * FLINK-25793: throughput exceeded * Update the defaults of KDS sink and update the docs + do the same for KDF * add a `AsyncSinkCommonConfig` class (to wrap the 6 params) to the `flink-connector-base` and propagate it to the two connectors - feature freeze * KDS performance testing * KDF performance testing * Clone the new docs to .../contents.zh/... and add the location to the corresponding Chinese translation jira - KDS - * Rename [Amazon AWS Kinesis Streams] to [Amazon Kinesis Data Streams] in docs (legacy issue) - Flink 1.15 release * KDS end to end sanity test - hits aws apis rather than local docker images * KDS Python wrappers * FLINK-25733 - Create A migration guide for Kinesis Table API connector - can happen after 1.15 * If `endpoint` is provided, `region` should not be required like it currently is * Test if Localstack container requires the 1ms timeout * Adaptive level of logging (in discussion) FYI: * FLINK-25661 - Add Custom Fatal Exception handler in AsyncSinkWriter - https://github.com/apache/flink/pull/18449 * https://issues.apache.org/jira/browse/FLINK-24229 DDB Sink Chinese translation: https://issues.apache.org/jira/browse/FLINK-25735 - KDS DataStream Sink https://issues.apache.org/jira/browse/FLINK-25736 - KDS Table API Sink https://issues.apache.org/jira/browse/FLINK-25737 - KDF DataStream Sink -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26057) CLONE - [FLIP-171] Add SQL Client Uber jar for KinesisDataStreams Sink.
Ahmed Hamdy created FLINK-26057: --- Summary: CLONE - [FLIP-171] Add SQL Client Uber jar for KinesisDataStreams Sink. Key: FLINK-26057 URL: https://issues.apache.org/jira/browse/FLINK-26057 Project: Flink Issue Type: Sub-task Components: Connectors / Kinesis Reporter: Ahmed Hamdy Assignee: Ahmed Hamdy Fix For: 1.15.0 h2. Motivation *User stories:* As a Flink user, I’d like to have an uber-jar for kinesis-data-streams sink sql connector with all dependencies without the need for all source dependencies. *Scope:* * Add a new module for {{flink-sql-connector-kinesis-data-streams}} which shades {{flink-connector-kinesis-data-streams}} with needed dependencies. * Add end to end tests for the new connector. h2. References More details to be found [https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26058) KinesisDynamicTableFactory implements DynamicSourceFactory but uses sink options
Ahmed Hamdy created FLINK-26058: --- Summary: KinesisDynamicTableFactory implements DynamicSourceFactory but uses sink options Key: FLINK-26058 URL: https://issues.apache.org/jira/browse/FLINK-26058 Project: Flink Issue Type: Bug Components: Connectors / Kinesis Reporter: Ahmed Hamdy {{h2. Description KinesisDynamicTableFactory}} implements {{DynamicTableSourceFactory}} while {{KinesisDynamicTableSinkFactory}} implements {{DynamicTableSinkFactory, however KinesisDynamicTableFactory adds {{SINK_PARTITIONER and {{SINK_PARTITION_DELIMITER}} options specific to sink in optional table options. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26059) ResourceProfile#ANY can cause overflow exceptions
Chesnay Schepler created FLINK-26059: Summary: ResourceProfile#ANY can cause overflow exceptions Key: FLINK-26059 URL: https://issues.apache.org/jira/browse/FLINK-26059 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Chesnay Schepler ResourceProfile#ANY uses MemorySize.MAX_VALUE for all memory settings. Some of the getters add up some of the memory profiles, causing an overflow. This implies that you must never access any getter without first comparing a resource profile against the ANY singleton. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26060) Make Python specific exec nodes unsupported
Francesco Guardiani created FLINK-26060: --- Summary: Make Python specific exec nodes unsupported Key: FLINK-26060 URL: https://issues.apache.org/jira/browse/FLINK-26060 Project: Flink Issue Type: Sub-task Reporter: Francesco Guardiani Assignee: Francesco Guardiani These nodes are using the old type system, which is going to be removed soon. We should avoid supporting them in the persisted plan, as we cannot commit to support them. Once migrated to the new type system, PyFlink won't need these nodes anymore and will just rely on Table new function stack. For more details, also check https://issues.apache.org/jira/browse/FLINK-25231 -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [DISCUSS] Drop Jepsen tests
Thank you for raising this issue. What risks do you see if we drop it? Do you see any cheaper alternative to (partially) mitigate those risks? On Wed, Feb 9, 2022 at 12:40 PM Chesnay Schepler wrote: > For a few years by now we had a set of Jepsen tests that verify the > correctness of Flinks coordination layer in the case of process crashes. > In the past it has indeed found issues and thus provided value to the > project, and in general the core idea of it (and Jepsen for that matter) > is very sound. > > However, so far we neither made attempts to make further use of Jepsen > (and limited ourselves to very basic tests) nor to familiarize ourselves > with the tests/jepsen at all. > As a result these tests are difficult to maintain. They (and Jepsen) are > written in Clojure, which makes debugging, changes and upstreaming > contributions very difficult. > Additionally, the tests also make use of a very complicated > (Ververica-internal) terraform+ansible setup to spin up and tear down > AWS machines. While it works (and is actually pretty cool), it's > difficult to adjust because the people who wrote it have left the company. > > Why I'm raising this now (and not earlier) is because so far keeping the > tests running wasn't much of a problem; bump a few dependencies here and > there and we're good to go. > > However, this has changed with the recent upgrade to Zookeeper 3.5, > which isn't supported by Jepsen out-of-the-box, completely breaking the > tests. We'd now have to write a new Zookeeper 3.5+ integration for > Jepsen (again, in Clojure). While I started working on that and could > likely finish it, I started to wonder whether it even makes sense to do > so, and whether we couldn't invest this time elsewhere. > > Let me know what you think. > > -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk
Re: [DISCUSS] Drop Jepsen tests
The jepsen tests cover 3 cases: a) JM/TM crashes b) HDFS namenode crash (aka, can't checkpoint because HDFS is down) c) network partitions a) can (and probably is) reasonably covered by existing ITCases and e2e tests b) We could probably figure this out ourselves if we wanted to. c) is the difficult part. Note that the tests also only cover yarn (per-job/session) and standalone (session) deployments. On 09/02/2022 17:11, Konstantin Knauf wrote: Thank you for raising this issue. What risks do you see if we drop it? Do you see any cheaper alternative to (partially) mitigate those risks? On Wed, Feb 9, 2022 at 12:40 PM Chesnay Schepler wrote: For a few years by now we had a set of Jepsen tests that verify the correctness of Flinks coordination layer in the case of process crashes. In the past it has indeed found issues and thus provided value to the project, and in general the core idea of it (and Jepsen for that matter) is very sound. However, so far we neither made attempts to make further use of Jepsen (and limited ourselves to very basic tests) nor to familiarize ourselves with the tests/jepsen at all. As a result these tests are difficult to maintain. They (and Jepsen) are written in Clojure, which makes debugging, changes and upstreaming contributions very difficult. Additionally, the tests also make use of a very complicated (Ververica-internal) terraform+ansible setup to spin up and tear down AWS machines. While it works (and is actually pretty cool), it's difficult to adjust because the people who wrote it have left the company. Why I'm raising this now (and not earlier) is because so far keeping the tests running wasn't much of a problem; bump a few dependencies here and there and we're good to go. However, this has changed with the recent upgrade to Zookeeper 3.5, which isn't supported by Jepsen out-of-the-box, completely breaking the tests. We'd now have to write a new Zookeeper 3.5+ integration for Jepsen (again, in Clojure). While I started working on that and could likely finish it, I started to wonder whether it even makes sense to do so, and whether we couldn't invest this time elsewhere. Let me know what you think.
Re: [DISCUSS] Drop Jepsen tests
b/c are part of the same test. * We have a job running, * trigger a network partition (failing the job), * then crash HDFS (preventing checkpoints and access to the HA storageDir), * then the partition is resolved and HDFS is started again. Conceptually I would think we can replicate this by nuking half the cluster, crashing HDFS/ZK, and restarting everything. On 09/02/2022 17:39, Chesnay Schepler wrote: The jepsen tests cover 3 cases: a) JM/TM crashes b) HDFS namenode crash (aka, can't checkpoint because HDFS is down) c) network partitions a) can (and probably is) reasonably covered by existing ITCases and e2e tests b) We could probably figure this out ourselves if we wanted to. c) is the difficult part. Note that the tests also only cover yarn (per-job/session) and standalone (session) deployments. On 09/02/2022 17:11, Konstantin Knauf wrote: Thank you for raising this issue. What risks do you see if we drop it? Do you see any cheaper alternative to (partially) mitigate those risks? On Wed, Feb 9, 2022 at 12:40 PM Chesnay Schepler wrote: For a few years by now we had a set of Jepsen tests that verify the correctness of Flinks coordination layer in the case of process crashes. In the past it has indeed found issues and thus provided value to the project, and in general the core idea of it (and Jepsen for that matter) is very sound. However, so far we neither made attempts to make further use of Jepsen (and limited ourselves to very basic tests) nor to familiarize ourselves with the tests/jepsen at all. As a result these tests are difficult to maintain. They (and Jepsen) are written in Clojure, which makes debugging, changes and upstreaming contributions very difficult. Additionally, the tests also make use of a very complicated (Ververica-internal) terraform+ansible setup to spin up and tear down AWS machines. While it works (and is actually pretty cool), it's difficult to adjust because the people who wrote it have left the company. Why I'm raising this now (and not earlier) is because so far keeping the tests running wasn't much of a problem; bump a few dependencies here and there and we're good to go. However, this has changed with the recent upgrade to Zookeeper 3.5, which isn't supported by Jepsen out-of-the-box, completely breaking the tests. We'd now have to write a new Zookeeper 3.5+ integration for Jepsen (again, in Clojure). While I started working on that and could likely finish it, I started to wonder whether it even makes sense to do so, and whether we couldn't invest this time elsewhere. Let me know what you think.
Re: [DISCUSS] Drop Jepsen tests
Network partitions are trickier than simply crashing process. For example these can be asymmetric -> as a TM you're still able to talk to the JM, but you're not able to talk to other TMs. In general this could be achieved by manipulating iptables on the host machine (considering we spawn all the processes locally), but not sure if that will solve the "make it less complicated for others to contribute" part :/ Also this kind of test would be executable on nix systems only. I assume that jepsen uses the same approach under the hood. D. On Wed, Feb 9, 2022 at 5:43 PM Chesnay Schepler wrote: > b/c are part of the same test. > > * We have a job running, > * trigger a network partition (failing the job), > * then crash HDFS (preventing checkpoints and access to the HA > storageDir), > * then the partition is resolved and HDFS is started again. > > Conceptually I would think we can replicate this by nuking half the > cluster, crashing HDFS/ZK, and restarting everything. > > On 09/02/2022 17:39, Chesnay Schepler wrote: > > The jepsen tests cover 3 cases: > > a) JM/TM crashes > > b) HDFS namenode crash (aka, can't checkpoint because HDFS is down) > > c) network partitions > > > > a) can (and probably is) reasonably covered by existing ITCases and > > e2e tests > > b) We could probably figure this out ourselves if we wanted to. > > c) is the difficult part. > > > > Note that the tests also only cover yarn (per-job/session) and > > standalone (session) deployments. > > > > On 09/02/2022 17:11, Konstantin Knauf wrote: > >> Thank you for raising this issue. What risks do you see if we drop > >> it? Do > >> you see any cheaper alternative to (partially) mitigate those risks? > >> > >> On Wed, Feb 9, 2022 at 12:40 PM Chesnay Schepler > >> wrote: > >> > >>> For a few years by now we had a set of Jepsen tests that verify the > >>> correctness of Flinks coordination layer in the case of process > >>> crashes. > >>> In the past it has indeed found issues and thus provided value to the > >>> project, and in general the core idea of it (and Jepsen for that > >>> matter) > >>> is very sound. > >>> > >>> However, so far we neither made attempts to make further use of Jepsen > >>> (and limited ourselves to very basic tests) nor to familiarize > >>> ourselves > >>> with the tests/jepsen at all. > >>> As a result these tests are difficult to maintain. They (and Jepsen) > >>> are > >>> written in Clojure, which makes debugging, changes and upstreaming > >>> contributions very difficult. > >>> Additionally, the tests also make use of a very complicated > >>> (Ververica-internal) terraform+ansible setup to spin up and tear down > >>> AWS machines. While it works (and is actually pretty cool), it's > >>> difficult to adjust because the people who wrote it have left the > >>> company. > >>> > >>> Why I'm raising this now (and not earlier) is because so far keeping > >>> the > >>> tests running wasn't much of a problem; bump a few dependencies here > >>> and > >>> there and we're good to go. > >>> > >>> However, this has changed with the recent upgrade to Zookeeper 3.5, > >>> which isn't supported by Jepsen out-of-the-box, completely breaking the > >>> tests. We'd now have to write a new Zookeeper 3.5+ integration for > >>> Jepsen (again, in Clojure). While I started working on that and could > >>> likely finish it, I started to wonder whether it even makes sense to do > >>> so, and whether we couldn't invest this time elsewhere. > >>> > >>> Let me know what you think. > >>> > >>> > > >
Re: Statefun async http request via RequestReplyFunctionBuilder
Thanks! PR created and noted on the ticket. On Wed, Feb 9, 2022 at 3:42 AM Till Rohrmann wrote: > Hi Galen, > > Great to hear it :-) I've assigned you to the ticket [1]. Next you can open > a PR against the repository and we will review it. > > [1] https://issues.apache.org/jira/browse/FLINK-25933 > > Cheers, > Till > > On Tue, Feb 8, 2022 at 11:58 PM Galen Warren > wrote: > > > I'm ready to pick this one up, I have some code that's working locally. > > Shall I create a PR? > > > > > > > > On Wed, Feb 2, 2022 at 3:17 PM Igal Shilman > > wrote: > > > > > Great, ping me when you would like to pick this up. > > > > > > For the related issue, I think that can be a good addition indeed! > > > > > > On Wed, Feb 2, 2022 at 8:55 PM Galen Warren > > > wrote: > > > > > > > Gotcha, thanks. I may be able to work on that one in a couple weeks > if > > > > you're looking for help. > > > > > > > > Unrelated question -- another thing that would be useful for me would > > be > > > > the ability to set a maximum backoff interval in > > > BoundedExponentialBackoff > > > > or the async equivalent. My situation is this. I'd like to set a long > > > > maxRequestDuration, so it takes a good while for Statefun to "give > up" > > > on a > > > > function call, i.e. perhaps several hours or even days. During that > > time, > > > > if the backoff interval doubles on each failure, those backoff > > intervals > > > > get pretty long. > > > > > > > > Sometimes, in exponential backoff implementations, I've seen the > > concept > > > of > > > > a max backoff interval, i.e. once the backoff interval reaches that > > > point, > > > > it won't go any higher. So I could set it to, say, 60 seconds, and no > > > > matter how long it would retry the function, the interval between > > retries > > > > wouldn't be more than that. > > > > > > > > Do you think that would be a useful addition? I could post something > to > > > the > > > > dev list if you want. > > > > > > > > On Wed, Feb 2, 2022 at 2:39 PM Igal Shilman wrote: > > > > > > > > > Hi Galen, > > > > > You are right, it is not possible, but there is no real reason for > > > that. > > > > > We should fix this, and I've created the following JIRA issue [1] > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-25933 > > > > > > > > > > On Wed, Feb 2, 2022 at 6:30 PM Galen Warren < > ga...@cvillewarrens.com > > > > > > > > wrote: > > > > > > > > > > > Is it possible to choose the async HTTP transport using > > > > > > RequestReplyFunctionBuilder? It looks to me that it is not, but I > > > > wanted > > > > > to > > > > > > double check. Thanks. > > > > > > > > > > > > > > > > > > > > >
[jira] [Created] (FLINK-26061) FileSystem.getFileStatus fails for directories
Matthias Pohl created FLINK-26061: - Summary: FileSystem.getFileStatus fails for directories Key: FLINK-26061 URL: https://issues.apache.org/jira/browse/FLINK-26061 Project: Flink Issue Type: Technical Debt Components: Connectors / FileSystem Affects Versions: 1.15.0 Reporter: Matthias Pohl {{FileSystem.getFileStatus}} is not supported for directories in object stores (like s3). We might want to adjust the interface here. It's not clear based on the contract that for certain {{FileSystem}} implementations. This is especially unintuitive because there is a {{isDir()}} method provided by the resulting {{FileStatus}} instance. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26062) [Changelog] Non-deterministic recovery of PriorityQueue states
Roman Khachatryan created FLINK-26062: - Summary: [Changelog] Non-deterministic recovery of PriorityQueue states Key: FLINK-26062 URL: https://issues.apache.org/jira/browse/FLINK-26062 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.15.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.15.0 Currently, InternalPriorityQueue.poll() is logged as a separate operation, without specifying the element that has been polled. On recovery, this recorded poll() is replayed. However, this is not deterministic because the order of PQ elements with equal priorityis not specified. For example, TimerHeapInternalTimer only compares timestamps, which are often equal. This results in polling timers from queue in wrong order => dropping timers => and not firing timers. ProcessingTimeWindowCheckpointingITCase.testAggregatingSlidingProcessingTimeWindow fails with materialization enabled and using heap state backend (both in-memory and fs-based implementations). Proposed solution is to replace poll with remove operation (which is based on equality). cc: [~masteryhx], [~ym], [~yunta] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26063) [Changelog] Incorrect key group logged for PQ.poll and remove
Roman Khachatryan created FLINK-26063: - Summary: [Changelog] Incorrect key group logged for PQ.poll and remove Key: FLINK-26063 URL: https://issues.apache.org/jira/browse/FLINK-26063 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.15.0 Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.15.0 Key group is logged so that state changes can be re-distributed or shuffled. It is currently obtained from keyContext during poll() and remove() operations. However, keyContext is not updated when dequeing processing time timers. The impact is relatively small for remove(): in the worst case, the operation will be ignored. poll() should probably be replaced with remove() anyways - see FLINK-26062. One way to solve this problem is to extract key group from the polled element - if it is a timer. cc: [~masteryhx], [~ym], [~yunta] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26064) KinesisFirehoseSinkITCase IllegalStateException: Trying to access closed classloader
Piotr Nowojski created FLINK-26064: -- Summary: KinesisFirehoseSinkITCase IllegalStateException: Trying to access closed classloader Key: FLINK-26064 URL: https://issues.apache.org/jira/browse/FLINK-26064 Project: Flink Issue Type: Bug Components: Connectors / Kinesis Affects Versions: 1.15.0 Reporter: Piotr Nowojski https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31044&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d (shortened stack trace, as full is too large) {noformat} Feb 09 20:05:04 java.util.concurrent.ExecutionException: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'. Feb 09 20:05:04 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) Feb 09 20:05:04 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) (...) Feb 09 20:05:04 Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'. Feb 09 20:05:04 at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) Feb 09 20:05:04 at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43) Feb 09 20:05:04 at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204) Feb 09 20:05:04 at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:200) Feb 09 20:05:04 at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:179) Feb 09 20:05:04 at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:159) (...) Feb 09 20:05:04 Caused by: java.lang.IllegalStateException: Trying to access closed classloader. Please check if you store classloaders directly or indirectly in static fields. If the stacktrace suggests that the leak occurs in a third party library and cannot be fixed immediately, you can disable this check with the configuration 'classloader.check-leaked-classloader'. Feb 09 20:05:04 at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164) Feb 09 20:05:04 at org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResources(FlinkUserCodeClassLoaders.java:188) Feb 09 20:05:04 at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:348) Feb 09 20:05:04 at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393) Feb 09 20:05:04 at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474) Feb 09 20:05:04 at javax.xml.stream.FactoryFinder$1.run(FactoryFinder.java:352) Feb 09 20:05:04 at java.security.AccessController.doPrivileged(Native Method) Feb 09 20:05:04 at javax.xml.stream.FactoryFinder.findServiceProvider(FactoryFinder.java:341) Feb 09 20:05:04 at javax.xml.stream.FactoryFinder.find(FactoryFinder.java:313) Feb 09 20:05:04 at javax.xml.stream.FactoryFinder.find(FactoryFinder.java:227) Feb 09 20:05:04 at javax.xml.stream.XMLInputFactory.newInstance(XMLInputFactory.java:154) Feb 09 20:05:04 at software.amazon.awssdk.protocols.query.unmarshall.XmlDomParser.createXmlInputFactory(XmlDomParser.java:124) Feb 09 20:05:04 at java.lang.ThreadLocal$SuppliedThreadLocal.initialValue(ThreadLocal.java:284) Feb 09 20:05:04 at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180) Feb 09 20:05:04 at java.lang.ThreadLocal.get(ThreadLocal.java:170) (...) {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [DISCUSS] Drop Jepsen tests
Are there e2e tests that run on kubernetes? Perhaps k8s network policies[1] would be an option to simulate asymmetric network partitions without modifying iptables in a more approachable way? Austin [1]: https://kubernetes.io/docs/concepts/services-networking/network-policies/ On Wed, Feb 9, 2022 at 12:40 PM David Morávek wrote: > Network partitions are trickier than simply crashing process. For example > these can be asymmetric -> as a TM you're still able to talk to the JM, but > you're not able to talk to other TMs. > > In general this could be achieved by manipulating iptables on the host > machine (considering we spawn all the processes locally), but not sure if > that will solve the "make it less complicated for others to contribute" > part :/ Also this kind of test would be executable on nix systems only. > > I assume that jepsen uses the same approach under the hood. > > D. > > On Wed, Feb 9, 2022 at 5:43 PM Chesnay Schepler > wrote: > > > b/c are part of the same test. > > > > * We have a job running, > > * trigger a network partition (failing the job), > > * then crash HDFS (preventing checkpoints and access to the HA > > storageDir), > > * then the partition is resolved and HDFS is started again. > > > > Conceptually I would think we can replicate this by nuking half the > > cluster, crashing HDFS/ZK, and restarting everything. > > > > On 09/02/2022 17:39, Chesnay Schepler wrote: > > > The jepsen tests cover 3 cases: > > > a) JM/TM crashes > > > b) HDFS namenode crash (aka, can't checkpoint because HDFS is down) > > > c) network partitions > > > > > > a) can (and probably is) reasonably covered by existing ITCases and > > > e2e tests > > > b) We could probably figure this out ourselves if we wanted to. > > > c) is the difficult part. > > > > > > Note that the tests also only cover yarn (per-job/session) and > > > standalone (session) deployments. > > > > > > On 09/02/2022 17:11, Konstantin Knauf wrote: > > >> Thank you for raising this issue. What risks do you see if we drop > > >> it? Do > > >> you see any cheaper alternative to (partially) mitigate those risks? > > >> > > >> On Wed, Feb 9, 2022 at 12:40 PM Chesnay Schepler > > >> wrote: > > >> > > >>> For a few years by now we had a set of Jepsen tests that verify the > > >>> correctness of Flinks coordination layer in the case of process > > >>> crashes. > > >>> In the past it has indeed found issues and thus provided value to the > > >>> project, and in general the core idea of it (and Jepsen for that > > >>> matter) > > >>> is very sound. > > >>> > > >>> However, so far we neither made attempts to make further use of > Jepsen > > >>> (and limited ourselves to very basic tests) nor to familiarize > > >>> ourselves > > >>> with the tests/jepsen at all. > > >>> As a result these tests are difficult to maintain. They (and Jepsen) > > >>> are > > >>> written in Clojure, which makes debugging, changes and upstreaming > > >>> contributions very difficult. > > >>> Additionally, the tests also make use of a very complicated > > >>> (Ververica-internal) terraform+ansible setup to spin up and tear down > > >>> AWS machines. While it works (and is actually pretty cool), it's > > >>> difficult to adjust because the people who wrote it have left the > > >>> company. > > >>> > > >>> Why I'm raising this now (and not earlier) is because so far keeping > > >>> the > > >>> tests running wasn't much of a problem; bump a few dependencies here > > >>> and > > >>> there and we're good to go. > > >>> > > >>> However, this has changed with the recent upgrade to Zookeeper 3.5, > > >>> which isn't supported by Jepsen out-of-the-box, completely breaking > the > > >>> tests. We'd now have to write a new Zookeeper 3.5+ integration for > > >>> Jepsen (again, in Clojure). While I started working on that and could > > >>> likely finish it, I started to wonder whether it even makes sense to > do > > >>> so, and whether we couldn't invest this time elsewhere. > > >>> > > >>> Let me know what you think. > > >>> > > >>> > > > > > >
Re: [DISCUSS] Drop Jepsen tests
@Austin We already have some e2e tests[1] that guards k8s deployment(both session and application, with or without HA). And I agree with you that network partition could be simulated by K8s network policy. [1]. https://github.com/apache/flink/blob/master/flink-end-to-end-tests/test-scripts/test_kubernetes_application_ha.sh Best, Yang Austin Cawley-Edwards 于2022年2月10日周四 05:12写道: > Are there e2e tests that run on kubernetes? Perhaps k8s network policies[1] > would be an option to simulate asymmetric network partitions without > modifying iptables in a more approachable way? > > Austin > > [1]: > https://kubernetes.io/docs/concepts/services-networking/network-policies/ > > > On Wed, Feb 9, 2022 at 12:40 PM David Morávek wrote: > > > Network partitions are trickier than simply crashing process. For example > > these can be asymmetric -> as a TM you're still able to talk to the JM, > but > > you're not able to talk to other TMs. > > > > In general this could be achieved by manipulating iptables on the host > > machine (considering we spawn all the processes locally), but not sure if > > that will solve the "make it less complicated for others to contribute" > > part :/ Also this kind of test would be executable on nix systems only. > > > > I assume that jepsen uses the same approach under the hood. > > > > D. > > > > On Wed, Feb 9, 2022 at 5:43 PM Chesnay Schepler > > wrote: > > > > > b/c are part of the same test. > > > > > > * We have a job running, > > > * trigger a network partition (failing the job), > > > * then crash HDFS (preventing checkpoints and access to the HA > > > storageDir), > > > * then the partition is resolved and HDFS is started again. > > > > > > Conceptually I would think we can replicate this by nuking half the > > > cluster, crashing HDFS/ZK, and restarting everything. > > > > > > On 09/02/2022 17:39, Chesnay Schepler wrote: > > > > The jepsen tests cover 3 cases: > > > > a) JM/TM crashes > > > > b) HDFS namenode crash (aka, can't checkpoint because HDFS is down) > > > > c) network partitions > > > > > > > > a) can (and probably is) reasonably covered by existing ITCases and > > > > e2e tests > > > > b) We could probably figure this out ourselves if we wanted to. > > > > c) is the difficult part. > > > > > > > > Note that the tests also only cover yarn (per-job/session) and > > > > standalone (session) deployments. > > > > > > > > On 09/02/2022 17:11, Konstantin Knauf wrote: > > > >> Thank you for raising this issue. What risks do you see if we drop > > > >> it? Do > > > >> you see any cheaper alternative to (partially) mitigate those risks? > > > >> > > > >> On Wed, Feb 9, 2022 at 12:40 PM Chesnay Schepler < > ches...@apache.org> > > > >> wrote: > > > >> > > > >>> For a few years by now we had a set of Jepsen tests that verify the > > > >>> correctness of Flinks coordination layer in the case of process > > > >>> crashes. > > > >>> In the past it has indeed found issues and thus provided value to > the > > > >>> project, and in general the core idea of it (and Jepsen for that > > > >>> matter) > > > >>> is very sound. > > > >>> > > > >>> However, so far we neither made attempts to make further use of > > Jepsen > > > >>> (and limited ourselves to very basic tests) nor to familiarize > > > >>> ourselves > > > >>> with the tests/jepsen at all. > > > >>> As a result these tests are difficult to maintain. They (and > Jepsen) > > > >>> are > > > >>> written in Clojure, which makes debugging, changes and upstreaming > > > >>> contributions very difficult. > > > >>> Additionally, the tests also make use of a very complicated > > > >>> (Ververica-internal) terraform+ansible setup to spin up and tear > down > > > >>> AWS machines. While it works (and is actually pretty cool), it's > > > >>> difficult to adjust because the people who wrote it have left the > > > >>> company. > > > >>> > > > >>> Why I'm raising this now (and not earlier) is because so far > keeping > > > >>> the > > > >>> tests running wasn't much of a problem; bump a few dependencies > here > > > >>> and > > > >>> there and we're good to go. > > > >>> > > > >>> However, this has changed with the recent upgrade to Zookeeper 3.5, > > > >>> which isn't supported by Jepsen out-of-the-box, completely breaking > > the > > > >>> tests. We'd now have to write a new Zookeeper 3.5+ integration for > > > >>> Jepsen (again, in Clojure). While I started working on that and > could > > > >>> likely finish it, I started to wonder whether it even makes sense > to > > do > > > >>> so, and whether we couldn't invest this time elsewhere. > > > >>> > > > >>> Let me know what you think. > > > >>> > > > >>> > > > > > > > > > >
[jira] [Created] (FLINK-26065) org.apache.flink.table.api.PlanReference$ContentPlanReference $FilePlanReference $ResourcePlanReference violation the api rules
Yun Gao created FLINK-26065: --- Summary: org.apache.flink.table.api.PlanReference$ContentPlanReference $FilePlanReference $ResourcePlanReference violation the api rules Key: FLINK-26065 URL: https://issues.apache.org/jira/browse/FLINK-26065 Project: Flink Issue Type: Bug Components: Table SQL / API Affects Versions: 1.15.0 Reporter: Yun Gao {code:java} Feb 09 21:10:32 [ERROR] Failures: Feb 09 21:10:32 [ERROR] Architecture Violation [Priority: MEDIUM] - Rule 'Classes in API packages should have at least one API visibility annotation.' was violated (3 times): Feb 09 21:10:32 org.apache.flink.table.api.PlanReference$ContentPlanReference does not satisfy: annotated with @Internal or annotated with @Experimental or annotated with @PublicEvolving or annotated with @Public or annotated with @Deprecated Feb 09 21:10:32 org.apache.flink.table.api.PlanReference$FilePlanReference does not satisfy: annotated with @Internal or annotated with @Experimental or annotated with @PublicEvolving or annotated with @Public or annotated with @Deprecated Feb 09 21:10:32 org.apache.flink.table.api.PlanReference$ResourcePlanReference does not satisfy: annotated with @Internal or annotated with @Experimental or annotated with @PublicEvolving or annotated with @Public or annotated with @Deprecated Feb 09 21:10:32 [INFO] Feb 09 21:10:32 [ERROR] Tests run: 7, Failures: 1, Errors: 0, Skipped: 0 {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=31051&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=26427 -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [VOTE] FLIP-212: Introduce Flink Kubernetes Operator
+1 (binding) On Mon, Feb 7, 2022 at 1:10 AM Yangze Guo wrote: > > +1 (binding) > > Best, > Yangze Guo > > On Mon, Feb 7, 2022 at 5:04 PM K Fred wrote: > > > > +1 (non-binding) > > > > Best Regards > > Peng Yuan > > > > On Mon, Feb 7, 2022 at 4:49 PM Danny Cranmer > > wrote: > > > > > +1 (binding) > > > > > > Thanks for this effort! > > > Danny Cranmer > > > > > > On Mon, Feb 7, 2022 at 7:59 AM Biao Geng wrote: > > > > > > > +1 (non-binding) > > > > > > > > Best, > > > > Biao Geng > > > > > > > > Peter Huang 于2022年2月7日周一 14:31写道: > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > Best Regards > > > > > Peter Huang > > > > > > > > > > On Sun, Feb 6, 2022 at 7:35 PM Yang Wang > > > wrote: > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Xintong Song 于2022年2月7日周一 10:25写道: > > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 7, 2022 at 12:52 AM Márton Balassi < > > > > > balassi.mar...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 5:35 PM Israel Ekpo > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > I am very excited to see this. > > > > > > > > > > > > > > > > > > Thanks for driving the effort > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 10:53 AM Shqiprim Bunjaku < > > > > > > > > > shqiprimbunj...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 4:39 PM Chenya Zhang < > > > > > > > > chenyazhangche...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > > > Thanks folks for leading this effort and making it happen > > > so > > > > > > fast! > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Chenya > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 12:02 AM Gyula Fóra < > > > > gyf...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Thomas! > > > > > > > > > > > > > > > > > > > > > > > > +1 (binding) from my side > > > > > > > > > > > > > > > > > > > > > > > > Happy to see this effort getting some traction! > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 3:00 AM Thomas Weise < > > > > t...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to start a vote on FLIP-212: Introduce Flink > > > > > > > Kubernetes > > > > > > > > > > > > > Operator [1] which has been discussed in [2]. > > > > > > > > > > > > > > > > > > > > > > > > > > The vote will be open for at least 72 hours unless > > > there > > > > is > > > > > > an > > > > > > > > > > > > > objection or not enough votes. > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > > > > > > > > > > [2] > > > > > > > > > > > > https://lists.apache.org/thread/1z78t6rf70h45v7fbd2m93rm2y1bvh0z > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Israel Ekpo > > > > > > > > > Lead Instructor, IzzyAcademy.com > > > > > > > > > https://www.youtube.com/c/izzyacademy > > > > > > > > > https://izzyacademy.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
[RESULT] [VOTE] FLIP-212: Introduce Flink Kubernetes Operator
Hi everyone, FLIP-212 [1] has been accepted. There were 7 binding +1 votes and 6 non-binding +1 votes. No other votes. binding: Gyula Fóra Márton Balassi Xintong Song Yang Wang Danny Cranmer Yangze Guo Thomas Weise non-binding: Chenya Zhang Shqiprim Bunjaku Israel Ekpo Peter Huang Biao Geng K Fred Thank you all for voting, we are excited to take this forward. The next step will be the creation of the repository. Cheers, Thomas [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator On Mon, Feb 7, 2022 at 1:10 AM Yangze Guo wrote: > > +1 (binding) > > Best, > Yangze Guo > > On Mon, Feb 7, 2022 at 5:04 PM K Fred wrote: > > > > +1 (non-binding) > > > > Best Regards > > Peng Yuan > > > > On Mon, Feb 7, 2022 at 4:49 PM Danny Cranmer > > wrote: > > > > > +1 (binding) > > > > > > Thanks for this effort! > > > Danny Cranmer > > > > > > On Mon, Feb 7, 2022 at 7:59 AM Biao Geng wrote: > > > > > > > +1 (non-binding) > > > > > > > > Best, > > > > Biao Geng > > > > > > > > Peter Huang 于2022年2月7日周一 14:31写道: > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > Best Regards > > > > > Peter Huang > > > > > > > > > > On Sun, Feb 6, 2022 at 7:35 PM Yang Wang > > > wrote: > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Xintong Song 于2022年2月7日周一 10:25写道: > > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 7, 2022 at 12:52 AM Márton Balassi < > > > > > balassi.mar...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 5:35 PM Israel Ekpo > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > I am very excited to see this. > > > > > > > > > > > > > > > > > > Thanks for driving the effort > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 10:53 AM Shqiprim Bunjaku < > > > > > > > > > shqiprimbunj...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 4:39 PM Chenya Zhang < > > > > > > > > chenyazhangche...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > +1 (non-binding) > > > > > > > > > > > > > > > > > > > > > > Thanks folks for leading this effort and making it happen > > > so > > > > > > fast! > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Chenya > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 12:02 AM Gyula Fóra < > > > > gyf...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Thomas! > > > > > > > > > > > > > > > > > > > > > > > > +1 (binding) from my side > > > > > > > > > > > > > > > > > > > > > > > > Happy to see this effort getting some traction! > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Gyula > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Feb 5, 2022 at 3:00 AM Thomas Weise < > > > > t...@apache.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > I'd like to start a vote on FLIP-212: Introduce Flink > > > > > > > Kubernetes > > > > > > > > > > > > > Operator [1] which has been discussed in [2]. > > > > > > > > > > > > > > > > > > > > > > > > > > The vote will be open for at least 72 hours unless > > > there > > > > is > > > > > > an > > > > > > > > > > > > > objection or not enough votes. > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator > > > > > > > > > > > > > [2] > > > > > > > > > > > > https://lists.apache.org/thread/1z78t6rf70h45v7fbd2m93rm2y1bvh0z > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Israel Ekpo > > > > > > > > > Lead Instructor, IzzyAcademy.com > > > > > > > > > https://www.youtube.com/c/izzyacademy > > > > > > > > > https://izzyacademy.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
[jira] [Created] (FLINK-26066) Introduce FileStoreRead
Caizhi Weng created FLINK-26066: --- Summary: Introduce FileStoreRead Key: FLINK-26066 URL: https://issues.apache.org/jira/browse/FLINK-26066 Project: Flink Issue Type: Sub-task Components: Table Store Reporter: Caizhi Weng Fix For: 1.15.0 Apart from {{FileStoreWrite}}, we also need a {{FileStoreRead}} operation to read actual key-values for a specific partition and bucket. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-26067) ZooKeeperLeaderElectionConnectionHandlingTest. testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled failed due to timeout
Yun Gao created FLINK-26067: --- Summary: ZooKeeperLeaderElectionConnectionHandlingTest. testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled failed due to timeout Key: FLINK-26067 URL: https://issues.apache.org/jira/browse/FLINK-26067 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Yun Gao {code:java} Feb 09 08:58:56 [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 18.67 s <<< FAILURE! - in org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionConnectionHandlingTest Feb 09 08:58:56 [ERROR] org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionConnectionHandlingTest.testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled Time elapsed: 8.096 s <<< ERROR! Feb 09 08:58:56 java.util.concurrent.TimeoutException Feb 09 08:58:56 at org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:106) Feb 09 08:58:56 at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionConnectionHandlingTest$TestingContender.awaitRevokeLeadership(ZooKeeperLeaderElectionConnectionHandlingTest.java:211) Feb 09 08:58:56 at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionConnectionHandlingTest.lambda$testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled$2(ZooKeeperLeaderElectionConnectionHandlingTest.java:100) Feb 09 08:58:56 at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionConnectionHandlingTest.runTestWithZooKeeperConnectionProblem(ZooKeeperLeaderElectionConnectionHandlingTest.java:164) Feb 09 08:58:56 at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionConnectionHandlingTest.runTestWithLostZooKeeperConnection(ZooKeeperLeaderElectionConnectionHandlingTest.java:109) Feb 09 08:58:56 at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionConnectionHandlingTest.testLoseLeadershipOnLostConnectionIfTolerateSuspendedConnectionsIsEnabled(ZooKeeperLeaderElectionConnectionHandlingTest.java:96) Feb 09 08:58:56 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 09 08:58:56 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 09 08:58:56 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 09 08:58:56 at java.lang.reflect.Method.invoke(Method.java:498) Feb 09 08:58:56 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Feb 09 08:58:56 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Feb 09 08:58:56 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Feb 09 08:58:56 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Feb 09 08:58:56 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) Feb 09 08:58:56 at org.apache.flink.runtime.util.TestingFatalErrorHandlerResource$CloseableStatement.evaluate(TestingFatalErrorHandlerResource.java:94) Feb 09 08:58:56 at org.apache.flink.runtime.util.TestingFatalErrorHandlerResource$CloseableStatement.access$200(TestingFatalErrorHandlerResource.java:86) Feb 09 08:58:56 at org.apache.flink.runtime.util.TestingFatalErrorHandlerResource$1.evaluate(TestingFatalErrorHandlerResource.java:58) Feb 09 08:58:56 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) Feb 09 08:58:56 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Feb 09 08:58:56 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) Feb 09 08:58:56 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Feb 09 08:58:56 at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) Feb 09 08:58:56 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) Feb 09 08:58:56 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) Feb 09 08:58:56 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) Feb 09 08:58:56 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) Feb 09 08:58:56 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) Feb 09 08:58:56 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) Feb 09 08:58:56 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) Feb 09 08:58:56 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=30979&view=logs&j=a57e0635-3fad-5b08-57c7-a4
[jira] [Created] (FLINK-26068) ZooKeeperJobGraphsStoreITCase.testPutAndRemoveJobGraph failed on azure
Yun Gao created FLINK-26068: --- Summary: ZooKeeperJobGraphsStoreITCase.testPutAndRemoveJobGraph failed on azure Key: FLINK-26068 URL: https://issues.apache.org/jira/browse/FLINK-26068 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.15.0 Reporter: Yun Gao {code:java} Feb 09 13:41:24 org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /flink/default/testPutAndRemoveJobGraph/372cd3c2dc2c8b3071d3f8fec2285fb9 Feb 09 13:41:24 at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:122) Feb 09 13:41:24 at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException.create(KeeperException.java:54) Feb 09 13:41:24 at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2384) Feb 09 13:41:24 at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) Feb 09 13:41:24 at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) Feb 09 13:41:24 at org.apache.flink.shaded.curator5.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) Feb 09 13:41:24 at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) Feb 09 13:41:24 at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) Feb 09 13:41:24 at org.apache.flink.shaded.curator5.org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) Feb 09 13:41:24 at org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.setStateHandle(ZooKeeperStateHandleStore.java:268) Feb 09 13:41:24 at org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.replace(ZooKeeperStateHandleStore.java:232) Feb 09 13:41:24 at org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.replace(ZooKeeperStateHandleStore.java:86) Feb 09 13:41:24 at org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.putJobGraph(DefaultJobGraphStore.java:226) Feb 09 13:41:24 at org.apache.flink.runtime.jobmanager.ZooKeeperJobGraphsStoreITCase.testPutAndRemoveJobGraph(ZooKeeperJobGraphsStoreITCase.java:123) Feb 09 13:41:24 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Feb 09 13:41:24 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Feb 09 13:41:24 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Feb 09 13:41:24 at java.lang.reflect.Method.invoke(Method.java:498) Feb 09 13:41:24 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Feb 09 13:41:24 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Feb 09 13:41:24 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Feb 09 13:41:24 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Feb 09 13:41:24 at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) Feb 09 13:41:24 at org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45) Feb 09 13:41:24 at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) Feb 09 13:41:24 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Feb 09 13:41:24 at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) Feb 09 13:41:24 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) Feb 09 13:41:24 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) Feb 09 13:41:24 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) Feb 09 13:41:24 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) Feb 09 13:41:24 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) Feb 09 13:41:24 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) Feb 09 13:41:24 at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) Feb 09 13:41:24 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) Feb 09 13:41:24 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) Feb 09 13:41:24 at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) Feb 09 13:41:24 at org.junit.runners.ParentRunner.run(ParentRunner.java
[jira] [Created] (FLINK-26069) KinesisFirehoseSinkITCase failed due to org.testcontainers.containers.ContainerLaunchException: Container startup failed
Yun Gao created FLINK-26069: --- Summary: KinesisFirehoseSinkITCase failed due to org.testcontainers.containers.ContainerLaunchException: Container startup failed Key: FLINK-26069 URL: https://issues.apache.org/jira/browse/FLINK-26069 Project: Flink Issue Type: Bug Components: Connectors / Kinesis Affects Versions: 1.15.0 Reporter: Yun Gao {code:java} 2022-02-09T20:52:36.6208358Z Feb 09 20:52:36 [ERROR] Picked up JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError 2022-02-09T20:52:37.8270432Z Feb 09 20:52:37 [INFO] Running org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase 2022-02-09T20:54:08.9842331Z Feb 09 20:54:08 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 91.02 s <<< FAILURE! - in org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase 2022-02-09T20:54:08.9845140Z Feb 09 20:54:08 [ERROR] org.apache.flink.connector.firehose.sink.KinesisFirehoseSinkITCase Time elapsed: 91.02 s <<< ERROR! 2022-02-09T20:54:08.9847119Z Feb 09 20:54:08 org.testcontainers.containers.ContainerLaunchException: Container startup failed 2022-02-09T20:54:08.9848834Z Feb 09 20:54:08at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:336) 2022-02-09T20:54:08.9850502Z Feb 09 20:54:08at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:317) 2022-02-09T20:54:08.9852012Z Feb 09 20:54:08at org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1066) 2022-02-09T20:54:08.9853695Z Feb 09 20:54:08at org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:29) 2022-02-09T20:54:08.9855316Z Feb 09 20:54:08at org.junit.rules.RunRules.evaluate(RunRules.java:20) 2022-02-09T20:54:08.9856955Z Feb 09 20:54:08at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) 2022-02-09T20:54:08.9858330Z Feb 09 20:54:08at org.junit.runners.ParentRunner.run(ParentRunner.java:413) 2022-02-09T20:54:08.9859838Z Feb 09 20:54:08at org.junit.runner.JUnitCore.run(JUnitCore.java:137) 2022-02-09T20:54:08.9861123Z Feb 09 20:54:08at org.junit.runner.JUnitCore.run(JUnitCore.java:115) 2022-02-09T20:54:08.9862747Z Feb 09 20:54:08at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) 2022-02-09T20:54:08.9864691Z Feb 09 20:54:08at org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) 2022-02-09T20:54:08.9866384Z Feb 09 20:54:08at org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72) 2022-02-09T20:54:08.9868138Z Feb 09 20:54:08at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107) 2022-02-09T20:54:08.9869980Z Feb 09 20:54:08at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88) 2022-02-09T20:54:08.9871255Z Feb 09 20:54:08at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54) 2022-02-09T20:54:08.9872602Z Feb 09 20:54:08at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67) 2022-02-09T20:54:08.9874126Z Feb 09 20:54:08at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52) 2022-02-09T20:54:08.9875899Z Feb 09 20:54:08at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114) 2022-02-09T20:54:08.9877109Z Feb 09 20:54:08at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:86) 2022-02-09T20:54:08.9878367Z Feb 09 20:54:08at org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:86) 2022-02-09T20:54:08.9879761Z Feb 09 20:54:08at org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:53) 2022-02-09T20:54:08.9881148Z Feb 09 20:54:08at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.execute(JUnitPlatformProvider.java:188) 2022-02-09T20:54:08.9882768Z Feb 09 20:54:08at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154) 2022-02-09T20:54:08.9884214Z Feb 09 20:54:08at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:124) 2022-02-09T20:54:08.9885475Z Feb 09 20:54:08at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428) 2022-02-09T20:54:08.9886856Z Feb 09 20:54:08at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162) 2022-02-09T20:54:08.9888037Z Feb 09 20:54:08at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562
Re: Azure Pipelines are dealing with an incident, causing pipeline runs to fail
It looks like the issues haven't been resolved yet unfortunately. On Wed, 9 Feb 2022 at 13:15, Robert Metzger wrote: > I filed a support request with Microsoft: > > https://developercommunity.visualstudio.com/t/Number-of-Microsoft-hosted-agents-droppe/1658827?from=email&space=21&sort=newest > > On Wed, Feb 9, 2022 at 1:04 PM Martijn Visser > wrote: > > > Unfortunately it looks like there are still failures. Will keep you > posted > > > > On Wed, 9 Feb 2022 at 11:51, Martijn Visser > wrote: > > > > > Hi everyone, > > > > > > The issue should now be resolved. > > > > > > Best regards, > > > > > > Martijn > > > > > > On Wed, 9 Feb 2022 at 10:55, Martijn Visser > > wrote: > > > > > >> Hi everyone, > > >> > > >> Please keep in mind that Azure Pipelines currently is dealing with an > > >> incident [1] which causes all CI pipeline runs on Azure to fail. When > > the > > >> incident has been resolved, it will be required to retrigger your > > pipeline > > >> to see if the pipeline then passes. > > >> > > >> Best regards, > > >> > > >> Martijn Visser > > >> https://twitter.com/MartijnVisser82 > > >> > > >> [1] https://status.dev.azure.com/_event/287959626 > > >> > > > > > >