Daebeom Lee created FLINK-14012:
-----------------------------------
Summary: Failed to start job for consuming Secure Kafka after the
job cancel
Key: FLINK-14012
URL: https://issues.apache.org/jira/browse/FLINK-14012
Project: Flink
Issue Type: Bug
Components: Connectors / Kafka
Affects Versions: 1.9.0
Environment: * Kubernetes 1.13.2
* Flink 1.9.0
* Kafka client libary 2.2.0
Reporter: Daebeom Lee
Hello, this is Daebeom Lee.
h2. Background
I installed Flink 1.9.0 at this our Kubernetes cluster.
We use Flink session cluster. - build fatJar file and upload it at the UI, run
serval jobs.
At first, our jobs are good to start.
But, when we cancel some jobs, the job failed
This is the error code.
{code:java}
// code placeholder
java.lang.NoClassDefFoundError:
org/apache/kafka/common/security/scram/internals/ScramSaslClient
at
org.apache.kafka.common.security.scram.internals.ScramSaslClient$ScramSaslClientFactory.createSaslClient(ScramSaslClient.java:235)
at javax.security.sasl.Sasl.createSaslClient(Sasl.java:384)
at
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.lambda$createSaslClient$0(SaslClientAuthenticator.java:180)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslClient(SaslClientAuthenticator.java:176)
at
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.<init>(SaslClientAuthenticator.java:168)
at
org.apache.kafka.common.network.SaslChannelBuilder.buildClientAuthenticator(SaslChannelBuilder.java:254)
at
org.apache.kafka.common.network.SaslChannelBuilder.lambda$buildChannel$1(SaslChannelBuilder.java:202)
at
org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:140)
at
org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:210)
at
org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:334)
at
org.apache.kafka.common.network.Selector.registerChannel(Selector.java:325)
at org.apache.kafka.common.network.Selector.connect(Selector.java:257)
at
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:920)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:474)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
at
org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292)
at
org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1803)
at
org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1771)
at
org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:77)
at
org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:508)
at
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:529)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:393)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
{code}
h2. Our workaround
* I think that this is Flink JVM classloader issue.
* Classloader unloads when job cancels by the way kafka client library is
included fatJar.
* So, I located Kafka client library to /opt/flink/lib
** /opt/flink/lib/kafka-clients-2.2.0.jar
* And then all issue solved.
* But there are weird points
** When Flink 1.8.1 has no problem before 2 weeks
** Before 1 week I rollback from 1.9.0 to 1.8.1, same errors occurred.
** Maybe docker image is changed at docker repository (
[https://github.com/docker-flink/docker-flink )
|https://github.com/docker-flink/docker-flink]
h2. Suggestion
* I'd like to know why this error occurred exactly reason after upgrade 1.9.0.
* Does anybody know a better solution in this case?
Thank you in advance.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)