[ https://issues.apache.org/jira/browse/FLINK-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary Yao updated FLINK-8005: ---------------------------- Description: *Problem Description* Classes in the user code jar cannot be loaded by the snapshot thread’s context class loader ({{AppClassLoader}}). For example, when creating instances of {{KafkaProducer}}, Strings are resolved to class objects by Kafka. Find below an extract from {{ConfigDef.java}}: {code} case CLASS: if (value instanceof Class) return value; else if (value instanceof String) return Class.forName(trimmed, true, Utils.getContextOrKafkaClassLoader()); else throw new ConfigException(name, value, "Expected a Class instance or class name."); {code} *Exception/Stacktrace* {noformat} Caused by: java.lang.Exception: Could not complete snapshot 1 for operator Source: Collection Source -> Sink: kafka-sink-1510048188383 (1/1). at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:379) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1077) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1026) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:659) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:595) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:526) ... 7 more Caused by: org.apache.kafka.common.config.ConfigException: Invalid value org.apache.kafka.common.serialization.ByteArraySerializer for configuration key.serializer: Class org.apache.kafka.common.serialization.ByteArraySerializer could not be found. at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:715) at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:460) at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:453) at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:62) at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:75) at org.apache.kafka.clients.producer.ProducerConfig.<init>(ProducerConfig.java:360) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:288) at org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaProducer.<init>(FlinkKafkaProducer.java:114) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initProducer(FlinkKafkaProducer011.java:913) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initTransactionalProducer(FlinkKafkaProducer011.java:904) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.createOrGetProducerFromPool(FlinkKafkaProducer011.java:637) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.beginTransaction(FlinkKafkaProducer011.java:613) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.beginTransaction(FlinkKafkaProducer011.java:94) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.beginTransactionInternal(TwoPhaseCommitSinkFunction.java:359) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.snapshotState(TwoPhaseCommitSinkFunction.java:294) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.snapshotState(FlinkKafkaProducer011.java:756) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:357) ... 12 more {noformat} *How to reproduce* Note that the problem only appears when a job is deployed on a cluster. # Build Flink 1.4 # Build test job https://github.com/GJL/flink-kafka011-producer-test with {{mvn -o clean install -Pbuild-jar}} # Start job: {noformat} bin/flink run -c com.garyyao.StreamingJob /pathto/flink-kafka011-producer/target/flink-kafka011-producer-1.0-SNAPSHOT.jar {noformat} was: *Problem Description* Classes in the user code jar cannot be loaded by the snapshot thread’s context class loader. For example, when creating instances of {{KafkaProducer}}, Strings are resolved to class objects by Kafka. Find below an extract from {{ConfigDef.java}}: {code} case CLASS: if (value instanceof Class) return value; else if (value instanceof String) return Class.forName(trimmed, true, Utils.getContextOrKafkaClassLoader()); else throw new ConfigException(name, value, "Expected a Class instance or class name."); {code} *Exception/Stacktrace* {noformat} Caused by: java.lang.Exception: Could not complete snapshot 1 for operator Source: Collection Source -> Sink: kafka-sink-1510048188383 (1/1). at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:379) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1077) at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1026) at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:659) at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:595) at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:526) ... 7 more Caused by: org.apache.kafka.common.config.ConfigException: Invalid value org.apache.kafka.common.serialization.ByteArraySerializer for configuration key.serializer: Class org.apache.kafka.common.serialization.ByteArraySerializer could not be found. at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:715) at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:460) at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:453) at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:62) at org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:75) at org.apache.kafka.clients.producer.ProducerConfig.<init>(ProducerConfig.java:360) at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:288) at org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaProducer.<init>(FlinkKafkaProducer.java:114) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initProducer(FlinkKafkaProducer011.java:913) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initTransactionalProducer(FlinkKafkaProducer011.java:904) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.createOrGetProducerFromPool(FlinkKafkaProducer011.java:637) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.beginTransaction(FlinkKafkaProducer011.java:613) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.beginTransaction(FlinkKafkaProducer011.java:94) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.beginTransactionInternal(TwoPhaseCommitSinkFunction.java:359) at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.snapshotState(TwoPhaseCommitSinkFunction.java:294) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.snapshotState(FlinkKafkaProducer011.java:756) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:357) ... 12 more {noformat} *How to reproduce* Note that the problem only appears when a job is deployed on a cluster. # Build Flink 1.4 # Build test job https://github.com/GJL/flink-kafka011-producer-test with {{mvn -o clean install -Pbuild-jar}} # Start job: {noformat} bin/flink run -c com.garyyao.StreamingJob /pathto/flink-kafka011-producer/target/flink-kafka011-producer-1.0-SNAPSHOT.jar {noformat} > Snapshotting FlinkKafkaProducer011 fails due to ClassLoader issues > ------------------------------------------------------------------ > > Key: FLINK-8005 > URL: https://issues.apache.org/jira/browse/FLINK-8005 > Project: Flink > Issue Type: Bug > Components: Core, Kafka Connector, State Backends, Checkpointing > Affects Versions: 1.4.0 > Reporter: Gary Yao > Priority: Blocker > Fix For: 1.4.0 > > > *Problem Description* > Classes in the user code jar cannot be loaded by the snapshot thread’s > context class loader ({{AppClassLoader}}). > For example, when creating instances of {{KafkaProducer}}, Strings are > resolved to class objects by Kafka. > Find below an extract from {{ConfigDef.java}}: > {code} > case CLASS: > if (value instanceof Class) > return value; > else if (value instanceof String) > return Class.forName(trimmed, true, > Utils.getContextOrKafkaClassLoader()); > else > throw new ConfigException(name, value, "Expected a Class instance or > class name."); > {code} > *Exception/Stacktrace* > {noformat} > Caused by: java.lang.Exception: Could not complete snapshot 1 for operator > Source: Collection Source -> Sink: kafka-sink-1510048188383 (1/1). > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:379) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1077) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1026) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:659) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:595) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:526) > ... 7 more > Caused by: org.apache.kafka.common.config.ConfigException: Invalid value > org.apache.kafka.common.serialization.ByteArraySerializer for configuration > key.serializer: Class > org.apache.kafka.common.serialization.ByteArraySerializer could not be found. > at > org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:715) > at > org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:460) > at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:453) > at > org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:62) > at > org.apache.kafka.common.config.AbstractConfig.<init>(AbstractConfig.java:75) > at > org.apache.kafka.clients.producer.ProducerConfig.<init>(ProducerConfig.java:360) > at > org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:288) > at > org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaProducer.<init>(FlinkKafkaProducer.java:114) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initProducer(FlinkKafkaProducer011.java:913) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.initTransactionalProducer(FlinkKafkaProducer011.java:904) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.createOrGetProducerFromPool(FlinkKafkaProducer011.java:637) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.beginTransaction(FlinkKafkaProducer011.java:613) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.beginTransaction(FlinkKafkaProducer011.java:94) > at > org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.beginTransactionInternal(TwoPhaseCommitSinkFunction.java:359) > at > org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.snapshotState(TwoPhaseCommitSinkFunction.java:294) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011.snapshotState(FlinkKafkaProducer011.java:756) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:357) > ... 12 more > {noformat} > *How to reproduce* > Note that the problem only appears when a job is deployed on a cluster. > # Build Flink 1.4 > # Build test job https://github.com/GJL/flink-kafka011-producer-test with > {{mvn -o clean install -Pbuild-jar}} > # Start job: > {noformat} > bin/flink run -c com.garyyao.StreamingJob > /pathto/flink-kafka011-producer/target/flink-kafka011-producer-1.0-SNAPSHOT.jar > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)