Siyao Meng created HADOOP-18699: ----------------------------------- Summary: InvalidProtocolBufferException caused by JDK 11 < 11.0.18 AES-CTR cipher state corruption with AVX-512 bug Key: HADOOP-18699 URL: https://issues.apache.org/jira/browse/HADOOP-18699 Project: Hadoop Common Issue Type: Bug Components: hdfs-client Reporter: Siyao Meng
This serves as a PSA for a JDK bug. Not really a bug in Hadoop / HDFS itself. [~relek] identified [JDK-8292158|https://bugs.openjdk.org/browse/JDK-8292158] (backported to JDK 11 in [JDK-8295297|https://bugs.openjdk.org/browse/JDK-8295297]) causes HDFS clients to fail with InvalidProtocolBufferException due to corrupted protobuf message in Hadoop RPC request when all of the below conditions are met: 1. The host is capable of AVX-512 instruction sets 2. AVX-512 is enabled in JVM. This should be enabled by default on AVX-512 capable hosts, equivalent to specifying JVM arg {{-XX:UseAVX=3}} As a result, the client could print messages like these: {code:title=Symptoms on the HDFS client} 2023-02-21 15:21:44,380 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to <HOST/IP:PORT> for file /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_21_25.b6788e89894a61b5 for block BP-1836197545-10.125.248.11-1672668423261:blk_1073935111_194857:com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type. com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type. 2023-02-21 15:21:44,378 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to <HOST/IP:PORT> for file /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_21_25.b6788e89894a61b5 for block BP-1836197545-<IP>-1672668423261:blk_1073935111_194857:com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag. 2023-02-21 15:06:55,530 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to <HOST/IP:PORT> for file /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_06_55.b4a633a8bde014aa for block BP-1836197545-<IP>-1672668423261:blk_1073935025_194771:com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length. {code} Solutions: 1. As a workaround, append {{-XX:UseAVX=2}} to client JVM args; or 2. Upgrade to OpenJDK >= 11.0.18. I might post a repro test case for this, or find a way in the code to prompt the user that this could be the potential issue (need to upgrade JDK 11) when it occurs. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org