Siyao Meng created HADOOP-18699:
-----------------------------------
Summary: InvalidProtocolBufferException caused by JDK 11 < 11.0.18
AES-CTR cipher state corruption with AVX-512 bug
Key: HADOOP-18699
URL: https://issues.apache.org/jira/browse/HADOOP-18699
Project: Hadoop Common
Issue Type: Bug
Components: hdfs-client
Reporter: Siyao Meng
This serves as a PSA for a JDK bug. Not really a bug in Hadoop / HDFS itself.
[~relek] identified [JDK-8292158|https://bugs.openjdk.org/browse/JDK-8292158]
(backported to JDK 11 in
[JDK-8295297|https://bugs.openjdk.org/browse/JDK-8295297]) causes HDFS clients
to fail with InvalidProtocolBufferException due to corrupted protobuf message
in Hadoop RPC request when all of the below conditions are met:
1. The host is capable of AVX-512 instruction sets
2. AVX-512 is enabled in JVM. This should be enabled by default on AVX-512
capable hosts, equivalent to specifying JVM arg {{-XX:UseAVX=3}}
As a result, the client could print messages like these:
{code:title=Symptoms on the HDFS client}
2023-02-21 15:21:44,380 WARN org.apache.hadoop.hdfs.DFSClient: Connection
failure: Failed to connect to <HOST/IP:PORT> for file
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_21_25.b6788e89894a61b5
for block
BP-1836197545-10.125.248.11-1672668423261:blk_1073935111_194857:com.google.protobuf.InvalidProtocolBufferException:
Protocol message tag had invalid wire type.
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had
invalid wire type.
2023-02-21 15:21:44,378 WARN org.apache.hadoop.hdfs.DFSClient: Connection
failure: Failed to connect to <HOST/IP:PORT> for file
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_21_25.b6788e89894a61b5
for block
BP-1836197545-<IP>-1672668423261:blk_1073935111_194857:com.google.protobuf.InvalidProtocolBufferException:
Protocol message end-group tag did not match expected tag.
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group
tag did not match expected tag.
2023-02-21 15:06:55,530 WARN org.apache.hadoop.hdfs.DFSClient: Connection
failure: Failed to connect to <HOST/IP:PORT> for file
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_06_55.b4a633a8bde014aa
for block
BP-1836197545-<IP>-1672668423261:blk_1073935025_194771:com.google.protobuf.InvalidProtocolBufferException:
While parsing a protocol message, the input ended unexpectedly in the middle
of a field. This could mean either than the input has been truncated or that an
embedded message misreported its own length.
com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol
message, the input ended unexpectedly in the middle of a field. This could mean
either than the input has been truncated or that an embedded message
misreported its own length.
{code}
Solutions:
1. As a workaround, append {{-XX:UseAVX=2}} to client JVM args; or
2. Upgrade to OpenJDK >= 11.0.18.
I might post a repro test case for this, or find a way in the code to prompt
the user that this could be the potential issue (need to upgrade JDK 11) when
it occurs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]