Siyao Meng created HADOOP-18699:
-----------------------------------

             Summary: InvalidProtocolBufferException caused by JDK 11 < 11.0.18 
AES-CTR cipher state corruption with AVX-512 bug
                 Key: HADOOP-18699
                 URL: https://issues.apache.org/jira/browse/HADOOP-18699
             Project: Hadoop Common
          Issue Type: Bug
          Components: hdfs-client
            Reporter: Siyao Meng


This serves as a PSA for a JDK bug. Not really a bug in Hadoop / HDFS itself.

[~relek] identified [JDK-8292158|https://bugs.openjdk.org/browse/JDK-8292158] 
(backported to JDK 11 in 
[JDK-8295297|https://bugs.openjdk.org/browse/JDK-8295297]) causes HDFS clients 
to fail with InvalidProtocolBufferException due to corrupted protobuf message 
in Hadoop RPC request when all of the below conditions are met:

1. The host is capable of AVX-512 instruction sets
2. AVX-512 is enabled in JVM. This should be enabled by default on AVX-512 
capable hosts, equivalent to specifying JVM arg {{-XX:UseAVX=3}}

As a result, the client could print messages like these:

{code:title=Symptoms on the HDFS client}
2023-02-21 15:21:44,380 WARN org.apache.hadoop.hdfs.DFSClient: Connection 
failure: Failed to connect to <HOST/IP:PORT> for file 
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_21_25.b6788e89894a61b5
 for block 
BP-1836197545-10.125.248.11-1672668423261:blk_1073935111_194857:com.google.protobuf.InvalidProtocolBufferException:
 Protocol message tag had invalid wire type.
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
invalid wire type.

2023-02-21 15:21:44,378 WARN org.apache.hadoop.hdfs.DFSClient: Connection 
failure: Failed to connect to <HOST/IP:PORT> for file 
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_21_25.b6788e89894a61b5
 for block 
BP-1836197545-<IP>-1672668423261:blk_1073935111_194857:com.google.protobuf.InvalidProtocolBufferException:
 Protocol message end-group tag did not match expected tag.
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group 
tag did not match expected tag.

2023-02-21 15:06:55,530 WARN org.apache.hadoop.hdfs.DFSClient: Connection 
failure: Failed to connect to <HOST/IP:PORT> for file 
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2023_02_21-15_06_55.b4a633a8bde014aa
 for block 
BP-1836197545-<IP>-1672668423261:blk_1073935025_194771:com.google.protobuf.InvalidProtocolBufferException:
 While parsing a protocol message, the input ended unexpectedly in the middle 
of a field. This could mean either than the input has been truncated or that an 
embedded message misreported its own length.
com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol 
message, the input ended unexpectedly in the middle of a field. This could mean 
either than the input has been truncated or that an embedded message 
misreported its own length.
{code}

Solutions:
1. As a workaround, append {{-XX:UseAVX=2}} to client JVM args; or
2. Upgrade to OpenJDK >= 11.0.18.


I might post a repro test case for this, or find a way in the code to prompt 
the user that this could be the potential issue (need to upgrade JDK 11) when 
it occurs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to