steveloughran commented on code in PR #6069:
URL: https://github.com/apache/hadoop/pull/6069#discussion_r1338307275


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -88,7 +91,8 @@
 public class AbfsClient implements Closeable {
   public static final Logger LOG = LoggerFactory.getLogger(AbfsClient.class);
   public static final String HUNDRED_CONTINUE_USER_AGENT = SINGLE_WHITE_SPACE 
+ HUNDRED_CONTINUE + SEMICOLON;
-
+  public static final String MD5_ERROR = "The MD5 value specified in the 
request "

Review Comment:
   should be placed in AbfsConstants, so it's more visible, javadocs to explain 
purpose. and that it comes with a 400



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String 
regEx,
     }
   }
 
+  /**
+   * Add MD5 hash as request header to the append request
+   * @param requestHeaders
+   * @param reqParams
+   * @param buffer
+   */
+  private void addCheckSumHeaderForWrite(List<AbfsHttpHeader> requestHeaders,
+      final AppendRequestParameters reqParams, final byte[] buffer)
+      throws AbfsRestOperationException {
+    try {
+      MessageDigest md5Digest = MessageDigest.getInstance(MD5);
+      byte[] dataToBeWritten = new byte[reqParams.getLength()];
+
+      if (reqParams.getoffset() == 0 && reqParams.getLength() == 
buffer.length) {
+        dataToBeWritten = buffer;
+      } else {
+        System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0,
+            reqParams.getLength());
+      }
+
+      byte[] md5Bytes = md5Digest.digest(dataToBeWritten);
+      String md5Hash = Base64.getEncoder().encodeToString(md5Bytes);
+      requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash));
+    } catch (NoSuchAlgorithmException ex) {
+      throw new AbfsRuntimeException(ex);
+    }
+  }
+
+  /**
+   * To verify the checksum information received from server for the data read
+   * @param buffer stores the data received from server
+   * @param result HTTP Operation Result
+   * @param bufferOffset Position where data returned by server is saved in 
buffer
+   * @throws AbfsRestOperationException
+   */
+  private void verifyCheckSumForRead(final byte[] buffer,
+      final AbfsHttpOperation result, final int bufferOffset)
+      throws AbfsRestOperationException {
+    // Number of bytes returned by server could be less than or equal to what
+    // caller requests. In case it is less, extra bytes will be initialized to 0
+    // Server returned MD5 Hash will be computed on what server returned.
+    // We need to get exact data that server returned and compute its md5 hash
+    // Computed hash should be equal to what server returned
+    int numberOfBytesRead = (int)result.getBytesReceived();
+    if (numberOfBytesRead == 0) {
+      return;
+    }
+    byte[] dataRead = new byte[numberOfBytesRead];
+
+    if (bufferOffset == 0 && numberOfBytesRead == buffer.length) {
+      dataRead = buffer;
+    } else {
+      System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead);
+    }
+
+    try {
+      MessageDigest md5Digest = MessageDigest.getInstance(MD5);

Review Comment:
    this creation/exception handling is repeated a lot. pull it out. maybe even 
create one per input stream



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -861,6 +875,12 @@ private boolean checkUserError(int responseStatusCode) {
         && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR);
   }
 
+  private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) {

Review Comment:
   if you make this static and package private you can add a unit test to 
verify that it is handled
   
   +add javadocs



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String 
regEx,
     }
   }
 
+  /**
+   * Add MD5 hash as request header to the append request
+   * @param requestHeaders
+   * @param reqParams
+   * @param buffer
+   */
+  private void addCheckSumHeaderForWrite(List<AbfsHttpHeader> requestHeaders,
+      final AppendRequestParameters reqParams, final byte[] buffer)
+      throws AbfsRestOperationException {
+    try {
+      MessageDigest md5Digest = MessageDigest.getInstance(MD5);
+      byte[] dataToBeWritten = new byte[reqParams.getLength()];
+
+      if (reqParams.getoffset() == 0 && reqParams.getLength() == 
buffer.length) {
+        dataToBeWritten = buffer;
+      } else {
+        System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0,
+            reqParams.getLength());
+      }
+
+      byte[] md5Bytes = md5Digest.digest(dataToBeWritten);
+      String md5Hash = Base64.getEncoder().encodeToString(md5Bytes);
+      requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash));
+    } catch (NoSuchAlgorithmException ex) {
+      throw new AbfsRuntimeException(ex);
+    }
+  }
+
+  /**
+   * To verify the checksum information received from server for the data read
+   * @param buffer stores the data received from server
+   * @param result HTTP Operation Result
+   * @param bufferOffset Position where data returned by server is saved in 
buffer
+   * @throws AbfsRestOperationException
+   */
+  private void verifyCheckSumForRead(final byte[] buffer,
+      final AbfsHttpOperation result, final int bufferOffset)
+      throws AbfsRestOperationException {
+    // Number of bytes returned by server could be less than or equal to what
+    // caller requests. In case it is less, extra bytes will be initialized to 0
+    // Server returned MD5 Hash will be computed on what server returned.
+    // We need to get exact data that server returned and compute its md5 hash
+    // Computed hash should be equal to what server returned
+    int numberOfBytesRead = (int)result.getBytesReceived();
+    if (numberOfBytesRead == 0) {
+      return;
+    }
+    byte[] dataRead = new byte[numberOfBytesRead];
+
+    if (bufferOffset == 0 && numberOfBytesRead == buffer.length) {
+      dataRead = buffer;
+    } else {
+      System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead);
+    }
+
+    try {
+      MessageDigest md5Digest = MessageDigest.getInstance(MD5);

Review Comment:
   can this be isolated/made static for unit tests?



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -761,6 +764,11 @@ public AbfsRestOperation append(final String path, final 
byte[] buffer,
       requestHeaders.add(new AbfsHttpHeader(USER_AGENT, userAgentRetry));
     }
 
+    // Add MD5 Hash of request content as request header if feature is enabled
+    if (isChecksumValidationEnabled()) {

Review Comment:
   ok, so Md5Mismatch is the stable error code to look for?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to