[
https://issues.apache.org/jira/browse/HDDS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-10587:
-------------------------------
Description:
Currently, the MessageDigest instance is a thread local variable (one per S3G
Jetty thread). MessageDigest requires the call to either MessageDigest#digest
or MessageDigest#reset to reset the digest.
In normal ObjectEndpoint#put flow, MessageDigest#digest is called after the
data has been written to the datanodes, before the key is committed. However,
if an IOException happens (e.g. EOFException due to client cancelling during
the write), the digest will not be reset and remains in the inconsistent state.
This will affect the subsequent request that uses the same thread and therefore
the ETag generated will be completely different from the md5 hash of the object
causing AWS S3 SDK to detect inconsistent hash when downloading the object.
The issue can be replicated using an S3G with a single thread and doing three
put-object operations for the same key and same payload.
1st put-object: cancel the operation before it put-object operation can finish,
ensure the EOFException is thrown in the S3Gateway logs
2nd put-object: let the put-object finish. The resulting ETag will not be the
same as the md5 digest of the payload.
3rd put-object: also let the put-object finish. Since the previous put-object
reset the digest, the resulting ETag will be correct.
This patch adds a call to MessageDigest#reset in ObjectEndpoint#put to reset
the digest in case of exception.
was:
Currently, the MessageDigest instance is a thread local variable (one per S3G
Jetty thread). MessageDigest requires the call to either MessageDigest#digest
or MessageDigest#reset to reset the digest.
In normal ObjectEndpoint#put flow, MessageDigest#digest is called after the
data has been written to the datanodes, before the key is committed. However,
if an IOException happens (e.g. EOFException due to client cancelling during
the write), the digest will not be reset and remains in the inconsistent state.
This will affect the subsequent request that uses the same thread and therefore
the ETag generated will be completely different from the md5 hash of the object
causing AWS S3 SDK to detect inconsistent hash when downloading the object.
The issue can be replicated using an S3G with a single thread and doing three
put-object operations for the same key and same payload.
1st put-object: cancel the operation before it put-object operation can finish
2nd put-object: let the put-object finish. The resulting ETag will not be the
same as the md5 sum.
3rd put-object: also let the put-object finish. Since the previous put-object
reset the digest, the resulting ETag will be correct.
This patch adds a call to MessageDigest#reset in ObjectEndpoint#put to reset
the digest in case of exception.
> Reset the thread-local MessageDigest instance during exception
> --------------------------------------------------------------
>
> Key: HDDS-10587
> URL: https://issues.apache.org/jira/browse/HDDS-10587
> Project: Apache Ozone
> Issue Type: Improvement
> Components: S3, s3gateway
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> Currently, the MessageDigest instance is a thread local variable (one per S3G
> Jetty thread). MessageDigest requires the call to either MessageDigest#digest
> or MessageDigest#reset to reset the digest.
> In normal ObjectEndpoint#put flow, MessageDigest#digest is called after the
> data has been written to the datanodes, before the key is committed. However,
> if an IOException happens (e.g. EOFException due to client cancelling during
> the write), the digest will not be reset and remains in the inconsistent
> state. This will affect the subsequent request that uses the same thread and
> therefore the ETag generated will be completely different from the md5 hash
> of the object causing AWS S3 SDK to detect inconsistent hash when downloading
> the object.
> The issue can be replicated using an S3G with a single thread and doing three
> put-object operations for the same key and same payload.
> 1st put-object: cancel the operation before it put-object operation can
> finish, ensure the EOFException is thrown in the S3Gateway logs
> 2nd put-object: let the put-object finish. The resulting ETag will not be the
> same as the md5 digest of the payload.
> 3rd put-object: also let the put-object finish. Since the previous put-object
> reset the digest, the resulting ETag will be correct.
> This patch adds a call to MessageDigest#reset in ObjectEndpoint#put to reset
> the digest in case of exception.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]