On 22/06/12 07:34 AM, Leif Hedstrom wrote:
On 6/22/12 4:12 AM, Jack Bates wrote:
What's the best way to compute SHA-256 digests for content in the
cache? I am thinking of using libgcrypt [1], can anyone comment on
whether this is a good choice, or offer advice?
hmmm, maybe consider something from OpenSSL? We already link / use it
heavily, and I believe there's a SHA-256 in it as well?
Thanks Leif, done. I switched from libgcrypt to OpenSSL [1]
This plugin now computes SHA-256 digests for content in the cache. Given
a response with a "Location: ..." header and a "Digest: SHA-256=..."
header, if the "Location: ..." URL isn't already cached but content with
a matching digest does exist in the cache, the plugin will rewrite the
"Location: ..." header with the cached URL. This should redirect clients
that are not Metalink aware to mirrors that are already cached
The code is up on GitHub [2] I would love any feedback
To check for content with a matching digest, this plugin uses
TSCacheWrite() and TSCacheRead() to map the SHA-256 digest of the
content to the request URL
It listens for responses from origin servers with
TS_EVENT_HTTP_READ_RESPONSE_HDR and sets up a transform. The transform
doesn't alter the content, just feeds it to OpenSSL SHA256_Update().
When complete, it calls TSCacheKeyDigestSet() on the SHA-256 digest, and
TSCacheWrite() to store there the request URL
It also listens for responses to clients (from cache or from origin
server) with TS_EVENT_HTTP_SEND_RESPONSE_HDR. If the response has a
"Location: ..." header and a "Digest: SHA-256=..." header then it calls
TSCacheKeyDigestFromUrlSet() and TSCacheRead() to check if the
"Location: ..." URL is already cached. If not then it calls
TSCacheKeyDigestSet() and TSCacheRead() to check if the "Digest:
SHA-256=..." digest already exists in the cache. If so then it calls
TSVConnRead() to read the URL associated with the digest
Finally it calls TSCacheKeyDigestFromUrlSet() and TSCacheRead() again,
on the URL associated with the digest, to check that the content is
still fresh. If it is then the plugin rewrites the "Location: ..."
header with the cached URL
What are your thoughts on this approach?
This satisfies the requirement from RFC 6249:
If Instance Digests are not provided by the Metalink servers, the
Link header fields pertaining to this specification MUST be ignored.
RFC 6249 also requires the SHA-256 digest:
Metalinks contain whole file hashes as described in
Section 6, and MUST include SHA-256, as specified in [FIPS-180-3].
The plugin could support additional digest algorithms, if they are useful?
[1]
https://github.com/jablko/dedup/commit/3d1e6c1980df5b75aa44ace24f6a4886d6ba4215
[2] https://github.com/jablko/dedup