Hello internals! Thanks for PHP!
I'm writing to gauge interest in two new functions to the PHP `hash`
extension, `hash_serialize` and `hash_unserialize`. These functions would
serialize and unserialize the internals of a HashContext object, allowing a
partially-computed hash to be saved, then restored and completed in a later
run.
EXAMPLE: Multi-part upload.
Say that a very large file is uploaded in pieces, `big.001` through
`big.999`, and it is necessary to compute the SHA256 of the final
concatenated file.
Current PHP must compute the hash in one go:
$ctx = hash_init("sha256");
for ($i = 1; $i <= 999; ++$i) {
hash_update_file($ctx, sprintf("big.%.03d", $i));
}
$hash = hash_final($ctx);
This in turn requires that all pieces be on the filesystem simultaneously.
With hash_serialize and hash_unserialize, the hash can be computed
gradually, allowing pieces to be deleted as they are uploaded elsewhere.
$ctx = hash_init("sha256");
hash_update_file($ctx, "big.001");
SAVE_TO_DATABASE(hash_serialize($ctx));
...
$ctx = hash_unserialize(LOAD_FROM_DATABASE());
hash_update_file($ctx, "big.002");
SAVE_TO_DATABASE(hash_serialize($ctx));
...
etc.
***
I am happy to write up an RFC for these functions. An initial
implementation with tests is visible here:
https://github.com/kohler/php-src/commit/5a3a828f90b88cd7f660babec7db531cfc04b0a1
New functions `hash_serialize` and `hash_unserialize` appear to fit the
existing API well, and simplify implementation, but it's possible that
`__serialize/__unserialize` or the internal `serialize/unserialize`
functions would be preferred.
I'd be grateful for any feedback.
Thanks!
Eddie Kohler