Hello internals! Thanks for PHP!

I'm writing to gauge interest in two new functions to the PHP `hash`
extension, `hash_serialize` and `hash_unserialize`. These functions would
serialize and unserialize the internals of a HashContext object, allowing a
partially-computed hash to be saved, then restored and completed in a later
run.

EXAMPLE: Multi-part upload.

Say that a very large file is uploaded in pieces, `big.001` through
`big.999`, and it is necessary to compute the SHA256 of the final
concatenated file.
Current PHP must compute the hash in one go:

$ctx = hash_init("sha256");
for ($i = 1; $i <= 999; ++$i) {
     hash_update_file($ctx, sprintf("big.%.03d", $i));
}
$hash = hash_final($ctx);

This in turn requires that all pieces be on the filesystem simultaneously.

With hash_serialize and hash_unserialize, the hash can be computed
gradually, allowing pieces to be deleted as they are uploaded elsewhere.

$ctx = hash_init("sha256");
hash_update_file($ctx, "big.001");
SAVE_TO_DATABASE(hash_serialize($ctx));
...
$ctx = hash_unserialize(LOAD_FROM_DATABASE());
hash_update_file($ctx, "big.002");
SAVE_TO_DATABASE(hash_serialize($ctx));
...
etc.

***

I am happy to write up an RFC for these functions. An initial
implementation with tests is visible here:
https://github.com/kohler/php-src/commit/5a3a828f90b88cd7f660babec7db531cfc04b0a1

New functions `hash_serialize` and `hash_unserialize` appear to fit the
existing API well, and simplify implementation, but it's possible that
`__serialize/__unserialize` or the internal `serialize/unserialize`
functions would be preferred.

I'd be grateful for any feedback.
Thanks!
Eddie Kohler

Reply via email to