Hi Ludo,
Block size is fixed; did you consider content-defined block
boundaries
and such? Perhaps it doesn’t bring much though.
I looked into block boundaries with a "sliding hash" (re-compute a
short
hash for every byte read and choose boundaries when hash is zero).
This
would allow a higher degree of de-duplication, but I found this to
be a
bit "finicky" (and myself too impatient to tune and tweak this :).
I settled on fixed block sizes, making the encoding faster and
preventing
information leaks based on block size.
An other idea to increase de-duplication: When encoding a
directory,
align files to the ERIS block size. This would allows
de-duplication of
files across encoded images/directories.
Maybe something like SquashFS already does such an alignment? That
would
be cool...
The IPFS example is nice! There are bindings to the IPFS HTTP
interface
floating around for Guix; would be nice to converge on these
bits.
Spelunking into wip-ipfs-substitutes is on my list! Will report
back
with a report on the adventure. :)
ERIS is still "experimental". This release is intended to
initiate
discussion and collect feedback from a wider circle. In
particular I'd
be interested in your thoughts on applications and the Guile
API.
Do I get it right that the encoder currently keeps blocks in
memory?
By default when using `(eris-encode content)`, yes. The blocks are
stored into an alist.
But the encoder is implemented as an SRFI-171 transducer that
eagerly
emits (reduces) encoded blocks. So one could do this:
(eris-encode content #:block-reducer my-backend)
Where `my-backend` is a SRFI-171 reducer that takes care of the
blocks
as soon as they are ready. The IPFS example implements a reducer
that
stores blocks to IPFS. By default `eris-encode` just uses `rcons`
from
`(srfi srfi-171)`.
The encoding transducer is state-full. But it only keeps
references to
blocks in memory and at most log(n) at any moment, where n is the
number of blocks to encode.
The decoding interface currently looks likes this:
(eris-decode->bytevector eris-urn
(lambda (ref) (get-block-from-my-backend ref)))
Much room for improvement...
Do you have plans to provide an interface to the storage backend
so one
can easily switch between in-memory, Datashards, IPFS, etc.?
Currently the interface is a bit "low-level" - provide a SRFI-171
reducer. This can definitely be improved and I'd be happy for
ideas on
how to make this more ergonomic.
Thank you for your comments!
-pukkamustard