Re: Encoding for Robust Immutable Storage (ERIS) and Guile

Ludovic Courtès Fri, 11 Dec 2020 00:11:07 -0800

Hello pukkamustard!

pukkamustard <pukkamust...@posteo.net> skribis:


> I looked into block boundaries with a "sliding hash" (re-compute a
> short
> hash for every byte read and choose boundaries when hash is
> zero). This
> would allow a higher degree of de-duplication, but I found this to be
> a
> bit "finicky" (and myself too impatient to tune and tweak this :).
>
> I settled on fixed block sizes, making the encoding faster and
> preventing
> information leaks based on block size.

Yeah, sounds reasonable.  (I evaluated the benefits of this and other
approaches years ago, FWIW: <https://hal.inria.fr/hal-00187069/en>.)

> An other idea to increase de-duplication: When encoding a directory,
> align files to the ERIS block size. This would allows de-duplication
> of
> files across encoded images/directories.

I guess that’d work, indeed.

>> Do I get it right that the encoder currently keeps blocks in memory?
>
> By default when using `(eris-encode content)`, yes. The blocks are
> stored into an alist.
>
> But the encoder is implemented as an SRFI-171 transducer that eagerly
> emits (reduces) encoded blocks. So one could do this:
>
> (eris-encode content #:block-reducer my-backend)
>
> Where `my-backend` is a SRFI-171 reducer that takes care of the blocks
> as soon as they are ready. The IPFS example implements a reducer that
> stores blocks to IPFS. By default `eris-encode` just uses `rcons` from
> `(srfi srfi-171)`.

Ah, I see, that’s great!  I’m not familiar with the transducer API so I
always have to think twice (or more) about what’s going on; the
flexibility it gives here is really nice.

> The encoding transducer is state-full. But it only keeps references to
> blocks in memory and at most log(n) at any moment, where n is the
> number of blocks to encode.
>
> The decoding interface currently looks likes this:
>
> (eris-decode->bytevector eris-urn
>  (lambda (ref) (get-block-from-my-backend ref)))

OK.

>> Do you have plans to provide an interface to the storage backend so
>> one
>> can easily switch between in-memory, Datashards, IPFS, etc.?
>
> Currently the interface is a bit "low-level" - provide a SRFI-171
> reducer. This can definitely be improved and I'd be happy for ideas on
> how to make this more ergonomic.

Maybe that’s all we need after all.  Maybe what would be nice is a
couple of examples, like a high-level procedure or CLI that can insert
or fetch from either (say) a local GDBM database or IPFS.  That would
illustrate integration with backends as well as the high-level API.

Thanks!

Ludo’.

Re: Encoding for Robust Immutable Storage (ERIS) and Guile

Reply via email to