Re: Encoding for Robust Immutable Storage (ERIS) and Guile

2020-12-10 Thread pukkamustard



Hi Ludo,

Block size is fixed; did you consider content-defined block 
boundaries

and such?  Perhaps it doesn’t bring much though.


I looked into block boundaries with a "sliding hash" (re-compute a 
short
hash for every byte read and choose boundaries when hash is zero). 
This
would allow a higher degree of de-duplication, but I found this to 
be a

bit "finicky" (and myself too impatient to tune and tweak this :).

I settled on fixed block sizes, making the encoding faster and 
preventing

information leaks based on block size.

An other idea to increase de-duplication: When encoding a 
directory,
align files to the ERIS block size. This would allows 
de-duplication of

files across encoded images/directories.

Maybe something like SquashFS already does such an alignment? That 
would

be cool...

The IPFS example is nice!  There are bindings to the IPFS HTTP 
interface
floating around for Guix; would be nice to converge on these 
bits.


Spelunking into wip-ipfs-substitutes is on my list! Will report 
back

with a report on the adventure. :)

ERIS is still "experimental". This release is intended to 
initiate
discussion and collect feedback from a wider circle. In 
particular I'd
be interested in your thoughts on applications and the Guile 
API.


Do I get it right that the encoder currently keeps blocks in 
memory?


By default when using `(eris-encode content)`, yes. The blocks are
stored into an alist.

But the encoder is implemented as an SRFI-171 transducer that 
eagerly

emits (reduces) encoded blocks. So one could do this:

(eris-encode content #:block-reducer my-backend)

Where `my-backend` is a SRFI-171 reducer that takes care of the 
blocks
as soon as they are ready. The IPFS example implements a reducer 
that
stores blocks to IPFS. By default `eris-encode` just uses `rcons` 
from

`(srfi srfi-171)`.

The encoding transducer is state-full. But it only keeps 
references to

blocks in memory and at most log(n) at any moment, where n is the
number of blocks to encode.

The decoding interface currently looks likes this:

(eris-decode->bytevector eris-urn
 (lambda (ref) (get-block-from-my-backend ref)))

Much room for improvement...

Do you have plans to provide an interface to the storage backend 
so one

can easily switch between in-memory, Datashards, IPFS, etc.?


Currently the interface is a bit "low-level" - provide a SRFI-171
reducer. This can definitely be improved and I'd be happy for 
ideas on

how to make this more ergonomic.

Thank you for your comments!
-pukkamustard



Re: Encoding for Robust Immutable Storage (ERIS) and Guile

2020-12-10 Thread pukkamustard



Congratulations pukkamustard, I really am excited by and respect 
the
work you're doing on this.  I think it's probably the proper 
replacement

slot for the storage work I'm doing in Spritely.


Thank you!

Once Spritely Goblins gets ported to Guile it'll be really fun 
to

combine these two things. :)


I agree! Looking forward to hacking on Spirtely in Guile...

-pukkamustard