> I have a question / suggestion about the distributed substitutes > project: would downloads be split into uniformly sized chunks or could > the sizes vary? > Specifically, in an extreme case where an update introduced a single > extra byte at the beginning of a file, would that result in completely > new chunks?
most (all?) distributed storage solutions have a chunker (including ERIS with its 32k chunks, or Swarm with 4k chunks), and the chunks are content addressed, i.e. it also serves as deduplication at the chunk granularity. if the file doesn't just grow, but shifts away a couple of bytes somewhere in the middle, then this chunk-level deduplication stops happening from that point on. IIRC rar was the first archiver that introduced a very fast deduplication algorithm that detected even the non-aligned duplicated blocks of varying sizes. i don't think any distributed storage system has anything like that. > An alternative I've been thinking about is this: > find the store references in a file and split it along these references, > optionally apply further chunking to the non-reference blobs. chunking storage systems store only whole chunks, so too much splitting of files can increase the wasted storage. more so with large chunks, less so with smaller ones. > It's probably best to do this at the NAR level?? > > Storing reference offsets is already something that we should be doing to > speed other operations up, so this could tie in nicely with that. if optimization of grafting is worth this amount of trouble, then maybe the best is to extend the NAR format to store mutable references in a separate table at the end of the file. that would speed up guix operations like grafting, and help any storage systems that have deduplication, which includes some copy-on-write filesystems. -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “If you shut up truth and bury it under the ground, it will but grow, and gather to itself such explosive power that the day it bursts through it will knock down everything that stands in its way.” — Émile Zola (1840–1902)