Hi Martin,

On 01/02/2021 08:36, Martin Verges wrote:
> Hello,
>
> source code should be compressible, maybe just creating something like
> a tar.gz per repo or so? That way you would get much bigger objects
> that could improve speed and make it easier to store on any storage
> system.
I should have been more specific about what "artifacts" are in the context of
Software Heritage, sorry about that. You can read more in the architecture 
document
if you're interested[0]. From my point of view, the problem is to store the 
artifacts
and handle them as opaque blobs of data and take advantage of their properties 
(immutable,
never deleted) to keep it simple and save space.

That being said, it is often a good idea to rethink the data structure itself 
to find a
better and more efficient strategy and keeping a tarbal of the repo could be a 
good
choice. But that would be an entirely different project.

Cheers

[0] https://docs.softwareheritage.org/devel/architecture.html
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
> Am Sa., 30. Jan. 2021 um 16:01 Uhr schrieb Loïc Dachary <l...@dachary.org>:
>> Bonjour,
>>
>> In the context Software Heritage (a noble mission to preserve all source 
>> code)[0], artifacts have an average size of ~3KB and there are billions of 
>> them. They never change and are never deleted. To save space it would make 
>> sense to write them, one after the other, in an every growing RBD volume 
>> (more than 100TB). An index, located somewhere else, would record the offset 
>> and size of the artifacts in the volume.
>>
>> I wonder if someone already implemented this idea with success? And if 
>> not... does anyone see a reason why it would be a bad idea?
>>
>> Cheers
>>
>> [0] https://docs.softwareheritage.org/
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Loïc Dachary, Artisan Logiciel Libre


Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to