[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Matthew Vernon
Hi, On 04/02/2021 07:41, Loïc Dachary wrote: On 04/02/2021 05:51, Federico Lucifredi wrote: Hi Loïc,    I am intrigued, but am missing something: why not using RGW, and store the source code files as objects? RGW has native compression and can take care of that behind the scenes. Excellent

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Loïc Dachary
On 04/02/2021 12:08, Lionel Bouton wrote: > Hi, > > Le 04/02/2021 à 08:41, Loïc Dachary a écrit : >> Hi Frederico, >> >> On 04/02/2021 05:51, Federico Lucifredi wrote: >>> Hi Loïc, >>>    I am intrigued, but am missing something: why not using RGW, and store >>> the source code files as objects?

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Lionel Bouton
Hi, Le 04/02/2021 à 08:41, Loïc Dachary a écrit : > Hi Frederico, > > On 04/02/2021 05:51, Federico Lucifredi wrote: >> Hi Loïc, >>    I am intrigued, but am missing something: why not using RGW, and store >> the source code files as objects? RGW has native compression and can take >> care of th

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-03 Thread Loïc Dachary
Hi Frederico, On 04/02/2021 05:51, Federico Lucifredi wrote: > Hi Loïc, >    I am intrigued, but am missing something: why not using RGW, and store the > source code files as objects? RGW has native compression and can take care of > that behind the scenes. Excellent question! > >    Is the desi

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-03 Thread Matt Wilder
If it were me, I would do something along the lines of: - Bundle larger blocks of code into pixz (essentially indexed tar files, allowing random access) and store them in RadosGW. - Build a small frontend that fetches (with caching) them and provides the file content

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-03 Thread Loïc Dachary
Hi Matt, I did not know about pixz, thanks for the pointer. The idea it implements is also new to me and it looks like it can usefully be applied to this use case. I'm not going to say "awesome" because I can't grasp how useful it really is right now. But I'll definitely think about it :-) Chee

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-03 Thread Burkhard Linke
Hi, On 2/3/21 9:41 AM, Loïc Dachary wrote: Just my 2 cents: You could use the first byte of the SHA sum to identify the image, e.g. using a fixed number of 256 images. Or some flexible approach similar to the way filestore used to store rados objects. A friend suggested the same to save spac

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-03 Thread Loïc Dachary
> > Just my 2 cents: > > You could use the first byte of the SHA sum to identify the image, e.g. using > a fixed number of 256 images. Or some flexible approach similar to the way > filestore used to store rados objects. A friend suggested the same to save space. Good idea. OpenPGP_signatu

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-03 Thread Burkhard Linke
Hi, On 2/2/21 9:32 PM, Loïc Dachary wrote: Hi Greg, On 02/02/2021 20:34, Gregory Farnum wrote: *snipsnap* Right. Dan's comment gave me pause: it does not seem to be a good idea to assume a RBD image of an infinite size. A friend who read this thread suggested a sensible approach (which als

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-02 Thread Loïc Dachary
Hi Greg, On 02/02/2021 20:34, Gregory Farnum wrote: > Packing's obviously a good idea for storing these kinds of artifacts > in Ceph, and hacking through the existing librbd might indeed be > easier than building something up from raw RADOS, especially if you > want to use stuff like rbd-mirror. >

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-02 Thread Anthony D'Atri
I’d be nervous about a plan to utilize a single volume, growing indefinitely. I would think that from a blast radius perspective that you’d want to strike a balance between a single monolithic blockchain-style volume vs a zillion tiny files. Perhaps a strategy to shard into, say, 10 TB volumes

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-02 Thread Gregory Farnum
Packing's obviously a good idea for storing these kinds of artifacts in Ceph, and hacking through the existing librbd might indeed be easier than building something up from raw RADOS, especially if you want to use stuff like rbd-mirror. My main concern would just be as Dan points out, that we don'

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-01 Thread Loïc Dachary
Hi Dan, On 01/02/2021 21:13, Dan van der Ster wrote: > Hi Loïc, > > We've never managed 100TB+ in a single RBD volume. I can't think of > anything, but perhaps there are some unknown limitations when they get so > big. > It should be easy enough to use rbd bench to create and fill a massive test >

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-01 Thread Dan van der Ster
Hi Loïc, We've never managed 100TB+ in a single RBD volume. I can't think of anything, but perhaps there are some unknown limitations when they get so big. It should be easy enough to use rbd bench to create and fill a massive test image to validate everything works well at that size. Also, I ass

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-01 Thread Loïc Dachary
On 01/02/2021 20:18, Alex Gorbachev wrote: > Hi Loïc, > > Does not borg need a file system to write its files to?  That's also my understanding. > We do replicate the chunks incrementally with rsync, and that is a very nice > and, importantly, idempotent way, to sync up data to a second site.  

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-01 Thread Alex Gorbachev
Hi Loïc, Does not borg need a file system to write its files to? We do replicate the chunks incrementally with rsync, and that is a very nice and, importantly, idempotent way, to sync up data to a second site. -- Alex Gorbachev ISS/Storcium On Mon, Feb 1, 2021 at 2:43 AM Loïc Dachary wrote:

[ceph-users] Re: Using RBD to pack billions of small files

2021-01-31 Thread Loïc Dachary
Hi Martin, On 01/02/2021 08:36, Martin Verges wrote: > Hello, > > source code should be compressible, maybe just creating something like > a tar.gz per repo or so? That way you would get much bigger objects > that could improve speed and make it easier to store on any storage > system. I should ha

[ceph-users] Re: Using RBD to pack billions of small files

2021-01-31 Thread Loïc Dachary
Hi Alex, Using borg would indeed make sense to copy the replicate the rbd content in case rbd-mirror is not an option, nice idea :-) Interestingly there is no need for a proper file system: the files are immutable and never deleted. They are indexed by the SHA256 of their content and a map where

[ceph-users] Re: Using RBD to pack billions of small files

2021-01-31 Thread Martin Verges
Hello, source code should be compressible, maybe just creating something like a tar.gz per repo or so? That way you would get much bigger objects that could improve speed and make it easier to store on any storage system. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.

[ceph-users] Re: Using RBD to pack billions of small files

2021-01-31 Thread Alex Gorbachev
Dear Loïc , I do not have direct experience with this many files, but it resonates for me with deduplication, such as borg (https://www.borgbackup.org/) or a similar implementation in the latest Proxmox Backup Server ( https://pbs.proxmox.com/wiki/index.php/Main_Page). I think you would need a fi