[ceph-users] Re: Why you might want packages not containers for Ceph deployments

Peter Lieven Tue, 16 Nov 2021 23:49:08 -0800


> Am 09.11.2021 um 00:01 schrieb Igor Fedotov <igor.fedo...@croit.io>:
> 
> Hi folks,
> 
> having a LTS release cycle could be a great topic for upcoming "Ceph User + 
> Dev Monthly meeting".
> 
> The first one is scheduled  on November 18, 2021, 14:00-15:00 UTC
> 
> https://pad.ceph.com/p/ceph-user-dev-monthly-minutes
> 
> Any volunteers to extend the agenda and advocate the idea?


Hi Igor,

do you still think we can add the LTS topic to the agenda? I will attend 
tomorrow and can try to advocate it.

Best,
Peter

> 
> Thanks,
> Igor
> 
>> On 11/8/2021 3:21 PM, Frank Schilder wrote:
>> Hi all,
>> 
>> I followed this thread with great interest and would like to add my 
>> opinion/experience/wishes as well.
>> 
>> I believe the question packages versus containers needs a bit more context 
>> to be really meaningful. This was already mentioned several times with 
>> regards to documentation. I see the following three topics tightly connected 
>> (my opinion/answers included):
>> 
>> 1. Distribution: Packages are compulsory, containers are optional.
>> 2. Deployment: Ceph adm (yet another deployment framework) and ceph (the 
>> actual storage system) should be strictly different projects.
>> 3. Release cycles: The release cadence is way too fast, I very much miss a 
>> ceph LTS branch with at least 10 years back-port support.
>> 
>> These are my short answers/wishes/expectations in this context. I will add 
>> below some more reasoning as optional reading (warning: wall of text ahead).
>> 
>> 
>> 1. Distribution
>> ---------
>> 
>> I don't think the question is about packages versus containers, because even 
>> if a distribution should decide not to package ceph any more, other 
>> distributors certainly will and the user community just moves away from 
>> distributions without ceph packages. In addition, unless Rad Hat plans to 
>> move to a source-only container where I run the good old configure - make - 
>> make install, it will be package based any ways, so packages are there to 
>> stay.
>> 
>> Therefore, the way I understand this question is about ceph-adm versus other 
>> deployment methods. Here, I think the push to a container-based ceph-adm 
>> only deployment is unlikely to become the no. 1 choice for everyone for good 
>> reasons already mentioned in earlier messages. In addition, I also believe 
>> that development of a general deployment tool is currently not sustainable 
>> as was mentioned by another user. My reasons for this are given in the next 
>> section.
>> 
>> 
>> 2. Deployment
>> ---------
>> 
>> In my opinion, it is really important to distinguish three components of any 
>> open-source project: development (release cycles), distribution and 
>> deployment. Following the good old philosophy that every tool does exactly 
>> one job and does it well, each of these components are separate projects, 
>> because they correspond to different tools.
>> 
>> This implies immediately that ceph documentation should not contain 
>> documentation about packaging and deployment tools. Each of these ought to 
>> be strictly separate. If I have a low-level problem with ceph and go to the 
>> ceph documentation, I do not want to see ceph-adm commands. Ceph 
>> documentation should be about ceph (the storage system) only. Such a mix-up 
>> is leading to problems and there were already ceph-user cases where people 
>> could not use the documentation for trouble shooting, because it showed 
>> ceph-adm commands but their cluster was not ceph-adm deployed.
>> 
>> In this context, I would prefer if there was a separate ceph-adm-users list 
>> so that ceph-users can focus on actual ceph problems again.
>> 
>> Now to the point that ceph-adm might be an un-sustainable project. Although 
>> at a first glance the idea of a generic deployment tool that solves all 
>> problems with a single command might look appealing, it is likely doomed to 
>> fail for a simple reason that was already indicated in an earlier message: 
>> ceph deployment is subject to a complexity paradox. Ceph has a very large 
>> configuration space and implementing and using a generic tool that covers 
>> and understands this configuration space is more complex than deploying any 
>> specific ceph cluster, each of which uses only a tiny subset of the entire 
>> configuration space.
>> 
>> In other words: deploying a specific ceph cluster is actually not that 
>> difficult.
>> 
>> Designing a - and dimensioning all components of a ceph cluster is difficult 
>> and none of the current deployment tools help here. There is not even a 
>> check for suitable hardware. In addition, technology is moving fast and 
>> adapting a generic tool to new developments in time seems a hopeless task. 
>> For example, when will ceph-adm natively support collocated lvm OSDs with 
>> dm_cache devices? Is it even worth trying to incorporate this?
>> 
>> My wish would be to keep the ceph project clean of any deployment tasks. In 
>> my opinion, the basic ceph tooling is already doing tasks that are the 
>> responsibility of a configuration management- and not a storage system (e.g. 
>> deploy unit files by default instead of as an option disabled by default).
>> 
>> 
>> 3. Release cycles
>> ---------
>> 
>> Ceph is a complex system and the code is getting more complex every day. It 
>> is very difficult to beat the curse of complexity that development and 
>> maintenance effort grows non-linearly (exponentially?) with the number of 
>> lines of code. As a consequence, (A) if one wants to maintain quality while 
>> adding substantial new features, the release intervals become longer and 
>> longer. (B) If one wants to maintain constant release intervals while adding 
>> substantial new features, the quality will have to go down. The last option 
>> is that (C) new releases with constant release intervals contain ever 
>> smaller increments in functionality to maintain quality. I ignore the option 
>> of throwing more and more qualified developers at the project as this seems 
>> unlikely and also comes with its own complexity cost.
>> 
>> I'm afraid we are in scenario B. Ceph is loosing its nimbus of being a rock 
>> solid system.
>> 
>> Just recently, there were some ceph-user emails about how dangerous or not 
>> is it to upgrade to the latest stable octopus version. The upgrade itself 
>> apparently goes well, but what happens then? I personally have too many 
>> reports that the latest ceph versions are quite touchy and collapse in 
>> situations that have never been a problem up to mimic (most prominently, 
>> that a simple rebalance operation after adding disks gets OSDs to flap and 
>> can take a whole cluster down - plenty of cases since nautilus). Stability 
>> at scale seems to become a real issue with increasing version numbers. I'm 
>> myself very hesitant to upgrade, in particular, because there is no way back 
>> and the cycles of potential doom are so short.
>> 
>> Therefore, I would very much appreciate the foundation of a ceph-LTS branch 
>> with at least 10 years back-port support, if not longer. In addition, 
>> upgrade procedures between LTS versions should allow a downgrade by one 
>> version as well (move legacy data along until explicitly allowed to cut all 
>> bridges). For any large storage system, robustness, predictability and low 
>> maintenance effort are invaluable. For example, our cluster is very 
>> demanding compared with our other storage systems, the OSDs have a nasty 
>> memory leak, operations get stuck in MONs and MDSes at least once or twice a 
>> week due to race conditions and so on. It is currently not possible to let 
>> the cluster run unattended for months or even years, something that is 
>> possible if not the rule with other (also open-source) storage systems.
>> 
>> Fixing bugs that show up rarely and are very difficult to catch is really 
>> important for a storage system with theoretically infinite uptime. Rolling 
>> versions over all the time and then throwing "xyz is not supported, try with 
>> a newer version" at users when they discover a rare a problem after running 
>> for a few years is not helping to get ceph to a level of stability that will 
>> be convincing enough in the long run.
>> 
>> I understand that implementing new features is more fun than bug fixing. 
>> However, bug fixing is what makes users trust a platform. I see too many 
>> people around me loosing faith in ceph at the moment and starting to treat 
>> it as a second- or third-class storage system. This is largely due to the 
>> short support interval given the actual complexity of the software. 
>> Establishing an LTS branch could win back sceptical admins who started 
>> looking for alternatives.
>> 
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> -- 
> Igor Fedotov
> Ceph Lead Developer
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

Reply via email to