> Am 09.11.2021 um 00:01 schrieb Igor Fedotov <igor.fedo...@croit.io>: > > Hi folks, > > having a LTS release cycle could be a great topic for upcoming "Ceph User + > Dev Monthly meeting". > > The first one is scheduled on November 18, 2021, 14:00-15:00 UTC > > https://pad.ceph.com/p/ceph-user-dev-monthly-minutes > > Any volunteers to extend the agenda and advocate the idea?
Hi Igor, do you still think we can add the LTS topic to the agenda? I will attend tomorrow and can try to advocate it. Best, Peter > > Thanks, > Igor > >> On 11/8/2021 3:21 PM, Frank Schilder wrote: >> Hi all, >> >> I followed this thread with great interest and would like to add my >> opinion/experience/wishes as well. >> >> I believe the question packages versus containers needs a bit more context >> to be really meaningful. This was already mentioned several times with >> regards to documentation. I see the following three topics tightly connected >> (my opinion/answers included): >> >> 1. Distribution: Packages are compulsory, containers are optional. >> 2. Deployment: Ceph adm (yet another deployment framework) and ceph (the >> actual storage system) should be strictly different projects. >> 3. Release cycles: The release cadence is way too fast, I very much miss a >> ceph LTS branch with at least 10 years back-port support. >> >> These are my short answers/wishes/expectations in this context. I will add >> below some more reasoning as optional reading (warning: wall of text ahead). >> >> >> 1. Distribution >> --------- >> >> I don't think the question is about packages versus containers, because even >> if a distribution should decide not to package ceph any more, other >> distributors certainly will and the user community just moves away from >> distributions without ceph packages. In addition, unless Rad Hat plans to >> move to a source-only container where I run the good old configure - make - >> make install, it will be package based any ways, so packages are there to >> stay. >> >> Therefore, the way I understand this question is about ceph-adm versus other >> deployment methods. Here, I think the push to a container-based ceph-adm >> only deployment is unlikely to become the no. 1 choice for everyone for good >> reasons already mentioned in earlier messages. In addition, I also believe >> that development of a general deployment tool is currently not sustainable >> as was mentioned by another user. My reasons for this are given in the next >> section. >> >> >> 2. Deployment >> --------- >> >> In my opinion, it is really important to distinguish three components of any >> open-source project: development (release cycles), distribution and >> deployment. Following the good old philosophy that every tool does exactly >> one job and does it well, each of these components are separate projects, >> because they correspond to different tools. >> >> This implies immediately that ceph documentation should not contain >> documentation about packaging and deployment tools. Each of these ought to >> be strictly separate. If I have a low-level problem with ceph and go to the >> ceph documentation, I do not want to see ceph-adm commands. Ceph >> documentation should be about ceph (the storage system) only. Such a mix-up >> is leading to problems and there were already ceph-user cases where people >> could not use the documentation for trouble shooting, because it showed >> ceph-adm commands but their cluster was not ceph-adm deployed. >> >> In this context, I would prefer if there was a separate ceph-adm-users list >> so that ceph-users can focus on actual ceph problems again. >> >> Now to the point that ceph-adm might be an un-sustainable project. Although >> at a first glance the idea of a generic deployment tool that solves all >> problems with a single command might look appealing, it is likely doomed to >> fail for a simple reason that was already indicated in an earlier message: >> ceph deployment is subject to a complexity paradox. Ceph has a very large >> configuration space and implementing and using a generic tool that covers >> and understands this configuration space is more complex than deploying any >> specific ceph cluster, each of which uses only a tiny subset of the entire >> configuration space. >> >> In other words: deploying a specific ceph cluster is actually not that >> difficult. >> >> Designing a - and dimensioning all components of a ceph cluster is difficult >> and none of the current deployment tools help here. There is not even a >> check for suitable hardware. In addition, technology is moving fast and >> adapting a generic tool to new developments in time seems a hopeless task. >> For example, when will ceph-adm natively support collocated lvm OSDs with >> dm_cache devices? Is it even worth trying to incorporate this? >> >> My wish would be to keep the ceph project clean of any deployment tasks. In >> my opinion, the basic ceph tooling is already doing tasks that are the >> responsibility of a configuration management- and not a storage system (e.g. >> deploy unit files by default instead of as an option disabled by default). >> >> >> 3. Release cycles >> --------- >> >> Ceph is a complex system and the code is getting more complex every day. It >> is very difficult to beat the curse of complexity that development and >> maintenance effort grows non-linearly (exponentially?) with the number of >> lines of code. As a consequence, (A) if one wants to maintain quality while >> adding substantial new features, the release intervals become longer and >> longer. (B) If one wants to maintain constant release intervals while adding >> substantial new features, the quality will have to go down. The last option >> is that (C) new releases with constant release intervals contain ever >> smaller increments in functionality to maintain quality. I ignore the option >> of throwing more and more qualified developers at the project as this seems >> unlikely and also comes with its own complexity cost. >> >> I'm afraid we are in scenario B. Ceph is loosing its nimbus of being a rock >> solid system. >> >> Just recently, there were some ceph-user emails about how dangerous or not >> is it to upgrade to the latest stable octopus version. The upgrade itself >> apparently goes well, but what happens then? I personally have too many >> reports that the latest ceph versions are quite touchy and collapse in >> situations that have never been a problem up to mimic (most prominently, >> that a simple rebalance operation after adding disks gets OSDs to flap and >> can take a whole cluster down - plenty of cases since nautilus). Stability >> at scale seems to become a real issue with increasing version numbers. I'm >> myself very hesitant to upgrade, in particular, because there is no way back >> and the cycles of potential doom are so short. >> >> Therefore, I would very much appreciate the foundation of a ceph-LTS branch >> with at least 10 years back-port support, if not longer. In addition, >> upgrade procedures between LTS versions should allow a downgrade by one >> version as well (move legacy data along until explicitly allowed to cut all >> bridges). For any large storage system, robustness, predictability and low >> maintenance effort are invaluable. For example, our cluster is very >> demanding compared with our other storage systems, the OSDs have a nasty >> memory leak, operations get stuck in MONs and MDSes at least once or twice a >> week due to race conditions and so on. It is currently not possible to let >> the cluster run unattended for months or even years, something that is >> possible if not the rule with other (also open-source) storage systems. >> >> Fixing bugs that show up rarely and are very difficult to catch is really >> important for a storage system with theoretically infinite uptime. Rolling >> versions over all the time and then throwing "xyz is not supported, try with >> a newer version" at users when they discover a rare a problem after running >> for a few years is not helping to get ceph to a level of stability that will >> be convincing enough in the long run. >> >> I understand that implementing new features is more fun than bug fixing. >> However, bug fixing is what makes users trust a platform. I see too many >> people around me loosing faith in ceph at the moment and starting to treat >> it as a second- or third-class storage system. This is largely due to the >> short support interval given the actual complexity of the software. >> Establishing an LTS branch could win back sceptical admins who started >> looking for alternatives. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io