Hi, I would like to second Nico's comment. What happened to the idea that a 
deployment tool should be idempotent? The most natural option would be:

1) start install -> something fails
2) fix problem
3) repeat exact same deploy command -> deployment picks up at current state 
(including cleaning up failed state markers) and tries to continue until next 
issue (go to 2)

I'm not sure (meaning: its a terrible idea) if its a good idea to provide a 
single command to wipe a cluster. Just for the fat finger syndrome. This seems 
safe only if it would be possible to mark a cluster as production somehow (must 
be sticky, that is, cannot be unset), which prevents a cluster destroy command 
(or any too dangerous command) from executing. I understand the test case in 
the tracker, but having such test-case utils that can run on a production 
cluster and destroy everything seems a bit dangerous.

I think destroying a cluster should be a manual and tedious process and 
figuring out how to do it should be part of the learning experience. So my 
answer to "how do I start over" would be "go figure it out, its an important 
lesson".

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Nico Schottelius <nico.schottel...@ungleich.ch>
Sent: Friday, May 26, 2023 10:40 PM
To: Redouane Kachach
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Seeking feedback on Improving cephadm bootstrap 
process


Hello Redouane,

much appreciated kick-off for improving cephadm. I was wondering why
cephadm does not use a similar approach to rook in the sense of "repeat
until it is fixed?"

For the background, rook uses a controller that checks the state of the
cluster, the state of monitors, whether there are disks to be added,
etc. It periodically restarts the checks and when needed shifts
monitors, creates OSDs, etc.

My question is, why not have a daemon or checker subcommand of cephadm
that a) checks what the current cluster status is (i.e. cephadm
verify-cluster) and b) fixes the situation (i.e. cephadm 
verify-and-fix-cluster)?

I think that option would be much more beneficial than the other two
suggested ones.

Best regards,

Nico


--
Sustainable and modern Infrastructures by ungleich.ch
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to