[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback

Anthony D'Atri Thu, 10 Apr 2025 10:59:57 -0700


>> anthonydatri@Mac models % pwd
>> /Users/anthonydatri/git/ceph/src/pybind/mgr/diskprediction_local/models
>> anthonydatri@Mac models % file redhat/*
>> redhat/config.json:           JSON data
>> redhat/hgst_predictor.pkl:    data
>> redhat/hgst_scaler.pkl:       data
>> redhat/seagate_predictor.pkl: data
>> redhat/seagate_scaler.pkl:    data
>> anthonydatri@Mac models %
> 
> These are Python pickle files from 2019 containing ML models made with a 
> version of sklearn from 2019.


Leerer Blick

IMHO binaries don’t belong in git repositories and the approach kinda sounds 
like trying to be clever and trendy for the sake of being clever and trendy.  
Cf. the KISS principle.  By which I mean keeping it simple, not lip-syncing 
when you should have retired in the 1990s.

I’ve had good luck in the past with an (admittedly ugly) SMART collector that 
dumped harmonized metrics into the textfile_collector directory for 
node_exporter to pick up, then using conventional Alertmanager rules, which are 
easy to write, improve, and tweak for local conditions.

If kept as a Manager module I could see this being yet another thing hampering 
scalability.

Were we to implement a framework for normalizing metrics for given drive models 
— and honestly that’s what it takes to be useful — the community could PR the 
individual SKU entries over time.  I would draw a line in the sand up front:  
no client SKUs will be accepted, no USB/Thunderbolt drives, no HBA/SAN mirages. 
 Only physical, enterprise drive SKUs.  Client drive failures are trivially 
predicted as simply SOON.

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback

Reply via email to