Well Herbert,

as Paul mentioned. You should reconfigure the threshold of your osds first and reweight second. Paul has sent you some hints.

Jewel Documentation:

http://docs.ceph.com/docs/jewel/rados/

|osd backfill full ratio|

Description: Refuse to accept backfill requests when the Ceph OSD Daemon’s full ratio is above this value.
Type:   Float
Default:        |0.85|


You could put this into your config with an value of 0.9  on all osd-servers and restart the osd-daemons. Don't forget "ceph osd set noout". After restarting the daemons "ceph osd unset noout" resync should take place instandly. Now set reweight on osd 1,0,2 to a value like 0.9.
"ceph osd reweight 1 0.9" and so on.

Herbert, you really should extend your cluster! And Or evacuate your data and rebuild it from scratch.

Cheers,

Vadim

On 12.06.2018 16:42, Steininger, Herbert wrote:
Hi,

Thanks Guys for your Answers.

'ceph osd df' gives me:
[root@pcl241 ceph]# ceph osd df
ID WEIGHT   REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
  1 18.18999  1.00000 18625G 15705G  2919G 84.32 1.04 152
  0 18.18999  1.00000 18625G 15945G  2680G 85.61 1.06 165
  3 18.18999  1.00000 18625G 14755G  3870G 79.22 0.98 162
  4 18.18999  1.00000 18625G 14503G  4122G 77.87 0.96 158
  2 18.18999  1.00000 18625G 15965G  2660G 85.72 1.06 165
  5 18.18999  1.00000 21940G 16054G  5886G 73.17 0.91 159
                TOTAL   112T 92929G 22139G 80.76
MIN/MAX VAR: 0.91/1.06  STDDEV: 4.64


And

[root@pcl241 ceph]# ceph osd df tree
ID  WEIGHT    REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS TYPE NAME
  -1 109.13992        -      0      0      0     0    0   0 root default
  -2         0        -      0      0      0     0    0   0     host 
A1214-2950-01
  -3         0        -      0      0      0     0    0   0     host 
A1214-2950-02
  -4         0        -      0      0      0     0    0   0     host 
A1214-2950-04
  -5         0        -      0      0      0     0    0   0     host 
A1214-2950-05
  -6         0        -      0      0      0     0    0   0     host 
A1214-2950-03
  -7  18.18999        - 18625G 15705G  2919G 84.32 1.04   0     host cuda002
   1  18.18999  1.00000 18625G 15705G  2919G 84.32 1.04 152         osd.1
  -8  18.18999        - 18625G 15945G  2680G 85.61 1.06   0     host cuda001
   0  18.18999  1.00000 18625G 15945G  2680G 85.61 1.06 165         osd.0
  -9  18.18999        - 18625G 14755G  3870G 79.22 0.98   0     host cuda005
   3  18.18999  1.00000 18625G 14755G  3870G 79.22 0.98 162         osd.3
-10  18.18999        - 18625G 14503G  4122G 77.87 0.96   0     host cuda003
   4  18.18999  1.00000 18625G 14503G  4122G 77.87 0.96 158         osd.4
-11  18.18999        - 18625G 15965G  2660G 85.72 1.06   0     host cuda004
   2  18.18999  1.00000 18625G 15965G  2660G 85.72 1.06 165         osd.2
-12  18.18999        - 21940G 16054G  5886G 73.17 0.91   0     host 
A1214-2950-06
   5  18.18999  1.00000 21940G 16054G  5886G 73.17 0.91 159         osd.5
-13         0        -      0      0      0     0    0   0     host pe9
                  TOTAL   112T 92929G 22139G 80.76
MIN/MAX VAR: 0.91/1.06  STDDEV: 4.64
[root@pcl241 ceph]#


Is it wise to reduce the weight?
Thanks,
Best,
Herbert



-----Ursprüngliche Nachricht-----
Von: ceph-users [mailto:[email protected]] Im Auftrag von Vadim 
Bulst
Gesendet: Dienstag, 12. Juni 2018 11:16
An: [email protected]
Betreff: Re: [ceph-users] Problems with CephFS

Hi Herbert,

could you please run "ceph osd df"?

Cheers,

Vadim


On 12.06.2018 11:06, Steininger, Herbert wrote:
Hi Guys,

i've inherited a CephFS-Cluster, I'm fairly new to CephFS.
The Cluster was down and I managed somehow to bring it up again.
But now there are some Problems that I can't fix that easily.
This is what 'ceph -s' is giving me as Info:
[root@pcl241 ceph]# ceph -s
      cluster cde1487e-f930-417a-9403-28e9ebf406b8
       health HEALTH_WARN
              2 pgs backfill_toofull
              1 pgs degraded
              1 pgs stuck degraded
              2 pgs stuck unclean
              1 pgs stuck undersized
              1 pgs undersized
              recovery 260/29731463 objects degraded (0.001%)
              recovery 798/29731463 objects misplaced (0.003%)
              2 near full osd(s)
              crush map has legacy tunables (require bobtail, min is firefly)
              crush map has straw_calc_version=0
       monmap e8: 3 mons at 
{cephcontrol=172.22.12.241:6789/0,slurmbackup=172.22.20.4:6789/0,slurmmaster=172.22.20.3:6789/0}
              election epoch 48, quorum 0,1,2 
cephcontrol,slurmmaster,slurmbackup
        fsmap e2288: 1/1/1 up {0=pcl241=up:active}
       osdmap e10865: 6 osds: 6 up, 6 in; 2 remapped pgs
              flags nearfull
        pgmap v14103169: 320 pgs, 3 pools, 30899 GB data, 9678 kobjects
              92929 GB used, 22139 GB / 112 TB avail
              260/29731463 objects degraded (0.001%)
              798/29731463 objects misplaced (0.003%)
                   316 active+clean
                     2 active+clean+scrubbing+deep
                     1 active+undersized+degraded+remapped+backfill_toofull
                     1 active+remapped+backfill_toofull
[root@pcl241 ceph]#


[root@pcl241 ceph]# ceph osd tree
ID  WEIGHT    TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
   -1 109.13992 root default
   -2         0     host A1214-2950-01
   -3         0     host A1214-2950-02
   -4         0     host A1214-2950-04
   -5         0     host A1214-2950-05
   -6         0     host A1214-2950-03
   -7  18.18999     host cuda002
    1  18.18999         osd.1               up  1.00000          1.00000
   -8  18.18999     host cuda001
    0  18.18999         osd.0               up  1.00000          1.00000
   -9  18.18999     host cuda005
    3  18.18999         osd.3               up  1.00000          1.00000
-10  18.18999     host cuda003
    4  18.18999         osd.4               up  1.00000          1.00000
-11  18.18999     host cuda004
    2  18.18999         osd.2               up  1.00000          1.00000
-12  18.18999     host A1214-2950-06
    5  18.18999         osd.5               up  1.00000          1.00000
-13         0     host pe9




Could someone please put me in the right Direction about what to do to fix the 
Problems?
It seems that two OSD are full, but how can I solve that, if I don't have 
additionally hardware available?
Also it seems that the Cluster has different ceph-versions running (Hammer and 
Jewel), how to solve that?
Ceph-(mds/-mon/-osd) is running on Scientific Linux.
If more Info is needed, just let me know.

Thanks in Advance,
Steininger Herbert

---
Herbert Steininger
Leiter EDV
Administrator
Max-Planck-Institut für Psychiatrie - EDV Kraepelinstr.  2-10
80804 München
Tel      +49 (0)89 / 30622-368
Mail   [email protected] Web  http://www.psych.mpg.de


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Vadim Bulst

Universität Leipzig / URZ
04109  Leipzig, Augustusplatz 10

phone: ++49-341-97-33380
mail:    [email protected]


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
---
Vadim Bulst

Universität Leipzig / URZ
04109  Leipzig, Augustusplatz 10

phone: +49-341-97-33380
mail:    [email protected]

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to