Re: [ceph-users] recovery process stops

2014-10-25 Thread Harald Rößler
Anyone an idea to solver the situation? Thanks for any advise. Kind Regards Harald Rößler > Am 23.10.2014 um 18:56 schrieb Harald Rößler : > > @Wido: sorry I don’t understand what you mean 100%, generated some output > which may helps. > > > Ok the pool: > > pool 3 'bcf' rep size 3 min_size 1 c

Re: [ceph-users] recovery process stops

2014-10-23 Thread Harald Rößler
@Wido: sorry I don’t understand what you mean 100%, generated some output which may helps. Ok the pool: pool 3 'bcf' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 832 pgp_num 832 last_change 8000 owner 0 all remapping pg have an temp entry: pg_temp 3.1 [14,20,0] pg_temp

Re: [ceph-users] recovery process stops

2014-10-23 Thread Wido den Hollander
On 10/23/2014 05:33 PM, Harald Rößler wrote: > Hi all > > the procedure does not work for me, have still 47 active+remapped pg. Anyone > have an idea how to fix this issue. If you look at those PGs using "ceph osd pg dump", what is their prefix? They should start with a number and that number c

Re: [ceph-users] recovery process stops

2014-10-23 Thread Harald Rößler
Hi all the procedure does not work for me, have still 47 active+remapped pg. Anyone have an idea how to fix this issue. @Wido: now my cluster have a usage less than 80% - thanks for your advice. Harry Am 21.10.2014 um 22:38 schrieb Craig Lewis mailto:cle...@centraldesktop.com>>: In that case

Re: [ceph-users] recovery process stops

2014-10-21 Thread Craig Lewis
In that case, take a look at ceph pg dump | grep remapped. In the up or active column, there should be one or two common OSDs between the stuck PGs. Try restarting those OSD daemons. I've had a few OSDs get stuck scheduling recovery, particularly around toofull situations. I've also had Robert'

Re: [ceph-users] recovery process stops

2014-10-21 Thread Robert LeBlanc
I've had issues magically fix themselves over night after waiting/trying things for hours. On Tue, Oct 21, 2014 at 1:02 PM, Harald Rößler wrote: > After more than 10 hours the same situation, I don’t think it will fix > self over time. How I can find out what is the problem. > > > Am 21.10.2014

Re: [ceph-users] recovery process stops

2014-10-21 Thread Harald Rößler
After more than 10 hours the same situation, I don’t think it will fix self over time. How I can find out what is the problem. Am 21.10.2014 um 17:28 schrieb Craig Lewis mailto:cle...@centraldesktop.com>>: That will fix itself over time. remapped just means that Ceph is moving the data aroun

Re: [ceph-users] recovery process stops

2014-10-21 Thread Craig Lewis
That will fix itself over time. remapped just means that Ceph is moving the data around. It's normal to see PGs in the remapped and/or backfilling state after OSD restarts. They should go down steadily over time. How long depends on how much data is in the PGs, how fast your hardware is, how ma

Re: [ceph-users] recovery process stops

2014-10-21 Thread Harald Rößler
Hi all, thank you for your support, now the file system is not degraded any more. Now I have a minus degrading :-) 2014-10-21 10:15:22.303139 mon.0 [INF] pgmap v43376478: 3328 pgs: 3281 active+clean, 47 active+remapped; 1609 GB data, 5022 GB used, 1155 GB / 6178 GB avail; 8034B/s rd, 3548KB/s

Re: [ceph-users] recovery process stops

2014-10-20 Thread Craig Lewis
I've been in a state where reweight-by-utilization was deadlocked (not the daemons, but the remap scheduling). After successive osd reweight commands, two OSDs wanted to swap PGs, but they were both toofull. I ended up temporarily increasing mon_osd_nearfull_ratio to 0.87. That removed the imped

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
yes, tomorrow I will get the replacement of the failed disk, to get a new node with many disk will take a few days. No other idea? Harald Rößler > Am 20.10.2014 um 16:45 schrieb Wido den Hollander : > > On 10/20/2014 04:43 PM, Harald Rößler wrote: >> Yes, I had some OSD which was near full

Re: [ceph-users] recovery process stops

2014-10-20 Thread Leszek Master
You can set lower weight on full osds, or try changing the osd_near_full_ratio parameter in your cluster from 85 to for example 89. But i don't know what can go wrong when you do that. 2014-10-20 17:12 GMT+02:00 Wido den Hollander : > On 10/20/2014 05:10 PM, Harald Rößler wrote: > > yes, tomorrow

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
Yes I agree 100%, but actual every disk have a maximum of 86% usage, there should a way to recover the cluster. To set the near full ratio to higher than 85% should be only a short term solution. New disk for higher capacity are already ordered, I only don’t like degraded situation, for a week o

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 05:10 PM, Harald Rößler wrote: > yes, tomorrow I will get the replacement of the failed disk, to get a new > node with many disk will take a few days. > No other idea? > If the disks are all full, then, no. Sorry to say this, but it came down to poor capacity management. Never le

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 04:43 PM, Harald Rößler wrote: > Yes, I had some OSD which was near full, after that I tried to fix the > problem with "ceph osd reweight-by-utilization", but this does not help. > After that I set the near full ratio to 88% with the idea that the remapping > would fix the issue. A

Re: [ceph-users] recovery process stops

2014-10-20 Thread Harald Rößler
Yes, I had some OSD which was near full, after that I tried to fix the problem with "ceph osd reweight-by-utilization", but this does not help. After that I set the near full ratio to 88% with the idea that the remapping would fix the issue. Also a restart of the OSD doesn’t help. At the same ti

Re: [ceph-users] recovery process stops

2014-10-20 Thread Leszek Master
I think it's because you have too full osds like in warning message. I had similiar problem recently and i did: ceph osd reweight-by-utilization But first read what this command does. It solved problem for me. 2014-10-20 14:45 GMT+02:00 Harald Rößler : > Dear All > > I have in them moment a iss

Re: [ceph-users] recovery process stops

2014-10-20 Thread Wido den Hollander
On 10/20/2014 02:45 PM, Harald Rößler wrote: > Dear All > > I have in them moment a issue with my cluster. The recovery process stops. > See this: 2 active+degraded+remapped+backfill_toofull 156 pgs backfill_toofull You have one or more OSDs which are to full and that causes recovery to stop.