Re: [ceph-users] Bug in OSD Maps

2017-07-30 Thread Stuart Harland
I know this thread has been silent for a while, however due to various reasons, I have been forced to work specifically on this issue this weekend. As it turns out, you were partly right, the fix for the state is to use ceph-objectstore, however it was not to remove the PG in question, rather to

Re: [ceph-users] Bug in OSD Maps

2017-05-29 Thread Vincent Godin
We had similar problem few month ago when migrating from hammer to jewel. We encountered some old bugs (which were declared closed on Hammer !!!l). We had some OSDs refusing to start because of lack of pg map like yours, some others which were completly busy and start declaring valid OSDs losts =>

Re: [ceph-users] Bug in OSD Maps

2017-05-26 Thread Gregory Farnum
On Fri, May 26, 2017 at 3:05 AM Stuart Harland < s.harl...@livelinktechnology.net> wrote: > Could you elaborate about what constitutes deleting the PG in this > instance, is a simple `rm` of the directories with the PG number in current > sufficient? or does it need some poking of anything else? >

Re: [ceph-users] Bug in OSD Maps

2017-05-26 Thread Stuart Harland
Unfortunately that is the side effect of 2400 disks + patch level increment mid upgrade + massive injection of disks with associated rebalancing act due to IOPerf drop off due to filling up FS issues. − Stuart Harland: Infrastructure Engineer Email: s.harl...@livelinktechnology.net

Re: [ceph-users] Bug in OSD Maps

2017-05-26 Thread David Turner
Are those all currently running versions? You should always run your cluster on the exact same version. On Fri, May 26, 2017 at 6:05 AM Stuart Harland < s.harl...@livelinktechnology.net> wrote: > Could you elaborate about what constitutes deleting the PG in this > instance, is a simple `rm` of t

Re: [ceph-users] Bug in OSD Maps

2017-05-26 Thread Stuart Harland
Could you elaborate about what constitutes deleting the PG in this instance, is a simple `rm` of the directories with the PG number in current sufficient? or does it need some poking of anything else? It is conceivable that there is a fault with the disks, they are known to be ‘faulty’ in the g

Re: [ceph-users] Bug in OSD Maps

2017-05-25 Thread Gregory Farnum
On Thu, May 25, 2017 at 8:39 AM Stuart Harland < s.harl...@livelinktechnology.net> wrote: > Has no-one any idea about this? If needed I can produce more information > or diagnostics on request. I find it hard to believe that we are the only > people experiencing this, and thus far we have lost abo

Re: [ceph-users] Bug in OSD Maps

2017-05-25 Thread Stuart Harland
Has no-one any idea about this? If needed I can produce more information or diagnostics on request. I find it hard to believe that we are the only people experiencing this, and thus far we have lost about 40 OSDs to corruption due to this. Regards Stuart Harland > On 24 May 2017, at 10:32,

[ceph-users] Bug in OSD Maps

2017-05-24 Thread Stuart Harland
Hello I think I’m running into a bug that is described at http://tracker.ceph.com/issues/14213 for Hammer. However I’m running the latest version of Jewel 10.2.7, although I’m in the middle of upgrading the cluster (from 10.2.5). At first it was on a couple of nodes, but now it seems to be mor