[ceph-users] ceph upgrade from hammer to jewel

2017-02-23 Thread gjprabu
Hi Team, We upgraded ceph version from 0.94.9 hammer to 10.2.5 jewel . Still some clients are showing older version while mounting with debug mode, is this caused any issue with OSD and MON. How to find the solution. New version and properly working client root@172.20.25.162

Re: [ceph-users] ceph upgrade from hammer to jewel

2017-02-23 Thread jiajia zhong
are you sure you have ceph-fuse upgraded? #ceph-fuse --version 2017-02-23 16:07 GMT+08:00 gjprabu : > Hi Team, > > We upgraded ceph version from 0.94.9 hammer to 10.2.5 jewel . > Still some clients are showing older version while mounting with debug > mode, is this caused any issue w

[ceph-users] Authentication error CEPH installation

2017-02-23 Thread Chaitanya Ravuri
Hi Team, I have recently deployed a new CEPH cluster for OEL6 boxes for my testing. I am getting below error on the admin host. not sure how can i fix it. 2017-02-23 02:13:04.166366 7f9c85efb700 0 librados: client.admin authentication error (1) Operation not permitted Error connecting to cluster

Re: [ceph-users] Authentication error CEPH installation

2017-02-23 Thread Brad Hubbard
You need ceph.client.admin.keyring in /etc/ceph/ On Thu, Feb 23, 2017 at 8:13 PM, Chaitanya Ravuri wrote: > Hi Team, > > I have recently deployed a new CEPH cluster for OEL6 boxes for my testing. I > am getting below error on the admin host. not sure how can i fix it. > > 2017-02-23 02:13:04.1663

[ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg
I have a RADOS pool with nearly a million objects in it--but I don't exactly know how many, and that's the point. I ran a long list_objects() overnight and, at first glance this morning, the output looks good, but it is thousands of objects fewer than get_stats() said are there. I am just doin

Re: [ceph-users] radosgw-admin bucket check kills SSD disks

2017-02-23 Thread Marius Vaitiekunas
On Wed, Feb 22, 2017 at 4:06 PM, Marius Vaitiekunas < mariusvaitieku...@gmail.com> wrote: > Hi Cephers, > > We are running latest jewel (10.2.5). Bucket index sharding is set to 8. > rgw pools except data are placed on SSD. > Today I've done some testing and run bucket index check on a bucket with

Re: [ceph-users] Bug maybe: osdmap failed undecoded

2017-02-23 Thread huang jun
you can copy the corrupt osdmap file from osd.1 and then restart osd, we met this before, and that works for us. 2017-02-23 22:33 GMT+08:00 tao chang : > HI, > > I have a ceph cluster (ceph 10.2.5) witch 3 node, each has two osds. > > It was a power outage last night and all the server are resta

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg
On 02/23/2017 07:43 AM, Kent Borg wrote: I ran a long list_objects() overnight and, at first glance this morning, the output looks good, but it is thousands of objects fewer than get_stats() said are there. Update: I scripted up a quick check and every object name I would expect to be in my p

Re: [ceph-users] ceph upgrade from hammer to jewel

2017-02-23 Thread gjprabu
Hi zhong, Yes, one of the client was not upgraded ceph-fuse version , now it's working thank you Regards Prabu GJ  On Thu, 23 Feb 2017 15:08:42 +0530 zhong2p...@gmail.com wrote are you sure you have ceph-fuse upgraded?  #ceph-fuse --version 2017-02-23 16:07 GMT+08:00 gjprabu : Hi T

Re: [ceph-users] PG stuck peering after host reboot

2017-02-23 Thread george.vasilakakos
Since we need this pool to work again, we decided to take the data loss and try to move on. So far, no luck. We tried a force create but, as expected, with a PG that is not peering this did absolutely nothing. We also tried rm-past-intervals and remove from ceph-objectstore-tool and manually de

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Gregory Farnum
On Thu, Feb 23, 2017 at 6:55 AM, Kent Borg wrote: > On 02/23/2017 07:43 AM, Kent Borg wrote: >> >> I ran a long list_objects() overnight and, at first glance this morning, >> the output looks good, but it is thousands of objects fewer than get_stats() >> said are there. > > > Update: I scripted up

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg
On 02/23/2017 02:13 PM, Gregory Farnum wrote: Did you run a pg split or something? That's the only off-hand way I can think of the number of objects going over, though I don't recall how snapshots impact those numbers and obviously it's very wonky if you were to use a cache tier. We did increa

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Gregory Farnum
Yeah, that's why. It'll fix itself once all the newly-split PGS have scrubbed, but in order to keep the splitting operation constant-time it has to estimate how many objects ended up in each of the new ones. -Greg On Thu, Feb 23, 2017 at 11:26 AM Kent Borg wrote: > On 02/23/2017 02:13 PM, Gregor

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg
On 02/23/2017 02:51 PM, Gregory Farnum wrote: Yeah, that's why. It'll fix itself once all the newly-split PGS have scrubbed, but in order to keep the splitting operation constant-time it has to estimate how many objects ended up in each of the new ones. That makes some sense. Thanks! While I

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Gregory Farnum
On Thu, Feb 23, 2017 at 12:11 PM Kent Borg wrote: > On 02/23/2017 02:51 PM, Gregory Farnum wrote: > > Yeah, that's why. It'll fix itself once all the newly-split PGS have > > scrubbed, but in order to keep the splitting operation constant-time > > it has to estimate how many objects ended up in e

Re: [ceph-users] get_stats() on pool gives wrong number?

2017-02-23 Thread Kent Borg
On 02/23/2017 03:13 PM, Gregory Farnum wrote: If your PG count isn't a power of two, some of them will have double the number of objects of the others. It mostly doesn't matter, though at low counts it can improve balance. There's no breakage that Ceph cares about. -Greg Good to know. This s

[ceph-users] Random Health_warn

2017-02-23 Thread Scottix
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) We are seeing a weird behavior or not sure how to diagnose what could be going on. We started monitoring the overall_status from the json query and every once in a while we would get a HEALTH_WARN for a minute or two. Monitoring logs.

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Robin H. Johnson
On Thu, Feb 23, 2017 at 09:49:21PM +, Scottix wrote: > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > We are seeing a weird behavior or not sure how to diagnose what could be > going on. We started monitoring the overall_status from the json query and > every once in a whil

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Gregory Farnum
On Thu, Feb 16, 2017 at 9:19 AM, Benjeman Meekhof wrote: > I tried starting up just a couple OSD with debug_osd = 20 and > debug_filestore = 20. > > I pasted a sample of the ongoing log here. To my eyes it doesn't look > unusual but maybe someone else sees something in here that is a > problem:

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Benjeman Meekhof
Hi Greg, Appreciate you looking into it. I'm concerned about CPU power per daemon as well...though we never had this issue when restarting our dense nodes under Jewel. Is the rapid rate of OSDmap generation a one-time condition particular to post-update processing or to Kraken in general? We di

Re: [ceph-users] Jewel to Kraken OSD upgrade issues

2017-02-23 Thread Gregory Farnum
On Thu, Feb 23, 2017 at 2:34 PM, Benjeman Meekhof wrote: > Hi Greg, > > Appreciate you looking into it. I'm concerned about CPU power per > daemon as well...though we never had this issue when restarting our > dense nodes under Jewel. Is the rapid rate of OSDmap generation a > one-time condition

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Scottix
Ya the ceph-mon.$ID.log I was running ceph -w when one of them occurred too and it never output anything. Here is a snippet for the the 5:11AM occurrence. On Thu, Feb 23, 2017 at 1:56 PM Robin H. Johnson wrote: > On Thu, Feb 23, 2017 at 09:49:21PM +, Scottix wrote: > > ceph version 10.2.5

Re: [ceph-users] Random Health_warn

2017-02-23 Thread John Spray
On Thu, Feb 23, 2017 at 9:49 PM, Scottix wrote: > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > We are seeing a weird behavior or not sure how to diagnose what could be > going on. We started monitoring the overall_status from the json query and > every once in a while we woul

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Robin H. Johnson
On Thu, Feb 23, 2017 at 10:40:31PM +, Scottix wrote: > Ya the ceph-mon.$ID.log > > I was running ceph -w when one of them occurred too and it never output > anything. > > Here is a snippet for the the 5:11AM occurrence. Yep, I don't see anything in there that should have triggered HEALTH_WARN

Re: [ceph-users] Random Health_warn

2017-02-23 Thread David Turner
There are multiple approaches to give you more information about the Health state. CLI has these 2 options: ceph health detail ceph status I also like using ceph-dash. ( https://github.com/Crapworks/ceph-dash ) It has an associated nagios check to scrape the ceph-dash page. I personally do `

Re: [ceph-users] Random Health_warn

2017-02-23 Thread Scottix
That sounds about right, I do see blocked requests sometimes when it is under really heavy load. Looking at some examples I think summary should list the issues. "summary": [], "overall_status": "HEALTH_OK", I'll try logging that too. Scott On Thu, Feb 23, 2017 at 3:00 PM David Turner wrote:

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
On Thu, Feb 23, 2017 at 5:18 PM, Schlacta, Christ wrote: > So I updated suse leap, and now I'm getting the following error from > ceph. I know I need to disable some features, but I'm not sure what > they are.. Looks like 14, 57, and 59, but I can't figure out what > they correspond to, nor ther

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ
aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "chooseleaf_stable": 1, "straw_calc_version": 1, "allowed_bucket

Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
On Fri, Feb 24, 2017 at 11:00 AM, Schlacta, Christ wrote: > aarcane@densetsu:~$ ceph --cluster rk osd crush show-tunables > { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, >

[ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ
-- Forwarded message -- From: Schlacta, Christ Date: Thu, Feb 23, 2017 at 5:56 PM Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph. To: Brad Hubbard They're from the suse leap ceph team. They maintain ceph, and build up to date versions for suse leap. What I d

[ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ
-- Forwarded message -- From: Schlacta, Christ Date: Thu, Feb 23, 2017 at 6:06 PM Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph. To: Brad Hubbard So setting the above to 0 by sheer brute force didn't work, so it's not crush or osd problem.. also, the errors

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
Is your change reflected in the current crushmap? On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ wrote: > -- Forwarded message -- > From: Schlacta, Christ > Date: Thu, Feb 23, 2017 at 6:06 PM > Subject: Re: [ceph-users] Upgrade Woes on suse leap with OBS ceph. > To: Brad Hubb

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ
insofar as I can tell, yes. Everything indicates that they are in effect. On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard wrote: > Is your change reflected in the current crushmap? > > On Fri, Feb 24, 2017 at 12:07 PM, Schlacta, Christ > wrote: >> -- Forwarded message -- >> From:

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
Did you dump out the crushmap and look? On Fri, Feb 24, 2017 at 1:36 PM, Schlacta, Christ wrote: > insofar as I can tell, yes. Everything indicates that they are in effect. > > On Thu, Feb 23, 2017 at 7:14 PM, Brad Hubbard wrote: >> Is your change reflected in the current crushmap? >> >> On Fri

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ
# begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 # typ

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
Hmm, What's interesting is the feature set reported by the servers has only changed from e0106b84a846a42 Bit 1 set Bit 6 set Bit 9 set Bit 11 set Bit 13 set Bit 14 set Bit 18 set Bit 23 set Bit 25 set Bit 27 set Bit 30 set Bit 35 set Bit 36 set Bit 37 set Bit 39 set Bit 41 set Bit 42 set Bit 48

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
Kefu has just pointed out that this has the hallmarks of https://github.com/ceph/ceph/pull/13275 On Fri, Feb 24, 2017 at 3:00 PM, Brad Hubbard wrote: > Hmm, > > What's interesting is the feature set reported by the servers has only > changed from > > e0106b84a846a42 > > Bit 1 set Bit 6 set Bit 9

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Schlacta, Christ
So hopefully when the suse ceph team get 11.2 released it should fix this, yes? On Feb 23, 2017 21:06, "Brad Hubbard" wrote: > Kefu has just pointed out that this has the hallmarks of > https://github.com/ceph/ceph/pull/13275 > > On Fri, Feb 24, 2017 at 3:00 PM, Brad Hubbard wrote: > > Hmm, > >

Re: [ceph-users] Fwd: Upgrade Woes on suse leap with OBS ceph.

2017-02-23 Thread Brad Hubbard
On Fri, Feb 24, 2017 at 3:07 PM, Schlacta, Christ wrote: > So hopefully when the suse ceph team get 11.2 released it should fix this, > yes? Definitely not a question I can answer. What I can tell you is the fix is only in master atm, not yet backported to kraken http://tracker.ceph.com/issues/1