[ceph-users] pg has invalid (post-split) stats; must scrub before tier agent can activate

2016-05-24 Thread Stillwell, Bryan J
On one of my test clusters that I¹ve upgraded from Infernalis to Jewel (10.2.1), and I¹m having a problem where reads are resulting in unfound objects. I¹m using cephfs on top of a erasure coded pool with cache tiering which I believe is related. >From what I can piece together, here is what the

[ceph-users] Rebuilding/recreating CephFS journal?

2016-05-27 Thread Stillwell, Bryan J
I have a Ceph cluster at home that I¹ve been running CephFS on for the last few years. Recently my MDS server became damaged and while attempting to fix it I believe I¹ve destroyed by CephFS journal based off this: 2016-05-25 16:48:23.882095 7f8d2fac2700 -1 log_channel(cluster) log [ERR] : Error

Re: [ceph-users] Rebuilding/recreating CephFS journal?

2016-05-27 Thread Stillwell, Bryan J
On 5/27/16, 11:27 AM, "Gregory Farnum" wrote: >On Fri, May 27, 2016 at 9:44 AM, Stillwell, Bryan J > wrote: >> I have a Ceph cluster at home that I¹ve been running CephFS on for the >> last few years. Recently my MDS server became damaged and while >> a

Re: [ceph-users] Rebuilding/recreating CephFS journal?

2016-05-27 Thread Stillwell, Bryan J
On 5/27/16, 3:01 PM, "Gregory Farnum" wrote: >> >> So would the next steps be to run the following commands?: >> >> cephfs-table-tool 0 reset session >> cephfs-table-tool 0 reset snap >> cephfs-table-tool 0 reset inode >> cephfs-journal-tool --rank=0 journal reset >> cephfs-data-scan init >> >> c

Re: [ceph-users] Rebuilding/recreating CephFS journal?

2016-05-27 Thread Stillwell, Bryan J
mark it as repaired. That's a monitor command. > >On Fri, May 27, 2016 at 2:09 PM, Stillwell, Bryan J > wrote: >> On 5/27/16, 3:01 PM, "Gregory Farnum" wrote: >> >>>> >>>> So would the next steps be to run the following commands?: >

Re: [ceph-users] Rebuilding/recreating CephFS journal?

2016-05-27 Thread Stillwell, Bryan J
On 5/27/16, 3:23 PM, "Gregory Farnum" wrote: >On Fri, May 27, 2016 at 2:22 PM, Stillwell, Bryan J > wrote: >> Here's the full 'ceph -s' output: >> >> # ceph -s >> cluster c7ba6111-e0d6-40e8-b0af-8428e8702df9 >> health HEALT

Re: [ceph-users] pg has invalid (post-split) stats; must scrub before tier agent can activate

2016-06-16 Thread Stillwell, Bryan J
6, 4:27 PM, "Stillwell, Bryan J" wrote: >On one of my test clusters that I¹ve upgraded from Infernalis to Jewel >(10.2.1), and I¹m having a problem where reads are resulting in unfound >objects. > >I¹m using cephfs on top of a erasure coded pool with cache tiering which I >believ

[ceph-users] Multi-device BlueStore testing

2016-07-19 Thread Stillwell, Bryan J
I would like to do some BlueStore testing using multiple devices like mentioned here: https://www.sebastien-han.fr/blog/2016/05/04/Ceph-Jewel-configure-BlueStore-with-multiple-devices/ However, si

[ceph-users] Multi-device BlueStore OSDs multiple fsck failures

2016-08-03 Thread Stillwell, Bryan J
I've been doing some benchmarking of BlueStore in 10.2.2 the last few days and have come across a failure that keeps happening after stressing the cluster fairly heavily. Some of the OSDs started failing and attempts to restart them fail to log anything in /var/log/ceph/, so I tried starting them

Re: [ceph-users] Multi-device BlueStore OSDs multiple fsck failures

2016-08-03 Thread Stillwell, Bryan J
>This is a good test case and I doubt any of us testing by enabling fsck() >on mount/unmount. > >Thanks & Regards >Somnath > >-Original Message- >From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >Stillwell, Bryan J >Sent: Wednesday,

[ceph-users] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

2016-09-21 Thread Stillwell, Bryan J
While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 I've run into serious performance issues every time I restart an OSD. At first I thought the problem I was running into was caused by the osdmap encoding bug that Dan and Wido ran into when upgrading to 0.94.7, because I was see

Re: [ceph-users] [EXTERNAL] Upgrading 0.94.6 -> 0.94.9 saturating mon node networking

2016-09-23 Thread Stillwell, Bryan J
causing some kind of spinlock condition. > >> On Sep 21, 2016, at 4:21 PM, Stillwell, Bryan J >> wrote: >> >> While attempting to upgrade a 1200+ OSD cluster from 0.94.6 to 0.94.9 >>I've >> run into serious performance issues every time I restart an OSD.

Re: [ceph-users] upgrade from v0.94.6 or lower and 'failed to encode map X with expected crc'

2016-10-06 Thread Stillwell, Bryan J
Thanks Kefu! Downgrading the mons to 0.94.6 got us out of this situation. I appreciate you tracking this down! Bryan On 10/4/16, 1:18 AM, "ceph-users on behalf of kefu chai" wrote: >hi ceph users, > >If user upgrades the cluster from a prior release to v0.94.7 or up by >following the steps: >

[ceph-users] Missing arm64 Ubuntu packages for 10.2.3

2016-10-13 Thread Stillwell, Bryan J
I have a basement cluster that is partially built with Odroid-C2 boards and when I attempted to upgrade to the 10.2.3 release I noticed that this release doesn't have an arm64 build. Are there any plans on continuing to make arm64 builds? Thanks, Bryan _

Re: [ceph-users] Missing arm64 Ubuntu packages for 10.2.3

2016-10-13 Thread Stillwell, Bryan J
On 10/13/16, 2:32 PM, "Alfredo Deza" wrote: >On Thu, Oct 13, 2016 at 11:33 AM, Stillwell, Bryan J > wrote: >> I have a basement cluster that is partially built with Odroid-C2 boards >>and >> when I attempted to upgrade to the 10.2.3 release I noticed that thi

Re: [ceph-users] Missing arm64 Ubuntu packages for 10.2.3

2016-10-14 Thread Stillwell, Bryan J
On 10/14/16, 2:29 PM, "Alfredo Deza" wrote: >On Thu, Oct 13, 2016 at 5:19 PM, Stillwell, Bryan J > wrote: >> On 10/13/16, 2:32 PM, "Alfredo Deza" wrote: >> >>>On Thu, Oct 13, 2016 at 11:33 AM, Stillwell, Bryan J >>> wrote: >>>&

[ceph-users] Announcing the ceph-large mailing list

2016-10-20 Thread Stillwell, Bryan J
Do you run a large Ceph cluster? Do you find that you run into issues that you didn't have when your cluster was smaller? If so we have a new mailing list for you! Announcing the new ceph-large mailing list. This list is targeted at experienced Ceph operators with cluster(s) over 500 OSDs to di

[ceph-users] Total free space in addition to MAX AVAIL

2016-11-01 Thread Stillwell, Bryan J
I recently learned that 'MAX AVAIL' in the 'ceph df' output doesn't represent what I thought it did. It actually represents the amount of data that can be used before the first OSD becomes full, and not the sum of all free space across a set of OSDs. This means that balancing the data with 'ceph

Re: [ceph-users] Total free space in addition to MAX AVAIL

2016-11-01 Thread Stillwell, Bryan J
On 11/1/16, 1:45 PM, "Sage Weil" wrote: >On Tue, 1 Nov 2016, Stillwell, Bryan J wrote: >> I recently learned that 'MAX AVAIL' in the 'ceph df' output doesn't >> represent what I thought it did. It actually represents the amount of >> data

[ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-09 Thread Stillwell, Bryan J
Last week I decided to play around with Kraken (11.1.1-1xenial) on a single node, two OSD cluster, and after a while I noticed that the new ceph-mgr daemon is frequently using a lot of the CPU: 17519 ceph 20 0 850044 168104208 S 102.7 4.3 1278:27 ceph-mgr Restarting it with 'system

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Stillwell, Bryan J
On 1/10/17, 5:35 AM, "John Spray" wrote: >On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J > wrote: >> Last week I decided to play around with Kraken (11.1.1-1xenial) on a >> single node, two OSD cluster, and after a while I noticed that the new >> ceph-mgr d

Re: [ceph-users] Crushmap (tunables) flapping on cluster

2017-01-10 Thread Stillwell, Bryan J
On 1/10/17, 2:56 AM, "ceph-users on behalf of Breunig, Steve (KASRL)" wrote: >Hi list, > > >I'm running a cluster which is currently in migration from hammer to >jewel. > > >Actually i have the problem, that the tunables are flapping and a map of >an rbd image is not working. > > >It is flapping

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Stillwell, Bryan J
, 2017 at 9:00 AM, Stillwell, Bryan J > wrote: >> On 1/10/17, 5:35 AM, "John Spray" wrote: >> >>>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J >>> wrote: >>>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a >>>>

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-10 Thread Stillwell, Bryan J
;On Tue, Jan 10, 2017 at 9:41 AM, Stillwell, Bryan J > wrote: >> This is from: >> >> ceph version 11.1.1 (87597971b371d7f497d7eabad3545d72d18dd755) >> >> On 1/10/17, 10:23 AM, "Samuel Just" wrote: >> >>>What ceph sha1 is that? Does it incl

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-01-11 Thread Stillwell, Bryan J
n behalf of Stillwell, Bryan J" wrote: >On 1/10/17, 5:35 AM, "John Spray" wrote: > >>On Mon, Jan 9, 2017 at 11:46 PM, Stillwell, Bryan J >> wrote: >>> Last week I decided to play around with Kraken (11.1.1-1xenial) on a >>> single node, two OSD cluster

Re: [ceph-users] OSD create with SSD journal

2017-01-11 Thread Stillwell, Bryan J
On 1/11/17, 10:31 AM, "ceph-users on behalf of Reed Dier" wrote: >>2017-01-03 12:10:23.514577 7f1d821f2800 0 ceph version 10.2.5 >>(c461ee19ecbc0c5c330aca20f7392c9a00730367), process ceph-osd, pid 19754 >> 2017-01-03 12:10:23.517465 7f1d821f2800 1 >>filestore(/var/lib/ceph/tmp/mnt.WaQmjK) mkfs

Re: [ceph-users] Experience with 5k RPM/archive HDDs

2017-02-03 Thread Stillwell, Bryan J
On 2/3/17, 3:23 AM, "ceph-users on behalf of Wido den Hollander" wrote: > >> Op 3 februari 2017 om 11:03 schreef Maxime Guyot >>: >> >> >> Hi, >> >> Interesting feedback! >> >> > In my opinion the SMR can be used exclusively for the RGW. >> > Unless it's something like a backup/archive clus