from:"Bryan Stillwell"

[ceph-users] Multi-MDS CephFS upgrades limitation

2021-10-06 Thread Bryan Stillwell

One of the main limitations of using CephFS is the requirement to reduce the number of active MDS daemons to one during upgrades. As far as I can tell this has been a known problem since Luminous (~2017). This issue essentially requires downtime during upgrades for any CephFS cluster that needs m

[ceph-users] Re: v16.2.5 Pacific released

2021-07-09 Thread Bryan Stillwell

Thanks David! This looks good now. :) > On Jul 8, 2021, at 6:28 PM, David Galloway wrote: > > Done! > > On 7/8/21 3:51 PM, Bryan Stillwell wrote: >> There appears to be arm64 packages built for Ubuntu Bionic, but not for >> Focal. Any chance Focal pa

[ceph-users] name alertmanager/node-exporter already in use with v16.2.5

2021-07-08 Thread Bryan Stillwell

I upgraded one of my clusters to v16.2.5 today and now I'm seeing these messages from 'ceph -W cephadm': 2021-07-08T22:01:55.356953+ mgr.excalibur.kuumco [ERR] Failed to apply alertmanager spec AlertManagerSpec({'placement': PlacementSpec(count=1), 'service_type': 'alertmanager', 'service_i

[ceph-users] Re: v16.2.5 Pacific released

2021-07-08 Thread Bryan Stillwell

There appears to be arm64 packages built for Ubuntu Bionic, but not for Focal. Any chance Focal packages can be built as well? Thanks, Bryan > On Jul 8, 2021, at 12:20 PM, David Galloway wrote: > > Caution: This email is from an external sender. Please do not click links or > open attachment

[ceph-users] Re: cephadm removed mon. key when adding new mon node

2021-06-01 Thread Bryan Stillwell

ly complete any upgrades after that, which means the global container image name was never changed. Bryan On Jun 1, 2021, at 9:38 AM, Bryan Stillwell mailto:bstillw...@godaddy.com>> wrote: This morning I tried adding a mon node to my home Ceph cluster with the following command: ceph orch

[ceph-users] cephadm removed mon. key when adding new mon node

2021-06-01 Thread Bryan Stillwell

This morning I tried adding a mon node to my home Ceph cluster with the following command: ceph orch daemon add mon ether This seemed to work at first, but then it decided to remove it fairly quickly which broke the cluster because the mon. keyring was also removed: 2021-06-01T14:16:11.523210

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

2021-05-15 Thread Bryan Stillwell

,1,14,0,19,8]p8[8,17,4,1,14,0,19,8]p8 2021-05-11T22:41:11.332885+ 2021-05-11T22:41:11.332885+ I'm now considering using device classes and assigning the OSDs to either hdd1 or hdd2... Unless someone has another idea? Thanks, Bryan > On May 14, 2021, at 12:35 PM, Bryan Stillwell wrote:

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

2021-05-14 Thread Bryan Stillwell

ndep 0 type host > step chooseleaf indep 1 type osd > step emit > > J. > > ‐‐‐ Original Message ‐‐‐ > > On Wednesday, May 12th, 2021 at 17:58, Bryan Stillwell > wrote: > >> I'm trying to figure out a CRUSH rule that will spread data out across my

[ceph-users] cephadm stalled after adjusting placement

2021-05-14 Thread Bryan Stillwell

I'm looking for help in figuring out why cephadm isn't making any progress after I told it to redeploy an mds daemon with: ceph orch daemon redeploy mds.cephfs.aladdin.kgokhr ceph/ceph:v15.2.12 The output from 'ceph -W cephadm' just says: 2021-05-14T16:24:46.628084+ mgr.paris.glbvov [INF]

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

2021-05-12 Thread Bryan Stillwell

1 harrahs 1 mirage 2 mandalaybay 2 paris ... Hopefully someone else will find this useful. Bryan > On May 12, 2021, at 9:58 AM, Bryan Stillwell wrote: > > I'm trying to figure out a CRUSH rule that will spread data out across my > cluster as much as possible,

[ceph-users] CRUSH rule for EC 6+2 on 6-node cluster

2021-05-12 Thread Bryan Stillwell

I'm trying to figure out a CRUSH rule that will spread data out across my cluster as much as possible, but not more than 2 chunks per host. If I use the default rule with an osd failure domain like this: step take default step choose indep 0 type osd step emit I get clustering of 3-4 chunks on

[ceph-users] Upgrade to 15.2.7 fails on mixed x86_64/arm64 cluster

2020-12-01 Thread Bryan Stillwell

I tried upgrading my home cluster to 15.2.7 (from 15.2.5) today and it appears to be entering a loop when trying to match docker images for ceph:v15.2.7: 2020-12-01T16:47:26.761950-0700 mgr.aladdin.liknom [INF] Upgrade: Checking mgr daemons... 2020-12-01T16:47:26.769581-0700 mgr.aladdin.liknom [

[ceph-users] Is it possible to rebuild a bucket instance?

2020-08-06 Thread Bryan Stillwell

I have a cluster running Nautilus where the bucket instance (backups.190) has gone missing: # radosgw-admin metadata list bucket | grep 'backups.19[0-1]' | sort "backups.190", "backups.191", # radosgw-admin metadata list bucket.instance | grep 'backups.19[0-1]' | sort "backups.191:00

[ceph-users] Multiple outages when disabling scrubbing

2020-06-03 Thread Bryan Stillwell

The last two days we've experienced a couple short outages shortly after setting both 'noscrub' and 'nodeep-scrub' on one of our largest Ceph clusters (~2,200 OSDs). This cluster is running Nautilus (14.2.6) and setting/unsetting these flags has been done many times in the past without a problem.

[ceph-users] Re: v15.2.0 Octopus released

2020-03-25 Thread Bryan Stillwell

On Mar 24, 2020, at 5:38 AM, Abhishek Lekshmanan wrote: > #. Upgrade monitors by installing the new packages and restarting the > monitor daemons. For example, on each monitor host,:: > > # systemctl restart ceph-mon.target > > Once all monitors are up, verify that the monitor upgrade i

[ceph-users] Re: v15.2.0 Octopus released

2020-03-24 Thread Bryan Stillwell

Great work! Thanks to everyone involved! One minor thing I've noticed so far with the Ubuntu Bionic build is it's reporting the release as an RC instead of being 'stable': $ ceph versions | grep octopus "ceph version 15.2.0 (dc6a0b5c3cbf6a5e1d6d4f20b5ad466d76b96247) octopus (rc)": 1 B

[ceph-users] Re: Ubuntu Bionic arm64 repo missing packages

2019-12-20 Thread Bryan Stillwell

I just noticed that arm64 packages only exist for xenial. Is there a reason why bionic packages aren't being built? Thanks, Bryan > On Dec 20, 2019, at 4:22 PM, Bryan Stillwell wrote: > > I was going to try adding an OSD to my home cluster using one of the 4GB > Raspberry

[ceph-users] Ubuntu Bionic arm64 repo missing packages

2019-12-20 Thread Bryan Stillwell

I was going to try adding an OSD to my home cluster using one of the 4GB Raspberry Pis today, but it appears that the Ubuntu Bionic arm64 repo is missing a bunch of packages: $ sudo grep ^Package: /var/lib/apt/lists/download.ceph.com_debian-nautilus_dists_bionic_main_binary-arm64_Packages Packa

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2019-12-18 Thread Bryan Stillwell

ou received this in error, please contact the sender and destroy any copies of this information. ________ From: Bryan Stillwell mailto:bstillw...@godaddy.com>> Sent: Wednesday, December 18, 2019 4:44:45 PM To: Sage Weil mailto:s...@newdream.net>

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2019-12-18 Thread Bryan Stillwell

On Dec 18, 2019, at 11:58 AM, Sage Weil mailto:s...@newdream.net>> wrote: On Wed, 18 Dec 2019, Bryan Stillwell wrote: After upgrading one of our clusters from Nautilus 14.2.2 to Nautilus 14.2.5 I'm seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H'

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2019-12-18 Thread Bryan Stillwell

On Dec 18, 2019, at 1:48 PM, e...@lapsus.org wrote: > > That sounds very similar to what I described there: > https://tracker.ceph.com/issues/43364 I would agree that they're quite similar if not the same thing! Now that you mention it I see the thread is named mgr-fin in 'top -H' as well. I

[ceph-users] High CPU usage by ceph-mgr in 14.2.5

2019-12-18 Thread Bryan Stillwell

After upgrading one of our clusters from Nautilus 14.2.2 to Nautilus 14.2.5 I'm seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H'). Attaching to the thread with strace shows a lot of mmap and munmap calls. Here's the distribution after watching it for a few minutes: 48.7

[ceph-users] Re: ceph-mon using 100% CPU after upgrade to 14.2.5

2019-12-16 Thread Bryan Stillwell

roblem. Bryan On Dec 14, 2019, at 10:27 AM, Sasha Litvak mailto:alexander.v.lit...@gmail.com>> wrote: Notice: This email is from an external sender. Bryan, Were you able to resolve this? If yes, can you please share with the list? On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell

[ceph-users] Re: ceph-mon using 100% CPU after upgrade to 14.2.5

2019-12-13 Thread Bryan Stillwell

alFrameEx 0.55% [kernel] [k] _raw_spin_unlock_irqrestore I increased mon debugging to 20 and nothing stuck out to me. Bryan > On Dec 12, 2019, at 4:46 PM, Bryan Stillwell wrote: > > On our test cluster after upgrading to 14.2.5 I'm having problems with th

[ceph-users] ceph-mon using 100% CPU after upgrade to 14.2.5

2019-12-12 Thread Bryan Stillwell

On our test cluster after upgrading to 14.2.5 I'm having problems with the mons pegging a CPU core while moving data around. I'm currently converting the OSDs from FileStore to BlueStore by marking the OSDs out in multiple nodes, destroying the OSDs, and then recreating them with ceph-volume lv

[ceph-users] Re: RESEND: Re: PG Balancer Upmap mode not working

2019-12-10 Thread Bryan Stillwell

Rich, What's your failure domain (osd? host? chassis? rack?) and how big is each of them? For example I have a failure domain of type rack in one of my clusters with mostly even rack sizes: # ceph osd crush rule dump | jq -r '.[].steps' [ { "op": "take", "item": -1, "item_name":

[ceph-users] Re: osdmaps not trimmed until ceph-mon's restarted (if cluster has a down osd)

2019-12-09 Thread Bryan Stillwell

On Nov 18, 2019, at 8:12 AM, Dan van der Ster wrote: > > On Fri, Nov 15, 2019 at 4:45 PM Joao Eduardo Luis wrote: >> >> On 19/11/14 11:04AM, Gregory Farnum wrote: >>> On Thu, Nov 14, 2019 at 8:14 AM Dan van der Ster >>> wrote: Hi Joao, I might have found the reason why s

[ceph-users] Re: mgr hangs with upmap balancer

2019-11-22 Thread Bryan Stillwell

e of a solution yet so I'll stick with disabled balancer > for now since the current pg placement is fine. > > Regards, > Eugen > > > [1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56994.html > [2] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg5

[ceph-users] mgr hangs with upmap balancer

2019-11-19 Thread Bryan Stillwell

On multiple clusters we are seeing the mgr hang frequently when the balancer is enabled. It seems that the balancer is getting caught in some kind of infinite loop which chews up all the CPU for the mgr which causes problems with other modules like prometheus (we don't have the devicehealth mod

[ceph-users] Re: msgr2 not used on OSDs in some Nautilus clusters

2019-11-19 Thread Bryan Stillwell

Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Tue, Nov 19, 2019 at 8:42 PM Bryan Stillwell > wrote: >> >> Closing the loop here. I

[ceph-users] Re: msgr2 not used on OSDs in some Nautilus clusters

2019-11-19 Thread Bryan Stillwell

as to track down, maybe a check should be added before enabling msgr2 to make sure the require-osd-release is set to nautilus? Bryan > On Nov 18, 2019, at 5:41 PM, Bryan Stillwell wrote: > > I cranked up debug_ms to 20 on two of these clusters today and I'm still not > underst

[ceph-users] Re: msgr2 not used on OSDs in some Nautilus clusters

2019-11-18 Thread Bryan Stillwell

18 16:46:05.979 7f917becf700 1 -- 10.0.13.2:0/3084510 learned_addr learned my addr 10.0.13.2:0/3084510 (peer_addr_for_me v1:10.0.13.2:0/0) The learned address is v1:10.0.13.2:0/0. What else can I do to figure out why it's deciding to use the legacy protocol only? Thanks, Bryan > On Nov 15

[ceph-users] msgr2 not used on OSDs in some Nautilus clusters

2019-11-15 Thread Bryan Stillwell

I've upgraded 7 of our clusters to Nautilus (14.2.4) and noticed that on some of the clusters (3 out of 7) the OSDs aren't using msgr2 at all. Here's the output for osd.0 on 2 clusters of each type: ### Cluster 1 (v1 only): # ceph osd find 0 | jq -r '.addrs' { "addrvec": [ { "type":

[ceph-users] Bad links on ceph.io for mailing lists

2019-11-14 Thread Bryan Stillwell

There are some bad links to the mailing list subscribe/unsubscribe/archives on this page that should get updated: https://ceph.io/resources/ The subscribe/unsubscribe/archives links point to the old lists vger and lists.ceph.com, and not the new lists on lists.ceph.io: ceph-devel subscribe

[ceph-users] Counting OSD maps

2019-11-13 Thread Bryan Stillwell

With FileStore you can get the number of OSD maps for an OSD by using a simple find command: # rpm -q ceph ceph-12.2.12-0.el7.x86_64 # find /var/lib/ceph/osd/ceph-420/current/meta/ -name 'osdmap*' | wc -l 42486 Does anyone know of an equivalent command that can be used with BlueStore? Thanks, B

[ceph-users] Re: RGW compression not compressing

2019-11-07 Thread Bryan Stillwell

Thanks Casey! Adding the following to my swiftclient put_object call caused it to start compressing the data: headers={'x-object-storage-class': 'STANDARD'} I appreciate the help! Bryan > On Nov 7, 2019, at 9:26 AM, Casey Bodley wrote: > > On 11/7/19 10

[ceph-users] Re: RGW compression not compressing

2019-11-07 Thread Bryan Stillwell

port to nautilus > in https://tracker.ceph.com/issues/41981. > > On 11/6/19 5:54 PM, Bryan Stillwell wrote: >> Today I tried enabling RGW compression on a Nautilus 14.2.4 test cluster and >> found it wasn't doing any compression at all. I figure I must have missed >> some

[ceph-users] RGW compression not compressing

2019-11-06 Thread Bryan Stillwell

Today I tried enabling RGW compression on a Nautilus 14.2.4 test cluster and found it wasn't doing any compression at all. I figure I must have missed something in the docs, but I haven't been able to find out what that is and could use some help. This is the command I used to enable zlib-base

[ceph-users] Re: Splitting PGs not happening on Nautilus 14.2.2

2019-10-30 Thread Bryan Stillwell

Responding to myself to follow up with what I found. While going over the release notes for 14.2.3/14.2.4 I found this was a known problem that has already been fixed. Upgrading the cluster to 14.2.4 fixed the issue. Bryan > On Oct 30, 2019, at 10:33 AM, Bryan Stillwell wrote: >

[ceph-users] Splitting PGs not happening on Nautilus 14.2.2

2019-10-30 Thread Bryan Stillwell

This morning I noticed that on a new cluster the number of PGs for the default.rgw.buckets.data pool was way too small (just 8 PGs), but when I try to split the PGs the cluster doesn't do anything: # ceph osd pool set default.rgw.buckets.data pg_num 16 set pool 13 pg_num to 16 It seems to set t

[ceph-users] Re: Compression on existing RGW buckets

2019-10-29 Thread Bryan Stillwell

lass to be used for new object uploads - > just note that some 'helpful' s3 clients will insert a > 'x-amz-storage-class: STANDARD' header to requests that don't specify > one, and the presence of this header will override the user's default > storage class.

[ceph-users] Re: Several ceph osd commands hang

2019-10-29 Thread Bryan Stillwell

3 7f0e16363700 0 mgr[dashboard] > [29/Oct/2019:17:37:56] ENGINE Error in HTTPServer.tick > Traceback (most recent call last): > File > "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line > 2021, in start >self.tick() > File > "/usr/li

[ceph-users] Re: Several ceph osd commands hang

2019-10-29 Thread Bryan Stillwell

On Oct 29, 2019, at 9:44 AM, Thomas Schneider <74cmo...@gmail.com> wrote: > in my unhealthy cluster I cannot run several ceph osd command because > they hang, e.g. > ceph osd df > ceph osd pg dump > > Also, ceph balancer status hangs. > > How can I fix this issue? Check the status of your ceph-m

[ceph-users] Compression on existing RGW buckets

2019-10-29 Thread Bryan Stillwell

I'm wondering if it's possible to enable compression on existing RGW buckets? The cluster is running Luminous 12.2.12 with FileStore as the backend (no BlueStore compression then). We have a cluster that recently started to rapidly fill up with compressible content (qcow2 images) and I would l

[ceph-users] Re: Slow peering caused by "wait for new map"

2019-09-04 Thread Bryan Stillwell

lag * Taking the fragile OSD out * restarting the "fragile" OSDs * check if everything is ok look ing their logs * taking off the NOUP flag * Take a coffee and wait till all data are drain []'s Arthur (aKa Guilherme Geronimo) On 04/09/2019 15:32, Bryan Stillwell wrote: We are

[ceph-users] Re: Slow peering caused by "wait for new map"

2019-09-04 Thread Bryan Stillwell

Sep 4, 2019, at 11:55 AM, Guilherme Geronimo mailto:guilherme.geron...@gmail.com>> wrote: Notice: This email is from an external sender. Hey Bryan, I suppose all nodes are using jumboframes (mtu 9000), right? I would suggest to check OSD->MON communication. Can you send the ou

[ceph-users] Slow peering caused by "wait for new map"

2019-09-04 Thread Bryan Stillwell

Our test cluster is seeing a problem where peering is going incredibly slow shortly after upgrading it to Nautilus (14.2.2) from Luminous (12.2.12). >From what I can tell it seems to be caused by "wait for new map" taking a long >time. When looking at dump_historic_slow_ops on pretty much any O

[ceph-users] osd_pg_create causing slow requests in Nautilus

2019-08-27 Thread Bryan Stillwell

We've run into a problem on our test cluster this afternoon which is running Nautilus (14.2.2). It seems that any time PGs move on the cluster (from marking an OSD down, setting the primary-affinity to 0, or by using the balancer), a large number of the OSDs in the cluster peg the CPU cores the

48 matches

Mail list logo