[ceph-users] Failing OSD RocksDB Corrupt

2020-12-22 Thread Ashley Merrick
Hello,I had some faulty power cables on some OSD's in one server which caused lots of IO issues/disks appearing/disappearing.This has been corrected now, 2 of the 10 OSD's are working, however 8 are failing to start due to what looks to be a corrupt DB.When running a ceph-bluestore-tool fsck I g

[ceph-users] Failed cephadm Upgrade - ValueError

2021-04-30 Thread Ashley Merrick
Hello All,I was running 15.2.8 via cephadm on docker Ubuntu 20.04I just attempted to upgrade to 16.2.1 via the automated method, it successfully upgraded the mon/mgr/mds and some OSD's, however it then failed on an OSD and hasn't been able to pass even after stopping and restarting the upgrade.I

[ceph-users] Failed cephadm Upgrade - ValueError

2021-05-03 Thread Ashley Merrick
et me know.Thanks > On Fri Apr 30 2021 21:54:30 GMT+0800 (Singapore Standard Time), Ashley > Merrick wrote: > Hello All,I was running 15.2.8 via cephadm on docker Ubuntu 20.04I just > attempted to upgrade to 16.2.1 via the automated method, it successfully > upgraded the mon/mgr/mds and

[ceph-users] Failed cephadm Upgrade - ValueError

2021-05-03 Thread Ashley Merrick
() File "/var/lib/ceph/30449cba-44e4-11eb-ba64-dda10beff041/cephadm.17068a0b484bd c911a9c50d6408adfca696c2faaa65c018d660a3b697d119482", line 6415, in _fetch_apparmor item, mode = line.split(' ') ValueError: not enough values to unpack (expected 2, got 1)being repeated over and

[ceph-users] Failed cephadm Upgrade - ValueError

2021-05-03 Thread Ashley Merrick
Created BugTicket : https://tracker.ceph.com/issues/50616 > On Mon May 03 2021 21:49:41 GMT+0800 (Singapore Standard Time), Ashley > Merrick wrote: > Just checked cluster logs and they are full of:cephadm exited with an error > code: 1, stderr:Reconfig daemon osd.16 ... Traceback

[ceph-users] mon vanished after cephadm upgrade

2021-05-14 Thread Ashley Merrick
I had a 3 mon CEPH cluster, after updating from 15.2.x to 16.2.x one of my mon's is showing as a stopped state in the Ceph Dashboard.And checking the cephadm logs on the server in question I can see "/usr/bin/docker: Error: No such object: ceph-30449cba-44e4-11eb-ba64-dda10beff041-mon.sn-m01"The

[ceph-users] Re: mon vanished after cephadm upgrade

2021-05-14 Thread Ashley Merrick
rch daemon rm mon.sn-m01` to remove the mon > * `ceph orch daemon start mon.sn-m01` to start it again > > Am 14.05.21 um 14:14 schrieb Ashley Merrick: >> I had a 3 mon CEPH cluster, after updating from 15.2.x to 16.2.x one of my >> mon's is showing as a stopped state

[ceph-users] Ceph Upgrade 16.2.5 stuck completing

2021-08-10 Thread Ashley Merrick
Just upgraded to 16.2.5 all OSD,MDS,MON,MGR,CRASH services updated. However ceph -s shows repeated line's like: Updating alertmanager deployment (+1 -1 -> 1) (0s) [] Updating node-exporter deployment (+7 -7 -> 7) (0s) [] Updating prometheus

[ceph-users] Re: Scrubs Randomly Starting/Stopping

2024-02-24 Thread Ashley Merrick
So I have done some further digging. Seems similar to this : Bug #54172: ceph version 16.2.7 PG scrubs not progressing - RADOS - Ceph Apart from: 1/ I have restarted all OSD's/forced a re-peer and the issue is still there 2/ Setting noscrub stops the scrubs "appearing" Checking a PG seems its jus

[ceph-users] Re: Device /dev/rbd0 excluded by a filter.

2020-04-27 Thread Ashley Merrick
This normally means you have some form of partition data on the RBD disk. If you use -vvv on the pv command it should show you the reason, but yes redhat solutions require an active support subscription. On Mon, 27 Apr 2020 17:43:02 +0800 Marc Roos wrote Why is this info not a

[ceph-users] Re: Device /dev/rbd0 excluded by a filter.

2020-04-27 Thread Ashley Merrick
A quick google shows this: You need to change your /etc/lvm/lvm.conf device name filter to include it. May be that your LVM filter is not allowing rbdX type disks to be used. On Mon, 27 Apr 2020 17:49:53 +0800 Marc Roos wrote It is a new image. -vvv says "Unrecognised LVM devic

[ceph-users] Re: Existing Cluster to cephadm - mds start failing

2020-04-27 Thread Ashley Merrick
wrote I have logged the following bug ticket for it : https://tracker.ceph.com/issues/45091 I have also noticed another bug with cephadm which I have logged under : https://tracker.ceph.com/issues/45092 Thanks On Mon, 13 Apr 2020 12:36:01 +0800 Ashley Merrick wrote

[ceph-users] Re: 14.2.9 MDS Failing

2020-05-01 Thread Ashley Merrick
Quickly checking the code that calls that assert if (version > omap_version) { omap_version = version; omap_num_objs = num_objs; omap_num_items.resize(omap_num_objs); journal_state = jstate; } else if (version == omap_version) { ceph_assert(omap_num_objs == num_objs); if (jstate > journa

[ceph-users] OSD Inbalance - upmap mode

2020-05-04 Thread Ashley Merrick
I have a cluster running 15.2.1, was originally running 14.x, the cluster is running the balance module in upmap mode (I have tried crush-compat in the past) Most OSD's are around the same & used give or take 0.x, however there is one OSD that is down a good few % and a few that are above averag

[ceph-users] Re: Fwd: Octopus on CentOS 7: lacking some packages

2020-05-06 Thread Ashley Merrick
As per the release notes : https://docs.ceph.com/docs/master/releases/octopus/ The dashboard and a few other modules aren't supported on CentOS 7.x due to python version / dependencies. On Wed, 06 May 2020 17:18:06 +0800 Sam Huracan wrote Hi Cephers, I am trying to install C

[ceph-users] Re: OSD Inbalance - upmap mode

2020-05-10 Thread Ashley Merrick
   1.0   10 GiB  1.2 GiB  162 MiB   28 MiB  996 MiB  8.8 GiB  11.58  0.16   32  up    TOTAL  273 TiB  200 TiB  199 TiB  336 MiB  588 GiB   73 TiB  73.21 Thanks On Tue, 05 May 2020 14:23:54 +0800 Ashley Merrick wrote I have a cluster running 15.2.1, was

[ceph-users] Re: v15.2.2 Octopus released

2020-05-18 Thread Ashley Merrick
I am getting the following error when trying to upgrade via cephadm ceph orch upgrade status {     "target_image": "docker.io/ceph/ceph:v15.2.2",     "in_progress": true,     "services_complete": [],     "message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image" } Are

[ceph-users] 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Ashley Merrick
I just upgraded a cephadm cluster from 15.2.1 to 15.2.2. Everything went fine on the upgrade, however after restarting one node that has 3 OSD's for ecmeta two of the 3 ODS's now wont boot with the following error: May 20 08:29:42 sn-m01 bash[6833]: debug 2020-05-20T08:29:42.598+ 7fbcc4

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Ashley Merrick
ed, 20 May 2020 17:02:31 +0800 Ashley Merrick wrote I just upgraded a cephadm cluster from 15.2.1 to 15.2.2. Everything went fine on the upgrade, however after restarting one node that has 3 OSD's for ecmeta two of the 3 ODS's now wont boot with the following error:

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Ashley Merrick
t; > > > Let me know if you need any more logs. > > > > Thanks > > > > On Wed, 20 May 2020 17:02:31 +0800 Ashley Merrick > <mailto:singap...@amerrick.co.uk> wrote > > > I just upgraded a cephadm cluster from 15.2.1 to 15.2.2.

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Ashley Merrick
AL and/or DB devices or just a single shared main device. Also could you please set debug-bluefs/debug-bluestore to 20 and collect startup log for broken OSD. Kind regards, Igor On 5/20/2020 3:27 PM, Ashley Merrick wrote: Thanks, fyi the OSD's that wen

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-05-20 Thread Ashley Merrick
What does ceph orch upgrade status show? On Wed, 20 May 2020 20:52:39 +0800 Gencer W. Genç wrote Hi, I've 15.2.1 installed on all machines. On primary machine I executed ceph upgrade command: $ ceph orch upgrade start --ceph-version 15.2.2 When I check ceph -s I see

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Ashley Merrick
3:32 PM, Dan van der Ster wrote: > lz4 ? It's not obviously related, but I've seen it involved in really > non-obvious ways: https://tracker.ceph.com/issues/39525 > > -- dan > > On Wed, May 20, 2020 at 2:27 PM Ashley Merrick > <mailto:singap...@amerrick

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-05-20 Thread Ashley Merrick
true,     "services_complete": [],     "message": "" } Thanks, Gencer. On 20.05.2020 15:58:34, Ashley Merrick <mailto:singap...@amerrick.co.uk> wrote: What does ceph orch upgrade status show? On Wed, 20 May 2020 20:52:39 +0800 Gencer W. Genç <mail

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-05-20 Thread Ashley Merrick
: 28,         "ceph version 15.2.2 (0c857e985a29d90501a285f242ea9c008df49eb8) octopus (stable)": 2     } } How can i fix this? Gencer. On 20.05.2020 16:04:33, Ashley Merrick <mailto:singap...@amerrick.co.uk> wrote: Does: ceph versions show any services yet running on 15.

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-05-20 Thread Ashley Merrick
3-130 Does this meaning anything to you? I've also attached full log. See especially after line #49. I stopped and restart upgrade there. Thanks, Gencer. On 20.05.2020 16:13:00, Ashley Merrick <mailto:singap...@amerrick.co.uk> wrote: ceph config set mgr mgr/cephadm/log_to_clu

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-05-20 Thread Ashley Merrick
:     Upgrade to docker.io/ceph/ceph:v15.2.2 (33s)       [=...] (remaining: 9m) Isn't both mons already up? I have no way to add third mon btw. Thnaks, Gencer. On 20.05.2020 16:21:03, Ashley Merrick <mailto:singap...@amerrick.co.uk> wrote: Yes, I think it

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-20 Thread Ashley Merrick
this has any meaning or just a coincidence though Thanks, Igor On 5/20/2020 4:01 PM, Ashley Merrick wrote: I attached the log but was too big and got moderated. Here is it in a paste bin : https://pastebin.pl/view/69b2beb9 I have cut the log to start from the point of th

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-05-21 Thread Ashley Merrick
16:32:15, Ashley Merrick wrote:Correct, however it will need to stop one to do the upgrade leaving you with only one working MON (this is what I would suggest the error means seeing i had the same thing when I only had a single MGR), normally is suggested to have 3 MONs due to quorum.Do you not h

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-22 Thread Ashley Merrick
Thanks Igor, Do you have any idea on a e.t.a or plan for people that are running 15.2.2 to be able to patch / fix the issue. I had a read of the ticket and seems the corruption is happening but the WAL is not read till OSD restart, so I imagine we will need some form of fix / patch we can

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Ashley Merrick
hough, so please don't take the above as any form of recommendation. It is important not to try to restart OSDs though in the meantime. I'm sure Igor will publish some more expert recommendations in due course... Regards, Chris On 23/05/2020 06:54, Ashley Merrick

[ceph-users] Re: 15.2.2 Upgrade - Corruption: error in middle of record

2020-05-23 Thread Ashley Merrick
x27;s clear. But once again, please don't take this as advice on > what you should do. That should come from the experts! > > Regards, Chris > > On 23/05/2020 10:03, Ashley Merrick wrote: >> Hello Chris, >> >> Great to hear, few questions. >> >>

[ceph-users] EC Compression

2019-09-02 Thread Ashley Merrick
I have an EC RBD pool I want to add compression on. I have the meta pool and the data pool, do I need to enable compression on both for it to function correctly or only on one pool? Thanks___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe

[ceph-users] Re: ceph mons stuck in electing state

2019-09-03 Thread Ashley Merrick
What change did you make in your ceph.conf? Id say it be a good idea to check and make sure that hasn't caused the issue. ,Ashley On Tue, 27 Aug 2019 04:37:15 +0800 nkern...@gmail.com wrote Hello, I have an old ceph 0.94.10 cluster that had 10 storage nodes with one extra manage

[ceph-users] Re: ceph mons stuck in electing state

2019-09-03 Thread Ashley Merrick
What change did you make in ceph.conf Id check that hasn't caused an issue first. On Tue, 27 Aug 2019 04:37:15 +0800 nkern...@gmail.com wrote Hello, I have an old ceph 0.94.10 cluster that had 10 storage nodes with one extra management node used for running commands on the cluste

[ceph-users] Re: disk failure

2019-09-05 Thread Ashley Merrick
Is your HD actually failing and vanishing from the OS and then coming back shortly? Or do you just mean your OSD is crashing and then restarting it self shortly later? On Fri, 06 Sep 2019 01:55:25 +0800 solarflo...@gmail.com wrote One of the things i've come to notice is when HDD

[ceph-users] Re: disk failure

2019-09-05 Thread Ashley Merrick
arflo...@gmail.com wrote no, I mean ceph sees it as a failure and marks it out for a while On Thu, Sep 5, 2019 at 11:00 AM Ashley Merrick wrote: Is your HD actually failing and vanishing from the OS and then coming back shortly? Or do you just mean your OSD is crashing and then restarting it

[ceph-users] Re: Host failure trigger " Cannot allocate memory"

2019-09-10 Thread Ashley Merrick
What's specs ate the machines? Recovery work will use more memory the general clean operation and looks like your maxing out the available memory on the machines during CEPH trying to recover. On Tue, 10 Sep 2019 18:10:50 +0800 amudha...@gmail.com wrote I have also found below e

[ceph-users] Re: Activate Cache Tier on Running Pools

2019-09-16 Thread Ashley Merrick
Have you checked that the user/keys that your VMs are connecting to have access rights to the cache pool? On Mon, 16 Sep 2019 17:36:38 +0800 Eikermann, Robert wrote Hi,   I’m using Ceph in combination with Openstack. For the “VMs” Pool I’d like to enable writeback caching

[ceph-users] Re: Activate Cache Tier on Running Pools

2019-09-16 Thread Ashley Merrick
I hope the data your running the CEPH server isn't important if your looking to run a Cache tier with just 2 SSDS / Replication of 2. If your cache tier fails, you basically corrupt most data on the pool below. Also as Wido said, as much as you may get it to work, I don't think it will give you

[ceph-users] 14.2.4 Packages Avaliable

2019-09-16 Thread Ashley Merrick
Have just noticed their is packages available for 14.2.4.. I know with the whole 14.2.3 release and the notes not going out to a good day or so later.. but this is not long after the 14.2.3 release..? Was this release even meant to have come out? Makes it difficult for people installing a new n

[ceph-users] Re: Cannot start virtual machines KVM / LXC

2019-09-19 Thread Ashley Merrick
Your need to fix this first.     pgs: 0.056% pgs unknown 0.553% pgs not active The back filling will cause slow I/O, but having pgs unknown and not active will cause I/O blocking which your seeing with the VM booting. Seems you have 4 OSD's down, if you get them back on

[ceph-users] Re: How to reduce or control memory usage during recovery?

2019-09-21 Thread Ashley Merrick
I'm not aware of any memory settings that control rebuild memory usage. You are running very under on RAM, have you tried adding more swap or adjusting /proc/sys/vm/swappiness On Fri, 20 Sep 2019 20:41:09 +0800 Amudhan P wrote Hi, I am using ceph mim

[ceph-users] Re: Cannot start virtual machines KVM / LXC

2019-09-22 Thread Ashley Merrick
three disks that you wiped. >>> >>> Do you still have the disks? Use ceph-objectstore-tool to export the >>> affected PGs manually and inject them into another OSD. >>> >>> >>> Paul >>> >>>> Then I added the 3 OSDs to

[ceph-users] Re: Constant write load on 4 node ceph cluster

2019-10-14 Thread Ashley Merrick
Is the storage being used for the whole VM disk? If so have you checked none of your software is writing constant log's? Or something that could continuously write to disk. If your running a new version you can use : https://docs.ceph.com/docs/mimic/mgr/iostat/ to locate the exact RBD image

[ceph-users] Re: Recovering from a Failed Disk (replication 1)

2019-10-16 Thread Ashley Merrick
I think your better off doing the DD method, you can export and import a PG at a time (ceph-objectstore-tool) But if the disk is failing a DD is probably your best method. On Thu, 17 Oct 2019 11:44:20 +0800 vladimir franciz blando wrote Sorry fo

[ceph-users] Near Perfect PG distrubtion apart from two OSD

2020-02-01 Thread Ashley Merrick
B 14.53 0.22  33 up 28   hdd 0.00999  1.0  10 GiB 1.5 GiB 462 MiB  12 KiB 1024 MiB 8.5 GiB 14.52 0.22  34 up     TOTAL 273 TiB 183 TiB 182 TiB 6.1 MiB  574 GiB  90 TiB 67.02 MIN/MAX VAR: 0.22/1.15  STDDEV: 30.40 On Fri, 10 Jan 2020 14:57:05 +0800 Ashley Merrick <mailto:singap.

[ceph-users] Re: New 3 node Ceph cluster

2020-03-14 Thread Ashley Merrick
I would say you definitely need more RAM with that many disks. On Sat, 14 Mar 2020 15:17:14 +0800 amudha...@gmail.com wrote Hi, I am planning to create a new 3 node ceph storage cluster. I will be using Cephfs + with samba for max 10 clients for upload and download. Storage Node

[ceph-users] Re: HEALTH_WARN 1 pools have too few placement groups

2020-03-16 Thread Ashley Merrick
This was a bug in 14.2.7 and calculation for EC pools. It has been fixed in 14.2.8 On Mon, 16 Mar 2020 16:21:41 +0800 Dietmar Rieder wrote Hi, I was planing to activate the pg_autoscaler on a EC (6+3) pool which I created two years ago. Back then I calculated the total #

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Ashley Merrick
Are you sure your not being hit by: ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/ Have all your OSD's successfully completed the fsck? Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore om

[ceph-users] Existing Cluster to cephadm - mds start failing

2020-04-12 Thread Ashley Merrick
Completed the migration of an existing Ceph cluster on Octopus to cephadm. All OSD/MON/MGR moved fine, however upon running the command to setup some new MDS for cephfs they both failed to start. After looking into the cephadm log's I found the following error: Apr 13 06:26:15 sn-s01 syst

[ceph-users] Re: Existing Cluster to cephadm - mds start failing

2020-04-14 Thread Ashley Merrick
I have logged the following bug ticket for it : https://tracker.ceph.com/issues/45091 I have also noticed another bug with cephadm which I have logged under : https://tracker.ceph.com/issues/45092 Thanks On Mon, 13 Apr 2020 12:36:01 +0800 Ashley Merrick wrote Completed the