Just to follow this through, 18.2.6 fixed my issues and I was able to complete
the upgrade. Is it advisable to go to 19 or should I stay on reef?
-jeremy
> On Monday, Apr 14, 2025 at 12:14 AM, Jeremy Hansen (mailto:jer...@skidrow.la)> wrote:
> Thanks. I’ll wait. I need this to go sm
RGW memory leak). If you really need to
> upgrade, I guess I would go with .4, otherwise stay on Pacific until
> this issue has been addressed. It's not an easy decision. ;-)
>
> Zitat von Jeremy Hansen :
>
> > I haven’t attempted the remaining upgrade just yet. I wanted t
o:ebl...@nde.ag)> wrote:
> Are you using Rook? Usually, I see this warning when a host is not
> reachable, for example during a reboot. But it also clears when the
> host comes back. Do you see this permanently or from time to time? It
> might have to do with the different Ceph versi
This looks relevant.
https://github.com/rook/rook/issues/13600#issuecomment-1905860331
> On Sunday, Apr 13, 2025 at 10:08 AM, Jeremy Hansen (mailto:jer...@skidrow.la)> wrote:
> I’m now seeing this:
>
> cluster:
> id: 95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1
> health: HEALTH_WAR
-jeremy
> On Sunday, Apr 13, 2025 at 12:59 AM, Jeremy Hansen (mailto:jer...@skidrow.la)> wrote:
> Updating mgr’s to 18.2.5 seemed to work just fine. I will go for the
> remaining services after the weekend. Thanks.
>
> -jeremy
>
>
>
> > On Thursday, Apr
> cluster from Pacific after getting rid of our cache tier. :-D
>
> Zitat von Jeremy Hansen :
>
> > This seems to have worked to get the orch back up and put me back to
> > 16.2.15. Thank you. Debating on waiting for 18.2.5 to move forward.
> >
> > -jeremy
> &g
> cluster. If you have the time, you could wait a bit longer for other
> responses. If you need the orchestrator in the meantime, you can roll
> back the MGRs.
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/32APKOXKRAIZ7IDCNI25KVYFCCCF6RJG/
often you need the orchestrator to maintain the
> cluster. If you have the time, you could wait a bit longer for other
> responses. If you need the orchestrator in the meantime, you can roll
> back the MGRs.
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/32APKOXKRAIZ7IDC
do staggered upgrades, e. g. I limit
> the number of OSD daemons first just to see if they come up healthy,
> then I let it upgrade all other OSDs automatically.
>
> https://docs.ceph.com/en/latest/cephadm/upgrade/#staggered-upgrade
>
> Zitat von Jeremy Hansen :
>
>
t;: null,
"remaining_count": null}
What should I do next?
Thank you!
-jeremy
> On Sunday, Apr 06, 2025 at 1:38 AM, Eugen Block (mailto:ebl...@nde.ag)> wrote:
> Can you check if you have this config-key?
>
> ceph config-key get mgr/cephadm/upgrade_state
>
> If you r
cases, basically it's the
> first thing I suggest since two or three years.
>
> Zitat von Jeremy Hansen :
>
> > Thank you so much for the detailed instructions. Here’s logs from
> > the failover to a new node.
> >
> > Apr 05 20:06:08 cn02.ceph.xyz.corp
>
uggest since two or three years.
>
> Zitat von Jeremy Hansen :
>
> > Thank you so much for the detailed instructions. Here’s logs from
> > the failover to a new node.
> >
> > Apr 05 20:06:08 cn02.ceph.xyz.corp
> > ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-m
Thank you so much for the detailed instructions. Here’s logs from the failover
to a new node.
Apr 05 20:06:08 cn02.ceph.xyz.corp
ceph-95f49c1c-b1e8-11ee-b5d0-0cc47a8f35c1-mgr-cn02-ceph-xyz-corp-ggixgj[2357414]:
:::192.168.47.72 - - [05/Apr/2025:20:06:08] "GET /metrics HTTP/1.1" 200 -
"" "P
I ran in to the “Error ENOENT: Module not found” issue with the orchestrator. I
see the note in the cephadm upgrade docs but I don’t quite know what action to
take to fix this:
ceph versions
{
"mon": {
"ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516) pacific
(stable)": 3
},
"mgr
there are
> limitations with KVM and disk snapshots but good to give it a try.
>
> Thanks
>
>
> Get Outlook for Android (https://aka.ms/AAb9ysg)
> From: Jeremy Hansen
> Sent: Saturday, February 3, 2024 11:39:19 PM
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: Sn
ommand to
> the vm to freeze the fs if the vm supports it.
>
>
> >
> > Am I just off base here or missing something obvious?
> >
> > Thanks
> >
> >
> >
> >
> > On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen > <mailto:jer...
Am I just off base here or missing something obvious?
Thanks
> On Thursday, Feb 01, 2024 at 2:13 AM, Jeremy Hansen (mailto:jer...@skidrow.la)> wrote:
> Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe I missed
> it in the documentation but it looked lik
Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe I missed
it in the documentation but it looked like scheduling snapshots wasn’t a
feature for block images. I’m still running Pacific. We’re trying to devise a
sufficient backup plan for Cloudstack and other things residing in
I’d like to upgrade from 16.2.11 to the latest version. Is it possible to do
this in one jump or do I need to go from 16.2.11 -> 16.2.14 -> 17.1.0 -> 17.2.7
-> 18.1.0 -> 18.2.1? I’m using cephadm.
Thanks
-jeremy
signature.asc
Description: PGP signature
Is it possible to use Ceph as a root filesystem for a pxe booted host?
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Found my previous post regarding this issue.
Fixed by restarting mgr daemons.
-jeremy
> On Friday, Dec 01, 2023 at 3:04 AM, Me (mailto:jer...@skidrow.la)> wrote:
> I think I ran in to this before but I forget the fix:
>
> HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm
> [WR
I think I ran in to this before but I forget the fix:
HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by
cephadm
stray host cn06.ceph.fu.intra has 1 stray daemons: ['mon.cn03']
Pacific 16.2.11
How do I cl
Got around this issue by restarting the mgr daemons.
-jeremy
> On Saturday, Jun 10, 2023 at 11:26 PM, Me (mailto:jer...@skidrow.la)> wrote:
> I see this in the web interface in Hosts and under cn03’s devices tab
>
> SAMSUNG_HD502HI_S1VFJ9ASB08190
> Unknown
> n/a
> sdg
> mon.cn04
>
>
> 1 total
>
I see this in the web interface in Hosts and under cn03’s devices tab
SAMSUNG_HD502HI_S1VFJ9ASB08190
Unknown
n/a
sdg
mon.cn04
1 total
Which doesn’t make sense. There is no daemons running on this host and I
noticed the daemon lists looks like its one that should be on another node.
There is al
I also see this error in the logs:
6/10/23 11:09:01 PM[ERR]host cn03.ceph does not exist Traceback (most recent
call last): File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125, in
wrapper return OrchResult(f(*args, **kwargs)) File
"/usr/share/ceph/mgr/cephadm/module.py", line 1625,
I’m going through the process of transitioning to new hardware. Pacific 16.2.11.
I drained the host, all daemons were removed. Did the ceph orch host rm
[ceph: root@cn01 /]# ceph orch host rm cn03.ceph
Error EINVAL: host cn03.ceph does not exist
Yet I see it here:
ceph osd crush tree |grep cn
Figured out how to cleanly relocate daemons via the interface. All is good.
-jeremy
> On Friday, Jun 09, 2023 at 2:04 PM, Me (mailto:jer...@skidrow.la)> wrote:
> I’m doing a drain on a host using cephadm, Pacific, 16.2.11.
>
> ceph orch host drain
>
> removed all the OSDs, but these daemons rema
I’m doing a drain on a host using cephadm, Pacific, 16.2.11.
ceph orch host drain
removed all the OSDs, but these daemons remain:
grafana.cn06 cn06.ceph.la1 *:3000 stopped 5m ago 18M - -
mds.btc.cn06.euxhdu cn06.ceph.la1 running (2d) 5m ago 17M 29.4M - 16.2.11
de4b0b384ad4 017f7ef441ff
mgr.
3/3/23 2:13:53 AM[WRN]unable to calc client keyring client.admin placement
PlacementSpec(label='_admin'): Cannot place : No matching hosts for label _admin
I keep seeing this warning in the logs. I’m not really sure what action to take
to resolve this issue.
Thanks
-jeremy
signature.asc
Desc
; > (http://quay.io/ceph/ceph:v16.2.11)",
> > > "in_progress": true,
> > > "services_complete": [],
> > > "progress": "",
> > > "message": ""
> > > }
> > >
> > > Hasn’t ch
W cephadm, does
> that return anything or just hang, also what about ceph health detail? You
> can always try ceph orch upgrade pause and then orch upgrade resume, might
> kick something loose, so to speak.
> On Tue, Feb 28, 2023, 10:39 Jeremy Hansen (mailto:jer...@skidrow.la)> wrote:
PM, Curt (mailto:light...@gmail.com)> wrote:
> What does Ceph orch upgrade status return?
> On Tue, Feb 28, 2023, 10:16 Jeremy Hansen (mailto:jer...@skidrow.la)> wrote:
> > I’m trying to upgrade from 16.2.7 to 16.2.11. Reading the documentation, I
> > cut and paste the orchest
I’m trying to upgrade from 16.2.7 to 16.2.11. Reading the documentation, I cut
and paste the orchestrator command to begin the upgrade, but I mistakenly
pasted directly from the docs and it initiated an “upgrade” to 16.2.6. I
stopped the upgrade per the docs and reissued the command specifying 1
How do I track down what is the stray daemon?
Thanks
-jeremy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
I have a situation (not sure how it happened), but Ceph believe I have two
OSD's assigned to a single device.
I tried to delete osd.2 and osd.3, but it just hangs. I'm also trying to
zap sdc, which claims it does not have an osd, but I'm unable to zap it.
Any suggestions?
/dev/sdb
HDD
TOSHIBA
M
your
> switching.
>
> On Mon, 25 Jul 2022, 23:05 Jeremy Hansen,
> wrote:
>
>> That results in packet loss:
>>
>> [root@cn01 ~]# ping -M do -s 8972 192.168.30.14
>> PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
>> ^C
>> --- 192.168.3
ng to figure out. Hmmm.
Thank you.
On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond
wrote:
> Looks good, just confirm it with a large ping with don't fragment flag set
> between each host.
>
> ping -M do -s 8972 [destination IP]
>
>
> On Mon, 25 Jul 2022, 22:56 Jeremy Ha
Does ceph do any kind of io fencing if it notices an anomaly? Do I need to
do something to re-enable these hosts if they get marked as bad?
On Mon, Jul 25, 2022 at 2:56 PM Jeremy Hansen
wrote:
> MTU is the same across all hosts:
>
> - cn01.ceph.la1.clx.corp-
> e
errors 0 dropped 0 overruns 0 carrier 0 collisions 0
10G.
On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond
wrote:
> Is the MTU in n the new rack set correctly?
>
> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen,
> wrote:
>
>> I transitioned some servers to a new rack and now I&
nfo". However, I did notice that it contains instructions, starting at
> "Please make sure that the host is reachable ...". How about starting to
> follow those?
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
&
rsized+degraded, last acting [26,4]
pg 12.7f is stuck undersized for 35m, current state
active+undersized+degraded, last acting [9,14]
On Mon, Jul 25, 2022 at 12:43 PM Jeremy Hansen <
farnsworth.mcfad...@gmail.com> wrote:
> Pretty desperate here. Can someone suggest what I might be able to do
5T10:21:11.662+0000 7fcdfd12d700 1 osd.34 30689 start_boot
> >
> > At this point it just keeps printing start_boot, but the dashboard has it
> > marked as "in" but "down".
> >
> > On these three hosts that moved, there were a bunch marked as "
I transitioned some servers to a new rack and now I'm having major issues
with Ceph upon bringing things back up.
I believe the issue may be related to the ceph nodes coming back up with
different IPs before VLANs were set. That's just a guess because I can't
think of any other reason this would
I’m going to also post this to the Cloudstack list as well.
Attempting to rsync a large file to the Ceph volume, the instance becomes
unresponsive at the network level. It eventually returns but it will
continually drop offline as the file copies. Dmesg shows this on the Cloudstack
host machine
-jeremy
> On Jun 7, 2021, at 7:53 PM, Jeremy Hansen wrote:
>
> Signed PGP part
>
> In an attempt to troubleshoot why only 2/5 mon services were running, I
> believe I’ve broke something:
>
> [ceph: root@cn01 /]# ceph orch ls
> NAME PORTS RUNNING
In an attempt to troubleshoot why only 2/5 mon services were running, I believe
I’ve broke something:
[ceph: root@cn01 /]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager 1/1 81s ago9d count:1
crash
This seems to have recovered on its own.
Thank you
-jeremy
> On Jun 7, 2021, at 5:44 PM, Neha Ojha wrote:
>
> On Mon, Jun 7, 2021 at 5:24 PM Jeremy Hansen <mailto:jer...@skidrow.la>> wrote:
>>
>>
>> I’m seeing this in my health status:
>>
>
I’m seeing this in my health status:
progress:
Global Recovery Event (13h)
[] (remaining: 5w)
I’m not sure how this was initiated but this is a cluster with almost zero
objects. Is there a way to halt this process? Why would it estimate 5 weeks
to reco
cephadm rm-daemon --name osd.29
on the node with the stale daemon did the trick.
-jeremy
> On Jun 7, 2021, at 2:24 AM, Jeremy Hansen wrote:
>
> Signed PGP part
> So I found the failed daemon:
>
> [root@cn05 ~]# systemctl | grep 29
>
> ● ceph-bfa2ad58-c049-11eb-
this osd, so this is perhaps left over from a
previous osd.29 on this host. How would I go about removing this cleanly and
more important, in a way that Ceph is aware of the change, therefore clearing
the warning.
Thanks
-jeremy
> On Jun 7, 2021, at 1:54 AM, Jeremy Hansen wrote:
>
&g
…
ceph osd ls
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
16
17
18
20
22
23
24
26
27
31
33
34
So how would I approach fixing this?
> On Jun 7, 2021, at 1:10 AM, 赵贺东 wrote:
>
> Hello Jeremy Hansen,
>
> try:
> ceph log last cephadm
>
> or see files below
> /var/log/ceph/cepha
What’s the proper way to track down where this error is coming from? Thanks.
6/7/21 12:40:00 AM
[WRN]
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
6/7/21 12:40:00 AM
[WRN]
Health detail: HEALTH_WARN 1 failed cephadm daemon(s)
signature.asc
Description: Message signed with O
I’m trying to understand this situation:
ceph health detail
HEALTH_WARN Reduced data availability: 33 pgs inactive
[WRN] PG_AVAILABILITY: Reduced data availability: 33 pgs inactive
pg 1.0 is stuck inactive for 20h, current state unknown, last acting []
pg 2.0 is stuck inactive for 20h, cur
ause
I haven’t previously specified this information:
ceph osd crush set osd.24 3.63869 root=default datacenter=la1 rack=rack1
host=cn06 room=room1 row=6
set item id 24 name 'osd.24' weight 3.63869 at location
{datacenter=la1,host=cn06,rack=rack1,room=room1,root=default,row=6}: no change
My e
I’m continuing to read and it’s becoming more clear.
The CRUSH map seems pretty amazing!
-jeremy
> On May 28, 2021, at 1:10 AM, Jeremy Hansen wrote:
>
> Thank you both for your response. So this leads me to the next question:
>
> ceph osd crush rule create-replicated
Create a crush rule that only chooses non-ssd drives, then
> ceph osd pool set crush_rule YourNewRuleName
> and it will move over to the non-ssd OSDs.
>
> Den fre 28 maj 2021 kl 02:18 skrev Jeremy Hansen :
>>
>>
>> I’m very new to Ceph so if this question makes no
I’m very new to Ceph so if this question makes no sense, I apologize.
Continuing to study but I thought an answer to this question would help me
understand Ceph a bit more.
Using cephadm, I set up a cluster. Cephadm automatically creates a pool for
Ceph metrics. It looks like one of my ssd
57 matches
Mail list logo