Any chance for this one or one that fixes 'all osd's unreachable' when
ipv6 in use?
https://github.com/ceph/ceph/pull/60881
On 12/18/24 11:35, Ilya Dryomov wrote:
On Mon, Dec 16, 2024 at 6:27 PM Yuri Weinstein wrote:
Details of this release are summarized here:
https://tracker.ceph.com/issu
14/24 21:04, Anthony D'Atri wrote:
Try failing over to a standby mgr
On Oct 14, 2024, at 9:33 PM, Harry G Coin wrote:
I need help to remove a useless "HEALTH ERR" in 19.2.0 on a fully dual stack
docker setup with ceph using ip v6, public and private nets separated, with a
fe
wrong, the dashboard reporting critical problems -- except there are
none. Makes me really wonder whether any actual testing on ipv6 is
ever done before releases are marked 'stable'.
HC
On 10/14/24 21:04, Anthony D'Atri wrote:
Try failing over to a standby mgr
On Oct 14, 2024,
I need help to remove a useless "HEALTH ERR" in 19.2.0 on a fully dual
stack docker setup with ceph using ip v6, public and private nets
separated, with a few servers. After upgrading from an error free v18
rev, I can't get rid of the 'health err' owing to the report that all
osds are unreach
Same errors as below on latest ceph / latest Ubuntu LTS / noble when
updating from reef to squid. The same 'ceph -s' that reports all osd's
are 'up' and 'in' also reports all of them are 'unreachable'. I hate it
when that happens. All OSD/mon/mgr hosts are dual stack, but ceph uses
just ip
On 7/26/24 11:45, Rouven Seifert wrote:
Hello,
On 2024-07-25 16:39, Harry G Coin wrote:
Upgraded to 18.2.4 yesterday. Healthy cluster reported a few minutes
after the upgrade completed. Next morning, this:
# ceph health detail
HEALTH_ERR Module 'diskprediction_local' has failed:
Upgraded to 18.2.4 yesterday. Healthy cluster reported a few minutes
after the upgrade completed. Next morning, this:
# ceph health detail
HEALTH_ERR Module 'diskprediction_local' has failed: No module named
'sklearn'
[ERR] MGR_MODULE_ERROR: Module 'diskprediction_local' has failed: No
modul
o read more about the new page you can check here
<https://docs.ceph.com/en/latest/mgr/dashboard/#overview-of-the-dashboard-landing-page>.
Regards,
Nizam
On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin wrote:
Looking at ceph -s, all is well. Looking at the dashboard, 85% of my
ca
Looking at ceph -s, all is well. Looking at the dashboard, 85% of my
capacity is 'warned', and 95% is 'in danger'. There is no hint given
as to the nature of the danger or reason for the warning. Though
apparently with merely 5% of my ceph world 'normal', the cluster reports
'ok'. Which, y
Is there a 'Howto' or 'workflow' to implement a one-line patch in a
running cluster? With full understanding it will be gone on the next
upgrade?
Hopefully without having to set up an entire packaging/development
environment?
Thanks!
To implement:
* /Subject/: Re: Permanent KeyError: 'T
These repeat for every host, only after upgrading from prev release
Quincy to 17.2.7. As a result, the cluster is always warned, never
indicates healthy.
root@noc1:~# ceph health detail
HEALTH_WARN failed to probe daemons or devices
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or de
Libcephfs's 'init' call hangs when passed arguments that once worked
normally, but later refer to a cluster that's either broken, is on its
way out of service, has too few mons, etc. At least the python
libcephfs wrapper hangs on init.
Of course mount and session timeouts work, but is there a
Can anyone help me understand seemingly contradictory cephfs error messages?
I have a RHEL ceph client that mounts a cephfs file system via autofs.
Very typical. After boot, when a user first uses the mount, for example
'ls /mountpoint' , all appears normal to the user. But on the system
co
Hi! No matter what I try, using the latest cephfs on an all
ceph-pacific setup, I've not been able to avoid this error message,
always similar to this on RHEL family clients:
SELinux: inode=1099954719159 on dev=ceph was found to have an invalid
context=system_u:object_r:unlabeled_t:s0. This
I have two autofs entries that mount the same cephfs file system to two
different mountpoints. Accessing the first of the two fails with 'stale
file handle'. The second works normally. Other than the name of the
mount point, the lines in autofs are identical. No amount of 'umount
-f' or res
s in the NFS, not cephfs docs:
https://docs.ceph.com/en/latest/cephfs/nfs/
and the Genius level much appreciated pointer from Curt here:
On 5/2/23 14:21, Curt wrote:
This thread might be of use, it's an older version of ceph 14, but
might still apply,
https://lists.ceph.io/hyperkitty/l
In 17.2.6 is there a security requirement that pool names supporting a
ceph fs filesystem match the filesystem name.data for the data and
name.meta for the associated metadata pool? (multiple file systems are
enabled)
I have filesystems from older versions with the data pool name matching
the
I've a 'healthy' cluster with a dashboard where Grafana correctly
reports the number of osds on a host and the correct raw capacity -- and
'no data' for any time period, for any of the osd's (dockerized
Quincy). Meanwhile the top level dashboard cluster reports reasonable
client throughput rea
On 5/12/22 02:05, Janne Johansson wrote:
Den tors 12 maj 2022 kl 00:03 skrev Harry G. Coin :
Might someone explain why the count of degraded items can drop
thousands, sometimes tens of thousands in the same number of hours it
takes to go from 10 to 0? For example, when an OSD or a host with a
ct is not huge and is not RGW index
omap, that slow of a single-object recovery would have me checking
whether I have a bad disk that's presenting itself as significantly
underperforming.
Josh
On Wed, May 11, 2022 at 4:03 PM Harry G. Coin wrote:
Might someone explain why the count of degraded
Might someone explain why the count of degraded items can drop
thousands, sometimes tens of thousands in the same number of hours it
takes to go from 10 to 0? For example, when an OSD or a host with a few
OSD's goes offline for a while, reboots.
Sitting at one complete and entire degraded obj
bbk, It did help! Thank you.
Here's a slightly more 'with the osd-fsid details filled in' procedure
for moving a 'dockerized' / container-run OSD set of drives to a
replacement server/motherboard (or the same server with blank/new/fresh
reinstalled OS). For occasions when the 'new setup' wil
I tried searching for the meaning of a ceph Quincy all caps WARNING
message, and failed. So I need help. Ceph tells me my cluster is
'healthy', yet emits a bunch of 'progress WARNING root] comlete ev' ...
messages. Which I score right up there with the helpful dmesg "yama,
becoming mindful"
Using Quincy I'm getting a much worse lag owing to ceph syslog message
volume, though without obvious system errors.
In the usual case of no current/active hardware errors and no software
crashes: what config settings can I pick so that what appears in syslog
is as close to what would appear
Great news! Any notion when the many pending bug fixes will show up in
Pacific? It's been a while.
On 4/19/22 20:36, David Galloway wrote:
We're very happy to announce the first stable release of the Quincy
series.
We encourage you to read the full release notes at
https://ceph.io/en/news/
I would really appreciate advice because I bet many of you have 'seen
this before' but I can't find a recipe.
There must be a 'better way' to respond to this situation: It starts
with a well working small ceph cluster with 5 servers and no apparent
change to the workflow suddenly starts repo
There's got to be some obvious way I haven't found for this common ceph
use case, that happens at least once every couple weeks. I hope
someone on this list knows and can give a link. The scenario goes like
this, on a server with a drive providing boot capability, the rest osds:
1. First, s
I sense the concern about ceph distributions via containers generally
has to do with what you might call a feeling of 'opaqueness'. The
feeling is amplified as most folks who choose open source solutions
prize being able promptly to address the particular concerns affecting
them without havin
Worked very well! Thank you.
Harry Coin
On 10/2/21 11:23 PM, 胡 玮文 wrote:
Hi Harry,
Please try these commands in CLI:
ceph health mute MGR_MODULE_ERROR
ceph health mute CEPHADM_CHECK_NETWORK_MISSING
Weiwen Hu
在 2021年10月3日,05:37,Harry G. Coin 写道:
I need help getting two 'non e
I need help getting two 'non errors' off the ceph dashboard so it stops
falsely scaring people with the dramatic read "HEALTH_ERR" --- and
masking what could be actual errors of immediate importance.
The first is a bug where the devs try to do date arithmetic between
incompatible variables. T
I asked as well, it seems nobody on the list knows so far.
On 9/30/21 10:34 AM, Andrew Gunnerson wrote:
Hello,
I'm trying to figure out what overlapping roots entails with the default
scale-down autoscaling profile in Ceph Pacific. My test setup involves a CRUSH
map that looks like:
ID=-
Hi all,
I know Ceph offers a way to 'automatically' cause blank drives it
detects to be spun up into osds, but I think that's an 'all or nothing'
situation if I read the docs properly.
Is there a way to specify which slots, or even better, a way to specify
not specific slots? It sure would
Is there anything to be done about groups of log messages like
"pg_autoscaler ERROR root] pool has overlapping roots"
The cluster reports it is healthy, and yet this is reported as an error,
so-- is it an error that ought to have been reported, or is it not an error?
Thanks
Harry Coin
___
Is there a way to re-calibrate the various 'global recovery event' and
related 'remaining time' estimators?
For the last three days I've been assured that a 19h event will be over
in under 3 hours...
Previously I think Microsoft held the record for the most incorrect
'please wait' progress i
A cluster reporting no errors running 16.2.5 immediately after upgrade
to 16.2.6 features what seems to be an entirely bug-related dramatic
'Heath Err' on the dashboard:
Module 'devicehealth' has failed: can't subtract offset-naive and
offset-aware datetimes
Looking at the bug tracking logs,
This topic comes up often enough, maybe it's time for one of those 'web
calculators'. One that accepts the user who knows their goals but not
ceph-fu, entering the importance of various factors (my suggested
factors: read freq/stored tb, write freq/stored tb, unreplicated tb
needed, min target d
Does ceph remove container subvolumes holding previous revisions of
daemon images after upgrades?
I have a couple servers using btrfs to hold the containers. The number
of docker related sub-volumes just keeps growing, way beyond the number
of daemons running. If I ignore this, I'll get disk-fu
/en/latest/mgr/dashboard/#disable-the-redirection>
> (e.g.: HAProxy)?
No redirection, nothing. Just timeout on every manager other than the
active one. Adding a HAproxy would be easily done, but seems redundant
to ceph internal capability -- that at one time worked, anyhow.
>
> Kind R
Somewhere between Nautilus and Pacific the hosts running standby
managers, which previously would redirect browsers to the currently
active mgr/dashboard, seem to have stopped doing that. Is that a
switch somewhere? Or was I just happily using an undocumented feature?
Thanks
Harry Coin
_
On 7/8/21 5:06 PM, Bryan Stillwell wrote:
> I upgraded one of my clusters to v16.2.5 today and now I'm seeing these
> messages from 'ceph -W cephadm':
>
> 2021-07-08T22:01:55.356953+ mgr.excalibur.kuumco [ERR] Failed to apply
> alertmanager spec AlertManagerSpec({'placement': PlacementSpec(co
Same problem here. Hundreds of lines like
' Updating node-exporter deployment (+4 -4 -> 5) (0s)
[]
'
And, similar to yours:
...
2021-07-10T16:26:30.432487-0500 mgr.noc4.tvhgac [ERR] Failed to apply
node-exporter spec MonitoringSpec({'placement':
PlacementSp
Hi
In a Pacific/container/cephadm setup, when a server boot/os drive fails
(unrelated to any osd actual storage): Can the boot/OS drive be
replaced with a 'fresh install OS install' then simply setting up the
same networking addressing/ssh keys (assuming the necessary
docker/non-ceph pkgs are ins
Is this happening to anyone else? After this command:
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 2w
The 'dashboard' shows 'Health OK', then after a few hours (perhaps a
mon leadership change), it's back to 'degraded' and
'AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing
On 6/11/21 3:52 AM, Szabo, Istvan (Agoda) wrote:
> Hi,
>
> Can you suggest me what is a good cephfs design? I've never used it, only rgw
> and rbd we have, but want to give a try. Howvere in the mail list I saw a
> huge amount of issues with cephfs so would like to go with some let's say
> bulle
got to take more processing and network bandwidth
than the 'known interesting only' parts of files.
>
> On Fri, Jun 11, 2021 at 12:31 PM Harry G. Coin wrote:
>> On any given a properly sized ceph setup, for other than database end
>> use) theoretically shouldn
On any given a properly sized ceph setup, for other than database end
use) theoretically shouldn't a ceph-fs root out-perform any fs atop a
rados block device root?
Seems to me like it ought to: moving only the 'interesting' bits of
files over the so-called 'public' network should take fewer, smal
Has anyone added the 'conf.d' modules (and on the centos/rhel/fedora
world done the selinux work) so that initramfs/dracut can 'direct kernel
boot' cephfs as a guest image root file system? It took some work for
the nfs folks to manage being the root filesystem.
Harry
_
On 6/2/21 2:28 PM, Phil Regnauld wrote:
> Dave Hall (kdhall) writes:
>> But the developers aren't out in the field with their deployments
>> when something weird impacts a cluster and the standard approaches don't
>> resolve it. And let's face it: Ceph is a marvelously robust solution for
>> lar
FYI, I'm getting monitors assigned via '... apply label:mon' with
current and valid 'mon' tags: 'committing suicide' after surprise
reboots in the 'Pacific' 16.2.4 release. The tag indicating a monitor
should be assigned to that host is present and never changed.
Deleting the mon tag, waiting a
On 5/21/21 9:49 AM, Eugen Block wrote:
> You can define the public_network [1]:
>
> ceph config set mon public_network **
>
> For example:
>
> ceph config set mon public_network 10.1.2.0/24
>
> Or is that already defined and it happens anyway?
The public network is defined, and it happens anyway
Is there a way to force '.. orch apply *' to limit ip address selection
to addresses matching the hostname in dns or /etc/hosts, or to a
specific address given at 'host add' time? I've hit a bothersome problem:
On v15, 'ceph orch apply mon ...' appears not to use the dns ip or
/etc/hosts when i
Any idea whether 'diskprediction_local' will ever work in containers?
I'm running 15.2.7 which contains a dependency on scikit-learn v 0.19.2
which isn't in the container. It's been throwing that error for a year
now on all the octopus container versions I tried. It used to be on the
baremetal v
This has got to be ceph/docker "101" but I can't find the answer in the
docs and need help.
The latest docker octopus images support using the ntpsec time daemon.
The default stable octopus image doesn't as yet.
I want to add a mon to a cluster that needs to use ntpsec (just go with
it..), so I
[Resent to correct title]
Marc:
Here's a template that works here. You'll need to do some steps to
create the 'secret' and make the block devs and so on:
Glad I could contribute something. Sure would appreciate leads for the
suggested sysctls
Marc:
Here's a template that works here. You'll need to do some steps to
create the 'secret' and make the block devs and so on:
Glad I could contribute something. Sure would appreciate leads for the
suggested sysctls/etc either apart or as tu
Hi
Are there any 'official' or even 'works for us' pointers to 'tuned
profiles' for such common uses as
'ceph baremetal osd host'
'ceph osd + libvirt host'
'ceph mon/mgr'
'guest vm based on a kernel-mounted rbd'
'guest vm based on a direct virtio->rados link'
I suppose there are a few other
Anthony asked about the 'use case'. Well, I haven't gone into details
because I worried it wouldn't help much. From a 'ceph' perspective, the
sandbox layout goes like this: 4 pretty much identical old servers,
each with 6 drives, and a smaller server just running a mon to break
ties. Usual fron
ote:
> What does “traffic” mean? Reads? Writes will have to hit the net
> regardless of any machinations.
>
>> On Jun 29, 2020, at 7:31 PM, Harry G. Coin wrote:
>>
>> I need exactly what ceph is for a whole lot of work, that work just
>> doesn't represent
I need exactly what ceph is for a whole lot of work, that work just
doesn't represent a large fraction of the total local traffic. Ceph is
the right choice. Plainly ceph has tremendous support for replication
within a chassis, among chassis and among racks. I just need
intra-chassis traffic to n
Hi
I have a few servers each with 6 or more disks, with a storage workload
that's around 80% done entirely within each server. From a
work-to-be-done perspective there's no need for 80% of the load to
traverse network interfaces, the rest needs what ceph is all about. So
I cooked up a set of c
Hello,
A couple days ago I increased the rbd cache size from the default to
256MB/osd on a small 4 node, 6 osd/node setup in a test/lab setting.
The rbd volumes are all vm images with writeback cache parameters and
steady if only a few mb/sec writes going on. Logging mostly. I
noticed the
Does anyone know if the change to disable spdk by default (so as to
remove the corei7 dependency when running on intel platforms) made it in
to 14.2.3? The spdk version only required core2 in 14.2.1, the change
to require corei7 in 14.2.2 killed all the osds on older systems flat.
On 9/4/1
62 matches
Mail list logo