We deployed a new Cephadm Squid cluster (400 OSDs) and are happy with
everything except this time we tried Ganesha again with the ingress option
(HAproxy), but soon we ran into a failed daemon event.
And there is a github issue for it:
https://github.com/nfs-ganesha/nfs-ganesha/issues/1158
Our qu
Hi Xiubo,
Is the issue we provided logs on the same as Erich or is that a third
different locking issue?
thanks,
nigel.
On Thu, 18 Apr 2024 at 12:29, Xiubo Li wrote:
>
> On 4/18/24 08:57, Erich Weiler wrote:
> >> Have you already shared information about this issue? Please do if not.
> >
> > I
On Wed, 10 Apr 2024 at 14:01, Xiubo Li wrote:
> > I assume if this fix is approved and backported it will then appear in
> > like 18.2.3 or something?
> >
> Yeah, it will be backported after being well tested.
>
We believe we are being bitten by this bug too, looking forward to the fix.
thanks.
On Wed, 7 Feb 2024 at 20:00, Nigel Williams
wrote:
>
> and just MDS left to do but upgrade has been sitting for hours on this
>
> resolved by rebooting a single host...still not sure why this fixed it
other than it had a standby MDS that would
kicked off
ceph orch upgrade start --image quay.io/ceph/ceph:v18.2.1
and just MDS left to do but upgrade has been sitting for hours on this
root@rdx-00:~# ceph orch upgrade status
{
"target_image": "
quay.io/ceph/ceph@sha256:a4e86c750cc11a8c93453ef5682acfa543e3ca08410efefa30f520b54f41831f
",
On Thu, 7 Sept 2023 at 18:05, Nicola Mori wrote:
> Is it just me or maybe my impressions are shared by someone else? Is
> there anything that can be done to improve the situation?
>
I wonder about the implementation choice for this dashboard. I find with
our Reef cluster it seems to get stuck du
Thanks Eugen for following up. Sorry my second response was incomplete. I
can confirm that it works as expected too. My confusion was that the
section from the online documentation seemed to be missing/moved, and when
it initially failed I wrongly thought that the OSD-add process had changed
in the
On Tue, 29 Aug 2023 at 10:09, Nigel Williams
wrote:
> and giving it a try it fails when it bumps into the root drive (which has
> an active LVM). I expect I can add a filter to avoid it.
>
I found the cause of this initial failure when applying the spec from the
web-gui. Even though I
We upgraded to Reef from Quincy, all went smoothly (thanks Ceph developers!)
When adding OSDs, the process seems to have changed, the docs no longer
mention OSD spec, and giving it a try it fails when it bumps into the root
drive (which has an active LVM). I expect I can add a filter to avoid it.
We're on 17.2.5 and had the default value (5.2), but changing it didn't
seem to impact recovery speed:
root@rdx-00:/# ceph config get osd osd_mclock_cost_per_byte_usec_hdd
5.20
root@rdx-00:/# ceph config show osd.0 osd_op_queue
mclock_scheduler
root@rdx-00:/# ceph config set osd osd_mclock_cos
Hi Wesley, thank you for the follow up.
Anthony D'Atri kindly helped me out with some guidance and advice and we
believe the problem is resolved now.
This was a brand new install of a Quincy cluster and I made the mistake of
presuming that autoscale would adjust the PGs as required, however it ne
With current 17.2.1 (cephadm) I am seeing an unusual HEALTH_ERR
Adding files to a new empty cluster, replica 3 (crush is by host), OSDs
became 95% full and reweighting them to any value does not cause backfill
to start.
If I reweight the three too full OSDs to 0.0 I get a large number of
misplaced
excellent work everyone!
Regarding this: "Quincy does not support LevelDB. Please migrate your OSDs
and monitors to RocksDB before upgrading to Quincy."
Is there a convenient way to determine this for cephadm and non-cephadm
setups?
What happens if LevelDB is still active? does it cause an immedi
Thank you York, that suggestion worked well.
'ceph-deploy mon destroy' on the old server followed by new server identity
change, then 'ceph-deploy mon create' on this replacement worked.
On Wed, 30 Mar 2022 at 19:06, York Huang wrote:
> the shrink-mon.yml and add-mon.yml playbooks may give yo
This is a ceph-deploy setup.
I would welcome suggestions as to how to replace a server hosting a MON;
none of the following have worked for me:
This fails:
1. Setup new server and copy MONMAP/keys to new server.
2. Shutdown old server (cluster out of quorum)
2. Change new server identity(IP, host
On Sat, 20 Nov 2021 at 02:26, Yan, Zheng wrote:
> we have FS contain more than 40 billions small files.
>
That is an impressive statistic! Are you able to share the output of ceph
-s / ceph df /etc to get an idea of your cluster deployment?
thanks.
__
Could we see the content of the bug report please, that RH bugzilla entry
seems to have restricted access.
"You are not authorized to access bug #1996680."
On Wed, 22 Sept 2021 at 03:32, Patrick Donnelly wrote:
> You're probably hitting this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=199
thanks for the tip.
All OSD logs on all hosts are zero length for me though, I suspect a
permission problem but most hosts don't have a ceph user defined.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@
I managed to upgrade to 15.2.14 by doing:
ceph orch upgrade start --image quay.io/ceph/ceph:v15.2.14
(anything else I tried would fail)
When I look in ceph orch ps output though I see quay.io for most image
sources, but alertmanager, grafana, node-exporter are coming from docker.io
Before doing
to answer my own question, the logs are meant to be in
/var/log/ceph//... however on this host they were all zero length.
On Tue, 31 Aug 2021 at 20:51, Nigel Williams
wrote:
>
> Where to find more detailed logs? or do I need to adjust a log-level
> firs
Ubuntu 20.04.3, Octopus 152.13, cephadm + podman
After a routine reboot, all OSDs on a host did not come up, after a few
iterations of cephadm deploy, and fixing the missing config file, the
daemons remain in the error state but neither journalctl / systemctl show
any log errors other than exit s
On Sun, 15 Aug 2021 at 00:10, Jadkins21 wrote:
> Am I just being too impatient ? or did I miss something around docker
> being discontinued for cephadmin ? (hope not, it's great)
>
not showing via podman either:
root@rnk-00:~# podman pull docker.io/ceph/ceph:v15.2.14
Trying to pull docker.io/ce
One of my colleagues attempted to set quotas on a large number (some
dozens) of users with the session below, but it caused the MDS to hang and
reject client requests.
Offending command was:
cat recent-users | xargs -P16 -I% setfattr -n ceph.quota.max_bytes -v
8796093022208 /scratch/%
Result was
On Wed, 1 Jul 2020 at 01:47, Anthony D'Atri wrote:
> > However when I've looked at the IO metrics for the nvme it seems to be only
> > lightly loaded, so does not appear to be the issue (at 1st sight anyway).
>
> How are you determining “lightly loaded”. Not iostat %util I hope.
For reference,
On Thu, 27 Feb 2020 at 13:08, Nigel Williams wrote
> On Thu, 27 Feb 2020 at 06:27, Anthony D'Atri wrote:
> > If the heap stats reported by telling the OSD `heap stats` is large,
> > telling each `heap release` may be useful. I suspect a TCMALLOC
> > shortcoming.
On Thu, 27 Feb 2020 at 06:27, Anthony D'Atri wrote:
> If the heap stats reported by telling the OSD `heap stats` is large, telling
> each `heap release` may be useful. I suspect a TCMALLOC shortcoming.
osd.158 tcmalloc heap stats:
MALLOC: 572
On Wed, 26 Feb 2020 at 23:56, Mark Nelson wrote:
> Have you tried dumping the mempools? ...
> One reason this can happen for example is if you
> have a huge number of PGs (like many thousands per OSD).
We are relying on the pg autoscaler to set the PGs, and so far it
seems to do the right thing.
more examples of rampant OSD memory consumption:
PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+
COMMAND
1326773 ceph 20 0 11.585g 0.011t 34728 S 110.3 8.6 14:26.87
ceph-osd
204622 ceph 20 0 16.414g 0.015t 34808 S 100.3 12.5 17:53.36
ceph-osd
5706 ceph
The OOM-killer is on the rampage and striking down hapless OSDs when
the cluster is under heavy client IO.
The memory target does not seem to be much of a limit, is this intentional?
root@cnx-11:~# ceph-conf --show-config|fgrep osd_memory_target
osd_memory_target = 4294967296
osd_memory_target_cg
Did you end up having all new IPs for your MONs? I've wondered how
should a large KVM deployment be handled when the instance-metadata
has a hard-coded list of MON IPs for the cluster? how are they changed
en-masse with running VMs? or do these moves always result in at least
one MON with an origin
On Sun, 8 Dec 2019 at 00:53, Martin Verges wrote:
> Swap is nothing you want to have in a Server as it is very slow and can cause
> long downtimes.
Given the commentary on this page advocating at least some swap to
enable Linux to manage memory when under pressure:
https://utcc.utoronto.ca/~cks
31 matches
Mail list logo