> result in OSD crash in some cases.
>
> So better to catch that sooner than later.
>
>
> Thanks,
>
> Igor
> On 29.04.2025 14:27, Marco Pizzolo wrote:
>
> Hi Igor,
>
> Thank you so very much for responding so quickly. Interestingly, I don't
> rem
sum less or equal
> to 1.0.
>
> Default settings are:
>
> bluestore_cache_meta_ratio = 0.45
>
> bluestore_cache_kv_ratio = 0.45
>
> bluestore_cache_kv_onode_ratio = 0.04
>
>
> Thanks,
>
> Igor
>
>
>
> On 29.04.2025 13:36, Marco Pizzolo wrote:
&g
Hello Everyone,
I'm upgrading from 18.2.4 to 18.2.6, and I have a 4-node cluster with 8
NVMe's per node. Each NVMe is split into 2 OSDs. The upgrade went through
the mgr, mon, crash and began upgrading OSDs.
The OSDs it was upgrading were not coming back online.
I tried rebooting, and no luck.
Is this by chance corrected in 17.2.5 already? If so, how can we pivot
mid-upgrade to 17.2.5?
Thanks,
On Mon, Mar 20, 2023 at 6:14 PM Marco Pizzolo
wrote:
> Hello Everyone,
>
> We made the mistake of trying to patch to 16.2.11 from 16.2.10 which has
> been stable as we felt that
Hello Everyone,
We made the mistake of trying to patch to 16.2.11 from 16.2.10 which has
been stable as we felt that 16.2.11 had been out for a while already.
As luck would have it, we are having failure after failure with OSDs not
upgrading successfully, and have 355 more OSDs to go.
I'm pretty
mba servers if you're in need of that.
> >
> > Overall I would say it works out quite well together, and is probably
> > the best method to get CephFS connected to any MacOS or Windows clients.
> >
> > -Original Message-
> > From: Marco Pizzolo
>
domain member samba servers if you're in need of that.
>
> Overall I would say it works out quite well together, and is probably the
> best method to get CephFS connected to any MacOS or Windows clients.
>
> -Original Message-
> From: Marco Pizzolo
> Sent: September 6, 2022 2:
Hello Everyone,
We are looking at clustering Samba with CTDB to have highly available
access to CephFS for clients.
I wanted to see how others have implemented, and their experiences so far.
Would welcome all feedback, and of course if you happen to have any
documentation on what you did so that
0+ OSD cluster, and also just recently
> in a test Luminous cluster (size=3 to size=2). In order for the purge to
> actually happen, I had to restart every OSD (one at a time for safety, or
> just run ceph-ansible site.yml with the osd handler health check = true).
>
> On We
ta.
>
> Perhaps the risks are not immediately visible in normal operation, but
> in the event of a failure, the potential loss of data must be accepted.
>
> Regards, Joachim
>
>
> ___
>
> Clyso GmbH - ceph foundation member
>
> Am 10.
seniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>
>
> On Fri, 10 Dec 2021 at 18:05, Marco Pizzolo
> wrote:
>
>> Hello,
>>
>> As par
Hello,
As part of a migration process where we will be swinging Ceph hosts from
one cluster to another we need to reduce the size from 3 to 2 in order to
shrink the footprint sufficiently to allow safe removal of an OSD/Mon node.
The cluster has about 500M objects as per dashboard, and is about 1
Forgot to confirm, was this process non-destructive in terms of data in
OSDs?
Thanks again,
On Fri, Dec 10, 2021 at 9:23 AM Marco Pizzolo
wrote:
> Robert, Roman and Weiwen Hu,
>
> Thank you very much for your responses. I presume one host at a time, and
> the redeploy will take
deploy for every daemon with "ceph orch daemon redeploy
> > osd.{id}"
> >
> > ~ Roman
>
> We have switched to podman with similar process. Stop systemd units,
> install
> podman, redeploy, and done.
>
> Weiwen Hu
>
> > On Thu, 9 Dec 2021 at 16:27,
Hello Everyone,
In an attempt to futureproof, I am beginning to look for information on how
one would go about moving to podman from docker on a cephadm 16.2.6
installation on ubuntu 20.04.3.
I would be interested to know if anyone else has contemplated or performed
something similar, and what th
Hello,
Not sure whether this is perhaps related:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-gcp-5.11/+bug/1948471
Any insight would be appreciated
Thanks,
Marco
On Wed, Nov 17, 2021 at 9:18 AM Marco Pizzolo
wrote:
> Good day everyone,
>
> This is a bit of a recurring the
Good day everyone,
This is a bit of a recurring theme for us on a new deployment performed at
16.2.6 on Ubuntu 20.04.3 with HWE stack.
We have had good stability over the past 3 weeks or so copying data, and we
now have about 230M objects (470TB of 1PB used) and we have had 1 OSD drop
from each o
see if there is something wrong with this disk?
> Maybe also do a self-test.
>
>
>
> 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送
>
>
>
> *发件人: *Marco Pizzolo
> *发送时间: *2021年10月28日 1:17
> *收件人: *胡 玮文
> *抄送: *ceph-users
> *主题: *Re:
Is there any command or log I can provide a sample from that would help to
pinpoint the issue? The 119 of 120 OSDs are working correctly by all
accounts, but I am just unable to have the bring the last one fully online.
Thank you,
On Tue, Oct 26, 2021 at 3:59 PM Marco Pizzolo
wrote:
> Tha
ock) read
stalled read 0x40408c~2000>
Oct 26 19:47:35 conmon[780974]: debug 2021-10-26T19:47:35.914+
7f08e65a4080 -1 bdev(0x56377e188800 /var/lib/ceph/osd/ceph-13/block) read
stalled read 0x40409b~1000>
Oct 26 19:48:39 conmon[780974]: debug 2021-10-26T19:48:39.110
Hello Everyone,
I'm seeing an issue where the podman container is running, but the osd is
being reported as down and out. restarting service doesn't help, neither
does rebooting the host.
What am I missing?
Thanks,
Marco
___
ceph-users mailing list --
Hi Everyone,
For a new build we tested the 5.4 kernel which wasn't working well for us
and ultimately changed to Ubuntu 20.04.3 HWE and 5.11 kernel.
We can now get all OSDs more or less up, but on a clean OS reinstall we are
seeing this type of behavior that is causing slow ops even before any poo
21 at 2:33 PM Zakhar Kirpichenko wrote:
> Indeed, this is the PVE forum post I saw earlier.
>
> /Z
>
> On Tue, Oct 12, 2021 at 9:27 PM Marco Pizzolo
> wrote:
>
>> Igor,
>>
>> Thanks for the response. One that I found was:
>> https://forum.proxmox.com/thre
rum. Specifically, 5.11.x with Ceph seems to be hitting kernel NULL
> > pointer dereference. Perhaps a newer kernel would help. If not, I'm
> running
> > 16.2.6 with kernel 5.4.x without any issues.
> >
> > Best regards,
> > Z
> >
> > On Tue, Oct 12, 2021 a
ould help. If not, I'm running
> 16.2.6 with kernel 5.4.x without any issues.
>
> Best regards,
> Z
>
> On Tue, Oct 12, 2021 at 8:31 PM Marco Pizzolo
> wrote:
>
>> Hello everyone,
>>
>> We are seeing instability in 20.04.3 using HWE kernel and Ceph
Hello everyone,
We are seeing instability in 20.04.3 using HWE kernel and Ceph 16.2.6
w/Podman.
We have OSDs that fail after <24 hours and I'm not sure why.
Seeing this:
ceph crash info
2021-10-12T14:32:49.169552Z_d1ee94f7-1aaa-4221-abeb-68bd56d3c763
{
"backtrace": [
"/lib64/libpthr
a fix for this, but I'm not sure how to deal with it
> using the current 16.2.6 image. Maybe others will have some ideas.
>
> On Mon, Sep 27, 2021 at 10:10 AM Marco Pizzolo
> wrote:
>
>> Good morning Adam, Ceph users,
>>
>> Is there something we can do to e
Good morning Adam, Ceph users,
Is there something we can do to extend the acceptable response size?
Trying to understand if there is some viable workaround that we can
implement.
Thanks,
Marco
On Fri, Sep 24, 2021 at 2:59 PM Marco Pizzolo
wrote:
> Hi Adam,
>
> I really appreciate
Hello Everyone,
If you have any suggestions on the cause, or what we can do I'd certainly
appreciate it.
I'm seeing the following on a newly stood up cluster using Podman on Ubuntu
20.04.3 HWE:
Thank you very much
Marco
Sep 24, 2021, 1:24:30 PM [ERR] cephadm exited with an error code: 1,
std
e OSDs begin to
activate but it does not bring up more than 32 OSDs.
tomorrow we will try adding the OSD daemons individually, but there seems
to be another underlying issue as well.
Marco
On Tue, Jun 1, 2021 at 1:42 PM Marco Pizzolo wrote:
> Peter and fellow Ceph users,
>
> I just wanted
the image tomorrow based on that version. I do agree, I feel this breaks
> new deploys as well as existing, and hope a point release will come soon
> that includes the fix.
>
> On May 31, 2021, at 15:33, Marco Pizzolo wrote:
>
>
> David,
>
> What I can confirm is that if
et beyond this issue:
>
> $ cat Dockerfile
> FROM docker.io/ceph/ceph:v16.2.3
> COPY process.py /lib/python3.6/site-packages/remoto/process.py
>
> The process.py is the patched version we submitted here:
>
> https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f2
t the disks disappear, not stop working but not
> detected by Linux, which makes me think I'm hitting some kernel limit.
>
> At this point I'm going to cut my loses and give up and use the small
> slightly more powerful 30x drive systems I have (with 256g memory), maybe
> tran
see
> if you see the lock/the behavior matches, if so - then it may help you
> out. The only change in that image is that patch to remoto being
> overlaid on the default 16.2.3 image.
>
> On Fri, May 28, 2021 at 1:15 PM Marco Pizzolo
> wrote:
> >
> > Peter,
> >
Peter,
We're seeing the same issues as you are. We have 2 new hosts Intel(R)
Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB SED
drives and we have tried both 15.2.13 and 16.2.4
Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 with
Docker.
Seems to be a
;s not recommended to get an OSD really
> full, so be careful with that. Do you have the option to add more disks?
>
>
> Zitat von Marco Pizzolo :
>
> > Hello Everyone,
> >
> > We seem to be having a problem on one of our ceph clusters post the OS
> > patch and
Hello Everyone,
We seem to be having a problem on one of our ceph clusters post the OS
patch and reboot of one of the nodes. The three other nodes are showing
OSD fill rates of 77%-81%, but the 60 OSDs contained in the host that was
just rebooted are varying between 64% and 90% since the reboot o
t;,
"type_id": 0,
"crush_weight": 5.8218994140625,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1,
"pr
Hello Everyone,
We're working on a new cluster and seeing some oddities. The crush map
viewer is not showing all hosts or OSDs. Cluster is NVMe w/4 hosts, each
having 8 NVMe. Using 2 OSDs per NVMe and Encryption. Using Max size of 3,
Min size of 2:
[image: image.png]
All OSDs appear to exist
Hello Everyone,
I'm still having issues getting the OSDs to properly create on a brand new
Ceph 15.2.2 cluster. I don't see to be able to have OSDs created based on
service definition of 2 osds per disk and encryption. It seems to hang
and/or I see "No Deployments..."
Has anyone had luck with t
Rebooting addressed
On Thu, May 28, 2020 at 4:52 PM Marco Pizzolo
wrote:
> Hello,
>
> Hitting an issue with a new 15.2.2 deployment using cephadm. I am having
> a problem creating encrypted, 2 osds per device OSDs (they are NVMe).
>
> After removing and bootstrapping th
Hello,
Hitting an issue with a new 15.2.2 deployment using cephadm. I am having a
problem creating encrypted, 2 osds per device OSDs (they are NVMe).
After removing and bootstrapping the cluster again, i am unable to create
OSDs as they're locked. sgdisk, wipefs, zap all fail to leave the drive
y did, so I don't
yet know how to avoid a recurrence.
I do very much like Ceph though
Best wishes,
Marco
On Fri, May 1, 2020 at 3:49 PM Marco Pizzolo wrote:
> Understood Paul, thanks.
>
> In case this helps to shed any further light...Digging through logs I'm
> also
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
>>
>>
>>
>> Paul
>>
>> --
>> Paul Emmerich
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
t; croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Fri, May 1, 2020 at 9:04 PM Marco Pizzolo
> wrote:
>
>> Also seeing errors such as this:
>>
>>
>> [2020-05-01 13:15:20,970][systemd][WARNING] command retu
command returned non-zero exit
status: 1
[2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries
left: 9
On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo wrote:
> Hi Ashley,
>
> Thanks for your response. Nothing that I can think of would have
> happened. We are using m
Hi Ashley,
Thanks for your response. Nothing that I can think of would have
happened. We are using max_mds =1. We do have 4 so used to have 3
standby. Within minutes they all crash.
On Fri, May 1, 2020 at 2:21 PM Ashley Merrick
wrote:
> Quickly checking the code that calls that assert
>
>
>
Hello,
Hoping you can help me.
Ceph had been largely problem free for us for the better part of a year.
We have a high file count in a single CephFS filesystem, and are seeing
this error in the logs:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
I think i have it working now... I'm sure i missed something early on that
would have made this simpler.
Thanks Nathan all the same.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Thanks Nathan,
Having worked on this a bit since I did make some progress:
[prdceph04][DEBUG ] connected to host: prdceph04
[prdceph04][DEBUG ] detect platform information from remote host
[prdceph04][DEBUG ] detect machine type
[ceph_deploy.mds][INFO ] Distro info: CentOS Linux 7.7.1908 Core
[c
50 matches
Mail list logo