[ceph-users] Re: 18.2.6 upgrade OSDs fail to mount

2025-04-29 Thread Marco Pizzolo
> result in OSD crash in some cases. > > So better to catch that sooner than later. > > > Thanks, > > Igor > On 29.04.2025 14:27, Marco Pizzolo wrote: > > Hi Igor, > > Thank you so very much for responding so quickly. Interestingly, I don't > rem

[ceph-users] Re: 18.2.6 upgrade OSDs fail to mount

2025-04-29 Thread Marco Pizzolo
sum less or equal > to 1.0. > > Default settings are: > > bluestore_cache_meta_ratio = 0.45 > > bluestore_cache_kv_ratio = 0.45 > > bluestore_cache_kv_onode_ratio = 0.04 > > > Thanks, > > Igor > > > > On 29.04.2025 13:36, Marco Pizzolo wrote: &g

[ceph-users] 18.2.6 upgrade OSDs fail to mount

2025-04-29 Thread Marco Pizzolo
Hello Everyone, I'm upgrading from 18.2.4 to 18.2.6, and I have a 4-node cluster with 8 NVMe's per node. Each NVMe is split into 2 OSDs. The upgrade went through the mgr, mon, crash and began upgrading OSDs. The OSDs it was upgrading were not coming back online. I tried rebooting, and no luck.

[ceph-users] Re: Upgrade 16.2.10 --> 16.2.11 OSD "UPGRADE_REDEPLOY_DAEMON" failed

2023-03-20 Thread Marco Pizzolo
Is this by chance corrected in 17.2.5 already? If so, how can we pivot mid-upgrade to 17.2.5? Thanks, On Mon, Mar 20, 2023 at 6:14 PM Marco Pizzolo wrote: > Hello Everyone, > > We made the mistake of trying to patch to 16.2.11 from 16.2.10 which has > been stable as we felt that

[ceph-users] Upgrade 16.2.10 --> 16.2.11 OSD "UPGRADE_REDEPLOY_DAEMON" failed

2023-03-20 Thread Marco Pizzolo
Hello Everyone, We made the mistake of trying to patch to 16.2.11 from 16.2.10 which has been stable as we felt that 16.2.11 had been out for a while already. As luck would have it, we are having failure after failure with OSDs not upgrading successfully, and have 355 more OSDs to go. I'm pretty

[ceph-users] Re: 16.2.10 Cephfs with CTDB, Samba running on Ubuntu

2022-09-13 Thread Marco Pizzolo
mba servers if you're in need of that. > > > > Overall I would say it works out quite well together, and is probably > > the best method to get CephFS connected to any MacOS or Windows clients. > > > > -Original Message- > > From: Marco Pizzolo >

[ceph-users] Re: 16.2.10 Cephfs with CTDB, Samba running on Ubuntu

2022-09-07 Thread Marco Pizzolo
domain member samba servers if you're in need of that. > > Overall I would say it works out quite well together, and is probably the > best method to get CephFS connected to any MacOS or Windows clients. > > -Original Message- > From: Marco Pizzolo > Sent: September 6, 2022 2:

[ceph-users] 16.2.10 Cephfs with CTDB, Samba running on Ubuntu

2022-09-06 Thread Marco Pizzolo
Hello Everyone, We are looking at clustering Samba with CTDB to have highly available access to CephFS for clients. I wanted to see how others have implemented, and their experiences so far. Would welcome all feedback, and of course if you happen to have any documentation on what you did so that

[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?

2021-12-15 Thread Marco Pizzolo
0+ OSD cluster, and also just recently > in a test Luminous cluster (size=3 to size=2). In order for the purge to > actually happen, I had to restart every OSD (one at a time for safety, or > just run ceph-ansible site.yml with the osd handler health check = true). > > On We

[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?

2021-12-14 Thread Marco Pizzolo
ta. > > Perhaps the risks are not immediately visible in normal operation, but > in the event of a failure, the potential loss of data must be accepted. > > Regards, Joachim > > > ___ > > Clyso GmbH - ceph foundation member > > Am 10.

[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?

2021-12-14 Thread Marco Pizzolo
seniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > > On Fri, 10 Dec 2021 at 18:05, Marco Pizzolo > wrote: > >> Hello, >> >> As par

[ceph-users] Experience reducing size 3 to 2 on production cluster?

2021-12-10 Thread Marco Pizzolo
Hello, As part of a migration process where we will be swinging Ceph hosts from one cluster to another we need to reduce the size from 3 to 2 in order to shrink the footprint sufficiently to allow safe removal of an OSD/Mon node. The cluster has about 500M objects as per dashboard, and is about 1

[ceph-users] Re: 16.2.6 Convert Docker to Podman?

2021-12-10 Thread Marco Pizzolo
Forgot to confirm, was this process non-destructive in terms of data in OSDs? Thanks again, On Fri, Dec 10, 2021 at 9:23 AM Marco Pizzolo wrote: > Robert, Roman and Weiwen Hu, > > Thank you very much for your responses. I presume one host at a time, and > the redeploy will take

[ceph-users] Re: 16.2.6 Convert Docker to Podman?

2021-12-10 Thread Marco Pizzolo
deploy for every daemon with "ceph orch daemon redeploy > > osd.{id}" > > > > ~ Roman > > We have switched to podman with similar process. Stop systemd units, > install > podman, redeploy, and done. > > Weiwen Hu > > > On Thu, 9 Dec 2021 at 16:27,

[ceph-users] 16.2.6 Convert Docker to Podman?

2021-12-09 Thread Marco Pizzolo
Hello Everyone, In an attempt to futureproof, I am beginning to look for information on how one would go about moving to podman from docker on a cephadm 16.2.6 installation on ubuntu 20.04.3. I would be interested to know if anyone else has contemplated or performed something similar, and what th

[ceph-users] Re: 16.2.6 SMP NOPTI - OSD down - Node Exporter Tainted

2021-11-17 Thread Marco Pizzolo
Hello, Not sure whether this is perhaps related: https://bugs.launchpad.net/ubuntu/+source/linux-meta-gcp-5.11/+bug/1948471 Any insight would be appreciated Thanks, Marco On Wed, Nov 17, 2021 at 9:18 AM Marco Pizzolo wrote: > Good day everyone, > > This is a bit of a recurring the

[ceph-users] 16.2.6 SMP NOPTI - OSD down - Node Exporter Tainted

2021-11-17 Thread Marco Pizzolo
Good day everyone, This is a bit of a recurring theme for us on a new deployment performed at 16.2.6 on Ubuntu 20.04.3 with HWE stack. We have had good stability over the past 3 weeks or so copying data, and we now have about 230M objects (470TB of 1PB used) and we have had 1 OSD drop from each o

[ceph-users] Re: 16.2.6 OSD down, out but container running....

2021-10-27 Thread Marco Pizzolo
see if there is something wrong with this disk? > Maybe also do a self-test. > > > > 从 Windows 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>发送 > > > > *发件人: *Marco Pizzolo > *发送时间: *2021年10月28日 1:17 > *收件人: *胡 玮文 > *抄送: *ceph-users > *主题: *Re:

[ceph-users] Re: 16.2.6 OSD down, out but container running....

2021-10-27 Thread Marco Pizzolo
Is there any command or log I can provide a sample from that would help to pinpoint the issue? The 119 of 120 OSDs are working correctly by all accounts, but I am just unable to have the bring the last one fully online. Thank you, On Tue, Oct 26, 2021 at 3:59 PM Marco Pizzolo wrote: > Tha

[ceph-users] Re: 16.2.6 OSD down, out but container running....

2021-10-26 Thread Marco Pizzolo
ock) read stalled read 0x40408c~2000> Oct 26 19:47:35 conmon[780974]: debug 2021-10-26T19:47:35.914+ 7f08e65a4080 -1 bdev(0x56377e188800 /var/lib/ceph/osd/ceph-13/block) read stalled read 0x40409b~1000> Oct 26 19:48:39 conmon[780974]: debug 2021-10-26T19:48:39.110

[ceph-users] 16.2.6 OSD down, out but container running....

2021-10-25 Thread Marco Pizzolo
Hello Everyone, I'm seeing an issue where the podman container is running, but the osd is being reported as down and out. restarting service doesn't help, neither does rebooting the host. What am I missing? Thanks, Marco ___ ceph-users mailing list --

[ceph-users] 16.2.6 OSD Heartbeat Issues

2021-10-19 Thread Marco Pizzolo
Hi Everyone, For a new build we tested the 5.4 kernel which wasn't working well for us and ultimately changed to Ubuntu 20.04.3 HWE and 5.11 kernel. We can now get all OSDs more or less up, but on a clean OS reinstall we are seeing this type of behavior that is causing slow ops even before any poo

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-18 Thread Marco Pizzolo
21 at 2:33 PM Zakhar Kirpichenko wrote: > Indeed, this is the PVE forum post I saw earlier. > > /Z > > On Tue, Oct 12, 2021 at 9:27 PM Marco Pizzolo > wrote: > >> Igor, >> >> Thanks for the response. One that I found was: >> https://forum.proxmox.com/thre

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Marco Pizzolo
rum. Specifically, 5.11.x with Ceph seems to be hitting kernel NULL > > pointer dereference. Perhaps a newer kernel would help. If not, I'm > running > > 16.2.6 with kernel 5.4.x without any issues. > > > > Best regards, > > Z > > > > On Tue, Oct 12, 2021 a

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Marco Pizzolo
ould help. If not, I'm running > 16.2.6 with kernel 5.4.x without any issues. > > Best regards, > Z > > On Tue, Oct 12, 2021 at 8:31 PM Marco Pizzolo > wrote: > >> Hello everyone, >> >> We are seeing instability in 20.04.3 using HWE kernel and Ceph

[ceph-users] OSD Crashes in 16.2.6

2021-10-12 Thread Marco Pizzolo
Hello everyone, We are seeing instability in 20.04.3 using HWE kernel and Ceph 16.2.6 w/Podman. We have OSDs that fail after <24 hours and I'm not sure why. Seeing this: ceph crash info 2021-10-12T14:32:49.169552Z_d1ee94f7-1aaa-4221-abeb-68bd56d3c763 { "backtrace": [ "/lib64/libpthr

[ceph-users] Re: 16.2.6 CEPHADM_REFRESH_FAILED New Cluster

2021-09-28 Thread Marco Pizzolo
a fix for this, but I'm not sure how to deal with it > using the current 16.2.6 image. Maybe others will have some ideas. > > On Mon, Sep 27, 2021 at 10:10 AM Marco Pizzolo > wrote: > >> Good morning Adam, Ceph users, >> >> Is there something we can do to e

[ceph-users] Re: 16.2.6 CEPHADM_REFRESH_FAILED New Cluster

2021-09-27 Thread Marco Pizzolo
Good morning Adam, Ceph users, Is there something we can do to extend the acceptable response size? Trying to understand if there is some viable workaround that we can implement. Thanks, Marco On Fri, Sep 24, 2021 at 2:59 PM Marco Pizzolo wrote: > Hi Adam, > > I really appreciate

[ceph-users] 16.2.6 CEPHADM_REFRESH_FAILED New Cluster

2021-09-24 Thread Marco Pizzolo
Hello Everyone, If you have any suggestions on the cause, or what we can do I'd certainly appreciate it. I'm seeing the following on a newly stood up cluster using Podman on Ubuntu 20.04.3 HWE: Thank you very much Marco Sep 24, 2021, 1:24:30 PM [ERR] cephadm exited with an error code: 1, std

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-06-01 Thread Marco Pizzolo
e OSDs begin to activate but it does not bring up more than 32 OSDs. tomorrow we will try adding the OSD daemons individually, but there seems to be another underlying issue as well. Marco On Tue, Jun 1, 2021 at 1:42 PM Marco Pizzolo wrote: > Peter and fellow Ceph users, > > I just wanted

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-06-01 Thread Marco Pizzolo
the image tomorrow based on that version. I do agree, I feel this breaks > new deploys as well as existing, and hope a point release will come soon > that includes the fix. > > On May 31, 2021, at 15:33, Marco Pizzolo wrote: > >  > David, > > What I can confirm is that if

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-05-31 Thread Marco Pizzolo
et beyond this issue: > > $ cat Dockerfile > FROM docker.io/ceph/ceph:v16.2.3 > COPY process.py /lib/python3.6/site-packages/remoto/process.py > > The process.py is the patched version we submitted here: > > https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f2

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-05-31 Thread Marco Pizzolo
t the disks disappear, not stop working but not > detected by Linux, which makes me think I'm hitting some kernel limit. > > At this point I'm going to cut my loses and give up and use the small > slightly more powerful 30x drive systems I have (with 256g memory), maybe > tran

[ceph-users] Re: Fwd: Re: Ceph osd will not start.

2021-05-29 Thread Marco Pizzolo
see > if you see the lock/the behavior matches, if so - then it may help you > out. The only change in that image is that patch to remoto being > overlaid on the default 16.2.3 image. > > On Fri, May 28, 2021 at 1:15 PM Marco Pizzolo > wrote: > > > > Peter, > >

[ceph-users] Fwd: Re: Ceph osd will not start.

2021-05-28 Thread Marco Pizzolo
Peter, We're seeing the same issues as you are. We have 2 new hosts Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB SED drives and we have tried both 15.2.13 and 16.2.4 Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 with Docker. Seems to be a

[ceph-users] Re: 14.2.16 Low space hindering backfill after reboot

2021-01-28 Thread Marco Pizzolo
;s not recommended to get an OSD really > full, so be careful with that. Do you have the option to add more disks? > > > Zitat von Marco Pizzolo : > > > Hello Everyone, > > > > We seem to be having a problem on one of our ceph clusters post the OS > > patch and

[ceph-users] 14.2.16 Low space hindering backfill after reboot

2021-01-25 Thread Marco Pizzolo
Hello Everyone, We seem to be having a problem on one of our ceph clusters post the OS patch and reboot of one of the nodes. The three other nodes are showing OSD fill rates of 77%-81%, but the 60 OSDs contained in the host that was just rebooted are varying between 64% and 90% since the reboot o

[ceph-users] Re: 15.2.3 Crush Map Viewer problem.

2020-06-02 Thread Marco Pizzolo
t;, "type_id": 0, "crush_weight": 5.8218994140625, "depth": 2, "pool_weights": {}, "exists": 1, "status": "up", "reweight": 1, "pr

[ceph-users] 15.2.3 Crush Map Viewer problem.

2020-06-01 Thread Marco Pizzolo
Hello Everyone, We're working on a new cluster and seeing some oddities. The crush map viewer is not showing all hosts or OSDs. Cluster is NVMe w/4 hosts, each having 8 NVMe. Using 2 OSDs per NVMe and Encryption. Using Max size of 3, Min size of 2: [image: image.png] All OSDs appear to exist

[ceph-users] Is 2 osds per disk, encryption possible with cephadm on 15.2.2?

2020-05-29 Thread Marco Pizzolo
Hello Everyone, I'm still having issues getting the OSDs to properly create on a brand new Ceph 15.2.2 cluster. I don't see to be able to have OSDs created based on service definition of 2 osds per disk and encryption. It seems to hang and/or I see "No Deployments..." Has anyone had luck with t

[ceph-users] Re: Octopus 15.2.2 unable to make drives available (reject reason locked)...

2020-05-28 Thread Marco Pizzolo
Rebooting addressed On Thu, May 28, 2020 at 4:52 PM Marco Pizzolo wrote: > Hello, > > Hitting an issue with a new 15.2.2 deployment using cephadm. I am having > a problem creating encrypted, 2 osds per device OSDs (they are NVMe). > > After removing and bootstrapping th

[ceph-users] Octopus 15.2.2 unable to make drives available (reject reason locked)...

2020-05-28 Thread Marco Pizzolo
Hello, Hitting an issue with a new 15.2.2 deployment using cephadm. I am having a problem creating encrypted, 2 osds per device OSDs (they are NVMe). After removing and bootstrapping the cluster again, i am unable to create OSDs as they're locked. sgdisk, wipefs, zap all fail to leave the drive

[ceph-users] Re: 14.2.9 MDS Failing

2020-05-01 Thread Marco Pizzolo
y did, so I don't yet know how to avoid a recurrence. I do very much like Ceph though Best wishes, Marco On Fri, May 1, 2020 at 3:49 PM Marco Pizzolo wrote: > Understood Paul, thanks. > > In case this helps to shed any further light...Digging through logs I'm > also

[ceph-users] Re: 14.2.9 MDS Failing

2020-05-01 Thread Marco Pizzolo
> croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > >> >> >> >> Paul >> >> -- >> Paul Emmerich >> >> Looking for help with your Ceph cluster? Contact us at https://croit.io >>

[ceph-users] Re: 14.2.9 MDS Failing

2020-05-01 Thread Marco Pizzolo
t; croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > > On Fri, May 1, 2020 at 9:04 PM Marco Pizzolo > wrote: > >> Also seeing errors such as this: >> >> >> [2020-05-01 13:15:20,970][systemd][WARNING] command retu

[ceph-users] Re: 14.2.9 MDS Failing

2020-05-01 Thread Marco Pizzolo
command returned non-zero exit status: 1 [2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries left: 9 On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo wrote: > Hi Ashley, > > Thanks for your response. Nothing that I can think of would have > happened. We are using m

[ceph-users] Re: 14.2.9 MDS Failing

2020-05-01 Thread Marco Pizzolo
Hi Ashley, Thanks for your response. Nothing that I can think of would have happened. We are using max_mds =1. We do have 4 so used to have 3 standby. Within minutes they all crash. On Fri, May 1, 2020 at 2:21 PM Ashley Merrick wrote: > Quickly checking the code that calls that assert > > >

[ceph-users] 14.2.9 MDS Failing

2020-05-01 Thread Marco Pizzolo
Hello, Hoping you can help me. Ceph had been largely problem free for us for the better part of a year. We have a high file count in a single CephFS filesystem, and are seeing this error in the logs: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/

[ceph-users] Re: As mon should be deployed in odd numbers, and I have a fourth node, can I deploy a fourth mds only? - 14.2.7

2020-02-07 Thread Marco Pizzolo
I think i have it working now... I'm sure i missed something early on that would have made this simpler. Thanks Nathan all the same. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: As mon should be deployed in odd numbers, and I have a fourth node, can I deploy a fourth mds only? - 14.2.7

2020-02-07 Thread Marco Pizzolo
Thanks Nathan, Having worked on this a bit since I did make some progress: [prdceph04][DEBUG ] connected to host: prdceph04 [prdceph04][DEBUG ] detect platform information from remote host [prdceph04][DEBUG ] detect machine type [ceph_deploy.mds][INFO ] Distro info: CentOS Linux 7.7.1908 Core [c