[ceph-users] Re: ceph octopus centos7, containers, cephadm

2020-10-23 Thread Marc Roos
No clarity on this? -Original Message- To: ceph-users Subject: [ceph-users] ceph octopus centos7, containers, cephadm I am running Nautilus on centos7. Does octopus run similar as nautilus thus: - runs on el7/centos7 - runs without containers by default - runs without cephadm by defa

[ceph-users] Re: ceph octopus centos7, containers, cephadm

2020-10-23 Thread Dan van der Ster
I'm not sure I understood the question. If you're asking if you can run octopus via RPMs on el7 without the cephadm and containers orchestration, then the answer is yes. -- dan On Fri, Oct 23, 2020 at 9:47 AM Marc Roos wrote: > > > No clarity on this? > > -Original Message- > To: ceph-u

[ceph-users] Re: ceph octopus centos7, containers, cephadm

2020-10-23 Thread David Majchrzak, ODERLAND Webbhotell AB
Hi! Runs on el7: https://download.ceph.com/rpm-octopus/el7/x86_64/ Runs as usual without containers by default - if you use cephadm for deployments then it will use containers. cephadm is one way to do deployments, you can however deploy whichever way you want (manually etc). -- David Maj

[ceph-users] Re: 14.2.12 breaks mon_host pointing to Round Robin DNS entry

2020-10-23 Thread Burkhard Linke
Hi, non round robin entries with multiple mon host FQDNs are also broken. Regards, Burkhard ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Hardware needs for MDS for HPC/OpenStack workloads?

2020-10-23 Thread Stefan Kooman
On 2020-10-22 14:34, Matthew Vernon wrote: > Hi, > > We're considering the merits of enabling CephFS for our main Ceph > cluster (which provides object storage for OpenStack), and one of the > obvious questions is what sort of hardware we would need for the MDSs > (and how many!). Is it a many pa

[ceph-users] Re: Strange USED size

2020-10-23 Thread Eugen Block
Hi, did you delete lots of objects recently? That operation is slow and ceph takes some time to catch up. If the value is not decreasing post again with 'ceph osd df' output. Regards, Eugen Zitat von Marcelo : Hello. I've searched a lot but couldn't find why the size of USED column in t

[ceph-users] Re: Rados Crashing

2020-10-23 Thread Eugen Block
Hi, I read that civetweb and radosgw have a locking issue in combination with ssl [1], just a thought based on failed to acquire lock on obj_delete_at_hint.79 Since Nautilus the default rgw frontend is beast, have you thought about switching? Regards, Eugen [1] https://tracke

[ceph-users] Re: [EXTERNAL] Re: 14.2.12 breaks mon_host pointing to Round Robin DNS entry

2020-10-23 Thread Van Alstyne, Kenneth
Jason/Wido, et al: I was hitting this exact problem when attempting to update from 14.2.11 to 14.2.12. I reverted the two commits associated with that pull request and was able to successfully upgrade to 14.2.12. Everything seems normal, now. Thanks, -- Kenneth Van Alstyne Systems Archi

[ceph-users] Re: OSD Failures after pg_num increase on one of the pools

2020-10-23 Thread Eugen Block
Hi, do you see any peaks on the OSD nodes like OOM killer etc.? Instead of norecover flag I would try the nodown and noout flags to prevent flapping OSDs. What was the previous pg_num before you increased to 512? Regards, Eugen Zitat von Артём Григорьев : Hello everyone, I created a ne

[ceph-users] Re: Ceph Octopus and Snapshot Schedules

2020-10-23 Thread Adam Boyhan
Care to provide anymore detail? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: desaster recovery Ceph Storage , urgent help needed

2020-10-23 Thread Burkhard Linke
Hi, your mail is formatted in a way that makes it impossible to get all information, so a number of questions first: - are the mons up, or are the mon up and in a quorum? you cannot change mon IP addresses without also adjusting them in the mon map. use the daemon socket on the systems to

[ceph-users] Re: multiple OSD crash, unfound objects

2020-10-23 Thread Frank Schilder
Hi Michael. > I still don't see any traffic to the pool, though I'm also unsure how much > traffic is to be expected. Probably not much. If ceph df shows that the pool contains some objects, I guess that's sorted. That osdmaptool crashes indicates that your cluster runs with corrupted interna

[ceph-users] Re: ceph octopus centos7, containers, cephadm

2020-10-23 Thread Marc Roos
yes that was it. I see so many messages here about these, I was wondering if it was a default. -Original Message- Cc: ceph-users Subject: Re: [ceph-users] Re: ceph octopus centos7, containers, cephadm I'm not sure I understood the question. If you're asking if you can run octopus vi

[ceph-users] Re: desaster recovery Ceph Storage , urgent help needed

2020-10-23 Thread Burkhard Linke
Hi, On 10/23/20 2:22 PM, Gerhard W. Recher wrote: This is a proxmox cluster ... sorry for formating problems of my post :( short plot, we messed with ip addr. change of public network, so monitors went down. *snipsnap* so howto recover from this disaster ? # ceph -s   cluster:     id:  

[ceph-users] Re: Ceph Octopus

2020-10-23 Thread Amudhan P
Hi Eugen, I did the same step specified but OSD is not updated cluster address. On Tue, Oct 20, 2020 at 2:52 PM Eugen Block wrote: > > I wonder if this would be impactful, even if `nodown` were set. > > When a given OSD latches onto > > the new replication network, I would expect it to want t

[ceph-users] Re: Large map object found

2020-10-23 Thread Peter Eisch
Perfect -- many thanks Dominic! I haven't found a doc which notes the --num-shards needs to be a power of two. It isn't I don't believe you -- just haven't seen that anywhere. peter  Peter Eisch Senior Site Reliability Engineer T1.612.445.5135 virginpulse.com Australia | Bosnia and Herzegovin

[ceph-users] Re: Ceph Octopus

2020-10-23 Thread Eugen Block
Did you restart the OSD containers? Does ceph config show your changes? ceph config get mon cluster_network ceph config get mon public_network Zitat von Amudhan P : Hi Eugen, I did the same step specified but OSD is not updated cluster address. On Tue, Oct 20, 2020 at 2:52 PM Eugen Block

[ceph-users] Re: Hardware needs for MDS for HPC/OpenStack workloads?

2020-10-23 Thread Nathan Fish
Regarding MDS pinning, we have our home directories split into u{0..9} for legacy reasons, and while adding more MDS' helped a little, pinning certain u? to certain MDS' helped greatly. The automatic migration between MDS' killed performance. This is an unusually perfect workload for pinning, as we

[ceph-users] TOO_FEW_PGS warning and pg_autoscale

2020-10-23 Thread Peter Eisch
Hi, # ceph health detail HEALTH_WARN too few PGs per OSD (24 < min 30) TOO_FEW_PGS too few PGs per OSD (24 < min 30) ceph version 14.2.9 This warning popped up when autoscale shrunk a pool from pg_num and pgp_num from 512 to 256 on its own. The hdd35 storage is only used by this pool. I have

[ceph-users] Re: Ceph Octopus

2020-10-23 Thread Amudhan P
Hi Eugen, ceph config output shows set network address. I have not restarted containers directly I was trying the command `ceph orch restart osd.46` I think that was a problem now after running `ceph orch daemon restart osd.46` it's showing changes in dashboard. Thanks. On Fri, Oct 23, 2020 at

[ceph-users] Re: Hardware for new OSD nodes.

2020-10-23 Thread Eneko Lacunza
Hi Anthony, El 22/10/20 a las 18:34, Anthony D'Atri escribió: Yeah, didn't think about a RAID10 really, although there wouldn't be enough space for 8x300GB = 2400GB WAL/DBs. 300 is overkill for many applications anyway. Yes, but he has spillover with 1600GB/12 WAL/DB. Seems he can make use

[ceph-users] Re: Hardware for new OSD nodes.

2020-10-23 Thread Eneko Lacunza
Hi Brian, El 22/10/20 a las 18:41, Brian Topping escribió: On Oct 22, 2020, at 10:34 AM, Anthony D'Atri wrote: - You must really be sure your raid card is dependable. (sorry but I have seen so much management problems with top-tier RAID cards I avoid them like the plague). This. I’d

[ceph-users] Re: Hardware for new OSD nodes.

2020-10-23 Thread Eneko Lacunza
Hi Dave, El 22/10/20 a las 19:43, Dave Hall escribió: El 22/10/20 a las 16:48, Dave Hall escribió: (BTW, Nautilus 14.2.7 on Debian non-container.) We're about to purchase more OSD nodes for our cluster, but I have a couple questions about hardware choices.  Our original nodes were 8 x 12T

[ceph-users] OSD down, how to reconstruct it from its main and block.db parts ?

2020-10-23 Thread Wladimir Mutel
Dear all, after breaking my experimental 1-host Ceph cluster and making one its pg 'incomplete' I left it in abandoned state for some time. Now I decided to bring it back into life and found that it can not start one of its OSDs (osd.1 to name it) "ceph osd df" shows : ID CLASS WEIGHT REW

[ceph-users] Re: Strange USED size

2020-10-23 Thread Anthony D'Atri
10B as in ten bytes? By chance have you run `rados bench` ? Sometimes a run is interrupted or one forgets to clean up and there are a bunch of orphaned RADOS objects taking up space, though I’d think `ceph df` would reflect that. Is your buckets.data pool replicated or EC? > On Oct 22, 2020

[ceph-users] desaster recovery Ceph Storage , urgent help needed

2020-10-23 Thread Gerhard W. Recher
Hi I have a worst case, osd's in a 3 node cluster each 4 nvme's won't start we had a ip config change in public network, and mon's died so we managed mon's to come back with new ip's. corosync on 2 rings is fine, all 3 mon's are up osd's won't start how to get back to the pool, already 3vm's ar

[ceph-users] Re: Large map object found

2020-10-23 Thread DHilsbos
Peter; As with many things in Ceph, I don’t believe it’s a hard and fast rule (i.e. noon power of 2 will work). I believe the issues are performance, and balance. I can't confirm that. Perhaps someone else on the list will add their thoughts. Has your warning gone away? Thank you, Domini

[ceph-users] Re: Hardware for new OSD nodes.

2020-10-23 Thread Brian Topping
Yes the UEFI problem with mirrored mdraid boot is well-documented. I’ve generally been working with BIOS partition maps which do not have the single point of failure UEFI has (/boot can be mounted as mirrored, any of them can be used as non-RAID by GRUB). But BIOS maps have problems as well with

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-23 Thread David C
Success! I remembered I had a server I'd taken out of the cluster to investigate some issues, that had some good quality 800GB Intel DC SSDs, dedicated an entire drive to swap, tuned up min_free_kbytes, added an MDS to that server and let it run. Took 3 - 4 hours but eventually came back online. I

[ceph-users] Re: Large map object found

2020-10-23 Thread Peter Eisch
Yes, the OMAP warning has cleared after running the deep-scrub, with all the swiftness. Thanks again!  Peter Eisch Senior Site Reliability Engineer T1.612.445.5135 virginpulse.com Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland | United Kingdom | USA Confidentia

[ceph-users] Re: desaster recovery Ceph Storage , urgent help needed

2020-10-23 Thread Gerhard W. Recher
This is a proxmox cluster ... sorry for formating problems of my post :( short plot, we messed with ip addr. change of public network, so monitors went down. we changed monitor information in ceph.conf and with ceph-mon -i pve01 --extract-monmap /tmp/monmap monmaptool --rm pve01 --rm pve02 --rm

[ceph-users] Re: desaster recovery Ceph Storage , urgent help needed

2020-10-23 Thread Eneko Lacunza
Hace you tried to recover old IPs ? El 23/10/20 a las 14:22, Gerhard W. Recher escribió: This is a proxmox cluster ... sorry for formating problems of my post :( short plot, we messed with ip addr. change of public network, so monitors went down. we changed monitor information in ceph.conf a

[ceph-users] Ceph and ram limits

2020-10-23 Thread Ing . Luis Felipe Domínguez Vega
Since some days ago i am recoveryng my ceph cluster, all start with OSD been killed by OOM, well i created a script to delete from the OSD the PGs corrupted (i write corrupted because that pg is the cause of the 100% of RAM usage by OSD). Great, almost done with all OSDs of my cluster, then the

[ceph-users] Re: OSD Failures after pg_num increase on one of the pools

2020-10-23 Thread Григорьев Артём Дмитриевич
It was ok in monitoring and logs, OSD nodes have plenty of available cpu and ram. Previous pg_num was 256. From: Eugen Block Sent: Friday, October 23, 2020 2:06:27 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: OSD Failures after pg_num increase on one of t

[ceph-users] Re: desaster recovery Ceph Storage , urgent help needed

2020-10-23 Thread Gerhard W. Recher
yep, I now reverted ip changes osd's still do not come up and i see no error in ceph.log , osd logs are empty ... Gerhard W. Recher net4sec UG (haftungsbeschränkt) Leitenweg 6 86929 Penzing +49 8191 4283888 +49 171 4802507 Am 23.10.2020 um 14:28 schrieb Eneko Lacunza: > Hace you tried to rec

[ceph-users] Re: [External Email] Re: Hardware for new OSD nodes.

2020-10-23 Thread Dave Hall
Brian, Eneko, BTW, the Tyan LFF chassis we've been using has 12 x 3.5" bays in front and 2 x 2.5" SATA bays in back.  We've been using 240GB SSDs in the rear bays for mirrored boot drives, so any NVMe we add is exclusively for OSD support.  -Dave Dave Hall Binghamton University kdh...@bingh

[ceph-users] Re: [External Email] Re: Hardware for new OSD nodes.

2020-10-23 Thread Dave Hall
Eneko, # ceph health detail HEALTH_WARN BlueFS spillover detected on 7 OSD(s) BLUEFS_SPILLOVER BlueFS spillover detected on 7 OSD(s) osd.1 spilled over 648 MiB metadata from 'db' device (28 GiB used of 124 GiB) to slow device osd.3 spilled over 613 MiB metadata from 'db' device (28 GiB u

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-23 Thread Patrick Donnelly
On Fri, Oct 23, 2020 at 9:02 AM David C wrote: > > Success! > > I remembered I had a server I'd taken out of the cluster to > investigate some issues, that had some good quality 800GB Intel DC > SSDs, dedicated an entire drive to swap, tuned up min_free_kbytes, > added an MDS to that server and le