[ceph-users] NFS version 4.0

2021-02-04 Thread Jens Hyllegaard (Soft Design A/S)
Hi. We are trying to set up an NFS server using ceph which needs to be accessed by an IBM System i. As far as I can tell the IBM System i only supports nfs v. 4. Looking at the nfs-ganesha deployments it seems that these only support 4.1 or 4.2. I have tried editing the configuration file to sup

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Mario Giammarco
Hi Federico, here I am not mixing raid1 with ceph. I am doing a comparison: is it safer to have a server with raid1 disks or two servers with ceph and size=2 min_size=1 ? We are talking about real world examples where a customer is buying a new server and want to choose. Il giorno gio 4 feb 2021 a

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Mario Giammarco
Il giorno gio 4 feb 2021 alle ore 00:33 Simon Ironside < sirons...@caffetine.org> ha scritto: > > > On 03/02/2021 19:48, Mario Giammarco wrote: > > To labour Dan's point a bit further, maybe a RAID5/6 analogy is better > than RAID1. Yes, I know we're not talking erasure coding pools here but > thi

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Mario Giammarco
Il giorno mer 3 feb 2021 alle ore 21:22 Dan van der Ster ha scritto: > > Lastly, if you can't afford 3x replicas, then use 2+2 erasure coding if > possible. > > I will investigate I heard that erasure coding is slow. Anyway I will write here the reason of this thread: In my customers I have usua

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Alexander E. Patrakov
There is a big difference between traditional RAID1 and Ceph. Namely, with Ceph, there are nodes where OSDs are running, and these nodes need maintenance. You want to be able to perform maintenance even if you have one broken OSD, that's why the recommendation is to have three copies with Ceph. The

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Dan van der Ster
On Thu, Feb 4, 2021 at 11:30 AM Mario Giammarco wrote: > > > > Il giorno mer 3 feb 2021 alle ore 21:22 Dan van der Ster > ha scritto: >> >> >> Lastly, if you can't afford 3x replicas, then use 2+2 erasure coding if >> possible. >> > > I will investigate I heard that erasure coding is slow. > >

[ceph-users] Re: mon db high iops

2021-02-04 Thread Seena Fallah
This is my osdmap commit diff: report 4231583130 "osdmap_first_committed": 300814, "osdmap_last_committed": 304062, My disk latency is 25ms because of the high block size that rocksdb is using. should I provide a high-performance disk than I'm using for my monitor nodes? On Thu, Feb 4, 20

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Lionel Bouton
Hi, Le 04/02/2021 à 08:41, Loïc Dachary a écrit : > Hi Frederico, > > On 04/02/2021 05:51, Federico Lucifredi wrote: >> Hi Loïc, >>    I am intrigued, but am missing something: why not using RGW, and store >> the source code files as objects? RGW has native compression and can take >> care of th

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Robert Sander
Hi, Am 04.02.21 um 12:10 schrieb Frank Schilder: > Going to 2+2 EC will not really help On such a small cluster you cannot even use EC because there are not enough independent hosts. As a rule of thumb there should be k+m+1 hosts in a cluster AFAIK. Regards -- Robert Sander Heinlein Support Gm

[ceph-users] NVMe and 2x Replica

2021-02-04 Thread Adam Boyhan
I know there is already a few threads about 2x replication but I wanted to start one dedicated to discussion on NVMe. There are some older threads, but nothing recent that addresses how the vendors are now pushing the idea of 2x. We are in the process of considering Ceph to replace our Nimble s

[ceph-users] Re: replace OSD without PG remapping

2021-02-04 Thread Frank Schilder
Hi Tony, OK, I understand better now as well. I was really wondering why you wanted to avoid the self-healing. Its the main reason for using ceph :) Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Tony Liu Sent:

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Loïc Dachary
On 04/02/2021 12:08, Lionel Bouton wrote: > Hi, > > Le 04/02/2021 à 08:41, Loïc Dachary a écrit : >> Hi Frederico, >> >> On 04/02/2021 05:51, Federico Lucifredi wrote: >>> Hi Loïc, >>>    I am intrigued, but am missing something: why not using RGW, and store >>> the source code files as objects?

[ceph-users] Re: NFS version 4.0

2021-02-04 Thread Daniel Gryniewicz
The preference for 4.1 and later is because 4.0 has a much less useful graceful restart (which is used for HA/failover as well). Ganesha itself supports 4.0 perfectly fine, and it should work fine with Ceph, but HA setups will be much more difficult, and will be limited in functionality. Dan

[ceph-users] Re: db_devices doesn't show up in exported osd service spec

2021-02-04 Thread Jens Hyllegaard (Soft Design A/S)
Hi. I have the same situation. Running 15.2.8 I created a specification that looked just like it. With rotational in the data and non-rotational in the db. First use applied fine. Afterwards it only uses the hdd, and not the ssd. Also, is there a way to remove an unused osd service. I manages to

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Frank Schilder
> - three servers > - three monitors > - 6 osd (two per server) > - size=3 and min_size=2 This is a set-up that I would not run at all. The first one is, that ceph lives on the law of large numbers and 6 is a small number. Hence, your OSD fill-up due to uneven distribution. What comes to my min

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Frank Schilder
> Because you have 3 hosts, 2 osds each, and 3 replicas: ... > So unless your cluster was under 40-50% used, that osd is going to > become overfull. Yes, overlooked this. With 2 disks per host statistics is not yet at play here, its the deterministic case. To run it safe, you need to have at leas

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Matthew Vernon
Hi, On 04/02/2021 07:41, Loïc Dachary wrote: On 04/02/2021 05:51, Federico Lucifredi wrote: Hi Loïc,    I am intrigued, but am missing something: why not using RGW, and store the source code files as objects? RGW has native compression and can take care of that behind the scenes. Excellent

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Eneko Lacunza
Hi all, El 4/2/21 a las 11:56, Frank Schilder escribió: - three servers - three monitors - 6 osd (two per server) - size=3 and min_size=2 This is a set-up that I would not run at all. The first one is, that ceph lives on the law of large numbers and 6 is a small number. Hence, your OSD fill-up

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread DHilsbos
Adam; Earlier this week, another thread presented 3 white papers in support of running 2x on NVMe for Ceph. I searched each to find the section where 2x was discussed. What I found was interesting. First, there are really only 2 positions here: Micron's and Red Hat's. Supermicro copies Micr

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread Anthony D'Atri
> > Maybe the weakest thing in that configuration is having 2 OSDs per node; osd > nearfull must be tuned accordingly so that no OSD goes beyond about 0.45, so > that in case of failure of one disk, the other OSD in the node has enough > space for healing replication. > A careful setting of

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Anthony D'Atri
> I searched each to find the section where 2x was discussed. What I found was > interesting. First, there are really only 2 positions here: Micron's and Red > Hat's. Supermicro copies Micron's positon paragraph word for word. Not > surprising considering that they are advertising a Superm

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Adam Boyhan
All great input and points guys. Helps me lean towards 3 copes a bit more. I mean honestly NVMe cost per TB isn't that much more than SATA SSD now. Somewhat surprised the salesmen aren't pitching 3x replication as it makes them more money. From: "Anthony D'Atri" To: "ceph-users" Sent:

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread DHilsbos
My impression is that cost / TB for a drive may be approaching parity, but the TB /drive is still well below (or at least at densities approaching parity, cost / TB is still quite high). I can get a Micron 15TB SSD for $2600, but why would I when I can get a 18TB Seagate IronWolf for <$600, a 1

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Jack
On 2/4/21 7:17 PM, dhils...@performair.com wrote: hy would I when I can get a 18TB Seagate IronWolf for <$600, a 18TB Seagate Exos for <$500, or a 18TB WD Gold for <$600? IOPS ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an e

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Mark Lehrer
> It seems all the big vendors feel 2x is safe with NVMe but > I get the feeling this community feels otherwise Definitely! As someone who works for a big vendor (and I have since I worked at Fusion-IO way back in the old days), IMO the correct way to phrase this would probably be that "someone i

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Anthony D'Atri
>> Why would I when I can get a 18TB Seagate IronWolf for <$600, a 18TB Seagate >> Exos for <$500, or a 18TB WD Gold for <$600? > > IOPS Some installations don’t care so much about IOPS. Less-tangible factors include: * Time to repair and thus to restore redundancy. When an EC pool of spi

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Steven Pine
Taking a month to weight up a drive suggests the cluster doesn't have enough spare IO capacity. And for everyone suggesting EC, I don't understand how anyone really thinks that's a valid alternative with the min allocation / space amplification bug, no one in this community, not even the top devel

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-04 Thread huxia...@horebdata.cn
>IMO with a cluster this size, you should not ever mark out any OSDs -- >rather, you should leave the PGs degraded, replace the disk (keep the >same OSD ID), then recover those objects to the new disk. >Or, keep it <40% used (which sounds like a waste). Dear Dan, I particularly like your idea of

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Anthony D'Atri
Weighting up slowly so as not to DoS users. Huge omaps and EC. So yes you’re actually agreeing with me. > > Taking a month to weight up a drive suggests the cluster doesn't have > enough spare IO capacity. ___ ceph-users mailing list -- ceph-users@ce

[ceph-users] replace OSD failed

2021-02-04 Thread Tony Liu
Hi, With 15.2.8, run "ceph orch rm osd 12 --replace --force", PGs on osd.12 are remapped, osd.12 is removed from "ceph osd tree", the daemon is removed from "ceph orch ps", the device is "available" in "ceph orch device ls". Everything seems good at this point. Then dry-run service spec. ``` # ca

[ceph-users] Re: replace OSD failed

2021-02-04 Thread Tony Liu
Here is the log from ceph-volume. ``` [2021-02-05 04:03:17,000][ceph_volume.process][INFO ] Running command: /usr/sbin/vgcreate --force --yes ceph-a3886f74-3de9-4e6e-a983-8330eda0bd64 /dev/sdd [2021-02-05 04:03:17,134][ceph_volume.process][INFO ] stdout Physical volume "/dev/sdd" successfully

[ceph-users] Re: replace OSD failed

2021-02-04 Thread Tony Liu
Here is the issue. https://tracker.ceph.com/issues/47758 Thanks! Tony > -Original Message- > From: Tony Liu > Sent: Thursday, February 4, 2021 8:46 PM > To: ceph-users@ceph.io > Subject: [ceph-users] Re: replace OSD failed > > Here is the log from ceph-volume. > ``` > [2021-02-05 04:03:

[ceph-users] log_meta log_data was turned off in multisite and deleted

2021-02-04 Thread Szabo, Istvan (Agoda)
Hi, Is there a way to reinitialize the stored data and make it sync from the logs? Thank you This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. I

[ceph-users] Multisite reshard stale instances

2021-02-04 Thread Szabo, Istvan (Agoda)
Hi, I found 6-700 stale instances with the reshard stale instances list command. Is there a way to clean it up (or actually should I clean it up)? The stale instance rm doesn't work in multisite. Thank you This message is confidential and is for the sole use of t

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Pascal Ehlert
Sorry to jump in here, but would you care to explain why the total disk usage should stay under 60%? This is not something I have heard before and a quick Google search didn't return anything useful. Steven Pine wrote on 04.02.21 20:41: There are a lot of hidden costs in using ceph which can v

[ceph-users] Re: NVMe and 2x Replica

2021-02-04 Thread Brian :
Certainly with a small number of nodes / osd this makes sense as to lose a node could quickly make the cluster storage capacity be full very quickly. On Friday, February 5, 2021, Pascal Ehlert wrote: > Sorry to jump in here, but would you care to explain why the total disk usage should stay under