[ceph-users] Re: how many monitor should to deploy in a 1000+ osd cluster

2019-09-25 Thread Wido den Hollander
On 9/26/19 5:05 AM, zhanrzh...@teamsun.com.cn wrote: > Thanks for your reply. > We don't maintain it frequently. > My confusion is whether the more monitor is more advantage for > client(osd,rbdclient...) to get clustermap. > Do All clients comunicate with  one monitor  of the  cluster at the m

[ceph-users] Re: Slow Write Issues

2019-09-25 Thread Konstantin Shalygin
On 9/25/19 12:39 AM, João Victor Rodrigues Soares wrote: My question is if what is happening may have to do with the amount of disk dedicated to DB / WAL. In the CEPH documentation it says it is recommended that the block.db size is not smaller than 4% of block. In this case for each disk in

[ceph-users] Re: Cephfs + docker

2019-09-25 Thread Patrick Hein
Hi, I am also using CephFS with Docker for the same reason you said. Also Ubuntu 18.04. I used the kernel client before Nautilus, but now FUSE, because the kernel client is to old (might work now with the newest HWE kernel). I don't have any problems at all, neither in Portainer nor any other con

[ceph-users] Re: how many monitor should to deploy in a 1000+ osd cluster

2019-09-25 Thread zhanrzh...@teamsun.com.cn
Thanks for your reply. We don't maintain it frequently. My confusion is whether the more monitor is more advantage for client(osd,rbdclient...) to get clustermap. Do All clients comunicate with  one monitor  of the  cluster at the mean time ? If not  how client to decide to communicat with which

[ceph-users] Re: CephFS deleted files' space not reclaimed

2019-09-25 Thread Gregory Farnum
On Mon, Sep 23, 2019 at 6:50 AM Josh Haft wrote: > > Hi, > > I've been migrating data from one EC pool to another EC pool: two > directories are mounted with ceph.dir.layout.pool file attribute set > appropriately, then rsync from old to new and finally, delete the old > files. I'm using the kerne

[ceph-users] Re: download.ceph.com repository changes

2019-09-25 Thread Ken Dreyer
On Wed, Sep 25, 2019 at 1:56 PM Sasha Litvak wrote: > > I guess for me the more crucial questions should be answered: > > 1. How can a busted release be taken out of repos (some metadata > update I hope)? It's hard to define the word "busted" in a way that satisfies everyone. For example, in

[ceph-users] Re: Luminous 12.2.12 "clients failing to respond to capability release" & "MDSs report slow requests" error

2019-09-25 Thread Marc Roos
Worked around it by doing a fail of the mds because I read somewhere about restarting it. However would be nice to know what causes this and how to prevent it. ceph mds fail c -Original Message- Subject: [ceph-users] Re: Luminous 12.2.12 "clients failing to respond to capability re

[ceph-users] Re: Luminous 12.2.12 "clients failing to respond to capability release" & "MDSs report slow requests" error

2019-09-25 Thread Marc Roos
These are not excessive values, are they? How to resolve this? [@~]# ceph daemon mds.c cache status { "pool": { "items": 266303962, "bytes": 7599982391 } } [@~]# ceph daemon mds.c objecter_requests { "ops": [], "linger_ops": [], "pool_ops": [], "pool_st

[ceph-users] Re: download.ceph.com repository changes

2019-09-25 Thread Sasha Litvak
I guess for me the more crucial questions should be answered: 1. How can a busted release be taken out of repos (some metadata update I hope)? 2. Can some fix(es) be added into a test release so they can be accessed by community and tested / used before next general release is avaialble. I was

[ceph-users] Cephfs + docker

2019-09-25 Thread Alex L
Hi, I am trying to figure out why my portainer and pi-hole in docker keeps getting broken databases. All other docker applications are working flawlessly but not these. I am running Ubuntu 18.04 + kernel ceph mount for the data directory. Have looked at how others do it, and they seem to all u

[ceph-users] Re: Ceph NIC partitioning (NPAR)

2019-09-25 Thread solarflow99
I kind of doubt this will provide much of an advantage, I think recovery is the only time you might have some chance of speedup, but i'm not sure network throughput is always the bottleneck. There was some discussion a while back about this, client IO is still going to be impacted by recovery. O

[ceph-users] Re: how many monitor should to deploy in a 1000+ osd cluster

2019-09-25 Thread Wido den Hollander
On 9/25/19 6:52 PM, Nathan Fish wrote: > You don't need more mons to scale; but going to 5 mons would make the > cluster more robust, if it is cheap for you to do so. > If you assume that 1 mon rebooting for updates or maintenance is > routine, then 2/3 is vulnerable to one failure. 4/5 can survi

[ceph-users] Re: how many monitor should to deploy in a 1000+ osd cluster

2019-09-25 Thread Nathan Fish
You don't need more mons to scale; but going to 5 mons would make the cluster more robust, if it is cheap for you to do so. If you assume that 1 mon rebooting for updates or maintenance is routine, then 2/3 is vulnerable to one failure. 4/5 can survive an unexpected additional failure while one is

[ceph-users] Re: how many monitor should to deploy in a 1000+ osd cluster

2019-09-25 Thread Olivier AUDRY
hello as far as I know and according to the document the mon just share the cluster map with the client. Not the data "Storage cluster clients retrieve a copy of the cluster map from the Ceph Monitor." https://docs.ceph.com/docs/master/architecture/ Le mercredi 25 septembre 2019 à 22:25 +0800,

[ceph-users] Re: how many monitor should to deploy in a 1000+ osd cluster

2019-09-25 Thread Alex Gorbachev
On Wed, Sep 25, 2019 at 10:26 AM 展荣臻(信泰) wrote: > > hi all: > I have a production cluster, and it had 24 hosts (528 osds,3mons) at a > former. > Now we want to add 36 hosts so the osd increase to 1320 . > We use 3 mons in this size cluster with no issues. > does the monitor need to inc

[ceph-users] Re: Luminous 12.2.12 "clients failing to respond to capability release"

2019-09-25 Thread Marc Roos
On the client I have this [@~]# cat /proc/sys/fs/file-nr 10368 0 381649 -Original Message- Subject: [ceph-users] Luminous 12.2.12 "clients failing to respond to capability release" I am getting this error I have in two sessions[0] num_caps high ( I assume the error is ab

[ceph-users] Announcing Ceph Buenos Aires 2019 on Oct 16th at Museo de Informatica

2019-09-25 Thread Victoria Martinez de la Cruz
Hi all, I'm happy to announce that next Oct 16th we will have the Ceph Day Argentina in Buenos Aires. The event will be held in the Museo de Informatica de Argentina, so apart from hearing the latest features from core developers, real use cases from our users and usage experiences from customers

[ceph-users] how many monitor should to deploy in a 1000+ osd cluster

2019-09-25 Thread 展荣臻(信泰)
hi all: I have a  production cluster, and it had 24 hosts (528 osds,3mons) at a former. Now we want to add 36 hosts so the osd increase to 1320 .  does the monitor need to increase?how many numbers of monitor node is recommended?        Another question is which monitor does monclient  c

[ceph-users] Luminous 12.2.12 "clients failing to respond to capability release"

2019-09-25 Thread Marc Roos
I am getting this error I have in two sessions[0] num_caps high ( I assume the error is about num_caps ). I am using a default luminous and a default centos7 with default kernel 3.10. Do I really still need to change to a not stock kernel to resolve this? I read this in posts of 2016 and 2

[ceph-users] Re: Wrong %USED and MAX AVAIL stats for pool

2019-09-25 Thread Wido den Hollander
On 9/25/19 3:22 PM, nalexand...@innologica.com wrote: > Hi everyone, > > We are running Nautilus 14.2.2 with 6 nodes and a total of 44 OSDs, all are > 2TB spinning disks. > # ceph osd count-metadata osd_objectstore > "bluestore": 44 > # ceph osd pool get one size > size: 3 > # ceph df > R

[ceph-users] Slow Write Issues

2019-09-25 Thread João Victor Rodrigues Soares
Hello, In my company, we currently have the following infrastructure: - Ceph Luminous - OpenStack Pike. We have a cluster of 3 osd nodes with the following configuration: - 1 x Xeon (R) D-2146NT CPU @ 2.30GHz - 128GB RAM - 128GB ROOT DISK - 12 x 10TB SATA ST1NM0146 (OSD) - 1 x Intel Optane

[ceph-users] Ceph NIC partitioning (NPAR)

2019-09-25 Thread Adrien Georget
Hi, I need your advice about the following setup. Currently, we have a Ceph nautilus cluster used by Openstack Cinder with single NIC in 10Gbps on OSD hosts. We will upgrade the cluster by adding 7 new hosts dedicated to Nova/Glance and we would like to add a cluster network to isolate replica

[ceph-users] Wrong %USED and MAX AVAIL stats for pool

2019-09-25 Thread nalexandrov
Hi everyone, We are running Nautilus 14.2.2 with 6 nodes and a total of 44 OSDs, all are 2TB spinning disks. # ceph osd count-metadata osd_objectstore "bluestore": 44 # ceph osd pool get one size size: 3 # ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW

[ceph-users] Re: OSD rebalancing issue - should drives be distributed equally over all nodes

2019-09-25 Thread Thomas
Hi Reed, I'm not sure what is meant with the grouping / chassis and "set your failure domain to chassis" respectively. This is my current crush map: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_onc

[ceph-users] Re: verify_upmap number of buckets 5 exceeds desired 4

2019-09-25 Thread Eric Dold
After updating the CRUSH rule from rule cephfs_ec { id 1 type erasure min_size 8 max_size 8 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step choose indep 4 type host step choose indep 2 type osd step emit } to rule cephfs_ec {