[ceph-users] Deep scrub distribution

2017-07-05 Thread Adrian Saul
During a recent snafu with a production cluster I disabled scrubbing and deep scrubbing in order to reduce load on the cluster while things backfilled and settled down. The PTSD caused by the incident meant I was not keen to re-enable it until I was confident we had fixed the root cause of the

[ceph-users] CDM APAC

2017-07-05 Thread Patrick McGarry
Hey cephers, While tonight was my last CDM, I am considering a recommendation to my replacement that we stop the alternating time zone and just standardize on a NA/EMEA time slot. Attendance typically is much better (even amongst APAC developers) during the NA/EMEA times. So, if you want us to co

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Maged Mokhtar
On 2017-07-05 23:22, David Clarke wrote: > On 07/05/2017 08:54 PM, Massimiliano Cuttini wrote: > >> Dear all, >> >> luminous is coming and sooner we should be allowed to avoid double writing. >> This means use 100% of the speed of SSD and NVMe. >> Cluster made all of SSD and NVMe will not be pe

Re: [ceph-users] How to force "rbd unmap"

2017-07-05 Thread Maged Mokhtar
On 2017-07-05 20:42, Ilya Dryomov wrote: > On Wed, Jul 5, 2017 at 8:32 PM, David Turner wrote: > >> I had this problem occasionally in a cluster where we were regularly mapping >> RBDs with KRBD. Something else we saw was that after this happened for >> un-mapping RBDs, was that it would start

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread David Clarke
On 07/05/2017 08:54 PM, Massimiliano Cuttini wrote: > Dear all, > > luminous is coming and sooner we should be allowed to avoid double writing. > This means use 100% of the speed of SSD and NVMe. > Cluster made all of SSD and NVMe will not be penalized and start to make > sense. > > Looking forwa

Re: [ceph-users] How to force "rbd unmap"

2017-07-05 Thread Ilya Dryomov
On Wed, Jul 5, 2017 at 8:32 PM, David Turner wrote: > I had this problem occasionally in a cluster where we were regularly mapping > RBDs with KRBD. Something else we saw was that after this happened for > un-mapping RBDs, was that it would start preventing mapping some RBDs as > well. We were a

Re: [ceph-users] How to force "rbd unmap"

2017-07-05 Thread David Turner
I had this problem occasionally in a cluster where we were regularly mapping RBDs with KRBD. Something else we saw was that after this happened for un-mapping RBDs, was that it would start preventing mapping some RBDs as well. We were able to use strace and kill the sub-thread that was stuck to a

Re: [ceph-users] How to force "rbd unmap"

2017-07-05 Thread Ilya Dryomov
On Wed, Jul 5, 2017 at 7:55 PM, Stanislav Kopp wrote: > Hello, > > I have problem that sometimes I can't unmap rbd device, I get "sysfs > write failed rbd: unmap failed: (16) Device or resource busy", there > is no open files and "holders" directory is empty. I saw on the > mailling list that you

[ceph-users] How to force "rbd unmap"

2017-07-05 Thread Stanislav Kopp
Hello, I have problem that sometimes I can't unmap rbd device, I get "sysfs write failed rbd: unmap failed: (16) Device or resource busy", there is no open files and "holders" directory is empty. I saw on the mailling list that you can "force" unmapping the device, but I cant find how does it work

Re: [ceph-users] Mon stuck in synchronizing after upgrading from Hammer to Jewel

2017-07-05 Thread David Turner
Did you make sure that your upgraded mon was chown'd to ceph:ceph? On Wed, Jul 5, 2017, 1:54 AM jiajia zhong wrote: > refer to http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/ > > I recalled we encoutered the same issue after upgrading to Jewel :(. > > 2017-07-05 11:21 GMT+08:00

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Wido den Hollander
> Op 5 juli 2017 om 12:39 schreef c...@jack.fr.eu.org: > > > Beware, a single 10G NIC is easily saturated by a single NVMe device > Yes, it is. But that what was what I'm pointing at. Bandwidth is usually not a problem, latency is. Take a look at a Ceph cluster running out there, it is proba

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread ceph
Beware, a single 10G NIC is easily saturated by a single NVMe device On 05/07/2017 11:54, Wido den Hollander wrote: > >> Op 5 juli 2017 om 11:41 schreef "Van Leeuwen, Robert" >> : >> >> >> Hi Max, >> >> You might also want to look at the PCIE lanes. >> I am not an expert on the matter but my gue

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Blair Bethwaite
On 5 July 2017 at 19:54, Wido den Hollander wrote: > I'd probably stick with 2x10Gbit for now and use the money I saved on more > memory and faster CPUs. > On the latency point. - you will get an improvement going from 10Gb to 25Gb, but stepping up to 100Gb won't significantly change things as 1

Re: [ceph-users] ceph-mon leader election problem, should it be improved ?

2017-07-05 Thread Joao Eduardo Luis
On 07/05/2017 08:01 AM, Z Will wrote: Hi Joao: I think this is all because we choose the monitor with the smallest rank number to be leader. For this kind of network error, no matter which mon has lost connection with the mon who has the smallest rank num , will be constantly calling an elec

Re: [ceph-users] bluestore behavior on disks sector read errors

2017-07-05 Thread Wido den Hollander
> Op 27 juni 2017 om 11:17 schreef SCHAER Frederic : > > > Hi, > > Every now and then , sectors die on disks. > When this happens on my bluestore (kraken) OSDs, I get 1 PG that becomes > degraded. > The exact status is : > > > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors > > pg 12.127 is a

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Wido den Hollander
> Op 5 juli 2017 om 11:41 schreef "Van Leeuwen, Robert" : > > > Hi Max, > > You might also want to look at the PCIE lanes. > I am not an expert on the matter but my guess would be the 8 NVME drives + > 2x100Gbit would be too much for > the current Xeon generation (40 PCIE lanes) to fully utili

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread ceph
Interesting point, 100Gbps PCI is x16, NVMe is x4, that's 64 PCIe lanes required Should work at fullrate on a dual-socket server On 05/07/2017 11:41, Van Leeuwen, Robert wrote: > Hi Max, > > You might also want to look at the PCIE lanes. > I am not an expert on the matter but my guess would be t

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Van Leeuwen, Robert
Hi Max, You might also want to look at the PCIE lanes. I am not an expert on the matter but my guess would be the 8 NVME drives + 2x100Gbit would be too much for the current Xeon generation (40 PCIE lanes) to fully utilize. I think the upcoming AMD/Intel offerings will improve that quite a bit s

Re: [ceph-users] Bucket resharding: "radosgw-admin bi list" ERROR

2017-07-05 Thread Andreas Calminder
Sure thing! I noted the new and old bucket instance id. backup the bucket metadata # radosgw-admin --cluster ceph-prod metadata get bucket:1001/large_bucket > large_bucket.metadata.bak.json # cp large_bucket.metadata.bak.json large_bucket.metadata.patched.json set bucket_id in large_bucket.metad

Re: [ceph-users] dropping filestore+btrfs testing for luminous

2017-07-05 Thread Lars Marowsky-Bree
On 2017-06-30T16:48:04, Sage Weil wrote: > > Simply disabling the tests while keeping the code in the distribution is > > setting up users who happen to be using Btrfs for failure. > > I don't think we can wait *another* cycle (year) to stop testing this. > > We can, however, > > - prominentl

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Maxime Guyot
Hi Massimiliano, I am a little surprised to see 6x NVMe, 64GB of RAM, 2x100 NICs and E5-2603 v4, that's one of the cheapest E5 Intel CPU mixed with some pretty high end gear, it does not make sense. Wildo's right go with much higher frequency: E5-2637 v4, E5-2643 v4, E5-1660 v4, E5-1650 v4. If you

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread ceph
You will need CPUs as well if you want to push/fetch 200Gbps 2603 is really too short (not really an issue, but NVMe for OS seems useless to me) On 05/07/2017 11:02, Wido den Hollander wrote: > >> Op 5 juli 2017 om 10:54 schreef Massimiliano Cuttini : >> >> >> Dear all, >> >> luminous is coming

[ceph-users] Massive slowrequests causes OSD daemon to eat whole RAM

2017-07-05 Thread pwoszuk
Hello We have a cluster of 10 ceph servers. On that cluster there are EC pool with replicated SSD cache tier, used by OpenStack Cinder for volumes storage for production environment. From 2 days we observe messages like this in logs: 2017-07-05 10:50:13.451987 osd.114 [WRN] slow request 1165

Re: [ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Wido den Hollander
> Op 5 juli 2017 om 10:54 schreef Massimiliano Cuttini : > > > Dear all, > > luminous is coming and sooner we should be allowed to avoid double writing. > This means use 100% of the speed of SSD and NVMe. > Cluster made all of SSD and NVMe will not be penalized and start to make > sense. > >

Re: [ceph-users] Bucket resharding: "radosgw-admin bi list" ERROR

2017-07-05 Thread Maarten De Quick
Hi Orit, We're running on jewel, version 10.2.7. I've ran the bi-list with the debugging commands and this is the end of it: *2017-07-05 08:50:19.705673 7ff3bfefe700 1 -- 10.21.4.1:0/3313807338 <== osd.3 10.21.4.

[ceph-users] New cluster - configuration tips and reccomendation - NVMe

2017-07-05 Thread Massimiliano Cuttini
Dear all, luminous is coming and sooner we should be allowed to avoid double writing. This means use 100% of the speed of SSD and NVMe. Cluster made all of SSD and NVMe will not be penalized and start to make sense. Looking forward I'm building the next pool of storage which we'll setup on ne

Re: [ceph-users] Bucket resharding: "radosgw-admin bi list" ERROR

2017-07-05 Thread Maarten De Quick
Hi Andreas, Interesting as we are also on Jewel 10.2.7. We do care about the data in the bucket so we really need the reshard process to run properly :). Could you maybe share how you linked the bucket to the new index by hand? That would already give me some extra insight. Thanks! Regards, Maart

Re: [ceph-users] Bucket resharding: "radosgw-admin bi list" ERROR

2017-07-05 Thread Andreas Calminder
Hi, I had a similar problem while resharding an oversized non-sharded bucket in Jewel (10.2.7), the bi_list exited with ERROR: bi_list(): (4) Interrupted system call at, what seemed like the very end of the operation. I went ahead and resharded the bucket anyway and the reshard process ended the sa

Re: [ceph-users] ceph-mon leader election problem, should it be improved ?

2017-07-05 Thread Z Will
Hi Joao: I think this is all because we choose the monitor with the smallest rank number to be leader. For this kind of network error, no matter which mon has lost connection with the mon who has the smallest rank num , will be constantly calling an election, that say ,will constantly affact t