date:20190717

Re: [ceph-users] HEALTH_WARN 1 MDSs report slow metadata IOs

2019-07-17 Thread Yan, Zheng

Check if there is any hang request in 'ceph daemon mds.xxx objecter_requests' On Tue, Jul 16, 2019 at 11:51 PM Dietmar Rieder wrote: > > On 7/16/19 4:11 PM, Dietmar Rieder wrote: > > Hi, > > > > We are running ceph version 14.1.2 with cephfs only. > > > > I just noticed that one of our pgs had s

Re: [ceph-users] HEALTH_WARN 1 MDSs report slow metadata IOs

2019-07-17 Thread Dietmar Rieder

Hi, thanks for the hint!! This did it. I indeed found stuck requests using "ceph daemon mds.xxx objecter_requests". I then restarted the osds involved in those requests one by one and now the problems are gone and the status is back to HEALTH_OK. Thanks again Dietmar On 7/17/19 9:08 AM, Yan,

[ceph-users] deep-scrub : stat mismatch

2019-07-17 Thread Ashley Merrick

Hey, I have a PG that after a deep-scrub it shows the following output: 3.0 deep-scrub : stat mismatch, got 23/24 objects, 0/0 clones, 23/24 dirty, 23/24 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 0/0 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes. This is on a metadat

Re: [ceph-users] cephfs snapshot scripting questions

2019-07-17 Thread Marc Roos

H, ok ok, test it first, can't remember if it is finished. Checks also if it is usefull to create a snapshot, by checking the size of the directory. [@ cron.daily]# cat backup-archive-mail.sh #!/bin/bash cd /home/ for account in `ls -c1 /home/mail-archive/ | sort` do /usr/local/sbin/ba

Re: [ceph-users] Random slow requests without any load

2019-07-17 Thread Maximilien Cuony

Hello, Just a quick update about this if somebody else get the same issue: The problem was with the firewall. Port range and established connection are allowed, but for some reasons it seems the tracking of connections are lost, leading to a strange state where one machine refuse data (RST ar

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-17 Thread David C

Thanks for taking a look at this, Daniel. Below is the only interesting bit from the Ceph MDS log at the time of the crash but I suspect the slow requests are a result of the Ganesha crash rather than the cause of it. Copying the Ceph list in case anyone has any ideas. 2019-07-15 15:06:54.624007 7

Re: [ceph-users] Multisite RGW - endpoints configuration

2019-07-17 Thread P. O.

Hi, Is there any mechanism inside the rgw that can detect faulty endpoints for a configuration with multiple endpoints? Is there any advantage related with the number of replication endpoints? Can I expect improved replication performance (the more synchronization rgws = the faster replication)?

Re: [ceph-users] Multisite RGW - endpoints configuration

2019-07-17 Thread P. O.

Hi, Is there any mechanism inside the rgw that can detect faulty endpoints for a configuration with multiple endpoints? Is there any advantage related with the number of replication endpoints? Can I expect improved replication performance (the more synchronization rgws = the faster replication)?

Re: [ceph-users] New best practices for osds???

2019-07-17 Thread John Petrini

Dell has a whitepaper that compares Ceph performance using JBOD and RAID-0 per disk that recommends RAID-0 for HDD's: en.community.dell.com/techcenter/cloud/m/dell_cloud_resources/20442913/download After switching from JBOD to RAID-0 we saw a huge reduction in latency, the difference was much more

Re: [ceph-users] Random slow requests without any load

2019-07-17 Thread Kees Meijs

Hi, Experienced similar issues. Our cluster internal network (completely separated) now has NOTRACK (no connection state tracking) iptables rules. In full: > # iptables-save > # Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019 > *filter > :FORWARD DROP [0:0] > :OUTPUT ACCEPT [0:0] >

Re: [ceph-users] New best practices for osds???

2019-07-17 Thread Mark Nelson

Some of the first performance studies we did back at Inktank were looking at RAID-0 vs JBOD setups! :) You are absolutely right that the controller cache (especially write-back with a battery or supercap) can help with HDD-only configurations. Where we typically saw problems was when you load

Re: [ceph-users] New best practices for osds???

2019-07-17 Thread Lars Marowsky-Bree

On 2019-07-17T08:27:46, John Petrini wrote: The main problem we've observed is that not all HBAs can just efficiently and easily pass through disks 1:1. Some of those from a more traditional server background insist on having some form of mapping via RAID. In that case it depends on whether 1 di

[ceph-users] ceph mon crash - ceph mgr module ls -f plain

2019-07-17 Thread Oskar Malnowicz

Hello, when i execute the following command on one of my three ceph-mon, all ceph-mon crashes. ceph mgr module ls -f plain ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) 1: (()+0x12890) [0x7fcc5e5e3890] 2: (gsignal()+0xc7) [0x7fcc5d6dbe97] 3: (abort()+0x141)

Re: [ceph-users] New best practices for osds???

2019-07-17 Thread Maged Mokhtar

in most cases write back cache does help a lot for hdd write latency, either raid-0 or some Areca cards support write back in jbod mode. Our observation they could help by a 3-5x factor in Bluestore, whereas db/wal on flash will be about 2x, it does depend on hardware but in general we see bene

Re: [ceph-users] Multisite RGW - endpoints configuration

2019-07-17 Thread Casey Bodley

On 7/17/19 8:04 AM, P. O. wrote: Hi, Is there any mechanism inside the rgw that can detect faulty endpoints for a configuration with multiple endpoints? No, replication requests that fail just get retried using round robin until they succeed. If an endpoint isn't available, we assume it will

Re: [ceph-users] ceph mon crash - ceph mgr module ls -f plain

2019-07-17 Thread Sage Weil

Thanks, opened bug https://tracker.ceph.com/issues/40804. Fix should be trivial. sage On Wed, 17 Jul 2019, Oskar Malnowicz wrote: > Hello, > when i execute the following command on one of my three ceph-mon, all > ceph-mon crashes. > > ceph mgr module ls -f plain > > ceph version 14.2.1 (d55

Re: [ceph-users] ceph mon crash - ceph mgr module ls -f plain

2019-07-17 Thread Oskar Malnowicz

thx! Am 17.07.19 um 16:28 schrieb Sage Weil: > Thanks, opened bug https://tracker.ceph.com/issues/40804. Fix should be > trivial. > > sage > > On Wed, 17 Jul 2019, Oskar Malnowicz wrote: > >> Hello, >> when i execute the following command on one of my three ceph-mon, all >> ceph-mon crashes. >>

[ceph-users] disk usage reported incorrectly

2019-07-17 Thread Paul Mezzanini

Sometime after our upgrade to Nautilus our disk usage statistics went off the rails wrong. I can't tell you exactly when it broke but I know that after the initial upgrade it worked at least for a bit. Correct numbers should be something similar to: (These are copy/pasted from the autoscale-

Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Igor Fedotov

H Paul, there was a post from Sage named "Pool stats issue with upgrades to nautilus" recently. Perhaps that's the case if you add new OSD or repair existing one... Thanks, Igor On 7/17/2019 6:29 PM, Paul Mezzanini wrote: Sometime after our upgrade to Nautilus our disk usage statistics

Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Igor Fedotov

Forgot to provide a workaround... If that's the case then you need to repair each OSD with corresponding command in ceph-objectstore-tool... Thanks, Igor. On 7/17/2019 6:29 PM, Paul Mezzanini wrote: Sometime after our upgrade to Nautilus our disk usage statistics went off the rails wrong.

[ceph-users] MGR module config from ceph.conf

2019-07-17 Thread Oskar Malnowicz

Hello, is it possible to set key/values for mgr modules from a file (e.g ceph.con) instead of e.g. ceph config set mgr mgr/influx/ ? Thx, Oskar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multisite RGW - endpoints configuration

2019-07-17 Thread Peter Eisch

Hi, I also have been looking solutions for improving sync. I have two clusters, 25 ms RTT, with the RGW multi-site configured and all nodes running 12.2.12. I have three rgw nodes at each with the nodes behind haproxy at each site. There is a 1G circuit between the sites and bandwidth usage

Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Paul Mezzanini

Oh my. That's going to hurt with 788 OSDs. Time for some creative shell scripts and stepping through the nodes. I'll report back. -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute of Technol

Re: [ceph-users] enterprise support

2019-07-17 Thread Void Star Nill

Thanks everyone. Appreciate the inputs. Any feedback on support quality of these vendors? Croit, Mirantis, Redhat, Ubuntu? Anyone already using them (other than Robert)? Thanks, Shridhar On Mon, 15 Jul 2019 at 13:30, Robert LeBlanc wrote: > We recently used Croit (https://croit.io/) and they

Re: [ceph-users] disk usage reported incorrectly

2019-07-17 Thread Igor Fedotov

Fix is on its way too... See https://github.com/ceph/ceph/pull/28978 On 7/17/2019 8:55 PM, Paul Mezzanini wrote: Oh my. That's going to hurt with 788 OSDs. Time for some creative shell scripts and stepping through the nodes. I'll report back. -- Paul Mezzanini Sr Systems Administrator / E

[ceph-users] Allocation recommendations for separate blocks.db and WAL

2019-07-17 Thread Robert LeBlanc

So, I see the recommendation for 4% of OSD space for blocks.db/WAL and the corresponding discussion regrading the 3/30/300GB vs 6/60/600GB allocation. How does this change when WAL is seperate from blocks.db? Reading [0] it seems that 6/60/600 is not correct. It seems that to compact a 300GB DB,

[ceph-users] MON DNS Lookup & Version 2 Protocol

2019-07-17 Thread DHilsbos

All; I'm trying to firm up my understanding of how Ceph works, and ease of management tools and capabilities. I stumbled upon this: http://docs.ceph.com/docs/nautilus/rados/configuration/mon-lookup-dns/ It got me wondering; how do you convey protocol version 2 capabilities in this format? Th

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-17 Thread Jeff Layton

This is almost certainly the same bug that is fixed here: https://github.com/ceph/ceph/pull/28324 It should get backported soon-ish but I'm not sure which luminous release it'll show up in. Cheers, Jeff On Wed, 2019-07-17 at 10:36 +0100, David C wrote: > Thanks for taking a look at this, Daniel

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-07-17 Thread Jeff Layton

Ahh, I just noticed you were running nautilus on the client side. This patch went into v14.2.2, so once you update to that you should be good to go. -- Jeff On Wed, 2019-07-17 at 17:10 -0400, Jeff Layton wrote: > This is almost certainly the same bug that is fixed here: > > https://github.com/ce

[ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-17 Thread Ravi Patel

Hello, We have deployed ceph cluster and we are trying to debug a massive drop in performance between the RADOS layer vs the RGW layer ## Cluster config 4 OSD nodes (12 Drives each, NVME Journals, 1 SSD drive) 40GbE NIC 2 RGW nodes ( DNS RR load balancing) 40GbE NIC 3 MON nodes 1 GbE NIC ## Pool

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-17 Thread Robert LeBlanc

I'm pretty new to RGW, but I'm needing to get max performance as well. Have you tried moving your RGW metadata pools to nvme? Carve out a bit of NVMe space and then pin the pool to the SSD class in CRUSH, that way the small metadata ops aren't on slow media. Robert LeBlanc PGP Fing

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-17 Thread Ravi Patel

We’ve been debugging this a while. The data pool was originally EC backed with the bucket indexes on HDD pools. Moving the metadata to SSD backed pools improved usability and consistency and the change from EC to replicated improved the rados layer iops by 4x, but didn't seem to affect rgw IOPS

Re: [ceph-users] HEALTH_WARN 1 MDSs report slow metadata IOs

Re: [ceph-users] HEALTH_WARN 1 MDSs report slow metadata IOs

[ceph-users] deep-scrub : stat mismatch

Re: [ceph-users] cephfs snapshot scripting questions

Re: [ceph-users] Random slow requests without any load

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

Re: [ceph-users] Multisite RGW - endpoints configuration

Re: [ceph-users] Multisite RGW - endpoints configuration

Re: [ceph-users] New best practices for osds???

Re: [ceph-users] Random slow requests without any load

Re: [ceph-users] New best practices for osds???

Re: [ceph-users] New best practices for osds???

[ceph-users] ceph mon crash - ceph mgr module ls -f plain

Re: [ceph-users] New best practices for osds???

Re: [ceph-users] Multisite RGW - endpoints configuration

Re: [ceph-users] ceph mon crash - ceph mgr module ls -f plain

Re: [ceph-users] ceph mon crash - ceph mgr module ls -f plain

[ceph-users] disk usage reported incorrectly

Re: [ceph-users] disk usage reported incorrectly

Re: [ceph-users] disk usage reported incorrectly

[ceph-users] MGR module config from ceph.conf

Re: [ceph-users] Multisite RGW - endpoints configuration

Re: [ceph-users] disk usage reported incorrectly

Re: [ceph-users] enterprise support

Re: [ceph-users] disk usage reported incorrectly

[ceph-users] Allocation recommendations for separate blocks.db and WAL

[ceph-users] MON DNS Lookup & Version 2 Protocol

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

[ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

32 matches

Site Navigation

Mail list logo

Footer information