Re: [ceph-users] I have PGs that I can't deep-scrub

2014-07-10 Thread Chris Dunlop
Hi Craig, On Thu, Jul 10, 2014 at 03:09:51PM -0700, Craig Lewis wrote: > I fixed this issue by reformatting all of the OSDs. I changed the mkfs > options from > > [osd] > osd mkfs type = xfs > osd mkfs options xfs = -l size=1024m -n size=64k -i size=2048 -s size=4096 > > to > [osd] > osd

Re: [ceph-users] Rbd cp empty block

2013-09-16 Thread Chris Dunlop
On Mon, Sep 16, 2013 at 09:20:29AM +0800, 王根意 wrote: > Hi all: > > I have a 30G rbd block device as virtual machine disk, Aleady installed > ubuntu 12.04. About 1G space used. > > When I want to deploy vm, I made a "rbd cp". Then problem came, it copy 30G > data instead of 1G. And this action tak

Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-18 Thread Chris Dunlop
Hi David, On Fri, Nov 15, 2013 at 10:00:37AM -0800, David Zafman wrote: > > Replication does not occur until the OSD is “out.” This creates a new > mapping in the cluster of where the PGs should be and thus data begins to > move and/or create sufficient copies. This scheme lets you control ho

Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-18 Thread Chris Dunlop
al with that if the object is on the primary is to remove the > file manually from the OSD’s filesystem and perform a repair of the PG that > holds that object. This will copy the object back from one of the replicas. > > David > > On Nov 17, 2013, at 10:46 PM, Chris Dunlop wrote

Re: [ceph-users] HDD bad sector, pg inconsistent, no object remapping

2013-11-18 Thread Chris Dunlop
http://www.inktank.com > > > > > On Nov 18, 2013, at 1:11 PM, Chris Dunlop wrote: > >> OK, that's good (as far is it goes, being a manual process). >> >> So then, back to what I think was Mihály's original issue: >> >>> pg repair o

[ceph-users] Replace OSD with larger

2014-03-04 Thread Chris Dunlop
Hi, What is the recommended procedure for replacing an osd with a larger osd in a safe and efficient manner, i.e. whilst maintaining redundancy and causing the least data movement? Would this be a matter of adding the new osd into the crush map whilst reducing the weight of the old osd to zero, t

[ceph-users] Increasing pg_num

2016-05-15 Thread Chris Dunlop
Hi, I'm trying to understand the potential impact on an active cluster of increasing pg_num/pgp_num. The conventional wisdom, as gleaned from the mailing lists and general google fu, seems to be to increase pg_num followed by pgp_num, both in small increments, to the target size, using "osd max b

Re: [ceph-users] v0.94.7 Hammer released

2016-05-15 Thread Chris Dunlop
On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote: > This Hammer point release fixes several minor bugs. It also includes a > backport of an improved ‘ceph osd reweight-by-utilization’ command for > handling OSDs with higher-than-average utilizations. > > We recommend that all hammer v0.

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
On Tue, May 17, 2016 at 08:21:48AM +0900, Christian Balzer wrote: > On Mon, 16 May 2016 22:40:47 +0200 (CEST) Wido den Hollander wrote: > > > > pg_num is the actual amount of PGs. This you can increase without any > > actual data moving. > > Yes and no. > > Increasing the pg_num will split PGs, w

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
On Mon, May 16, 2016 at 10:40:47PM +0200, Wido den Hollander wrote: > > Op 16 mei 2016 om 7:56 schreef Chris Dunlop : > > Why do we have both pg_num and pgp_num? Given the docs say "The pgp_num > > should be equal to the pg_num": under what circumstances might you wan

Re: [ceph-users] Increasing pg_num

2016-05-16 Thread Chris Dunlop
Hi Christian, On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote: > On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote: > Most your questions would be easily answered if you did spend a few > minutes with even the crappiest test cluster and observing things (with >

Re: [ceph-users] 0.56.3 OSDs wrongly marked down and cluster unresponsiveness

2013-02-28 Thread Chris Dunlop
On Thu, Feb 28, 2013 at 01:44:28PM -0800, Nick Bartos wrote: > When a single high I/O event (in this case a cp of a 10G file on a > filesystem mounted on an rbd) occurs, I'm having the 2 OSDs that > reside on the same system where the rbd is mounted being marked down > when it appears that they sho

Re: [ceph-users] journal on ramdisk for testing

2013-04-25 Thread Chris Dunlop
G'day James, On Thu, Apr 25, 2013 at 07:39:27AM +, James Harper wrote: > I'm doing some testing and wanted to see the effect of increasing journal > speed, and the fastest way to do this seemed to be to put it on a ramdisk > where latency should drop to near zero and I can see what other >

[ceph-users] Journal flushed on osd clean shutdown?

2018-06-13 Thread Chris Dunlop
Hi, Is the osd journal flushed completely on a clean shutdown? In this case, with Jewel, and FileStore osds, and a "clean shutdown" being: systemctl stop ceph-osd@${osd} I understand it's documented practice to issue a --flush-journal after shutting down down an osd if you're intending to d

Re: [ceph-users] Journal flushed on osd clean shutdown?

2018-06-13 Thread Chris Dunlop
Excellent news - tks! On Wed, Jun 13, 2018 at 11:50:15AM +0200, Wido den Hollander wrote: On 06/13/2018 11:39 AM, Chris Dunlop wrote: Hi, Is the osd journal flushed completely on a clean shutdown? In this case, with Jewel, and FileStore osds, and a "clean shutdown" being: It i

[ceph-users] All pgs stuck peering

2015-12-13 Thread Chris Dunlop
Hi, ceph 0.94.5 After restarting one of our three osd hosts to increase the RAM and change from linux 3.18.21 to 4.1., the cluster is stuck with all pgs peering: # ceph -s cluster c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 health HEALTH_WARN 3072 pgs peering 3072 pgs s

Re: [ceph-users] All pgs stuck peering

2015-12-13 Thread Chris Dunlop
Hi Varada, On Mon, Dec 14, 2015 at 03:23:20AM +, Varada Kari wrote: > Can get the details of > > 1. ceph health detail > 2. ceph pg query > > of any one PG stuck peering > > > Varada The full health detail is over 9000 lines, but here's a summary: # ceph health detail | head HEALTH_WA

Re: [ceph-users] All pgs stuck peering

2015-12-13 Thread Chris Dunlop
On Sun, Dec 13, 2015 at 09:10:34PM -0700, Robert LeBlanc wrote: > I've had something similar to this when there was an MTU mismatch, the > smaller I/O would get through, but the larger I/O would be blocked and > prevent peering. > > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 4

Re: [ceph-users] All pgs stuck peering

2015-12-14 Thread Chris Dunlop
On Mon, Dec 14, 2015 at 09:29:20PM +0800, Jaze Lee wrote: > Should we add big packet test in heartbeat? Right now the heartbeat > only test the little packet. If the MTU is mismatched, the heartbeat > can not find that. It would certainly have saved me a great deal of stress! I imagine you wouldn

Re: [ceph-users] Cephfs: large files hang

2015-12-17 Thread Chris Dunlop
Hi Bryan, Have you checked your MTUs? I was recently bitten by large packets not getting through where small packets would. (This list, Dec 14, "All pgs stuck peering".) Small files working but big files not working smells like it could be a similar problem. Cheers, Chris On Thu, Dec 17, 2015

Re: [ceph-users] pg stuck in peering state

2015-12-18 Thread Chris Dunlop
Hi Reno, "Peering", as far as I understand it, is the osds trying to talk to each other. You have approximately 1 OSD worth of pgs stuck (i.e. 264 / 8), and osd.0 appears in each of the stuck pgs, alongside either osd.2 or osd.3. I'd start by checking the comms between osd.0 and osds 2 and 3 (in

Re: [ceph-users] v0.94.6 Hammer released

2016-03-01 Thread Chris Dunlop
Hi, The "old list of supported platforms" includes debian wheezy. Will v0.94.6 be built for this? Chris On Mon, Feb 29, 2016 at 10:57:53AM -0500, Sage Weil wrote: > The intention was to continue building stable releases (0.94.x) on the old > list of supported platforms (which inclues 12.04 and

Re: [ceph-users] v0.94.6 Hammer released

2016-03-09 Thread Chris Dunlop
Hi Loic, On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote: > I think you misread what Sage wrote : "The intention was to > continue building stable releases (0.94.x) on the old list of > supported platforms (which inclues 12.04 and el6)". In other > words, the old OS'es are still suppo

Re: [ceph-users] v0.94.6 Hammer released

2016-03-19 Thread Chris Dunlop
Hi Chen, On Thu, Mar 17, 2016 at 12:40:28AM +, Chen, Xiaoxi wrote: > It’s already there, in > http://download.ceph.com/debian-hammer/pool/main/c/ceph/. I can only see ceph*_0.94.6-1~bpo80+1_amd64.deb there. Debian wheezy would be bpo70. Cheers, Chris > On 3/17/16, 7:20 AM, &quo

Re: [ceph-users] v0.94.6 Hammer released

2016-03-19 Thread Chris Dunlop
Hi Stable Release Team for v0.94, On Thu, Mar 10, 2016 at 11:00:06AM +1100, Chris Dunlop wrote: > On Wed, Mar 02, 2016 at 06:32:18PM +0700, Loic Dachary wrote: >> I think you misread what Sage wrote : "The intention was to >> continue building stable releases (0.94.x

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Stable Release Team for v0.94, Let's try again... Any news on a release of v0.94.6 for debian wheezy (bpo70)? Cheers, Chris On Thu, Mar 17, 2016 at 12:43:15PM +1100, Chris Dunlop wrote: > Hi Chen, > > On Thu, Mar 17, 2016 at 12:40:28AM +, Chen, Xiaoxi wrote: >> It

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Loïc, On Wed, Mar 23, 2016 at 12:14:27AM +0100, Loic Dachary wrote: > On 22/03/2016 23:49, Chris Dunlop wrote: >> Hi Stable Release Team for v0.94, >> >> Let's try again... Any news on a release of v0.94.6 for debian wheezy >> (bpo70)? > > I don'

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
Hi Loïc, On Wed, Mar 23, 2016 at 01:03:06AM +0100, Loic Dachary wrote: > On 23/03/2016 00:39, Chris Dunlop wrote: >> "The old OS'es" that were being supported up to v0.94.5 includes debian >> wheezy. It would be quite surprising and unexpected to drop support f

Re: [ceph-users] v0.94.6 Hammer released

2016-03-22 Thread Chris Dunlop
On Wed, Mar 23, 2016 at 01:22:45AM +0100, Loic Dachary wrote: > On 23/03/2016 01:12, Chris Dunlop wrote: >> On Wed, Mar 23, 2016 at 01:03:06AM +0100, Loic Dachary wrote: >>> On 23/03/2016 00:39, Chris Dunlop wrote: >>>> "The old OS'es" that were