Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-17 Thread Steve Dainard
ich had been updated a couple times) and >>>> noticed the crush map had 'tunable straw_calc_version 1' so I added it >>>> to the current cluster. >>>> >>>> After the data moved around for about 8 hours or so I'm left with this >>>

Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-13 Thread Steve Dainard
ted a pg repair on both of the pg's listed above, but it >> doesn't look like anything is happening. The doc's reference an >> inconsistent state as a use case for the repair command so that's >> likely why. >> >> These 2 pg's have been the iss

Re: [ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-13 Thread Steve Dainard
jS # ceph pg 2.e7f query: http://pastebin.com/0ntBfFK5 On Wed, Aug 12, 2015 at 6:52 PM, yangyongp...@bwstor.com.cn wrote: > You can try "ceph pg repair pg_id"to repair the unhealth pg."ceph health > detail" command is very useful to detect unhealth pgs. > > ______

[ceph-users] Cluster health_warn 1 active+undersized+degraded/1 active+remapped

2015-08-12 Thread Steve Dainard
I ran a ceph osd reweight-by-utilization yesterday and partway through had a network interruption. After the network was restored the cluster continued to rebalance but this morning the cluster has stopped rebalance and status will not change from: # ceph status cluster af859ff1-c394-4c9a-95e2

Re: [ceph-users] ceph tell not persistent through reboots?

2015-08-06 Thread Steve Dainard
rs [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Steve Dainard > Sent: Thursday, August 06, 2015 9:16 PM > To: ceph-users@lists.ceph.com > Subject: [ceph-users] ceph tell not persistent through reboots? > > Hello, > > Version 0.94.1 > > I'm passing

[ceph-users] Direct IO tests on RBD device vary significantly

2015-08-06 Thread Steve Dainard
Trying to get an understanding why direct IO would be so slow on my cluster. Ceph 0.94.1 1 Gig public network 10 Gig public network 10 Gig cluster network 100 OSD's, 4T disk sizes, 5G SSD journal. As of this morning I had no SSD journal and was finding direct IO was sub 10MB/s so I decided to ad

[ceph-users] ceph tell not persistent through reboots?

2015-08-06 Thread Steve Dainard
Hello, Version 0.94.1 I'm passing settings to the admin socket ie: ceph tell osd.* injectargs '--osd_deep_scrub_begin_hour 20' ceph tell osd.* injectargs '--osd_deep_scrub_end_hour 4' ceph tell osd.* injectargs '--osd_deep_scrub_interval 1209600' Then I check to see if they're in the configs now

Re: [ceph-users] Meanning of ceph perf dump

2015-07-24 Thread Steve Dainard
Hi Somnath, Do you have a link with the definitions of all the perf counters? Thanks, Steve On Sun, Jul 5, 2015 at 11:23 AM, Somnath Roy wrote: > Hi Ray, > > Here is the description of the different latencies under filestore perf > counters. > > > > Journal_latency : > > --

Re: [ceph-users] Workaround for RHEL/CentOS 7.1 rbdmap service start warnings?

2015-07-17 Thread Steve Dainard
Other than those errors, do you find RBD's will not be unmapped on system restart/shutdown on a machine using systemd? Leaving the system hanging without network connections trying to unmap RBD's? That's been my experience thus far, so I wrote an (overly simple) systemd file to handle this on a pe

Re: [ceph-users] Deadly slow Ceph cluster revisited

2015-07-17 Thread Steve Dainard
Disclaimer: I'm relatively new to ceph, and haven't moved into production with it. Did you run your bench for 30 seconds? For reference my bench from a VM bridged to a 10Gig card with 90x4TB at 30 seconds is: Total time run: 30.766596 Total writes made: 1979 Write size:

[ceph-users] Unsetting osd_crush_chooseleaf_type = 0

2015-07-16 Thread Steve Dainard
I originally built a single node cluster, and added 'osd_crush_chooseleaf_type = 0 #0 is for one node cluster' to ceph.conf (which is now commented out). I've now added a 2nd node, where can I set this value to 1? I see in the crush map that the osd's are under 'host' buckets and don't see any ref

Re: [ceph-users] Health WARN, ceph errors looping

2015-07-07 Thread Steve Dainard
ff 2015-07-07 10:51:57.611297) Thats just a small section, but multiple osd's are listed. eventually the logs are rate limited because they're coming in so fast. On Tue, Jul 7, 2015 at 10:13 AM, Abhishek L wrote: > > Steve Dainard writes: > >> Hello, >> >> Ce

[ceph-users] Health WARN, ceph errors looping

2015-07-07 Thread Steve Dainard
Hello, Ceph 0.94.1 2 hosts, Centos 7 I have two hosts, one which ran out of / disk space which crashed all the osd daemons. After cleaning up the OS disk storage and restarting ceph on that node, I'm seeing multiple errors, then health OK, then back into the errors: # ceph -w http://pastebin.com

[ceph-users] Can't mount btrfs volume on rbd

2015-06-11 Thread Steve Dainard
Hello, I'm getting an error when attempting to mount a volume on a host that was forceably powered off: # mount /dev/rbd4 climate-downscale-CMIP5/ mount: mount /dev/rbd4 on /mnt/climate-downscale-CMIP5 failed: Stale file handle /var/log/messages: Jun 10 15:31:07 node1 kernel: rbd4: unknown parti