Re: [ceph-users] ceph-mgr fails to restart after upgrade to mimic

2019-01-04 Thread Steve Taylor
rors it might be something to check out. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | _

Re: [ceph-users] Balancer module not balancing perfectly

2018-11-06 Thread Steve Taylor
this upstream. I haven't reviewed the balancer module code to see how it's doing things, but assuming it uses osdmaptool or the same upmap code as osdmaptool this should also improve the balancer module. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3

Re: [ceph-users] Balancer module not balancing perfectly

2018-10-31 Thread Steve Taylor
full but will not be made overfull by the move to take new PGs? That seems like it should be the expected behavior in this scenario. Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If

Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
nd appear in the config-key dump. It seems to be picking up the other config options correctly. Am I doing something wrong? I feel like I must have a typo or something, but I'm not seeing it. Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite

Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
I had played with those settings some already, but I just tried again with max_deviation set to 0.0001 and max_iterations set to 1000. Same result. Thanks for the suggestion though. Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300 | Draper

[ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
case that was preventing further optimization, but I still get the same error. I'm sure I'm missing some config option or something that will allow it to do better, but thus far I haven't been able to find anything in the docs, mailing list archives, or balancer source code that he

Re: [ceph-users] Strange Ceph host behaviour

2018-10-02 Thread Steve Taylor
Unless this is related to load and OSDs really are unreponsive, it is almost certainly some sort of network issue. Duplicate IP address maybe? Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799

Re: [ceph-users] move rbd image (with snapshots) to different pool

2018-06-15 Thread Steve Taylor
I have done this with Luminous by deep-flattening a clone in a different pool. It seemed to do what I wanted, but the RBD appeared to lose its sparseness in the process. Can anyone verify that and/or comment on whether Mimic's "rbd deep copy" does the same? Steve Taylor |

Re: [ceph-users] osds with different disk sizes may killing performance

2018-04-12 Thread Steve Taylor
That's the trade-off. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If you ar

Re: [ceph-users] Reweight 0 - best way to backfill slowly?

2018-01-29 Thread Steve Taylor
kfills set to 1, then you need to add some new OSDs to your cluster before doing any of this. They will have to backfill too, but at least you'll have more spindles to handle it. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] S

Re: [ceph-users] upgrade Hammer>Jewel>Luminous OSD fail to start

2017-09-12 Thread Steve Taylor
100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If you are not the intend

Re: [ceph-users] Power outages!!! help!

2017-08-30 Thread Steve Taylor
disk" than dd? I'm always interested in learning about new recovery tools. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com>

Re: [ceph-users] Power outages!!! help!

2017-08-30 Thread Steve Taylor
RBD in the second cluster, and used the kernel client for the dd, xfs_repair, and mount. Worked like a charm. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] Steve Taylor | Senior Software Engineer | StorageCraft Technology Corpora

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Steve Taylor
ies afforded by Ceph inception are endless. ☺ Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If you are not the intended recipient of this message or received it erroneously, please notify t

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Steve Taylor
nd your partition is gone, then it probably isn't worth wasting additional time on it. ____ [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https:/

Re: [ceph-users] how to fix X is an unexpected clone

2017-08-08 Thread Steve Taylor
would also be interesting, although troubleshooting further is probably less valid and less valuable now that you've resolved the problem. It's just a matter of curiosity at this point. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jp

Re: [ceph-users] Read errors on OSD

2017-06-01 Thread Steve Taylor
eck to make sure the kernel version you're running has the fix. [cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg] Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data

Re: [ceph-users] Question about unfound objects

2017-03-30 Thread Steve Taylor
immed, but I imagine there is some possibility that this issue is related to snap trimming or deleting snapshots. Just more information... On Thu, 2017-03-30 at 17:13 +0000, Steve Taylor wrote: Good suggestion, Nick. I actually did that at the time. The "ceph osd map" wasn't

Re: [ceph-users] Question about unfound objects

2017-03-30 Thread Steve Taylor
y map to. And/or maybe pg query might show something? Nick [cid:imagec0161b.JPG@d2cd1459.4ebbf9d5]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.8

[ceph-users] Question about unfound objects

2017-03-30 Thread Steve Taylor
resulted in unfound objects in an otherwise healthy cluster, correct? [cid:image575d42.JPG@8ddd3310.40afc06a]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 3

Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Steve Taylor
-> 5 OSDs. Right? > > So better add all new OSDs together on a specific server? > > Or not? :-) > > MJ > ____ [cid:imagec7a5fc.JPG@dc945914.44a32fb5]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft

Re: [ceph-users] KVM/QEMU rbd read latency

2017-02-16 Thread Steve Taylor
guest OS to normalize that some due to its own page cache in the librbd test, but that might at least give you some more clues about where to look further. [cid:imagea0af4f.JPG@e3d04a1e.44ace3d9]<https://storagecraft.com> Steve Taylor | Senior So

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Steve Taylor
. [cid:image1bd943.JPG@b026bd80.43945ba2]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Steve Taylor
as much as the subops are. Thoughts? [cid:image99464a.JPG@898dfa11.4e81d597]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper |

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Steve Taylor
tps://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If you are not the intended recipient of this message or

Re: [ceph-users] ***Suspected Spam*** dm-crypt journal replacement

2017-01-25 Thread Steve Taylor
al with ceph-osd, and you should be good to go. ________ [cid:image9f5ad9.JPG@cc8e4767.4394994f]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper |

Re: [ceph-users] 10.2.4 Jewel released

2016-12-07 Thread Steve Taylor
Not a single scrub in my case. [cid:imagee2204f.JPG@a12e09b2.4ebb7737]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper |

Re: [ceph-users] 10.2.4 Jewel released

2016-12-07 Thread Steve Taylor
I'm seeing the same behavior with very similar perf top output. One server with 32 OSDs has a load average approaching 800. No excessive memory usage and no iowait at all. [cid:imagea8a69a.JPG@f4e62cf1.419383aa]<https://storagecraft.com> St

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-30 Thread Steve Taylor
to each OSD’s filestore mount point. [cid:image71f234.JPG@2c6ee238.46ab8bf6]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper |

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-30 Thread Steve Taylor
On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor mailto:steve.tay...@storagecraft.com>> wrote: We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with 1M blocks) due to massive fragmentation in our filestores a while back. We were having to defrag all the time and cluster

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-29 Thread Steve Taylor
the object of the hard drives, to see if we can overcome the overall slow read rate. Cheers, Tom [cid:image5f646a.JPG@1e0ce342.4f8bc00f]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<htt

Re: [ceph-users] Out-of-date RBD client libraries

2016-10-25 Thread Steve Taylor
, October 25, 2016 1:27 PM To: Steve Taylor Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Out-of-date RBD client libraries On Tue, Oct 25, 2016 at 3:10 PM, Steve Taylor mailto:steve.tay...@storagecraft.com>> wrote: Recently we tested an upgrade from 0.94.7 to 10.2.3 and found exact

Re: [ceph-users] Out-of-date RBD client libraries

2016-10-25 Thread Steve Taylor
leases involved. The key is to test first and make sure you have a sane upgrade path before doing anything in production. [cid:imagebeeb2c.JPG@5541413f.4f9d6fa0]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Tec

Re: [ceph-users] Ceph consultants?

2016-10-05 Thread Steve Taylor
x27;s fine for an initial dev setup. [cid:imageaf5f23.JPG@182b1064.43828019]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 3

Re: [ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-14 Thread Steve Taylor
784c1e.4796a4c3]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | If you are not the intended r

Re: [ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-14 Thread Steve Taylor
he osdmap frequently. [cid:image9cbe59.JPG@a1d77762.42974963]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | _

Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-02 Thread Steve Taylor
nder the covers. [cid:imagebc1a87.JPG@004db369.419de911]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | _

Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-01 Thread Steve Taylor
particular test. It's possible that OpenStack does the flattening for you in this scenario. This issue will likely require some investigation at the RBD level throughout your testing process to understand exactly what's happening. [cid:image5feece.JPG

Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-01 Thread Steve Taylor
87.JPG@753835fa.45a0b2c0]<https://storagecraft.com> Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 If you are not the

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-12 Thread Steve Taylor
It would be interesting to see what this cluster looks like as far as OSD count, journal configuration, network, CPU, RAM, etc. Something is obviously amiss. Even in a semi-decent configuration one should be able to restart a single OSD with noout under little load without causing blocked o

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-12 Thread Steve Taylor
you to restart one without blocking writes. If this isn't the case, something deeper is going on with your cluster. You shouldn't get slow requests due to restarting a single OSD with only noout set and idle disks on the remaining OSDs. I've done this many, many times. Steve Tay

Re: [ceph-users] OSDs are down, don't know why

2016-01-18 Thread Steve Taylor
t of error reported in this scenario. It seems likely that it would be network-related in this case, but the logs will confirm or debunk that theory. Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.

Re: [ceph-users] OSDs are down, don't know why

2016-01-18 Thread Steve Taylor
Do you have a ceph private network defined in your config file? I've seen this before in that situation where the private network isn't functional. The osds can talk to the mon(s) but not to each other, so they report each other as down when they're all running just fine.

Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Steve Taylor
you or if you're removing an osd that's still functional to some degree, then reweighting to 0, waiting for the single rebalance, then following the removal steps is probably your best bet. Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation 380 Data Drive Suite

Re: [ceph-users] double rebalance when removing osd

2016-01-07 Thread Steve Taylor
. Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation<http://www.storagecraft.com/> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2799 | Fax: 801.545.4705 If you are not the intended recipient o

Re: [ceph-users] Recovery question

2015-07-29 Thread Steve Taylor
neously as long as I didn't mix OSDs from different, old failure domains in a new failure domain without recovering in between. I understand mixing failure domains li ke this is risky, but I sort of expected it to work anyway. Maybe it was better in the end that Ceph forced me to do it more