rors it might be something
to check out.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
_
this upstream. I haven't
reviewed the balancer module code to see how it's doing things, but assuming it
uses osdmaptool or the same upmap code as osdmaptool this should also improve
the balancer module.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3
full but will not be made overfull by the move to take new PGs? That seems
like it should be the expected behavior in this scenario.
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If
nd appear in the config-key dump. It seems to be
picking up the other config options correctly. Am I doing something
wrong? I feel like I must have a typo or something, but I'm not seeing
it.
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite
I had played with those settings some already, but I just tried again
with max_deviation set to 0.0001 and max_iterations set to 1000. Same
result. Thanks for the suggestion though.
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper
case that
was preventing further optimization, but I still get the same error.
I'm sure I'm missing some config option or something that will allow it
to do better, but thus far I haven't been able to find anything in the
docs, mailing list archives, or balancer source code that he
Unless this is related to load and OSDs really are unreponsive, it is
almost certainly some sort of network issue. Duplicate IP address
maybe?
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799
I have done this with Luminous by deep-flattening a clone in a different pool.
It seemed to do what I wanted, but the RBD appeared to lose its sparseness in
the process. Can anyone verify that and/or comment on whether Mimic's "rbd deep
copy" does the same?
Steve Taylor |
That's the trade-off.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If you ar
kfills set to 1, then you need
to add some new OSDs to your cluster before doing any of this. They will have
to backfill too, but at least you'll have more spindles to handle it.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
S
100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If you are not the intend
disk" than dd? I'm always interested in learning about
new recovery tools.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
RBD in the second cluster, and used
the kernel client for the dd, xfs_repair, and mount. Worked like a charm.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corpora
ies afforded by Ceph inception are endless. ☺
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If you are not the intended recipient of this message or received it
erroneously, please notify t
nd your partition
is gone, then it probably isn't worth wasting additional time on it.
____
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https:/
would also be
interesting, although troubleshooting further is probably less valid and less
valuable now that you've resolved the problem. It's just a matter of curiosity
at this point.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jp
eck to make sure the kernel version you're running has the fix.
[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data
immed, but I imagine there is some
possibility that this issue is related to snap trimming or deleting snapshots.
Just more information...
On Thu, 2017-03-30 at 17:13 +0000, Steve Taylor wrote:
Good suggestion, Nick. I actually did that at the time. The "ceph osd map"
wasn't
y map
to. And/or maybe pg query might show something?
Nick
[cid:imagec0161b.JPG@d2cd1459.4ebbf9d5]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.8
resulted
in unfound objects in an otherwise healthy cluster, correct?
[cid:image575d42.JPG@8ddd3310.40afc06a]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
3
-> 5 OSDs. Right?
>
> So better add all new OSDs together on a specific server?
>
> Or not? :-)
>
> MJ
>
____
[cid:imagec7a5fc.JPG@dc945914.44a32fb5]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft
guest OS to
normalize that some due to its own page cache in the librbd test, but that
might at least give you some more clues about where to look further.
[cid:imagea0af4f.JPG@e3d04a1e.44ace3d9]<https://storagecraft.com> Steve
Taylor | Senior So
.
[cid:image1bd943.JPG@b026bd80.43945ba2]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If
as much as the subops
are. Thoughts?
[cid:image99464a.JPG@898dfa11.4e81d597]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper |
tps://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If you are not the intended recipient of this message or
al with ceph-osd, and
you should be good to go.
________
[cid:image9f5ad9.JPG@cc8e4767.4394994f]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper |
Not a single scrub in my case.
[cid:imagee2204f.JPG@a12e09b2.4ebb7737]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper |
I'm seeing the same behavior with very similar perf top output. One server with
32 OSDs has a load average approaching 800. No excessive memory usage and no
iowait at all.
[cid:imagea8a69a.JPG@f4e62cf1.419383aa]<https://storagecraft.com> St
to each OSD’s
filestore mount point.
[cid:image71f234.JPG@2c6ee238.46ab8bf6]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper |
On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor
mailto:steve.tay...@storagecraft.com>> wrote:
We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with 1M
blocks) due to massive fragmentation in our filestores a while back. We were
having to defrag all the time and cluster
the object of the hard drives, to see if we can
overcome the overall slow read rate.
Cheers,
Tom
[cid:image5f646a.JPG@1e0ce342.4f8bc00f]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<htt
, October 25, 2016 1:27 PM
To: Steve Taylor
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Out-of-date RBD client libraries
On Tue, Oct 25, 2016 at 3:10 PM, Steve Taylor
mailto:steve.tay...@storagecraft.com>> wrote:
Recently we tested an upgrade from 0.94.7 to 10.2.3 and found exact
leases involved. The key is to test
first and make sure you have a sane upgrade path before doing anything in
production.
[cid:imagebeeb2c.JPG@5541413f.4f9d6fa0]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Tec
x27;s fine for an initial dev setup.
[cid:imageaf5f23.JPG@182b1064.43828019]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 3
784c1e.4796a4c3]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
If you are not the intended r
he osdmap frequently.
[cid:image9cbe59.JPG@a1d77762.42974963]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
_
nder the covers.
[cid:imagebc1a87.JPG@004db369.419de911]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |
_
particular test. It's
possible that OpenStack does the flattening for you in this scenario.
This issue will likely require some investigation at the RBD level throughout
your testing process to understand exactly what's happening.
[cid:image5feece.JPG
87.JPG@753835fa.45a0b2c0]<https://storagecraft.com> Steve
Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799
If you are not the
It would be interesting to see what this cluster looks like as far as OSD
count, journal configuration, network, CPU, RAM, etc. Something is obviously
amiss. Even in a semi-decent configuration one should be able to restart a
single OSD with noout under little load without causing blocked o
you to restart one without blocking writes. If this isn't the case, something
deeper is going on with your cluster. You shouldn't get slow requests due to
restarting a single OSD with only noout set and idle disks on the remaining
OSDs. I've done this many, many times.
Steve Tay
t of error reported in this scenario. It seems likely that it would be
network-related in this case, but the logs will confirm or debunk that theory.
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.
Do you have a ceph private network defined in your config file? I've seen this
before in that situation where the private network isn't functional. The osds
can talk to the mon(s) but not to each other, so they report each other as down
when they're all running just fine.
you or if you're removing an osd that's still functional to
some degree, then reweighting to 0, waiting for the single rebalance, then
following the removal steps is probably your best bet.
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite
.
Steve Taylor | Senior Software Engineer | StorageCraft Technology
Corporation<http://www.storagecraft.com/>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705
If you are not the intended recipient o
neously as long as I
didn't mix OSDs from different, old failure domains in a new failure domain
without recovering in between. I understand mixing failure domains li
ke this is risky, but I sort of expected it to work anyway. Maybe it was
better in the end that Ceph forced me to do it more
46 matches
Mail list logo