Re: [ceph-users] Erasure pool performance expectations

2016-05-15 Thread Peter Kerdisle
Hey Nick,

I've been playing around with the osd_tier_promote_max_bytes_sec setting
but I'm not really seeing any changes.

What would be expected when setting a max bytes value? I would expected
that my OSDs would throttle themselves to this rate when doing promotes but
this doesn't seem to be the case. When I set it to 2MB I would expect a
node with 10 OSDs to do a max of 20MB/s during promotions. Is this math
correct?

Thanks,

Peter

On Tue, May 10, 2016 at 3:48 PM, Nick Fisk  wrote:

>
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Peter Kerdisle
> > Sent: 10 May 2016 14:37
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Erasure pool performance expectations
> >
> > To answer my own question it seems that you can change settings on the
> fly
> > using
> >
> > ceph tell osd.* injectargs '--osd_tier_promote_max_bytes_sec 5242880'
> > osd.0: osd_tier_promote_max_bytes_sec = '5242880' (unchangeable)
> >
> > However the response seems to imply I can't change this setting. Is
> there an
> > other way to change these settings?
>
> Sorry Peter, I missed your last email. You can also specify that setting
> in the ceph.conf, ie I have in mine
>
> osd_tier_promote_max_bytes_sec = 400
>
>
>
> >
> >
> > On Sun, May 8, 2016 at 2:37 PM, Peter Kerdisle  >
> > wrote:
> > Hey guys,
> >
> > I noticed the merge request that fixes the switch around here
> > https://github.com/ceph/ceph/pull/8912
> >
> > I had two questions:
> >
> > • Does this effect my performance in any way? Could it explain the slow
> > requests I keep having?
> > • Can I modify these settings manually myself on my cluster?
> > Thanks,
> >
> > Peter
> >
> >
> > On Fri, May 6, 2016 at 9:58 AM, Peter Kerdisle  >
> > wrote:
> > Hey Mark,
> >
> > Sorry I missed your message as I'm only subscribed to daily digests.
> >
> > Date: Tue, 3 May 2016 09:05:02 -0500
> > From: Mark Nelson 
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Erasure pool performance expectations
> > Message-ID: 
> > Content-Type: text/plain; charset=windows-1252; format=flowed
> > In addition to what nick said, it's really valuable to watch your cache
> > tier write behavior during heavy IO.  One thing I noticed is you said
> > you have 2 SSDs for journals and 7 SSDs for data.
> >
> > I thought the hardware recommendations were 1 journal disk per 3 or 4
> data
> > disks but I think I might have misunderstood it. Looking at my journal
> > read/writes they seem to be ok
> > though: https://www.dropbox.com/s/er7bei4idd56g4d/Screenshot%202016-
> > 05-06%2009.55.30.png?dl=0
> >
> > However I started running into a lot of slow requests (made a separate
> > thread for those: Diagnosing slow requests) and now I'm hoping these
> could
> > be related to my journaling setup.
> >
> > If they are all of
> > the same type, you're likely bottlenecked by the journal SSDs for
> > writes, which compounded with the heavy promotions is going to really
> > hold you back.
> > What you really want:
> > 1) (assuming filestore) equal large write throughput between the
> > journals and data disks.
> > How would one achieve that?
> >
> > 2) promotions to be limited by some reasonable fraction of the cache
> > tier and/or network throughput (say 70%).  This is why the
> > user-configurable promotion throttles were added in jewel.
> > Are these already in the docs somewhere?
> >
> > 3) The cache tier to fill up quickly when empty but change slowly once
> > it's full (ie limiting promotions and evictions).  No real way to do
> > this yet.
> > Mark
> >
> > Thanks for your thoughts.
> >
> > Peter
> >
> >
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Configure Civetweb Infernalis

2016-05-15 Thread giannis androulidakis

Hey,

i've been using the Infernalis Ceph version in a VM cluster, with 2 
OSDs, 1 monitor and 1 gateway node. Is there a standard way to change 
the options of the web server the gateway is using? (Civetweb according 
to the docs)
For example it is quite simple to change the default port (from 7480 to 
80, as stated in the docs), but i cannot configure the web server to 
change a specific header in the http responses in order to enable CORS : 
*access-control-allow-origin*
Unfortunately I haven't found any .conf file for the Civetweb in the 
gateway VM, so my guess is that the only possible way is through the 
cluster ceph.conf ?


Thanks,
Giannis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help ... some osd does not wan't to start after a dumpling->firefly->hammer upgrade

2016-05-15 Thread Emmanuel Lacour
Dear ceph users,


I have clusters running debian wheezy with dumpling.

I upgraded one cluster from dumpling to firefly, then to hammer without
problem.

Then I upgraded a second cluster from dumpling to firefly without
problem, thought I forget to restart 2 osds/10 so they stayed in dumpling.

I then did the firefly->hammer upgrade. 8/10 osd restart without problem
(including those that where running dumpling).

2 osd does not wan't to start and despite a rep_size of 3, I have lots
of pg down+peering :(


Here is the log (debug 20) of one of those osd:


http://owncloud.home-dn.net/index.php/s/jROrqr2nuuVL8kZ

Any help will be really appreciated!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] reweight-by-utilization warning

2016-05-15 Thread Blair Bethwaite
Hi all,

IMHO reweight-by-utilization should come with some sort of warning, it
just suddenly reweights everything - no dry run, no confirmation,
apparently no option to see what it's going to do. It also doesn't
appear to consider pools and hence crush rulesets, which I imagine
could result in it making some poor reweighting decisions.

We ran it on a cluster this evening and promptly had over 70% of
objects misplaced - even at 5-7 GB/s that's quite a lot of data
movement when there are half a billion objects in the cluster! I think
we'll stick with Dan's scripts
(https://github.com/cernceph/ceph-scripts/blob/master/tools/crush-reweight-by-utilization.py)
for the moment (thanks Dan!).

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] reweight-by-utilization warning

2016-05-15 Thread Dan van der Ster
Hi Blaire! (re-copying to list)

The good news is that the functionality of that python script is now
available natively in jewel and has been backported to hammer 0.96.7.

Now you can use

  ceph osd test-reweight-by-(pg|utilization)

in order to see how the weights would change if you were to run
reweight-by-(pg|utilization). Also there are some new options on the
(test-)reweight-by-* functions which allow you to adjust the max
weight changed per run and num osds changed per run.

Hope that helps!

Dan


On Sun, May 15, 2016 at 4:09 PM, Blair Bethwaite
 wrote:
> Hi all,
>
> IMHO reweight-by-utilization should come with some sort of warning, it
> just suddenly reweights everything - no dry run, no confirmation,
> apparently no option to see what it's going to do. It also doesn't
> appear to consider pools and hence crush rulesets, which I imagine
> could result in it making some poor reweighting decisions.
>
> We ran it on a cluster this evening and promptly had over 70% of
> objects misplaced - even at 5-7 GB/s that's quite a lot of data
> movement when there are half a billion objects in the cluster! I think
> we'll stick with Dan's scripts
> (https://github.com/cernceph/ceph-scripts/blob/master/tools/crush-reweight-by-utilization.py)
> for the moment (thanks Dan!).
>
> --
> Cheers,
> ~Blairo
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove a placement group?

2016-05-15 Thread Michael Kuriger
I would try:
ceph pg repair 15.3b3


[yp]



Michael Kuriger
Sr. Unix Systems Engineer
• mk7...@yp.com |• 818-649-7235



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Romero 
Junior
Sent: Saturday, May 14, 2016 11:46 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] How to remove a placement group?

Hi all,

I’m currently having trouble with an incomplete pg.
Our cluster has a replication factor of 3, however somehow I found this pg to 
be present in 9 different OSDs (being active only in 3 of them, of course).
Since I don’t really care about data loss, I was wondering if it’s possible to 
get rid of this by simply removing the pg. Is that even possible?

I’m running ceph version 0.94.3.

Here are some quick details:

1 pgs incomplete
1 pgs stuck inactive
100 requests are blocked > 32 sec

6 ops are blocked > 67108.9 sec on osd.130
94 ops are blocked > 33554.4 sec on osd.130

pg 15.3b3 is stuck inactive since forever, current state incomplete, last 
acting [130,210,148]
pg 15.3b3 is stuck unclean since forever, current state incomplete, last acting 
[130,210,148]
pg 15.3b3 is incomplete, acting [130,210,148]

Running a: “ceph pg 15.3b3 query” hangs without response.

I’ve tried setting OSD 130 as down, but then OSD 210 becomes the one keeping 
things stuck (query hangs), same for OSD 148.

Any ideas?

Kind regards,
Romero Junior
DevOps Infra Engineer
LeaseWeb Global Services B.V.

T: +31 20 316 0230
M: +31 6 2115 9310
E: r.jun...@global.leaseweb.com
W: 
www.leaseweb.com



Luttenbergweg 8,

1101 EC Amsterdam,

Netherlands




LeaseWeb is the brand name under which the various independent LeaseWeb 
companies operate. Each company is a separate and distinct entity that provides 
services in a particular geographic area. LeaseWeb Global Services B.V. does 
not provide third-party services. Please see 
www.leaseweb.com/en/legal for more 
information.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove a placement group?

2016-05-15 Thread Kostis Fardelas
There is the "ceph pg {pgid} mark_unfound_lost revert|delete" command but
you may also find interesting to utilize ceph-objectstore-tool to do the job

On 15 May 2016 at 20:22, Michael Kuriger  wrote:

> I would try:
>
> ceph pg repair 15.3b3
>
>
>
>
>
> [image: yp]
>
>
>
> Michael Kuriger
> Sr. Unix Systems Engineer
> * mk7...@yp.com |( 818-649-7235
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Romero Junior
> *Sent:* Saturday, May 14, 2016 11:46 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] How to remove a placement group?
>
>
>
> Hi all,
>
>
>
> I’m currently having trouble with an incomplete pg.
>
> Our cluster has a replication factor of 3, however somehow I found this pg
> to be present in 9 different OSDs (being active only in 3 of them, of
> course).
>
> Since I don’t really care about data loss, I was wondering if it’s
> possible to get rid of this by simply removing the pg. Is that even
> possible?
>
>
>
> I’m running ceph version 0.94.3.
>
>
>
> Here are some quick details:
>
>
>
> 1 pgs incomplete
>
> 1 pgs stuck inactive
>
> 100 requests are blocked > 32 sec
>
>
>
> 6 ops are blocked > 67108.9 sec on osd.130
>
> 94 ops are blocked > 33554.4 sec on osd.130
>
>
>
> pg 15.3b3 is stuck inactive since forever, current state incomplete, last
> acting [130,210,148]
>
> pg 15.3b3 is stuck unclean since forever, current state incomplete, last
> acting [130,210,148]
>
> pg 15.3b3 is incomplete, acting [130,210,148]
>
>
>
> Running a: “ceph pg 15.3b3 query” hangs without response.
>
>
>
> I’ve tried setting OSD 130 as down, but then OSD 210 becomes the one
> keeping things stuck (query hangs), same for OSD 148.
>
>
>
> Any ideas?
>
> Kind regards,
>
> Romero Junior
> *DevOps Infra Engineer*
> LeaseWeb Global Services B.V.
>
> T: +31 20 316 0230
> M: +31 6 2115 9310
> E: r.jun...@global.leaseweb.com
> W: www.leaseweb.com
> 
>
>
>
> Luttenbergweg 8,
>
> 1101 EC Amsterdam,
>
> Netherlands
>
>
>
>
>
> *LeaseWeb is the brand name under which the various independent LeaseWeb
> companies operate. Each company is a separate and distinct entity that
> provides services in a particular geographic area. LeaseWeb Global Services
> B.V. does not provide third-party services. Please see
> www.leaseweb.com/en/legal  for more
> information.*
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Starting a cluster with one OSD node

2016-05-15 Thread Alex Gorbachev
> On Friday, May 13, 2016, Mike Jacobacci  wrote:
> Hello,
>
> I have a quick and probably dumb question… We would like to use Ceph
> for our storage, I was thinking of a cluster with 3 Monitor and OSD
> nodes.  I was wondering if it was a bad idea to start a Ceph cluster
> with just one OSD node (10 OSDs, 2 SSDs), then add more nodes as our
> budget allows?  We want to spread out the purchases of the OSD nodes
> over a month or two but I would like to start moving data over ASAP.

 Hi Mike,

 Production or test?  I would strongly recommend against one OSD node
 in production.  Not only risk of hang and data loss due to e.g.
 Filesystem issue or kernel, but also as you add nodes the data
 movement will introduce a good deal of overhead.
>> On May 14, 2016, at 9:56 AM, Christian Balzer  wrote:
>>
>> On Sat, 14 May 2016 09:46:23 -0700 Mike Jacobacci wrote:
>>
>>
>> Hello,
>>
>>> Hi Alex,
>>>
>>> Thank you for your response! Yes, this is for a production
>>> environment... Do you think the risk of data loss due to the single node
>>> be different than if it was an appliance or a Linux box with raid/zfs?
>> Depends.
>>
>> Ceph by default distributes 3 replicas amongst the storage nodes, giving
>> you fault tolerances along the lines of RAID6.
>> So (again by default), the smallest cluster you want to start with is 3
>> nodes.
>>
>> OF course you could modify the CRUSH rules to place 3 replicas based on
>> OSDs, not nodes.
>>
>> However that only leaves you with 3 disks worth of capacity in your case
>> and still the data movement Alex mentioned when adding more nodes AND
>> modifying the CRUSH rules.
>>
>> Lastly I personally wouldn't deploy anything that's a SPoF in production.
>>
>> Christian
On Sat, May 14, 2016 at 1:08 PM, Mike Jacobacci  wrote:
> Hi Christian,
>
> Thank you, I know what I am asking isn't a good idea... I am just trying to 
> avoid waiting for all three nodes before I began virtualizing our 
> infrastructure.
>
> Again thanks for the responses!

Hi Mike,

I generally do not build production environments on one node for
storage, although my group has built really good test/training
environments with a single box.  What we do there is forgo ceph
altogether and install a hardware RAID with your favorite RAID HBA
vendor - LSI/Avago, Areca, Adaptec, etc. and export it using SCST as
iSCSI.  This setup has worked really well so far for its intended use.

We tested a two box setup with LSI/Avago SyncroCS, which works well
too and there are some good howtos on the web for this - but it seems
SyncroCS has been put on ice by Avago, unfortunately.

Regarding building a one node setup in ceph and then expanding it, I
would not do this.  It is easier to do things right up front than to
redo later.  What you may want to do is use this one node to become
familiar with the ceph architecture and do a dry run - however, I
would wipe clean and recreate the environment rather than promote it
to production.  Silly operator errors have come up in the past, like
leaving OSD level redundancy instead of setting node redundancy.
Also, big data migrations are hard on clients (you can see IO
timeouts), as discussed often on this list.  So YMMV, but I personally
would not rush.

Best regards,
Alex
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help ... some osd does not wan't to start after a dumpling->firefly->hammer upgrade

2016-05-15 Thread Emmanuel Lacour
Le 15/05/2016 15:35, Emmanuel Lacour a écrit :
> Dear ceph users,
> 
> 
> I have clusters running debian wheezy with dumpling.
> 
> I upgraded one cluster from dumpling to firefly, then to hammer without
> problem.
> 
> Then I upgraded a second cluster from dumpling to firefly without
> problem, thought I forget to restart 2 osds/10 so they stayed in dumpling.
> 
> I then did the firefly->hammer upgrade. 8/10 osd restart without problem
> (including those that where running dumpling).
> 
> 2 osd does not wan't to start and despite a rep_size of 3, I have lots
> of pg down+peering :(
> 
> 
> Here is the log (debug 20) of one of those osd:
> 
> 
> http://owncloud.home-dn.net/index.php/s/jROrqr2nuuVL8kZ
> 
> Any help will be really appreciated!
> 


I finally just found the solution, the osdmap was corrupted on those osd
(don' t know why), I copied all current/meta/DIR* from another osd et
health is OK know. I still check if there is no data loss ...
I found this solution after reading this:
http://tracker.ceph.com/issues/14406, which has the same symptoms
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help...my cluster has multiple rgw related pools after upgrading from H to J

2016-05-15 Thread 易明
Dear Cepher,

It's my first time to write email to the list, I hope  my problem is
depicted clearly.

I have a cluster with 4 physical servers, 3 mons on each server and 4 osds
per one server, as well as one server as rgw client. I just upgrade 3
servers from H to J, except the rgw server.

After creating rgw service on one mon node, I found the new rgw service can
not access my old rgw pools with the name user and password, but I create
new user and password on the new rgw server, it just can access the new
default rgw pools.


If I want to access the old rgw pools with my new rgw service, is there any
way to go?

My old ceph version:
[root@rgw0 ~]# ceph --version
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)

And the new version:
[root@ceph2 ~]# ceph --version
ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

Here are cluster info:
[root@ceph2 home]# ceph -s
cluster 3fcc77ef-9fda-4f83-8b9f-efc9c769c857
 health HEALTH_OK
 monmap e5: 3 mons at {ceph0=
172.17.0.170:6789/0,ceph1=172.17.0.171:6789/0,ceph2=172.17.0.172:6789/0}
election epoch 346, quorum 0,1,2 ceph0,ceph1,ceph2
  fsmap e353: 5/5/5 up
{4:0=ceph2-mds1=up:active,4:1=ceph1-mds0=up:active,4:2=ceph2-mds0=up:active,4:3=ceph1-mds1=up:active,4:4=ceph0-mds0=up:active}
 osdmap e7540: 12 osds: 12 up, 12 in
  pgmap v5839842: 1960 pgs, 31 pools, 185 GB data, 96103 objects
557 GB used, 44132 GB / 44690 GB avail
1960 active+clean
  client io 18291 B/s rd, 0 B/s wr, 17 op/s rd, 11 op/s wr

[root@ceph2 home]# rados lspools
rbd
.rgw.root
.rgw.control
.rgw
.rgw.gc
glance
.users.uid
.users
.users.swift
.rgw.buckets.index
.rgw.buckets
.users.email
LUNs
.rgw.buckets.hospitalA
.rgw.buckets.hospitalB
.rgw.buckets.hospitalC
.rgw.buckets.extra
ceph2_mds0_data
ceph2_mds0_metadata
nfs_disk
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.users.uid
default.rgw.users.keys
default.rgw.users.swift
default.rgw.meta
default.rgw.buckets.index
default.rgw.buckets.data
tmp_pool


Any help will be greatly appreciated !

Many thanks,

Sambar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Pacemaker Resource Agents for Ceph by Andreas Kurz

2016-05-15 Thread Alex Gorbachev
Following a conversation with Sage in NYC, I would like to share links
to the excellent resource agents for Pacemaker, developed by Andreas
Kurz to present Ceph images to iSCSI and FC fabrics.  We are using
these as part of the Storcium solution, and these RAs have withstood
quite a few beatings by clients' IO load.

https://github.com/akurz/resource-agents/blob/SCST/heartbeat/SCSTLogicalUnit
https://github.com/akurz/resource-agents/blob/SCST/heartbeat/SCSTTarget
https://github.com/akurz/resource-agents/blob/SCST/heartbeat/iscsi-scstd

--
Alex Gorbachev
http://www.iss-integration.com
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] failing to respond to cache pressure

2016-05-15 Thread Andrus, Brian Contractor
So this 'production ready' CephFS for jewel seems a little not quite

Currently I have a single system mounting CephFS and merely scp-ing data to it.
The CephFS mount has 168 TB used, 345 TB / 514 TB avail.

Every so often, I get a HEALTH_WARN message of mds0: Client failing to respond 
to cache pressure
Even if I stop the scp, it will not go away until I umount/remount the 
filesystem.

For testing, I had the cephfs mounted on about 50 systems and when updated 
started on the, I got all kinds of issues with it all.
I figured having updated run on a few systems would be a good 'see what 
happens' if there is a fair amount of access to it.

So, should I not be even considering using CephFS as a large storage mount for 
a compute cluster? Is there a sweet spot for what CephFS would be good for?


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Increasing pg_num

2016-05-15 Thread Chris Dunlop
Hi,

I'm trying to understand the potential impact on an active cluster of
increasing pg_num/pgp_num.

The conventional wisdom, as gleaned from the mailing lists and general
google fu, seems to be to increase pg_num followed by pgp_num, both in
small increments, to the target size, using "osd max backfills" (and
perhaps "osd recovery max active"?) to control the rate and thus
performance impact of data movement.

I'd really like to understand what's going on rather than "cargo culting"
it.

I'm currently on Hammer, but I'm hoping the answers are broadly applicable
across all versions for others following the trail.

Why do we have both pg_num and pgp_num? Given the docs say "The pgp_num
should be equal to the pg_num": under what circumstances might you want
these different, apart from when actively increasing pg_num first then
increasing pgp_num to match? (If they're supposed to be always the same, why
not have a single parameter and do the "increase pg_num, then pgp_num"
within ceph's internals?)

What do "osd backfill scan min" and "osd backfill scan max" actually
control? The docs say "The minimum/maximum number of objects per backfill
scan" but what does this actually mean and how does it affect the impact (if
at all)?

Is "osd recovery max active" actually relevant to this situation? It's
mentioned in various places related to increasing pg_num/pgp_num but my
understanding is it's related to recovery (e.g. osd falls out and comes
back again and needs to catch up) rather than back filling (migrating
pgs misplaced due to increasing pg_num, crush map changes etc.)

Previously (back in Dumpling days):


http://article.gmane.org/gmane.comp.file-systems.ceph.user/11490

From: Gregory Farnum
Subject: Re: Throttle pool pg_num/pgp_num increase impact
Newsgroups: gmane.comp.file-systems.ceph.user
Date: 2014-07-08 17:01:30 GMT

On Tuesday, July 8, 2014, Kostis Fardelas wrote:
> Should we be worried that the pg/pgp num increase on the bigger pool will
> have a 300X larger impact?

The impact won't be 300 times bigger, but it will be bigger. There are two
things impacting your cluster here

1) the initial "split" of the affected PGs into multiple child PGs. You can
mitigate this by stepping through pg_num at small multiples.
2) the movement of data to its new location (when you adjust pgp_num). This
can be adjusted by setting the "OSD max backfills" and related parameters;
check the docs.
-Greg


Am I correct thinking "small multiples" in this context is along the lines
of "1.1" rather than "2" or "4"?.

Is there really much impact when increasing pg_num in a single large step
e.g. 1024 to 4096? If so, what causes this impact? An initial trial of
increasing pg_num by 10% (1024 to 1126) on one of my pools showed it
completed in a matter of tens of seconds, too short to really measure any
performance impact. But I'm concerned this could be exponential to the size
of the step such that increasing by a large step (e.g. the rest of the way
from 1126 to 4096) could cause problems.

Given the use of "osd max backfills" to limit the impact of the data
movement associated with increasing pgp_num, is there any advantage or
disadvantage to increasing pgp_num in small increments (e.g. 10% at a time)
vs "all at once", apart from small increments likely moving some data
multiple times? E.g. with a large step is there a higher potential for
problems if something else happens to the cluster the same time (e.g. an OSD
dies) because the current state of the system is further from the expected
state, or something like that?

If small increments of pgp_num are advisable, should the process be
"increase pg_num by a small increment, increase pgp_num to match, repeat
until target reached", or is that no advantage to increasing pg_num (in
multiple small increments or single large step) to the target, then
increasing pgp_num in small increments to the target - and why?

Given that increasing pg_num/pgp_num seem almost inevitable for a growing
cluster, and that increasing these can be one of the most
performance-impacting operations you can perform on a cluster, perhaps a
document going into these details would be appropriate?

Cheers,

Chris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.7 Hammer released

2016-05-15 Thread Chris Dunlop
On Fri, May 13, 2016 at 10:21:51AM -0400, Sage Weil wrote:
> This Hammer point release fixes several minor bugs. It also includes a 
> backport of an improved ‘ceph osd reweight-by-utilization’ command for 
> handling OSDs with higher-than-average utilizations.
> 
> We recommend that all hammer v0.94.x users upgrade.

Per http://download.ceph.com/debian-hammer/pool/main/c/ceph/

ceph-common_0.94.7-1trusty_amd64.deb11-May-2016 16:08  5959876
ceph-common_0.94.7-1xenial_amd64.deb11-May-2016 15:54  6037236
ceph-common_0.94.7-1xenial_arm64.deb11-May-2016 16:06  5843722
ceph-common_0.94.7-1~bpo80+1_amd64.deb  11-May-2016 16:08  6028036

Once again, no debian wheezy (~bpo70) version?

Ubuntu Precise missed out this time too.

Oddly, the date on the previously released wheezy version changed at the
same time as the 0.94.7 releases above, it was previously 15-Dec-2015 15:32:

ceph-common_0.94.5-1~bpo70+1_amd64.deb  11-May-2016 15:57  9868188


Cheers,

Chris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com