Re: [ceph-users] node and its OSDs down...

2016-12-03 Thread M Ranga Swami Reddy
Sure, will try with "*ceph osd crush reweight 0.0" *and update the status.

Thanks
Swami

On Fri, Dec 2, 2016 at 8:15 PM, David Turner 
wrote:

> If you want to reweight only once when you have a failed disk that is
> being balanced off of, set the crush weight for that osd to 0.0.  Then when
> you fully remove the disk from the cluster it will not do any additional
> backfilling.  Any change to the crush map will likely move data around,
> even if you're removing an already "removed" osd.
>
> --
>
>  David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
> --
>
> --
> *From:* M Ranga Swami Reddy [swamire...@gmail.com]
> *Sent:* Thursday, December 01, 2016 11:45 PM
> *To:* David Turner
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] node and its OSDs down...
>
> Hi David - Yep, I did the "ceph osd crush remove osd.", which started
> the recovery.
> My worries is - why Ceph is doing the recovery, if an OSD is already down
> and no more in the cluster. That means, ceph already maintained down OSDs
> objects copied to another OSDs.. here is the ceph osd tree o/p:
> ===
>
> 227 0.91osd.227 down0
>
> 
>
> 250 0.91osd.250 down0
>
> ===
>
>
> So to avoid the recovery/rebalance , can I set the weight of OSD (which
> was in down state). But is this weight setting also lead to rebalance
> activity.
>
>
> Thanks
>
> Swami
>
>
>
> On Thu, Dec 1, 2016 at 8:07 PM, David Turner <
> david.tur...@storagecraft.com> wrote:
>
>> I assume you also did ceph osd crush remove osd..  When you removed
>> the osd that was down/out and balanced off of, you changed the weight of
>> the host that it was on which triggers additional backfilling to balance
>> the crush map.
>>
>> --
>>
>>  David Turner | Cloud Operations Engineer | 
>> StorageCraft
>> Technology Corporation 
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
>> <(385)%20224-2943>
>>
>> --
>>
>> If you are not the intended recipient of this message or received it
>> erroneously, please notify the sender and delete it, together with any
>> attachments, and be advised that any dissemination or copying of this
>> message is prohibited.
>>
>> --
>>
>> --
>> *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of M
>> Ranga Swami Reddy [swamire...@gmail.com]
>> *Sent:* Thursday, December 01, 2016 3:03 AM
>> *To:* ceph-users
>> *Subject:* [ceph-users] node and its OSDs down...
>>
>> Hello,
>> One of my ceph node with 20 OSDs down...After a couple of hours,
>> ceph health is in OK state.
>>
>> Now, I tried to remove those OSDs, which were down state from
>> ceph cluster...
>> using the "ceh osd remove osd."
>> then ceph clsuter started rebalancing...which is strange ..because
>> thsoe OSDs are down for a long time and health also OK..
>> my question - why recovery or reblance started when I remove the OSD
>> (which was down).
>>
>> Thanks
>> Swami
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph QoS user stories

2016-12-03 Thread Ning Yao
Hi Sage,

I think we can refactor the io priority strategy at the same time
based on our consideration below?

2016-12-03 17:21 GMT+08:00 Ning Yao :
> Hi, all
>
> Currently,  we can modify osd_client_op_priority to assign different
> clients' ops with different priority such like we can assign high
> priority for OLTP and assign low priority for OLAP. However, there are
> some consideration as below:
>
> 1) it seems OLTP's client op still can be blocked by OLAP's sub_ops
> since sub_ops use the CEPH_MSG_PRIO_DEFAULT.  So should we consider
> sub_op should inherit the message's priority from client Ops (if
> client ops do not give priority  explicitly, use CEPH_MSG_PRIO_DEFAULT
> by default), does this make sense?
>
> 2) secondly, reply message is assigned with
> priority(CEPH_MSG_PRIO_HIGH), but there is no restriction for client
> Ops' priority (use can set 210), which will lead to blocked for reply
> message. So should we think change those kind of message to highest
> priority(CEPH_MSG_PRIO_HIGHEST). Currently, it seems no ops use
> CEPH_MSG_PRIO_HIGHEST.
>
> 3) I think the kick recovery ops should inherit the client ops priority
>
> 4) Is that possible to add test cases to verify whether it works
> properly as expected in ceph-qa-suite as Sam mentioned before? Any
> guidelines?
Regards
Ning Yao


2016-12-03 3:01 GMT+08:00 Sage Weil :
> Hi all,
>
> We're working on getting infrasture into RADOS to allow for proper
> distributed quality-of-service guarantees.  The work is based on the
> mclock paper published in OSDI'10
>
> https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Gulati.pdf
>
> There are a few ways this can be applied:
>
>  - We can use mclock simply as a better way to prioritize background
> activity (scrub, snap trimming, recovery, rebalancing) against client IO.
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS or
> proportional priority/weight) on RADOS pools
>  - We can use d-mclock to set QoS parameters (e.g., min IOPS) for
> individual clients.
>
> Once the rados capabilities are in place, there will be a significant
> amount of effort needed to get all of the APIs in place to configure and
> set policy.  In order to make sure we build somethign that makes sense,
> I'd like to collection a set of user stores that we'd like to support so
> that we can make sure we capture everything (or at least the important
> things).
>
> Please add any use-cases that are important to you to this pad:
>
> http://pad.ceph.com/p/qos-user-stories
>
> or as a follow-up to this email.
>
> mClock works in terms of a minimum allocation (of IOPS or bandwidth; they
> are sort of reduced into a single unit of work), a maximum (i.e. simple
> cap), and a proportional weighting (to allocation any additional capacity
> after the minimum allocations are satisfied).  It's somewhat flexible in
> terms of how we apply it to specific clients, classes of clients, or types
> of work (e.g., recovery).  How we put it all together really depends on
> what kinds of things we need to accomplish (e.g., do we need to support a
> guaranteed level of service shared across a specific set of N different
> clients, or only individual clients?).
>
> Thanks!
> sage
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to create two isolated rgw services in one ceph cluster?

2016-12-03 Thread piglei
Thank you Abhishek, I will take a look at Realm soon. BTW, what's your
point on the multi-tenancy combined nginx rules solution?

AFAIK, Ceph's multi-tenancy feature seems like a replacement of adding
prefix for user/bucket name manually. It only avoids name conflict across
different tenants, but lacks the ability of real isolation of user data.
What do you think?

On Fri, Dec 2, 2016 at 10:07 PM, Abhishek L  wrote:

>
> piglei writes:
>
> > Hi, I am a ceph newbie. I want to create two isolated rgw services in a
> single ceph cluster, the requirements:
> >
> > * Two radosgw will have different hosts, such as radosgw-x.site.com and
> radosgw-y.site.com. File uploaded to rgw-xcannot be accessed via rgw-y.
> > * Isolated bucket and user namespaces is not necessary, because I could
> prepend term to bucket name and user name, like "x-bucket" or "y-bucket"
> >
> > At first I thought region and zone may be the solution, but after a
> little more researchs, I found that region and zone are for different geo
> locations, they share the same metadata (buckets and users) and objects
> instead of isolated copies.
> >
> > After that I noticed ceph's multi-tenancy feature since jewel release,
> which is probably what I'm looking for, here is my solution using
> multi-tenancy:
> >
> > * using two tenant called x and y, each rgw service matches one tenant.
> > * Limit incoming requests to rgw in it's own tenant, which means you can
> only retrieve resources belongs to buckets "x:bucket" when
> callingradosgw-x.site.com. This can be archived by some custom nginx
> rules.
> >
> > Is this the right approach or Should I just use two different clusters
> instead? Looking forward to your awesome advises.
> >
>
> Since jewel, you can also consider looking into realms which sort of
> provide for isolated namespaces within a zone or zonegroup.
>
> --
> Abhishek
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Fuse Strange Behavior Very Strange

2016-12-03 Thread Winger Cheng
Hi, all:
I have two small test on our cephfs cluster:

time for i in {1..1}; do echo hello > file${i}; done && time rm * && 
time for i in {1..1}; do echo hello > file${i}; done && time rm *

Client A : use kernel client

Client B : use fuse client

First I create folder “create_by_A" on Client A  and do the test on 
“create_by_A” with Client A 
and create folder “create_by_B" on Client B and do the test on 
““create_by_B” with Client B
   
result on A:
real0m3.768s
user0m0.236s
sys 0m0.244s

real0m4.542s
user0m0.068s
sys 0m0.228s

real0m3.545s
user0m0.200s
sys 0m0.256s

real0m4.941s
user0m0.024s
sys 0m0.264s

result on B:
real0m16.768s
user   0m0.368s
sys 0m0.716s

real0m27.542s
user0m0.120s
sys 0m0.888s

real0m15.990s
user0m0.288s
sys 0m0.792s

real0m20.904s
user0m0.243s
sys 0m0.577s

   It seem normal, but then 
   do the test on folder “create_by_A” with Client B
   do the test on folder “create_by_B” with Client A

   result on A:
real 0m3.832s 
user 0m0.200s 
sys 0m0.264s 

real 0m8.326s 
user 0m0.100s 
sys 0m0.192s 

real 0m5.934s 
user 0m0.264s 
sys 0m0.368s 

real 0m4.117s 
user 0m0.104s 
sys 0m0.200s
 
   result on B:
real 2m25.713s 
user 0m0.592s 
sys 0m1.120s 

real 2m16.726s 
user 0m0.084s 
sys 0m1.228s 

real 2m9.301s 
user 0m0.440s 
sys 0m1.104s 

real 2m19.365s 
user 0m0.200s 
sys 0m1.184s

   It seems very slow and strange

   System Version : Ubuntu 14.04
   kernel Version: 4.4.0-28-generic
   ceph version: 10.2.2

ceph -s :
health HEALTH_WARN
noscrub,nodeep-scrub,sortbitwise flag(s) set
 monmap e1: 3 mons at 
{rndcl26=10.0.0.26:6789/0,rndcl38=10.0.0.38:6789/0,rndcl62=10.0.0.62:6789/0}
election epoch 40, quorum 0,1,2 rndcl26,rndcl38,rndcl62
  fsmap e24091: 1/1/1 up {0=rndcl67=up:active}, 1 up:standby
 osdmap e9202: 119 osds: 119 up, 119 in
flags noscrub,nodeep-scrub,sortbitwise
  pgmap v11577714: 8256 pgs, 3 pools, 62234 GB data, 165 Mobjects
211 TB used, 221 TB / 432 TB avail
8256 active+clean


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com