[ceph-users] Adding multiple osd's to an active cluster

2017-02-17 Thread nigel davies
Hay All How is the best way to added multiple osd's to an active cluster? As the last time i done this i all most killed the VM's we had running on the cluster Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.

Re: [ceph-users] PG stuck peering after host reboot

2017-02-17 Thread george.vasilakakos
Hi Wido, In an effort to get the cluster to complete peering that PG (as we need to be able to use our pool) we have removed osd.595 from the CRUSH map to allow a new mapping to occur. When I left the office yesterday osd.307 had replaced osd.595 in the up set but the acting set had CRUSH_ITEM

Re: [ceph-users] Question regarding CRUSH algorithm

2017-02-17 Thread Richard Hesketh
On 16/02/17 20:44, girish kenkere wrote: > Thanks David, > > Its not quiet what i was looking for. Let me explain my question in more > detail - > > This is excerpt from Crush paper, this explains how crush algo running on > each client/osd maps pg to an osd during the write operation[lets assu

Re: [ceph-users] PG stuck peering after host reboot

2017-02-17 Thread Wido den Hollander
> Op 17 februari 2017 om 11:09 schreef george.vasilaka...@stfc.ac.uk: > > > Hi Wido, > > In an effort to get the cluster to complete peering that PG (as we need to be > able to use our pool) we have removed osd.595 from the CRUSH map to allow a > new mapping to occur. > > When I left the off

Re: [ceph-users] PG stuck peering after host reboot

2017-02-17 Thread george.vasilakakos
On 17/02/2017, 12:00, "Wido den Hollander" wrote: > >> Op 17 februari 2017 om 11:09 schreef george.vasilaka...@stfc.ac.uk: >> >> >> Hi Wido, >> >> In an effort to get the cluster to complete peering that PG (as we need to >> be able to use our pool) we have removed osd.595 from the CRUSH ma

[ceph-users] moving rgw pools to ssd cache

2017-02-17 Thread Малков Петр Викторович
Hello! I'am looking for method to make rgw ssd-cache tier with hdd https://blog-fromsomedude.rhcloud.com/2015/11/06/Ceph-RadosGW-Placement-Targets/ I successfully created rgw pools for ssd as described above And Placement Targets are written to users profile So data can be written to any hdd or

Re: [ceph-users] High CPU usage by ceph-mgr on idle Ceph cluster

2017-02-17 Thread John Spray
On Fri, Feb 17, 2017 at 6:27 AM, Muthusamy Muthiah wrote: > On one our platform mgr uses 3 CPU cores . Is there a ticket available for > this issue ? Not that I'm aware of, you could go ahead and open one. Cheers, John > Thanks, > Muthu > > On 14 February 2017 at 03:13, Brad Hubbard wrote: >>

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
I'm not sure what variable should I be looking at exactly, but after reading through all of them I don't see anyting supsicious, all values are 0. I'm attaching it anyway, in case I missed something: https://atw.hu/~koszik/ceph/osd26-perf I tried debugging the ceph pg query a bit more, and it s

Re: [ceph-users] KVM/QEMU rbd read latency

2017-02-17 Thread Jason Dillaman
On Fri, Feb 17, 2017 at 2:14 AM, Alexandre DERUMIER wrote: > and I have good hope than this new feature > "RBD: Add support readv,writev for rbd" > http://marc.info/?l=ceph-devel&m=148726026914033&w=2 Definitely will eliminate 1 unnecessary data copy -- but sadly it still will make a single copy

Re: [ceph-users] [Tendrl-devel] Calamari-server for CentOS

2017-02-17 Thread Ken Dreyer
I think the most up-to-date source of Calamari CentOS packages would be https://shaman.ceph.com/repos/calamari/1.5/ On Fri, Feb 17, 2017 at 7:38 AM, Martin Kudlej wrote: > Hello all, > > I would like to ask again about calamari-server package for CentOS 7. Is > there any plan to have calamari-ser

Re: [ceph-users] KVM/QEMU rbd read latency

2017-02-17 Thread Alexandre DERUMIER
>>We also need to support >1 librbd/librados-internal IO >>thread for outbound/inbound paths. Could be worderfull ! multiple iothread by disk is coming for qemu too. (I have seen Paolo Bonzini sending a lot of patches this month) - Mail original - De: "Jason Dillaman" À: "aderumier"

[ceph-users] Disable debug logging: best practice or not?

2017-02-17 Thread Kostis Fardelas
Hi, I keep reading recommendations about disabling debug logging in Ceph in order to improve performance. There are two things that are unclear to me though: a. what do we lose if we decrease default debug logging and where is the sweet point in order to not lose critical messages? I would say fo

Re: [ceph-users] Adding multiple osd's to an active cluster

2017-02-17 Thread Brian Andrus
As described recently in several other threads, we like to add OSDs in to their proper CRUSH location, but with the following parameter set: osd crush initial weight = 0 We then bring the OSDs in to the cluster (0 impact in our environment), and then gradually increase CRUSH weight to bring the

[ceph-users] S3 Radosgw : how to grant a user within a tenant

2017-02-17 Thread Vincent Godin
I created 2 users : jack & bob inside a tenant_A jack created a bucket named BUCKET_A and want to give read access to the user bob with s3cmd, i can grant a user without tenant easylly: s3cmd setacl --acl-grant=read:user s3://BUCKET_A but with an explicit tenant, i tried : --acl-grant=read:bob --

Re: [ceph-users] Disable debug logging: best practice or not?

2017-02-17 Thread Wido den Hollander
> Op 17 februari 2017 om 17:44 schreef Kostis Fardelas : > > > Hi, > I keep reading recommendations about disabling debug logging in Ceph > in order to improve performance. There are two things that are unclear > to me though: > > a. what do we lose if we decrease default debug logging and wher

Re: [ceph-users] S3 Radosgw : how to grant a user within a tenant

2017-02-17 Thread Bastian Rosner
On 02/17/2017 06:25 PM, Vincent Godin wrote: > I created 2 users : jack & bob inside a tenant_A > jack created a bucket named BUCKET_A and want to give read access to the > user bob > > with s3cmd, i can grant a user without tenant easylly: s3cmd setacl > --acl-grant=read:user s3://BUCKET_A > > b

Re: [ceph-users] crushtool mappings wrong

2017-02-17 Thread Gregory Farnum
On Thu, Feb 16, 2017 at 3:51 PM, Blair Bethwaite wrote: > Hi Brian, > > After another hour of staring at the decompiled crushmap and playing around > with crushtool command lines I finally looked harder at your command line > and noticed I was also specifying "--simulate", removing that gives me >

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Gregory Farnum
Situations that are stable lots of undersized PGs like this generally mean that the CRUSH map is failing to allocate enough OSDs for certain PGs. The log you have says the OSD is trying to NOTIFY the new primary that the PG exists here on this replica. I'd guess you only have 3 hosts and are tryin

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo
Can you do? * ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o ./crushmap.txt On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum wrote: > Situations that are stable lots of undersized PGs like this generally > mean that the CRUSH map is failing to allocate enough OSDs for certain

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
It's at https://atw.hu/~koszik/ceph/crushmap.txt On Sat, 18 Feb 2017, Shinobu Kinjo wrote: > Can you do? > > * ceph osd getcrushmap -o ./crushmap.o; crushtool -d ./crushmap.o -o > ./crushmap.txt > > On Sat, Feb 18, 2017 at 3:52 AM, Gregory Farnum wrote: > > Situations that are stable lots of

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
I have size=2 and 3 independent nodes. I'm happy to try firefly tunables, but a bit scared that it would make things even worse. On Fri, 17 Feb 2017, Gregory Farnum wrote: > Situations that are stable lots of undersized PGs like this generally > mean that the CRUSH map is failing to allocate en

Re: [ceph-users] KVM/QEMU rbd read latency

2017-02-17 Thread Phil Lacroute
Thanks everyone for the suggestions. Disabling the RBD cache, disabling the debug logging and building qemu with jemalloc each had a significant impact. Performance is up from 25K IOPS to 63K IOPS. Hopefully the ongoing work to reduce the number of buffer copies will yield further improvement

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo
You may need to increase ``choose_total_tries`` to more than 50 (default) up to 100. - http://docs.ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map - https://github.com/ceph/ceph/blob/master/doc/man/8/crushtool.rst On Sat, Feb 18, 2017 at 5:25 AM, Matyas Koszik wrote: > >

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
I set it to 100, then restarted osd26, but after recovery everything is as it was before. On Sat, 18 Feb 2017, Shinobu Kinjo wrote: > You may need to increase ``choose_total_tries`` to more than 50 > (default) up to 100. > > - > http://docs.ceph.com/docs/master/rados/operations/crush-map/#ed

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Matyas Koszik
Looks like you've provided me with the solution, thanks! I've set the tunables to firefly, and now I only see the normal states associated with a recovering cluster, there're no more stale pgs. I hope it'll stay like this when it's done, but that'll take quite a while. Matyas On Fri, 17 Feb 20

Re: [ceph-users] pgs stuck unclean

2017-02-17 Thread Shinobu Kinjo
On Sat, Feb 18, 2017 at 9:03 AM, Matyas Koszik wrote: > > > Looks like you've provided me with the solution, thanks! :) > I've set the tunables to firefly, and now I only see the normal states > associated with a recovering cluster, there're no more stale pgs. > I hope it'll stay like this when

[ceph-users] How safe is ceph pg repair these days?

2017-02-17 Thread Tracy Reed
I have a 3 replica cluster. A couple times I have run into inconsistent PGs. I googled it and ceph docs and various blogs say run a repair first. But a couple people on IRC and a mailing list thread from 2015 say that ceph blindly copies the primary over the secondaries and calls it good. http://

Re: [ceph-users] Experience with 5k RPM/archive HDDs

2017-02-17 Thread Mike Miller
Hi, don't go there, we tried this with SMR drives, which will slow down to somewhere around 2-3 IOPS during backfilling/recovery and that renders the cluster useless for client IO. Things might change in the future, but for now, I would strongly recommend against SMR. Go for normal SATA driv

Re: [ceph-users] KVM/QEMU rbd read latency

2017-02-17 Thread Jason Dillaman
On Fri, Feb 17, 2017 at 3:35 PM, Phil Lacroute wrote: > I have a followup question about the debug logging. Is there any way to > dump the in-memory logs from the QEMU RBD client? If not (and I couldn’t > find a way to do this), then nothing is lost by disabling the logging on > client machines

Re: [ceph-users] How safe is ceph pg repair these days?

2017-02-17 Thread Shinobu Kinjo
if ``ceph pg deep-scrub `` does not work then do ``ceph pg repair On Sat, Feb 18, 2017 at 10:02 AM, Tracy Reed wrote: > I have a 3 replica cluster. A couple times I have run into inconsistent > PGs. I googled it and ceph docs and various blogs say run a repair > first. But a couple people

Re: [ceph-users] How safe is ceph pg repair these days?

2017-02-17 Thread Tracy Reed
Well, that's the question...is that safe? Because the link to the mailing list post (possibly outdated) says that what you just suggested is definitely NOT safe. Is the mailing list post wrong? Has the situation changed? Exactly what does ceph repair do now? I suppose I could go dig into the code b