Re: [ceph-users] High disk utilisation

2015-11-30 Thread Christian Balzer
Hello, On Mon, 30 Nov 2015 07:55:24 + MATHIAS, Bryn (Bryn) wrote: > Hi Christian, > > I’ll give you a much better dump of detail :) > > Running RHEL 7.1, > ceph version 0.94.5 > > all ceph disks are xfs, with journals on a partition on the disk > Disks: 6Tb spinners. > OK, I was guessing

[ceph-users] НА: network failover with public/custer network - is that possible

2015-11-30 Thread Межов Игорь Александрович
Hi! Götz Reinicke wrotes: >>What if one of the networks fail? e.g. just on one host or the whole >>network for all nodes? >>Is there some sort of auto failover to use the other network for alltraffic >>than? >>How dose that work in real life? :) Or do I have to interact by hand Alex Gorbachev

Re: [ceph-users] python3 librados

2015-11-30 Thread Wido den Hollander
On 29-11-15 20:20, misa-c...@hudrydum.cz wrote: > Hi everyone, > > for my pet project I've needed python3 rados library. So I've took the > existing python2 rados code and clean it up a little bit to fit my needs. The > lib contains basic interface, asynchronous operations and also asyncio >

Re: [ceph-users] Undersized pgs problem

2015-11-30 Thread Vasiliy Angapov
Btw, in my configuration "mon osd downout subtree limit" is set to "host". Does it influence things? 2015-11-29 14:38 GMT+08:00 Vasiliy Angapov : > Bob, > Thanks for explanation, sounds resonable! But how it could happen that > host is down and its OSDs are still IN cluster? > I mean NOOUT flag is

Re: [ceph-users] High disk utilisation

2015-11-30 Thread MATHIAS, Bryn (Bryn)
Hi, > On 30 Nov 2015, at 13:44, Christian Balzer wrote: > > > Hello, > > On Mon, 30 Nov 2015 07:55:24 + MATHIAS, Bryn (Bryn) wrote: > >> Hi Christian, >> >> I’ll give you a much better dump of detail :) >> >> Running RHEL 7.1, >> ceph version 0.94.5 >> >> all ceph disks are xfs, with jo

[ceph-users] Removing OSD - double rebalance?

2015-11-30 Thread Carsten Schmitt
Hi all, I'm running ceph version 0.94.5 and I need to downsize my servers because of insufficient RAM. So I want to remove OSDs from the cluster and according to the manual it's a pretty straightforward process: I'm beginning with "ceph osd out {osd-num}" and the cluster starts rebalancing i

Re: [ceph-users] Removing OSD - double rebalance?

2015-11-30 Thread Wido den Hollander
On 30-11-15 10:08, Carsten Schmitt wrote: > Hi all, > > I'm running ceph version 0.94.5 and I need to downsize my servers > because of insufficient RAM. > > So I want to remove OSDs from the cluster and according to the manual > it's a pretty straightforward process: > I'm beginning with "ceph

Re: [ceph-users] Removing OSD - double rebalance?

2015-11-30 Thread Burkhard Linke
Hi Carsten, On 11/30/2015 10:08 AM, Carsten Schmitt wrote: Hi all, I'm running ceph version 0.94.5 and I need to downsize my servers because of insufficient RAM. So I want to remove OSDs from the cluster and according to the manual it's a pretty straightforward process: I'm beginning with "

Re: [ceph-users] python3 librados

2015-11-30 Thread John Spray
On Sun, Nov 29, 2015 at 7:20 PM, wrote: > Hi everyone, > > for my pet project I've needed python3 rados library. So I've took the > existing python2 rados code and clean it up a little bit to fit my needs. The > lib contains basic interface, asynchronous operations and also asyncio wrapper > for

[ceph-users] ceph-mon high cpu usage, and response slow

2015-11-30 Thread Yujian Peng
The mons in my production cluster(0.80.7) have a very high cpu usage 100%. I added leveldb_compression = false to the ceph.conf to disable leveldb compression and restarted all the mons with --compact. But the mons still have a high cpu usages, and response to ceph command very slow. Here is the pe

Re: [ceph-users] ceph-mon high cpu usage, and response slow

2015-11-30 Thread Joao Eduardo Luis
On 11/30/2015 09:51 AM, Yujian Peng wrote: > The mons in my production cluster(0.80.7) have a very high cpu usage 100%. > I added leveldb_compression = false to the ceph.conf to disable leveldb > compression and restarted all the mons with --compact. But the mons still > have a high cpu usages, and

[ceph-users] Does anyone know how to open clog debug?

2015-11-30 Thread Wukongming
Hi, All Does anyone know how to open clog debug? - wukongming ID: 12019 Tel:0571-86760239 Dept:2014 UIS2 OneStor ---

Re: [ceph-users] rbd_inst.create

2015-11-30 Thread Gregory Farnum
On Nov 27, 2015 3:34 AM, "NEVEU Stephane" wrote: > > Ok, I think I got it. It seems to come from here : > > tracker.ceph.com/issues/6047 > > > > I’m trying to snapshot an image while I previously made a snapshot of my pool… whereas it just works fine when using a brand new pool. I’m using ceph v0.

[ceph-users] RBD fiemap already safe?

2015-11-30 Thread Timofey Titovets
Hi list, AFAIK, fiemap disabled by default because it cause rbd corruption. Someone already test it with recent kernels? Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph performances

2015-11-30 Thread Rémi BUISSON
Hello Robert, OK. I already tried this but as you said performances decrease. I just built the 10.0.0 version and it seems that there are some regressions in there. I've now 3.5 Kiops instead of 21 Kiops in 9.2.0 :-/ Thanks. Rémi Le 2015-11-25 18:54, Robert LeBlanc a écrit : -BEGIN PGP

Re: [ceph-users] rbd_inst.create

2015-11-30 Thread Jason Dillaman
... and once you create a pool-level snapshot on a pool, there is no way to convert that pool back to being compatible with RBD self-managed snapshots. As for the RBD image feature bits, they are defined within rbd.py. On master, they currently are as follows: RBD_FEATURE_LAYERING = 1 RBD_FEAT

Re: [ceph-users] Undersized pgs problem

2015-11-30 Thread Bob R
Vasiliy, I don't think that's the cause. Can you paste other tuning options from your ceph.conf? Also, have you fixed the problems with cephx auth? Bob On Mon, Nov 30, 2015 at 12:56 AM, Vasiliy Angapov wrote: > Btw, in my configuration "mon osd downout subtree limit" is set to "host". > Does

[ceph-users] RBD: Max queue size

2015-11-30 Thread Timofey Titovets
Hi list, Short: i just want ask, why i can't do: echo 129 > /sys/class/block/rbdX/queue/nr_requests i.e. why i can't set value greater then 128? Why such a restriction? Long: Usage example: i have slow CEPH HDD based storage and i want it export by iSCSI proxy machine for ESXi cluster If i have

Re: [ceph-users] RBD: Max queue size

2015-11-30 Thread Ilya Dryomov
On Mon, Nov 30, 2015 at 7:17 PM, Timofey Titovets wrote: > Hi list, > Short: > i just want ask, why i can't do: > echo 129 > /sys/class/block/rbdX/queue/nr_requests > > i.e. why i can't set value greater then 128? > Why such a restriction? > > Long: > Usage example: > i have slow CEPH HDD based st

Re: [ceph-users] python3 librados

2015-11-30 Thread misa-ceph
Hi John, thanks for the info. It seems that patch adds a python3 compatibility support but leaves the ugly thread spawning intact. No idea if it makes sense to try to merge some of my changes back to the ceph source. Cheers On Monday 30 of November 2015 09:46:18 John Spray wrote: > On Sun, No

Re: [ceph-users] RBD: Max queue size

2015-11-30 Thread Timofey Titovets
On 30 Nov 2015 21:19, "Ilya Dryomov" wrote: > > On Mon, Nov 30, 2015 at 7:17 PM, Timofey Titovets wrote: > > Hi list, > > Short: > > i just want ask, why i can't do: > > echo 129 > /sys/class/block/rbdX/queue/nr_requests > > > > i.e. why i can't set value greater then 128? > > Why such a restrict

Re: [ceph-users] RBD: Max queue size

2015-11-30 Thread Ilya Dryomov
On Mon, Nov 30, 2015 at 7:47 PM, Timofey Titovets wrote: > > On 30 Nov 2015 21:19, "Ilya Dryomov" wrote: >> >> On Mon, Nov 30, 2015 at 7:17 PM, Timofey Titovets >> wrote: >> > Hi list, >> > Short: >> > i just want ask, why i can't do: >> > echo 129 > /sys/class/block/rbdX/queue/nr_requests >> >

[ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Tom Christensen
We recently upgraded to 0.94.3 from firefly and now for the last week have had intermittent slow requests and flapping OSDs. We have been unable to nail down the cause, but its feeling like it may be related to our osdmaps not getting deleted properly. Most of our osds are now storing over 100GB

[ceph-users] Namespaces and authentication

2015-11-30 Thread Daniel Schneller
Hi! On http://docs.ceph.com/docs/master/rados/operations/user-management/#namespace I read about auth namespaces. According to the most recent documentation it is still not supported by any of the client libraries, especially rbd. I have a client asking to get access to rbd volumes for Kuber

Re: [ceph-users] RBD: Max queue size

2015-11-30 Thread Timofey Titovets
Big Thanks Ilya, for explanation 2015-11-30 22:15 GMT+03:00 Ilya Dryomov : > On Mon, Nov 30, 2015 at 7:47 PM, Timofey Titovets > wrote: >> >> On 30 Nov 2015 21:19, "Ilya Dryomov" wrote: >>> >>> On Mon, Nov 30, 2015 at 7:17 PM, Timofey Titovets >>> wrote: >>> > Hi list, >>> > Short: >>> > i jus

[ceph-users] CRUSH Algorithm

2015-11-30 Thread James Gallagher
Hi, I was wondering what hash function the CRUSH algorithm used, is there any way that I can access the code for it? Or is it a commonly used one such as MD5 or SHA-1. Essentially, I'm just looking to read more information about it as I'm interested how this is used in order to look up objects ind

Re: [ceph-users] CRUSH Algorithm

2015-11-30 Thread Gregory Farnum
The code is in ceph/src/crush of the gut repo, but it's pretty opaque. If you go to the Ceph site and look through the pages there's one about "publications" (or maybe just documentation? I think publications) that hosts a paper on how CRUSH works. IIRC it's using the jenkins hash on the object na

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Wido den Hollander
On 11/30/2015 08:56 PM, Tom Christensen wrote: > We recently upgraded to 0.94.3 from firefly and now for the last week > have had intermittent slow requests and flapping OSDs. We have been > unable to nail down the cause, but its feeling like it may be related to > our osdmaps not getting deleted

[ceph-users] Number of OSD map versions

2015-11-30 Thread George Mihaiescu
Hi, I've read the recommendation from CERN about the number of OSD maps ( https://cds.cern.ch/record/2015206/files/CephScaleTestMarch2015.pdf, page 3) and I would like to know if there is any negative impact from these changes: [global] osd map message max = 10 [osd] osd map cache size = 20 osd

Re: [ceph-users] Removing OSD - double rebalance?

2015-11-30 Thread Steve Anthony
It's probably worth noting that if you're planning on removing multiple OSDs in this manner, you should make sure they are not in the same failure domain, per your CRUSH rules. For example, if you keep one replica per node and three copies (as in the default) and remove OSDs from multiple nodes wit

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Tom Christensen
No, CPU and memory look normal. We haven't been fast/lucky enough with iostat to see if we're just slamming the disk itself, I continue to attempt to catch one, get logged into the node, find the disk and get iostat running before the OSD comes back up. We haven't flapped that many OSDs, and most

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Dan van der Ster
The trick with debugging heartbeat errors is to grep back through the log to find the last thing the affected thread was doing, e.g. is 0x7f5affe72700 stuck in messaging, writing to the disk, reading through the omap, etc.. I agree this doesn't look to be network related, but if you want to rule i

Re: [ceph-users] Number of OSD map versions

2015-11-30 Thread Dan van der Ster
I wouldn't run with those settings in production. That was a test to squeeze too many OSDs into too little RAM. Check the values from infernalis/master. Those should be safe. -- Dan On 30 Nov 2015 21:45, "George Mihaiescu" wrote: > Hi, > > I've read the recommendation from CERN about the number

Re: [ceph-users] python3 librados

2015-11-30 Thread Josh Durgin
On 11/30/2015 10:26 AM, misa-c...@hudrydum.cz wrote: Hi John, thanks for the info. It seems that patch adds a python3 compatibility support but leaves the ugly thread spawning intact. No idea if it makes sense to try to merge some of my changes back to the ceph source. Yeah, like Wido mentione

Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs

2015-11-30 Thread Tom Christensen
What counts as ancient? Concurrent to our hammer upgrade we went from 3.16->3.19 on ubuntu 14.04. We are looking to revert to the 3.16 kernel we'd been running because we're also seeing an intermittent (its happened twice in 2 weeks) massive load spike that completely hangs the osd node (we're ta

[ceph-users] High 0.94.5 OSD memory use at 8GB RAM/TB raw disk during recovery

2015-11-30 Thread Laurent GUERBY
Hi, We lost a disk today in our ceph cluster so we added a new machine with 4 disks to replace the capacity and we activated straw1 tunable too (we also tried straw2 but we quickly backed up this change). During recovery OSD started crashing on all of our machines the issue being OSD RAM usage th

Re: [ceph-users] High 0.94.5 OSD memory use at 8GB RAM/TB raw disk during recovery

2015-11-30 Thread Mark Nelson
Hi Laurent, Wow, that's excessive! I'd see if anyone else has any tricks first, but if nothing else helps, running an OSD through valgrind with massif will probably help pinpoint what's going on. Have you tweaked the recovery tunables at all? Mark On 11/30/2015 06:52 PM, Laurent GUERBY wr

Re: [ceph-users] High 0.94.5 OSD memory use at 8GB RAM/TB raw disk during recovery

2015-11-30 Thread Mark Nelson
Oh, forgot to ask, any core dumps? Mark On 11/30/2015 06:58 PM, Mark Nelson wrote: Hi Laurent, Wow, that's excessive! I'd see if anyone else has any tricks first, but if nothing else helps, running an OSD through valgrind with massif will probably help pinpoint what's going on. Have you twea

Re: [ceph-users] multi radosgw-agent

2015-11-30 Thread fangchen sun
Thanks for your response! I have the following two questions now, please help me 1) There is one node for every cluster. Ceph and radosgw are deployed at every node. One is as master zone, another is as slave zone. I write data to master zone with bash script, and run radosgw agent at slave zone

[ceph-users] ceph + openrc Long term

2015-11-30 Thread James
Hello, So I run systems using gentoo's openrc. Ceph is interesting, but in the long term will it be mandatory to use systemd to keep using ceph? Will there continue to be a supported branch that works with openrc? Long range guidance is keenly appreciated. James __