[ceph-users] Strange configuration with many SAN and few servers

2014-11-07 Thread Mario Giammarco
Hello, I need to build a ceph test lab. I have to do it with existing hardware. I have several iscsi and fibre channel san but few servers. Imagine I have: - 4 SAN with 1 lun on each san - 2 diskless (apart boot disk) servers I mount two luns on first server and two luns on second server. Then (I

[ceph-users] How to detect degraded objects

2014-11-07 Thread Ta Ba Tuan
Hi everyone, 111/57706299 objects degraded (0.001%) 14918 active+clean 1 active+clean+scrubbing+deep 52 active+recovery_wait+degraded 2 active+recovering+degraded Ceph'state : *111 /*57706299 objects degraded. Some missing object

Re: [ceph-users] How to detect degraded objects

2014-11-07 Thread Sahana Lokeshappa
Hi Tuan, 14918 active+clean 1 active+clean+scrubbing+deep 52 active+recovery_wait+degraded 2 active+recovering+degraded This says that 2 +52 pgs are degraded. You can run command: ceph pg dump | grep degraded. You will get list of pgs w

Re: [ceph-users] buckets and users

2014-11-07 Thread Marco Garcês
So I really need to create the region also? I thought it was using the default region, so I didn't have to create extra regions. Let me try to figure this out, the docs are a little bit confusing. Marco Garcês On Thu, Nov 6, 2014 at 6:39 PM, Craig Lewis wrote: > You need to tell each radosgw d

Re: [ceph-users] PG inconsistency

2014-11-07 Thread Sage Weil
On Thu, 6 Nov 2014, GuangYang wrote: > Hello Cephers, > Recently we observed a couple of inconsistencies in our Ceph cluster, > there were two major patterns leading to inconsistency as I observed: 1) > EIO to read the file, 2) the digest is inconsistent (for EC) even there > is no read error).

Re: [ceph-users] Typical 10GbE latency

2014-11-07 Thread Stefan Priebe - Profihost AG
Hi, this is with intel 10GBE bondet (2x10Gbit/s) network. rtt min/avg/max/mdev = 0.053/0.107/0.184/0.034 ms I thought that the mellanox stuff had lower latencies. Stefan Am 06.11.2014 um 18:09 schrieb Robert LeBlanc: > rtt min/avg/max/mdev = 0.130/0.157/0.190/0.016 ms > > IPoIB Mellanox Connec

Re: [ceph-users] How to detect degraded objects

2014-11-07 Thread Ta Ba Tuan
Hi Sahana, Thank for your replying. But, how to list objects of pgs ? :D Thanks! Tuan -- HaNoi-VietNam On 11/07/2014 04:22 PM, Sahana Lokeshappa wrote: Hi Tuan, 14918 active+clean 1 active+clean+scrubbing+deep 52 active+recovery_wait+degraded

Re: [ceph-users] How to detect degraded objects

2014-11-07 Thread Sahana Lokeshappa
HI tuan, As per my knowledge, there is no cli as such. By indirect way, when you do pg dump, you will get primary osd assigned for every pg.(check primary header) Parse through the directory /var/lib/ceph/osd/ceph-/current/_head Here are all objects resided in that pg. Thanks Sahana Lokeshap

[ceph-users] RBD command crash & can't delete volume!

2014-11-07 Thread Chu Duc Minh
Hi folks, some volumes in my ceph cluster have problem and I can NOT delete it by rbd command. When i show info or try to delete it, rbd comand crash. Command i used: *# rbd -p volumes info volume-e110b0a5-5116-46f2-99c7-84bb546f15c2# rbd -p volumes rm volume-e110b0a5-5116-46f2-99c7-84bb546f15c2*

[ceph-users] Cache pressure fail

2014-11-07 Thread Daniel Takatori Ohara
Hi, In my cluster, when i execute the command ceph health detail, show me the message. mds0: Many clients (17) failing to respond to cache pressure(client_count: ) This message appear when i upgrade the ceph for 0.87 from 0.80.7. Anyone help me? Thank's, Att. --- Daniel Takatori Ohara. Syste

[ceph-users] Ceph Monitoring with check_MK

2014-11-07 Thread Robert Sander
Hi, I just create a simple check_MK agent plugin and accompanying checks to monitor the overall health status and pool usage with the check_MK / OMD monitoring system: https://github.com/HeinleinSupport/check_mk/tree/master/ceph One question remains: What is the real unit of the ceph df output?

Re: [ceph-users] Installing CephFs via puppet

2014-11-07 Thread Loic Dachary
Hi, On 07/11/2014 05:07, JIten Shah wrote: > Thanks Loic. > > What is the recommended puppet module for installing cephFS ? Unless you're obliged to use puppet I would probably recommend using another tool such as ceph-deploy. But I don't know much about CephFS and someone else may have an au

[ceph-users] look into erasure coding

2014-11-07 Thread eric mourgaya
Hi, " In erasure coding pool, how do we know which OSDs keeping the data chunk and which one the keep the encoding chunk?" There was this question yesterday on ceph irc channel on erasure code. http://ceph.com/docs/giant/dev/erasure-coded-pool/, we really have a difference between k and m c

Re: [ceph-users] look into erasure coding

2014-11-07 Thread Loic Dachary
On 07/11/2014 15:37, eric mourgaya wrote:> Hi, > > " In erasure coding pool, how do we know which OSDs keeping the data chunk > and which one the keep the encoding chunk?" > > There was this question yesterday on ceph irc channel on erasure code. > http://ceph.com/docs/giant/dev/erasure-cod

Re: [ceph-users] Typical 10GbE latency

2014-11-07 Thread Robert LeBlanc
Infiniband has much lower latencies when performing RDMA and native IB traffic. Doing IPoIB adds all the Ethernet stuff that has to be done in software. Still it is comparable to Ethernet even with this disadvantage. Once Ceph has the ability to do native RDMA, Infiniband should have an edge. Robe

[ceph-users] Testing limitation of each component in Swift + radosgw

2014-11-07 Thread Narendra Trivedi (natrived)
Hi All, I have two haproxy in front of two radosgws. I need to test the limitations of each component (i.e. haproxy or radosgw) like the number of Swift APIs or number of concurrent container creations operations before haproxys give up i.e. the breaking point- similarly for radosgw. Does anyo

Re: [ceph-users] RBD command crash & can't delete volume!

2014-11-07 Thread Jason Dillaman
It appears you have discovered a bug in librbd that occurs when a child's parent image is missing or corrupt. I have opened the following ticket for this issue: http://tracker.ceph.com/issues/10030 . For the OSD failure, can you start a new email thread with the supporting details of that issu

Re: [ceph-users] RBD - possible to query "used space" of images/clones ?

2014-11-07 Thread Jason Dillaman
In the longer term, there is an in-progress RBD feature request to add a new RBD command to see image disk usage: http://tracker.ceph.com/issues/7746 -- Jason Dillaman Red Hat dilla...@redhat.com http://www.redhat.com - Original Message - From: "Sébastien Han" To: "Daniel Schwage

Re: [ceph-users] Strange configuration with many SAN and few servers

2014-11-07 Thread Gregory Farnum
Yes, you can get the OSDs back if you replace the server. In fact, in your case you might not want to bother including hosts as a distinguishable entity in the crush map; and then to "replace the server" you could hair mount the LUNs somewhere else and turn on the OSDs. You would need to set a few

Re: [ceph-users] Ceph Monitoring with check_MK

2014-11-07 Thread Gregory Farnum
I believe we use base-2 space accounting everywhere. Joao could confirm on that. -Greg On Fri, Nov 7, 2014 at 5:50 AM Robert Sander wrote: > Hi, > > I just create a simple check_MK agent plugin and accompanying checks to > monitor the overall health status and pool usage with the check_MK / OMD >

Re: [ceph-users] Typical 10GbE latency

2014-11-07 Thread Alexandre DERUMIER
Mellanox is also doing ethernet now, http://www.mellanox.com/page/products_dyn?product_family=163&mtag=sx1012 for example - 220nsec for 40GbE - 280nsec for 10GbE And I think it's also possible to do Roce (rdma over ethernet) with mellanox connect-x3 adapters - Mail original - De:

Re: [ceph-users] Installing CephFs via puppet

2014-11-07 Thread Jean-Charles LOPEZ
Hi, with ceps-deploy do the following 1) Install ceph-deploy 2) mkdir ~/ceph-deploy 3) cd ~/ceph-deploy 4) ceph-deploy --overwrite-conf config pull {monitorhostname} 5) If version is Giant a) ceph osd pool create cephfsdata b) ceph odd pool create cephfsmeta xxx c) ceph mds newfs {cephfsmeta_

Re: [ceph-users] Cache pressure fail

2014-11-07 Thread Gregory Farnum
Did you upgrade your clients along with the MDS? This warning indicates the MDS asked the clients to boot some inboxes out of cache and they have taken too long to do so. It might also just mean that you're actively using more inodes at any given time than your MDS is configured to keep in memory.

Re: [ceph-users] Ceph Cluster with two radosgw

2014-11-07 Thread Yehuda Sadeh
On Wed, Nov 5, 2014 at 2:08 PM, lakshmi k s wrote: > Hello - > > My ceph cluster needs to have two rados gateway nodes eventually interfacing > with Openstack haproxy. I have been successful in bringing up one of them. > What are the steps for additional rados gateway node to be included in > clus

Re: [ceph-users] Ceph Monitoring with check_MK

2014-11-07 Thread Joao Eduardo Luis
On 11/07/2014 03:46 PM, Gregory Farnum wrote: I believe we use base-2 space accounting everywhere. Joao could confirm on that. although unit formatting is set to SI, these are base-2 values. 2G or 2GB will in fact be (2 << 30) bytes. -Joao -Greg On Fri, Nov 7, 2014 at 5:50 AM Robert Sander

Re: [ceph-users] buckets and users

2014-11-07 Thread Craig Lewis
You need separate pools for the different zones, otherwise both zones will have the same data. You could use the defaults for the first zone, but the second zone will need it's own. You might as well follow the convention of creating non-default pools for the zone. This is all semantics, but re

Re: [ceph-users] Installing CephFs via puppet

2014-11-07 Thread JIten Shah
Thanks JC and Loic but we HAVE to use puppet. That’s how all of our configuration and deployment stuff works and I can’t sway away from it. Is https://github.com/enovance/puppet-ceph a good resource for cephFS? Has anyone used it successfully? —Jiten On Nov 7, 2014, at 9:09 AM, Jean-Charles L

Re: [ceph-users] Installing CephFs via puppet

2014-11-07 Thread Loic Dachary
Hi, On 07/11/2014 19:18, JIten Shah wrote: > Thanks JC and Loic but we HAVE to use puppet. That’s how all of our > configuration and deployment stuff works and I can’t sway away from it. > > Is https://github.com/enovance/puppet-ceph a good resource for cephFS? Has > anyone used it successfull

Re: [ceph-users] Is it normal that osd's memory exceed 1GB under stresstest?

2014-11-07 Thread Craig Lewis
It depends on which version of ceph, but it's pretty normal under newer versions. There are a bunch of variables. How many PGs per OSD, how much data is in the PGs, etc. I'm a bit light on the PGs (~60 PGs per OSD), and heavy on the data (~3 TiB of data on each OSD). In the production cluster,

Re: [ceph-users] osd down

2014-11-07 Thread Michael Nishimoto
Most likely, the drive mapping to /dev/sdl1 is going bad or is bad. I suggest power cycling it to see if the error is cleared. If the drive comes up, check out the SMART stats to see if sectors are starting to get remapped. It's possible that a transient error occurred. Mike On 11/6/14 5:06

Re: [ceph-users] osd down

2014-11-07 Thread Craig Lewis
I'd stop that osd daemon, and run xfs_check / xfs_repair on that partition. If you repair anything, you should probably force a deep-scrub on all the PGs on that disk. I think ceph osd deep-scrub will do that, but you might have to manually grep ceph pg dump . Or you could just treat it like a

[ceph-users] RBD kernel module for CentOS?

2014-11-07 Thread Bruce McFarland
Can anyone point me to a RBD kmod for CentOS? Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-07 Thread Craig Lewis
ceph-disk-prepare will give you the next unused number. So this will work only if the osd you remove is greater than 20. On Thu, Nov 6, 2014 at 12:12 PM, Chad Seys wrote: > Hi Craig, > > > You'll have trouble until osd.20 exists again. > > > > Ceph really does not want to lose data. Even if yo

Re: [ceph-users] RBD kernel module for CentOS?

2014-11-07 Thread Robert LeBlanc
I believe that the kernel-ml and kernel-lt packages from ELrepo have the RBD module already built (except for CentOS7 which will get it on the next kernel release). If you want to stay with the stock kernel, I don't have a good answer. I've had to rebuild the kernel to get RBD. On Fri, Nov 7, 2014

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-07 Thread Gregory Farnum
On Thu, Nov 6, 2014 at 11:49 AM, John Spray wrote: > This is still an issue on master, so a fix will be coming soon. > Follow the ticket for updates: > http://tracker.ceph.com/issues/10025 > > Thanks for finding the bug! John is off for a vacation, but he pushed a branch wip-10025-firefly that if

[ceph-users] MDS slow, logging rdlock failures

2014-11-07 Thread Erik Logtenberg
Hi, My MDS is very slow, and it logs stuff like this: 2014-11-07 23:38:41.154939 7f8180a31700 0 log_channel(default) log [WRN] : 2 slow requests, 1 included below; oldest blocked for > 187.777061 secs 2014-11-07 23:38:41.154956 7f8180a31700 0 log_channel(default) log [WRN] : slow request 121.32

Re: [ceph-users] Typical 10GbE latency

2014-11-07 Thread Łukasz Jagiełło
Hi, rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01) at both hosts and Arista 7050S-64 between. Both hosts were part of active ceph cluster. On Thu, Nov 6, 2014 at 5:18 AM, Wido den Holland

Re: [ceph-users] MDS slow, logging rdlock failures

2014-11-07 Thread Gregory Farnum
On Fri, Nov 7, 2014 at 2:40 PM, Erik Logtenberg wrote: > Hi, > > My MDS is very slow, and it logs stuff like this: > > 2014-11-07 23:38:41.154939 7f8180a31700 0 log_channel(default) log > [WRN] : 2 slow requests, 1 included below; oldest blocked for > > 187.777061 secs > 2014-11-07 23:38:41.15495

Re: [ceph-users] Ceph Cluster with two radosgw

2014-11-07 Thread lakshmi k s
Yehuda - thanks much. I do have unique users for two rados gateway nodes and also defined them accordingly in ceph configuration file. From Openstack controller node, I can talk to both the nodes. Any thoughts on how to incorporate HA in controller node and test the fail-over? On Friday, Novem

Re: [ceph-users] Typical 10GbE latency

2014-11-07 Thread Gary M
Wido, Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic

[ceph-users] questions about pg_log mechanism

2014-11-07 Thread chen jan
Hi all, I'm trying to test pg_log mechanism under stress test by using a simple 3 nodes with 15 osds ceph cluster(replica size is 2). The following were test steps: 1. Set mon_osd_down_out_interval to 2 days 2. Using at least 10 threads FIO and librbd to send random 4KB r/w IOs continuously. 3.

Re: [ceph-users] questions about pg_log mechanism

2014-11-07 Thread chen jan
Hi all, Sorry, I need to correct something about what I observed. I found pg_log file was on running OSD disk, but the file size was 0, so I think maybe OSD kept all pg log data in memory. BTW, the CEPH version is 0.80.6 Thanks, Jan 2014-11-08 10:04 GMT+08:00 chen jan : > Hi all, > > I'm tr

[ceph-users] Giant repository for Ubuntu Utopic?

2014-11-07 Thread Michael Taylor
It didn't take long at all for Trusty's repositories to show up on ceph.com as soon as Trusty was in beta, is there a reason Utopic doesn't have repositories yet? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/cep

Re: [ceph-users] RBD command crash & can't delete volume!

2014-11-07 Thread Chu Duc Minh
Hi, i will start a new email thread, but i think it related to this rbd bug. Do you have any suggestion about quick fix for this buggy volume (eg: way to safely delete it,...)? Maybe it is a reason to make me can not start the last OSD. Thank you very much! On Fri, Nov 7, 2014 at 10:14 PM, Jason