Re: [ceph-users] PGs issue

2015-03-20 Thread Sahana
HI Bogdan, Please paste the output of `ceph osd dump` and ceph osd tree` Thanks Sahana On Fri, Mar 20, 2015 at 11:47 AM, Bogdan SOLGA wrote: > Hello, Nick! > > Thank you for your reply! I have tested both with setting the replicas > number to 2 and 3, by setting the 'osd pool default size = (

Re: [ceph-users] PHP Rados failed in read operation if object size is large (say more than 10 MB )

2015-03-20 Thread Gaurang Vyas
If I run from command prompt it gives below error in $piece = rados_read($ioRados, 'TEMP_object',$pieceSize['psize'] ,0); -- Segmentation fault (core dumped) -- I have tried new version of librados too... -- ph

Re: [ceph-users] OSD remains down

2015-03-20 Thread Sahana
Hi, If device mounted is not coming up, you can replace with new disk and ceph will handle rebalancing the data. Here are the steps if you would like to replace the failed disk with new one : 1. ceph osd out osd.110 2. Now remove this failed OSD from Crush Map , as soon as its removed from cru

[ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Thanks [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS EN

Re: [ceph-users] PGs issue

2015-03-20 Thread Bogdan SOLGA
Hello, Sahana! The output of the requested commands is listed below: admin@cp-admin:~/safedrive$ ceph osd dump epoch 26 fsid 7db3cf23-ddcb-40d9-874b-d7434bd8463d created 2015-03-20 07:53:37.948969 modified 2015-03-20 08:11:18.813790 flags pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0

Re: [ceph-users] PGs issue

2015-03-20 Thread Nick Fisk
I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will rewei

Re: [ceph-users] PGs issue

2015-03-20 Thread Bogdan SOLGA
Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by other

Re: [ceph-users] cciss driver package for RHEL7

2015-03-20 Thread Steffen W Sørensen
On 19/03/2015, at 17.46, O'Reilly, Dan wrote: > The problem with using the hpsa driver is that I need to install RHEL 7.1 on > a Proliant system using the SmartArray 400 controller. Therefore, I need a > driver that supports it to even install RHEL 7.1. RHEL 7.1 doesn’t > generically recogn

Re: [ceph-users] OSD + Flashcache + udev + Partition uuid

2015-03-20 Thread Burkhard Linke
Hi, On 03/19/2015 10:41 PM, Nick Fisk wrote: I'm looking at trialling OSD's with a small flashcache device over them to hopefully reduce the impact of metadata updates when doing small block io. Inspiration from here:- http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083 One thin

Re: [ceph-users] 'pgs stuck unclean ' problem

2015-03-20 Thread Burkhard Linke
Hi, On 03/20/2015 01:58 AM, houguanghua wrote: Dear all, Ceph 0.72.2 is deployed in three hosts. But the ceph's status is HEALTH_WARN . The status is as follows: # ceph -s cluster e25909ed-25d9-42fd-8c97-0ed31eec6194 health HEALTH_WARN 768 pgs degraded; 768 pgs stuck u

Re: [ceph-users] SSD Hardware recommendation

2015-03-20 Thread Josef Johansson
> On 19 Mar 2015, at 08:17, Christian Balzer wrote: > > On Wed, 18 Mar 2015 08:59:14 +0100 Josef Johansson wrote: > >> Hi, >> >>> On 18 Mar 2015, at 05:29, Christian Balzer wrote: >>> >>> >>> Hello, >>> >>> On Wed, 18 Mar 2015 03:52:22 +0100 Josef Johansson wrote: >> > [snip] We thou

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Stéphane DUGRAVOT
- Mail original - > Hi all, can anybody tell me how can I force delete osds? the thing is that > one node got corrupted because of outage, so there is no way to get those > osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : * http://ceph.c

Re: [ceph-users] OSD + Flashcache + udev + Partition uuid

2015-03-20 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Burkhard Linke > Sent: 20 March 2015 09:09 > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid > > Hi, > > On 03/19/2015 10:41 PM, Nick Fi

Re: [ceph-users] PGs issue

2015-03-20 Thread Sahana
Hi Bogdan, Here is the link for hardware recccomendations : http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives. As per this link, minimum size reccommended for osds is 1TB. Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01 Here is the snippet fr

[ceph-users] how to compute Ceph durability?

2015-03-20 Thread ghislain.chevalier
Hi all, I would like to compute the durability of data stored in a ceph environment according to the cluster topology (failure domains) and the data resiliency (replication/erasure coding). Does a tool exist ? Best regards - - - - - - - - - - - - - - - - - Ghislain Chevalier ORANGE +33299124

Re: [ceph-users] how to compute Ceph durability?

2015-03-20 Thread Loic Dachary
Hi Ghislain, You will find more information about tools and methods at On 20/03/2015 11:47, ghislain.cheval...@orange.com wrote: > Hi all, > > > > I would like to compute the durability of data stored in a ceph environment > according to the cluster topology (failure domains) and the data

Re: [ceph-users] how to compute Ceph durability?

2015-03-20 Thread Loic Dachary
(that's what happens when typing Control-Enter V instead of Control-V enter ;-) On 20/03/2015 11:50, Loic Dachary wrote: > Hi Ghislain, > > You will find more information about tools and methods at https://wiki.ceph.com/Development/Reliability_model/Final_report Enjoy ! > > > On 20/03/2015 1

[ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-03-20 Thread Karan Singh
Hello Guys My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering. I have tried several things to recover them like , scrub , d

[ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-03-20 Thread Karan Singh
Hello Guys My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering. I have tried several things to recover them like , scrub , d

[ceph-users] centos vs ubuntu for production ceph cluster ?

2015-03-20 Thread Alexandre DERUMIER
Hi, I'll build my full ssd production soon, I wonder which distrib is best tested with inktank and ceph team ? ceph.com doc is quite old, and don't have reference for giant or hammer http://ceph.com/docs/master/start/os-recommendations/ Seem than in past only ubuntu and rhel was well tested, no

[ceph-users] Unable to create rbd snapshot on Centos 7

2015-03-20 Thread gian
Hi guys, I'm trying to test rbd snapshot on a Centos 7. # rbd -p rbd ls test-a test-b test-c test-d # rbd snap create rbd/test-b@snap rbd: failed to create snapshot: (22) Invalid argument 2015-03-20 15:22:56.300731 7f78f7afe880 -1 librbd: failed to create snap id: (22) Invalid argument I tr

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Thomas Foster
Have you tried it from a different node? like the ceph-mon or another ceph-osd node? On Fri, Mar 20, 2015 at 11:23 AM, Jesus Chavez (jeschave) < jesch...@cisco.com> wrote: > Thanks stephane the thing is that those steps needs to be run in the > node where the osd lives, I dont have that node any

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread John Spray
On 20/03/2015 15:23, Jesus Chavez (jeschave) wrote: Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Assuming the OSD is already down+out, you c

Re: [ceph-users] centos vs ubuntu for production ceph cluster ?

2015-03-20 Thread Quentin Hartman
For all intents and purposes, centos and rhel are equivalent, so I'd not be too concerned about that distinction. I can't comment as to which distro is better tested by ceph devs, but assuming that the packages are built appropriately with similar dependency versions and whatnot, that also shouldn'

[ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Any idea how to forcé remove ? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Begin forwarded message: From: Stéphane DUGRAVOT mailto:stephane.dugra...@univ-lorraine.fr>> Date: March 20,

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Gregory Farnum
On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid wrote: > Hi, > > I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with > cephFS. I have installed hadoop-1.1.1 in the nodes and changed the > conf/core-site.xml file according to the ceph documentation > http://ceph.com/docs/master/c

Re: [ceph-users] PGs issue

2015-03-20 Thread Bogdan SOLGA
Thank you for the clarifications, Sahana! I haven't got to that part, yet, so these details were (yet) unknown to me. Perhaps some information on the PGs weight should be provided in the 'quick deployment' page, as this issue might be encountered in the future by other users, as well. Kind regard

[ceph-users] mds log message

2015-03-20 Thread Daniel Takatori Ohara
Hello, Anybody help me, please? Appear any messages in log of my mds. And after the shell of my clients freeze. 2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696 pending pAsxLsXsxFcb issued pAsxLsXs

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Ridwan Rashid
Gregory Farnum writes: > > On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid wrote: > > Hi, > > > > I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with > > cephFS. I have installed hadoop-1.1.1 in the nodes and changed the > > conf/core-site.xml file according to the ceph docum

Re: [ceph-users] Server Specific Pools

2015-03-20 Thread Robert LeBlanc
You can create CRUSH rulesets and then assign pools to different rulesets. http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds On Thu, Mar 19, 2015 at 7:28 PM, Garg, Pankaj wrote: > Hi, > > > > I have a Ceph cluster with both ARM and x86 based server

Re: [ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Robert LeBlanc
Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez

Re: [ceph-users] OSD + Flashcache + udev + Partition uuid

2015-03-20 Thread Robert LeBlanc
We tested bcache and abandoned it for two reasons. 1. Didn't give us any better performance than journals on SSD. 2. We had lots of corruption of the OSDs and were rebuilding them frequently. Since removing them, the OSDs have been much more stable. On Fri, Mar 20, 2015 at 4:03 AM, Nick

Re: [ceph-users] PGs issue

2015-03-20 Thread Robert LeBlanc
The weight can be based on anything, size, speed, capability, some random value, etc. The important thing is that it makes sense to you and that you are consistent. Ceph by default (ceph-disk and I believe ceph-deploy) take the approach of using size. So if you use a different weighting scheme, yo

Re: [ceph-users] PGs issue

2015-03-20 Thread Craig Lewis
This seems to be a fairly consistent problem for new users. The create-or-move is adjusting the crush weight, not the osd weight. Perhaps the init script should set the defaultweight to 0.01 if it's <= 0? It seems like there's a downside to this, but I don't see it. On Fri, Mar 20, 2015 at 1

Re: [ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-03-20 Thread Craig Lewis
> osdmap e261536: 239 osds: 239 up, 238 in Why is that last OSD not IN? The history you need is probably there. Run ceph pg query on some of the stuck PGs. Look for the recovery_state section. That should tell you what Ceph needs to complete the recovery. If you need more help, post the ou

Re: [ceph-users] mds log message

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 12:39 PM, Daniel Takatori Ohara wrote: > Hello, > > Anybody help me, please? Appear any messages in log of my mds. > > And after the shell of my clients freeze. > > 2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] : > client.3197487 isn't responding

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 1:05 PM, Ridwan Rashid wrote: > Gregory Farnum writes: > >> >> On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid wrote: >> > Hi, >> > >> > I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with >> > cephFS. I have installed hadoop-1.1.1 in the nodes and chan

Re: [ceph-users] PGs issue

2015-03-20 Thread Robert LeBlanc
I like this idea. I was under the impression that udev did not call the init script, but ceph-disk directly. I don't see ceph-disk calling create-or-move, but I know it does because I see it in the ceph -w when I boot up OSDs. /lib/udev/rules.d/95-ceph-osd.rules # activate ceph-tagged partitions A

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Yes that exactly what I did but "ceph osd tree" still shows the osds Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 On Mar 20, 2015, at 9:41 AM, John Spray mailto:john.sp...@redhat.com>> wrote:

Re: [ceph-users] Ceiling on number of PGs in a OSD

2015-03-20 Thread Craig Lewis
This isn't a hard limit on the number, but it's recommended that you keep it around 100. Smaller values cause data distribution evenness problems. Larger values cause the OSD processes to use more CPU, RAM, and file descriptors, particularly during recovery. With that many OSDs, you're going to w

Re: [ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Maybe I should Edit the crushmap and delete osd... Is that a way yo force them? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 On Mar 20, 2015, at 2:21 PM, Robert LeBlanc mailto:rob...@l

Re: [ceph-users] RADOS Gateway Maturity

2015-03-20 Thread Craig Lewis
I have found a few incompatibilities, but so far they're all on the Ceph side. One example I remember was having to change the way we delete objects. The function we originally used fetches a list of object versions, and deletes all versions. Ceph is implementing objects versions now (I believe

Re: [ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Robert LeBlanc
Does it show DNE in the entry? That stands for Does Not Exist. It will disappear on it's own after a while. I don't know what the timeout is, but they have always gone away within 24 hours. I've edited the CRUSH map before and I don't think it removed it when it was already DNE, I just had to wait

Re: [ceph-users] Uneven CPU usage on OSD nodes

2015-03-20 Thread Craig Lewis
I would say you're a little light on RAM. With 4TB disks 70% full, I've seen some ceph-osd processes using 3.5GB of RAM during recovery. You'll be fine during normal operation, but you might run into issues at the worst possible time. I have 8 OSDs per node, and 32G of RAM. I've had ceph-osd pr

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
This is the output if I try to remove from the crush map it says that a is already out… [root@capricornio ~]# ceph osd crush remove osd.29 device 'osd.29' does not appear in the crush map [root@capricornio ~]# [root@capricornio ~]# ceph osd tree | grep down # idweight type name up/d

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
thats what you sayd? [root@capricornio ~]# ceph auth del osd.9 entity osd.9 does not exist [root@capricornio ~]# ceph auth del osd.19 entity osd.19 does not exist [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: +5

Re: [ceph-users] Question Blackout

2015-03-20 Thread Craig Lewis
I'm not a CephFS user, but I have had a few cluster outages. Each OSD has a journal, and Ceph ensures that a write is in all of the journals (primary and replicas) before it acknowledges the write. If an OSD process crashes, it replays the journal on startup, and recovers the write. I've lost po

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Robert LeBlanc
Yes, at this point, I'd export the CRUSH, edit it and import it back in. What version are you running? Robert LeBlanc Sent from a mobile device please excuse any typos. On Mar 20, 2015 4:28 PM, "Jesus Chavez (jeschave)" wrote: > thats what you sayd? > > [root@capricornio ~]# ceph auth del osd

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-20 Thread Chris Murray
Ah, I was wondering myself if compression could be causing an issue, but I'm reconsidering now. My latest experiment should hopefully help troubleshoot. So, I remembered that ZLIB is slower, but is more 'safe for old kernels'. I try that: find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f -

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 4:03 PM, Chris Murray wrote: > Ah, I was wondering myself if compression could be causing an issue, but I'm > reconsidering now. My latest experiment should hopefully help troubleshoot. > > So, I remembered that ZLIB is slower, but is more 'safe for old kernels'. I > try

Re: [ceph-users] RADOS Gateway Maturity

2015-03-20 Thread Chris Jones
Hi Jerry, We using RGW and RBD in our OpenStack clusters and as stand alone clusters. We have six large clusters and adding more. Most of any issues we have faced have been self inflicted such as not currently supporting bucket names like host names. Some S3 tools only work that way which causes s

Re: [ceph-users] Question Blackout

2015-03-20 Thread Pavel V. Kaygorodov
Hi! We have experienced several blackouts on our small ceph cluster. Most annoying problem is time desync just after a blackout: mons are not starting to work before time sync, after resync and manual restart of monitors, some of pgs can stuck in "inactive" or "peering" state for a significant p