from:"JIten Shah"

[ceph-users] Installing a ceph cluster from scratch

2014-08-25 Thread JIten Shah

Hi Guys,

I have been looking to try out a test ceph cluster in my lab to see if we can 
replace it with our traditional storage. Heard a lot of good things about Ceph 
but need some guidance on how to begin with. 

I have read some stuff on ceph.com but wanted to get a first hand info and 
knowledge from the guys who have actually installed it and have it running in 
their environment.

Any help is appreciated.

Thanks.

—Jiten Shah
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing a ceph cluster from scratch

2014-08-25 Thread JIten Shah

Thanks Steve.  Appreciate your help.

On Aug 25, 2014, at 9:58 AM, Stephen Jahl  wrote:

> Hi Jiten,
> 
> The Ceph quick-start guide here was pretty helpful to me when I was starting 
> with my test cluster: http://ceph.com/docs/master/start/
> 
> ceph-deploy is a very easy way to get a test cluster up quickly, even with 
> minimal experience with Ceph.
> 
> If you use puppet, the puppet-ceph module has an example for bringing up a 
> small test environment up on VMs, but there's a small amount of assembly 
> required: 
> https://github.com/stackforge/puppet-ceph/blob/master/USECASES.md#i-want-to-try-this-module,-heard-of-ceph,-want-to-see-it-in-action
> 
> Cheers,
> -Steve
> 
> 
> On Fri, Aug 22, 2014 at 5:25 PM, JIten Shah  wrote:
> Hi Guys,
> 
> I have been looking to try out a test ceph cluster in my lab to see if we can 
> replace it with our traditional storage. Heard a lot of good things about 
> Ceph but need some guidance on how to begin with.
> 
> I have read some stuff on ceph.com but wanted to get a first hand info and 
> knowledge from the guys who have actually installed it and have it running in 
> their environment.
> 
> Any help is appreciated.
> 
> Thanks.
> 
> —Jiten Shah
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Getting error trying to activate the first OSD

2014-09-04 Thread JIten Shah

getting below error:

[nk21l01si-d01-ceph001][INFO  ] Running command: sudo ceph-disk -v activate 
--mark-init sysvinit --mount /var/local/osd0
[nk21l01si-d01-ceph001][WARNIN] DEBUG:ceph-disk:Cluster uuid is 
08985bbc-5a98-4614-9267-3e0a91e7358b
[nk21l01si-d01-ceph001][WARNIN] INFO:ceph-disk:Running command: 
/usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[nk21l01si-d01-ceph001][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
[nk21l01si-d01-ceph001][WARNIN] DEBUG:ceph-disk:OSD uuid is 
a0f81322-bf81-45f2-8d79-b2d694db44c5
[nk21l01si-d01-ceph001][WARNIN] DEBUG:ceph-disk:OSD id is 0
[nk21l01si-d01-ceph001][WARNIN] DEBUG:ceph-disk:Marking with init system 
sysvinit
[nk21l01si-d01-ceph001][WARNIN] DEBUG:ceph-disk:ceph osd.0 data dir is ready at 
/var/local/osd0
[nk21l01si-d01-ceph001][WARNIN] Traceback (most recent call last):
[nk21l01si-d01-ceph001][WARNIN]   File "/usr/sbin/ceph-disk", line 2591, in 

[nk21l01si-d01-ceph001][WARNIN] main()
[nk21l01si-d01-ceph001][WARNIN]   File "/usr/sbin/ceph-disk", line 2569, in main
[nk21l01si-d01-ceph001][WARNIN] args.func(args)
[nk21l01si-d01-ceph001][WARNIN]   File "/usr/sbin/ceph-disk", line 1929, in 
main_activate
[nk21l01si-d01-ceph001][WARNIN] init=args.mark_init,
[nk21l01si-d01-ceph001][WARNIN]   File "/usr/sbin/ceph-disk", line 1771, in 
activate_dir
[nk21l01si-d01-ceph001][WARNIN] old = os.readlink(canonical)
[nk21l01si-d01-ceph001][WARNIN] OSError: [Errno 22] Invalid argument: 
'/var/lib/ceph/osd/ceph-0'
[nk21l01si-d01-ceph001][ERROR ] RuntimeError: command returned non-zero exit 
status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v 
activate --mark-init sysvinit --mount /var/local/osd0___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] resizing the OSD

2014-09-05 Thread JIten Shah

Hello Cephers,

We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of the stuff 
seems to be working fine but we are seeing some degrading on the osd's due to 
lack of space on the osd's. Is there a way to resize the OSD without bringing 
the cluster down?

--jiten
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Filesystem - Production?

2014-09-05 Thread JIten Shah

We ran into the same issue where we could not mount the filesystem on the 
clients because it had 3.9. Once we upgraded the kernel on the client node, we 
were able to mount it fine. FWIW, you need kernel 3.14 and above.

--jiten

On Sep 5, 2014, at 6:55 AM, James Devine  wrote:

> No messages in dmesg, I've updated the two clients to 3.16, we'll see if that 
> fixes this issue.
> 
> 
> On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng  wrote:
> On Fri, Sep 5, 2014 at 8:42 AM, James Devine  wrote:
> > I'm using 3.13.0-35-generic on Ubuntu 14.04.1
> >
> 
> Was there any kernel message when the hang happened?  We have fixed a
> few bugs since 3.13 kernel, please use 3.16 kernel if possible.
> 
> Yan, Zheng
> 
> >
> > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng  wrote:
> >>
> >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine  wrote:
> >> > It took a week to happen again, I had hopes that it was fixed but alas
> >> > it is
> >> > not.  Looking at top logs on the active mds server, the load average was
> >> > 0.00 the whole time and memory usage never changed much, it is using
> >> > close
> >> > to 100% and some swap but since I changed memory.swappiness swap usage
> >> > hasn't gone up but has been slowly coming back down.  Same symptoms, the
> >> > mount on the client is unresponsive and a cat on
> >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries.  A umount and
> >> > remount seems to fix it.
> >> >
> >>
> >> which version of kernel do you use ?
> >>
> >> Yan, Zheng
> >>
> >> >
> >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine 
> >> > wrote:
> >> >>
> >> >> I am running active/standby and it didn't swap over to the standby.  If
> >> >> I
> >> >> shutdown the active server it swaps to the standby fine though.  When
> >> >> there
> >> >> were issues, disk access would back up on the webstats servers and a
> >> >> cat of
> >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas
> >> >> normally
> >> >> it would only list one or two if any.  I have 4 cores and 2GB of ram on
> >> >> the
> >> >> mds machines.  Watching it right now it is using most of the ram and
> >> >> some of
> >> >> swap although most of the active ram is disk cache.  I lowered the
> >> >> memory.swappiness value to see if that helps.  I'm also logging top
> >> >> output
> >> >> if it happens again.
> >> >>
> >> >>
> >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng  wrote:
> >> >>>
> >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine 
> >> >>> wrote:
> >> >>> >
> >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum 
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
> >> >>> >>  wrote:
> >> >>> >> > Is Ceph Filesystem ready for production servers?
> >> >>> >> >
> >> >>> >> > The documentation says it's not, but I don't see that mentioned
> >> >>> >> > anywhere
> >> >>> >> > else.
> >> >>> >> > http://ceph.com/docs/master/cephfs/
> >> >>> >>
> >> >>> >> Everybody has their own standards, but Red Hat isn't supporting it
> >> >>> >> for
> >> >>> >> general production use at this time. If you're brave you could test
> >> >>> >> it
> >> >>> >> under your workload for a while and see how it comes out; the known
> >> >>> >> issues are very much workload-dependent (or just general concerns
> >> >>> >> over
> >> >>> >> polish).
> >> >>> >> -Greg
> >> >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >> >>> >> ___
> >> >>> >> ceph-users mailing list
> >> >>> >> ceph-users@lists.ceph.com
> >> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > I've been testing it with our webstats since it gets live hits but
> >> >>> > isn't
> >> >>> > customer affecting.  Seems the MDS server has problems every few
> >> >>> > days
> >> >>> > requiring me to umount and remount the ceph disk to resolve.  Not
> >> >>> > sure
> >> >>> > if
> >> >>> > the issue is resolved in development versions but as of 0.80.5 we
> >> >>> > seem
> >> >>> > to be
> >> >>> > hitting it.  I set the log verbosity to 20 so there's tons of logs
> >> >>> > but
> >> >>> > ends
> >> >>> > with
> >> >>>
> >> >>> The cephfs client is supposed to be able to handle MDS takeover.
> >> >>> what's symptom makes you umount and remount the cephfs ?
> >> >>>
> >> >>> >
> >> >>> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14  laggy,
> >> >>> > deferring
> >> >>> > client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1)
> >> >>> > 2014-08-24 07:10:19.682021 7f2b575e7700  5 mds.0.14 is_laggy
> >> >>> > 19.324963
> >> >>> > > 15
> >> >>> > since last acked beacon
> >> >>> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send
> >> >>> > up:active
> >> >>> > seq 127220 (currently up:active)
> >> >>> > 2014-08-24 07:10:21.515899 7f2b575e7700  5 mds.0.14 is_laggy
> >> >>> > 21.158841
> >> >>> > > 15
> >> >>> > since last acked beacon
> >> >>> > 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.1

Re: [ceph-users] resizing the OSD

2014-09-06 Thread JIten Shah

Thanks Christian.  Replies inline.
On Sep 6, 2014, at 8:04 AM, Christian Balzer  wrote:

> 
> Hello,
> 
> On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
> 
>> Hello Cephers,
>> 
>> We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of the
>> stuff seems to be working fine but we are seeing some degrading on the
>> osd's due to lack of space on the osd's. 
> 
> Please elaborate on that degradation.

The degradation happened on few OSD's because it got quickly filled up. They 
were not of the same size as the other OSD's. Now I want to remove these OSD's 
and readd them with correct size to match the others.
> 
>> Is there a way to resize the
>> OSD without bringing the cluster down?
>> 
> 
> Define both "resize" and "cluster down".

Basically I want to remove the OSD's with incorrect size and readd them with 
the size matching the other OSD's. 
> 
> As in, resizing how? 
> Are your current OSDs on disks/LVMs that are not fully used and thus could
> be grown?
> What is the size of your current OSDs?

The size of current OSD's is 20GB and we do have more unused space on the disk 
that we can make the LVM bigger and increase the size of the OSD's. I agree 
that we need to have all the disks of same size and I am working towards 
that.Thanks.
> 
> The normal way of growing a cluster is to add more OSDs.
> Preferably of the same size and same performance disks.
> This will not only simplify things immensely but also make them a lot more
> predictable.
> This of course depends on your use case and usage patterns, but often when
> running out of space you're also running out of other resources like CPU,
> memory or IOPS of the disks involved. So adding more instead of growing
> them is most likely the way forward.
> 
> If you were to replace actual disks with larger ones, take them (the OSDs)
> out one at a time and re-add it. If you're using ceph-deploy, it will use
> the disk size as basic weight, if you're doing things manually make sure
> to specify that size/weight accordingly.
> Again, you do want to do this for all disks to keep things uniform.
> 
> If your cluster (pools really) are set to a replica size of at least 2
> (risky!) or 3 (as per Firefly default), taking a single OSD out would of
> course never bring the cluster down.
> However taking an OSD out and/or adding a new one will cause data movement
> that might impact your cluster's performance.
> 

We have a current replica size of 2 with 100 OSD's. How many can I loose 
without affecting the performance? I understand the impact of data movement.

--Jiten





> Regards,
> 
> Christian
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Fusion Communications
> http://www.gol.com/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] resizing the OSD

2014-09-08 Thread JIten Shah


On Sep 6, 2014, at 8:22 PM, Christian Balzer  wrote:

> 
> Hello,
> 
> On Sat, 06 Sep 2014 10:28:19 -0700 JIten Shah wrote:
> 
>> Thanks Christian.  Replies inline.
>> On Sep 6, 2014, at 8:04 AM, Christian Balzer  wrote:
>> 
>>> 
>>> Hello,
>>> 
>>> On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
>>> 
>>>> Hello Cephers,
>>>> 
>>>> We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of
>>>> the stuff seems to be working fine but we are seeing some degrading
>>>> on the osd's due to lack of space on the osd's. 
>>> 
>>> Please elaborate on that degradation.
>> 
>> The degradation happened on few OSD's because it got quickly filled up.
>> They were not of the same size as the other OSD's. Now I want to remove
>> these OSD's and readd them with correct size to match the others.
> 
> Alright, that's good idea, uniformity helps. ^^
> 
>>> 
>>>> Is there a way to resize the
>>>> OSD without bringing the cluster down?
>>>> 
>>> 
>>> Define both "resize" and "cluster down".
>> 
>> Basically I want to remove the OSD's with incorrect size and readd them
>> with the size matching the other OSD's. 
>>> 
>>> As in, resizing how? 
>>> Are your current OSDs on disks/LVMs that are not fully used and thus
>>> could be grown?
>>> What is the size of your current OSDs?
>> 
>> The size of current OSD's is 20GB and we do have more unused space on
>> the disk that we can make the LVM bigger and increase the size of the
>> OSD's. I agree that we need to have all the disks of same size and I am
>> working towards that.Thanks.
>>> 
> OK, so your OSDs are backed by LVM. 
> A curious choice, any particular reason to do so?

We already had lvm’s carved out for some other project and were not using it so 
we decided to have OSD’s on those LVMs

> 
> Either way, in theory you could grow things in place, obviously first the
> LVM and then the underlying filesystem. Both ext4 and xfs support online
> growing, so the OSD can keep running the whole time.
> If you're unfamiliar with these things, play with them on a test machine
> first. 
> 
> Now for the next step we will really need to know how you deployed ceph
> and the result of "ceph osd tree" (not all 100 OSDs are needed, a sample of
> a "small" and "big" OSD is sufficient).

Fixed all the sizes so all of them weight as 1
[jshah@pv11p04si-mzk001 ~]$ ceph osd tree
# idweight  type name   up/down reweight
-1  99  root default
-2  1   host pv11p04si-mslave0005
0   1   osd.0   up  1   
-3  1   host pv11p04si-mslave0006
1   1   osd.1   up  1   
-4  1   host pv11p04si-mslave0007
2   1   osd.2   up  1   
-5  1   host pv11p04si-mslave0008
3   1   osd.3   up  1   
-6  1   host pv11p04si-mslave0009
4   1   osd.4   up  1   
-7  1   host pv11p04si-mslave0010
5   1   osd.5   up  1   
> 
> Depending on the results (it will probably have varying weights depending
> on the size and a reweight value of 1 for all) you will need to adjust the
> weight of the grown OSD in question accordingly with "ceph osd crush
> reweight". 
> That step will incur data movement, so do it one OSD at a time.
> 
>>> The normal way of growing a cluster is to add more OSDs.
>>> Preferably of the same size and same performance disks.
>>> This will not only simplify things immensely but also make them a lot
>>> more predictable.
>>> This of course depends on your use case and usage patterns, but often
>>> when running out of space you're also running out of other resources
>>> like CPU, memory or IOPS of the disks involved. So adding more instead
>>> of growing them is most likely the way forward.
>>> 
>>> If you were to replace actual disks with larger ones, take them (the
>>> OSDs) out one at a time and re-add it. If you're using ceph-deploy, it
>>> will use the disk size as basic weight, if you're doing things
>>> manually make sure to specify that size/weight accordingly.
>>> Again, you do want to do this for all disks to keep things uniform.
>>> 
>>> If your cluster (pools really) are set to a replica size of at least 2
>>&g

[ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah

While checking the health of the cluster, I ran to the following error:

warning: health HEALTH_WARN too few pgs per osd (1< min 20)

When I checked the pg and php numbers, I saw the value was the default value of 
64

ceph osd pool get data pg_num
pg_num: 64
ceph osd pool get data pgp_num
pgp_num: 64

Checking the ceph documents, I updated the numbers to 2000 using the following 
commands:

ceph osd pool set data pg_num 2000
ceph osd pool set data pgp_num 2000

It started resizing the data and saw health warnings again:

health HEALTH_WARN 1 requests are blocked > 32 sec; pool data pg_num 2000 > 
pgp_num 64

and then:

ceph health detail
HEALTH_WARN 6 requests are blocked > 32 sec; 3 osds have slow requests
5 ops are blocked > 65.536 sec
1 ops are blocked > 32.768 sec
1 ops are blocked > 32.768 sec on osd.16
1 ops are blocked > 65.536 sec on osd.77
4 ops are blocked > 65.536 sec on osd.98
3 osds have slow requests

This error also went away after a day.

ceph health detail
HEALTH_OK


Now, the question I have is, will this pg number remain effective on the 
cluster, even if we restart MON or OSD’s on the individual disks?  I haven’t 
changed the values in /etc/ceph/ceph.conf. Do I need to make a change to the 
ceph.conf and push that change to all the MON, MSD and OSD’s ?


Thanks.

—Jiten


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah

Thanks Greg.

—Jiten

On Sep 8, 2014, at 10:31 AM, Gregory Farnum  wrote:

> On Mon, Sep 8, 2014 at 10:08 AM, JIten Shah  wrote:
>> While checking the health of the cluster, I ran to the following error:
>> 
>> warning: health HEALTH_WARN too few pgs per osd (1< min 20)
>> 
>> When I checked the pg and php numbers, I saw the value was the default value
>> of 64
>> 
>> ceph osd pool get data pg_num
>> pg_num: 64
>> ceph osd pool get data pgp_num
>> pgp_num: 64
>> 
>> Checking the ceph documents, I updated the numbers to 2000 using the
>> following commands:
>> 
>> ceph osd pool set data pg_num 2000
>> ceph osd pool set data pgp_num 2000
>> 
>> It started resizing the data and saw health warnings again:
>> 
>> health HEALTH_WARN 1 requests are blocked > 32 sec; pool data pg_num 2000 >
>> pgp_num 64
>> 
>> and then:
>> 
>> ceph health detail
>> HEALTH_WARN 6 requests are blocked > 32 sec; 3 osds have slow requests
>> 5 ops are blocked > 65.536 sec
>> 1 ops are blocked > 32.768 sec
>> 1 ops are blocked > 32.768 sec on osd.16
>> 1 ops are blocked > 65.536 sec on osd.77
>> 4 ops are blocked > 65.536 sec on osd.98
>> 3 osds have slow requests
>> 
>> This error also went away after a day.
>> 
>> ceph health detail
>> HEALTH_OK
>> 
>> 
>> Now, the question I have is, will this pg number remain effective on the
>> cluster, even if we restart MON or OSD’s on the individual disks?  I haven’t
>> changed the values in /etc/ceph/ceph.conf. Do I need to make a change to the
>> ceph.conf and push that change to all the MON, MSD and OSD’s ?
> 
> It's durable once the commands are successful on the monitors. You're all 
> done.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah

So, if it doesn’t refer to the entry in ceph.conf. Where does it actually store 
the new value?

—Jiten

On Sep 8, 2014, at 10:31 AM, Gregory Farnum  wrote:

> On Mon, Sep 8, 2014 at 10:08 AM, JIten Shah  wrote:
>> While checking the health of the cluster, I ran to the following error:
>> 
>> warning: health HEALTH_WARN too few pgs per osd (1< min 20)
>> 
>> When I checked the pg and php numbers, I saw the value was the default value
>> of 64
>> 
>> ceph osd pool get data pg_num
>> pg_num: 64
>> ceph osd pool get data pgp_num
>> pgp_num: 64
>> 
>> Checking the ceph documents, I updated the numbers to 2000 using the
>> following commands:
>> 
>> ceph osd pool set data pg_num 2000
>> ceph osd pool set data pgp_num 2000
>> 
>> It started resizing the data and saw health warnings again:
>> 
>> health HEALTH_WARN 1 requests are blocked > 32 sec; pool data pg_num 2000 >
>> pgp_num 64
>> 
>> and then:
>> 
>> ceph health detail
>> HEALTH_WARN 6 requests are blocked > 32 sec; 3 osds have slow requests
>> 5 ops are blocked > 65.536 sec
>> 1 ops are blocked > 32.768 sec
>> 1 ops are blocked > 32.768 sec on osd.16
>> 1 ops are blocked > 65.536 sec on osd.77
>> 4 ops are blocked > 65.536 sec on osd.98
>> 3 osds have slow requests
>> 
>> This error also went away after a day.
>> 
>> ceph health detail
>> HEALTH_OK
>> 
>> 
>> Now, the question I have is, will this pg number remain effective on the
>> cluster, even if we restart MON or OSD’s on the individual disks?  I haven’t
>> changed the values in /etc/ceph/ceph.conf. Do I need to make a change to the
>> ceph.conf and push that change to all the MON, MSD and OSD’s ?
> 
> It's durable once the commands are successful on the monitors. You're all 
> done.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Updating the pg and pgp values

2014-09-08 Thread JIten Shah

Thanks. How do I query the OSDMap on monitors? 

Using "ceph osd pool get data pg” ? or is there a way to get the full list of 
settings?

—jiten


On Sep 8, 2014, at 10:52 AM, Gregory Farnum  wrote:

> It's stored in the OSDMap on the monitors.
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Mon, Sep 8, 2014 at 10:50 AM, JIten Shah  wrote:
>> So, if it doesn’t refer to the entry in ceph.conf. Where does it actually 
>> store the new value?
>> 
>> —Jiten
>> 
>> On Sep 8, 2014, at 10:31 AM, Gregory Farnum  wrote:
>> 
>>> On Mon, Sep 8, 2014 at 10:08 AM, JIten Shah  wrote:
>>>> While checking the health of the cluster, I ran to the following error:
>>>> 
>>>> warning: health HEALTH_WARN too few pgs per osd (1< min 20)
>>>> 
>>>> When I checked the pg and php numbers, I saw the value was the default 
>>>> value
>>>> of 64
>>>> 
>>>> ceph osd pool get data pg_num
>>>> pg_num: 64
>>>> ceph osd pool get data pgp_num
>>>> pgp_num: 64
>>>> 
>>>> Checking the ceph documents, I updated the numbers to 2000 using the
>>>> following commands:
>>>> 
>>>> ceph osd pool set data pg_num 2000
>>>> ceph osd pool set data pgp_num 2000
>>>> 
>>>> It started resizing the data and saw health warnings again:
>>>> 
>>>> health HEALTH_WARN 1 requests are blocked > 32 sec; pool data pg_num 2000 >
>>>> pgp_num 64
>>>> 
>>>> and then:
>>>> 
>>>> ceph health detail
>>>> HEALTH_WARN 6 requests are blocked > 32 sec; 3 osds have slow requests
>>>> 5 ops are blocked > 65.536 sec
>>>> 1 ops are blocked > 32.768 sec
>>>> 1 ops are blocked > 32.768 sec on osd.16
>>>> 1 ops are blocked > 65.536 sec on osd.77
>>>> 4 ops are blocked > 65.536 sec on osd.98
>>>> 3 osds have slow requests
>>>> 
>>>> This error also went away after a day.
>>>> 
>>>> ceph health detail
>>>> HEALTH_OK
>>>> 
>>>> 
>>>> Now, the question I have is, will this pg number remain effective on the
>>>> cluster, even if we restart MON or OSD’s on the individual disks?  I 
>>>> haven’t
>>>> changed the values in /etc/ceph/ceph.conf. Do I need to make a change to 
>>>> the
>>>> ceph.conf and push that change to all the MON, MSD and OSD’s ?
>>> 
>>> It's durable once the commands are successful on the monitors. You're all 
>>> done.
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] full/near full ratio

2014-09-12 Thread JIten Shah

Looking at the docs (as below), it seems like .95 and .85 are the default 
numbers for full and near full ratio and if you reach the full ratio, it will 
stop reading an writing to avoid data corruption.

http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity

So, few questions:

1. If we need to modify those numbers, do we need to update the values in 
ceph.conf and restart every OSD or we can run a command on MON, that will 
overwrite it?

2. What is the best way to get the OSD’s to work again, if we reach the full 
ration amount?  You can’t delete the data because read/write is blocked.

3. If we add new OSD’s, will it start rebalancing the OSD’s or do I need to 
trigger it manually and how?

Thanks

—Jiten


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS mounting error

2014-09-12 Thread JIten Shah

What does your mount command look like ?

Sent from my iPhone 5S



> On Sep 12, 2014, at 4:56 PM, Erick Ocrospoma  wrote:
> 
> Hi,
> 
> I'm n00b in the ceph world, so here I go. I was following this tutorials 
> [1][2] (in case you need to know if I missed something), while trying to 
> mount a block from an isolated machine using cehpfs I got this error 
> (actually following what's there in [2]).
> 
> mount error 5 = Input/output error
> 
> I've searched on the internet  but with no success. No idea what's going on, 
> and logs don't seem to have any clue of this error. My setup consists on 1 
> mds server and 3 OSD servers, I perform all test with the root user, I've 
> seen on other tutorials (and on this aswell) using an specificar user, don't 
> if that could have impacted on the whole setup.
> 
> 
> [1] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph
> [2] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph&f=2
> 
> 
> -- 
> 
> 
> 
> ~ Happy install !
> 
> 
> 
> 
> 
> Erick.
> 
> ---
> 
> IRC :   zerick
> About :  http://about.me/zerick
> Linux User ID :  549567
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS mounting error

2014-09-12 Thread JIten Shah

Yes. It has to be the  name of the MON server. If there are more than one MON 
servers, they all need to be listed.  

--Jiten

Sent from my iPhone 5S



> On Sep 12, 2014, at 6:44 PM, Jean-Charles LOPEZ  wrote:
> 
> Hi Erick,
> 
> the address to use in the mount syntax is the address of your MON node, not 
> the one of the MDS node.
> 
> Or may be you have deployed both a MON and an MDS on ceph01?
> 
> 
> JC
> 
> 
> 
>> On Sep 12, 2014, at 18:41, Erick Ocrospoma  wrote:
>> 
>> 
>> 
>> On 12 September 2014 20:32, JIten Shah  wrote:
>> What does your mount command look like ?
>> 
>> 
>> mount -t ceph ceph01:/mnt /mnt -o name=admin,secretfile=/root/ceph/admin.key
>> 
>> Where ceph01 is my mds server.
>> 
>> 
>> Sent from my iPhone 5S
>> 
>> 
>> 
>>> On Sep 12, 2014, at 4:56 PM, Erick Ocrospoma  wrote:
>>> 
>>> Hi,
>>> 
>>> I'm n00b in the ceph world, so here I go. I was following this tutorials 
>>> [1][2] (in case you need to know if I missed something), while trying to 
>>> mount a block from an isolated machine using cehpfs I got this error 
>>> (actually following what's there in [2]).
>>> 
>>> mount error 5 = Input/output error
>>> 
>>> I've searched on the internet  but with no success. No idea what's going 
>>> on, and logs don't seem to have any clue of this error. My setup consists 
>>> on 1 mds server and 3 OSD servers, I perform all test with the root user, 
>>> I've seen on other tutorials (and on this aswell) using an specificar user, 
>>> don't if that could have impacted on the whole setup.
>>> 
>>> 
>>> [1] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph
>>> [2] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph&f=2
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> 
>>> ~ Happy install !
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Erick.
>>> 
>>> ---
>>> 
>>> IRC :   zerick
>>> About :  http://about.me/zerick
>>> Linux User ID :  549567
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> 
>> -- 
>> 
>> 
>> 
>> ~ Happy install !
>> 
>> 
>> 
>> 
>> 
>> Erick.
>> 
>> ---
>> 
>> IRC :   zerick
>> About :  http://about.me/zerick
>> Linux User ID :  549567
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS mounting error

2014-09-12 Thread JIten Shah

Here's an example:

sudo mount -t ceph 192.168.0.1:6789:/ /mnt/mycephfs -o 
name=admin,secret=AQATSKdNGBnwLhAAnNDKnH65FmVKpXZJVasUeQ==

Sent from my iPhone 5S



> On Sep 12, 2014, at 7:14 PM, JIten Shah  wrote:
> 
> Yes. It has to be the  name of the MON server. If there are more than one MON 
> servers, they all need to be listed.  
> 
> --Jiten
> 
> Sent from my iPhone 5S
> 
> 
> 
>> On Sep 12, 2014, at 6:44 PM, Jean-Charles LOPEZ  wrote:
>> 
>> Hi Erick,
>> 
>> the address to use in the mount syntax is the address of your MON node, not 
>> the one of the MDS node.
>> 
>> Or may be you have deployed both a MON and an MDS on ceph01?
>> 
>> 
>> JC
>> 
>> 
>> 
>>> On Sep 12, 2014, at 18:41, Erick Ocrospoma  wrote:
>>> 
>>> 
>>> 
>>> On 12 September 2014 20:32, JIten Shah  wrote:
>>> What does your mount command look like ?
>>> 
>>> 
>>> mount -t ceph ceph01:/mnt /mnt -o name=admin,secretfile=/root/ceph/admin.key
>>> 
>>> Where ceph01 is my mds server.
>>> 
>>> 
>>> Sent from my iPhone 5S
>>> 
>>> 
>>> 
>>>> On Sep 12, 2014, at 4:56 PM, Erick Ocrospoma  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I'm n00b in the ceph world, so here I go. I was following this tutorials 
>>>> [1][2] (in case you need to know if I missed something), while trying to 
>>>> mount a block from an isolated machine using cehpfs I got this error 
>>>> (actually following what's there in [2]).
>>>> 
>>>> mount error 5 = Input/output error
>>>> 
>>>> I've searched on the internet  but with no success. No idea what's going 
>>>> on, and logs don't seem to have any clue of this error. My setup consists 
>>>> on 1 mds server and 3 OSD servers, I perform all test with the root user, 
>>>> I've seen on other tutorials (and on this aswell) using an specificar 
>>>> user, don't if that could have impacted on the whole setup.
>>>> 
>>>> 
>>>> [1] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph
>>>> [2] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph&f=2
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> 
>>>> 
>>>> ~ Happy install !
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Erick.
>>>> 
>>>> ---
>>>> 
>>>> IRC :   zerick
>>>> About :  http://about.me/zerick
>>>> Linux User ID :  549567
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> 
>>> 
>>> ~ Happy install !
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Erick.
>>> 
>>> ---
>>> 
>>> IRC :   zerick
>>> About :  http://about.me/zerick
>>> Linux User ID :  549567
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS mounting error

2014-09-12 Thread JIten Shah



Sent from my iPhone 5S



> On Sep 12, 2014, at 8:01 PM, Erick Ocrospoma  wrote:
> 
> 
> 
>> On 12 September 2014 21:16, JIten Shah  wrote:
>> Here's an example:
>> 
>> sudo mount -t ceph 192.168.0.1:6789:/ /mnt/mycephfs -o 
>> name=admin,secret=AQATSKdNGBnwLhAAnNDKnH65FmVKpXZJVasUeQ==
>> 
>> Sent from my iPhone 5S
>> 
>> 
>> 
>>> On Sep 12, 2014, at 7:14 PM, JIten Shah  wrote:
>>> 
>>> Yes. It has to be the  name of the MON server. If there are more than one 
>>> MON servers, they all need to be listed.  
>>> 
>>> --Jiten
>>> 
>>> Sent from my iPhone 5S
>>> 
>>> 
>>> 
>>>> On Sep 12, 2014, at 6:44 PM, Jean-Charles LOPEZ  
>>>> wrote:
>>>> 
>>>> Hi Erick,
>>>> 
>>>> the address to use in the mount syntax is the address of your MON node, 
>>>> not the one of the MDS node.
>>>> 
>>>> Or may be you have deployed both a MON and an MDS on ceph01?
> Omg, this is true, I was a complete silly boy :). It worked. Now I can see a 
> volume with the sum of the / of each mon server, (30GBx3=90), is that 
> correct? 

It should be the sum of all the OSDs not the sum of MON's unless your MON and 
OSD's are same.  


> 
> root@cephClient:~# mount -t ceph ceph02:/ /mnt -o 
> name=admin,secretfile=/root/ceph/admin.key
> root@cephClient:~# df -h
> FilesystemSize  Used Avail Use% Mounted on
> /dev/vda1  20G  2.1G   17G  12% /
> none  4.0K 0  4.0K   0% /sys/fs/cgroup
> udev  235M  4.0K  235M   1% /dev
> tmpfs  50M  332K   49M   1% /run
> none  5.0M 0  5.0M   0% /run/lock
> none  246M 0  246M   0% /run/shm
> none  100M 0  100M   0% /run/user
> 104.131.25.161:/   89G   27G   62G  30% /mnt
> 
> I also created one rbd device on the mds server, is it mountable through 
> network? I just mounted the device on the mds server itself.
>  
>>>> 
>>>> JC
>>>> 
>>>> 
>>>> 
>>>>> On Sep 12, 2014, at 18:41, Erick Ocrospoma  wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 12 September 2014 20:32, JIten Shah  wrote:
>>>>> What does your mount command look like ?
>>>>> 
>>>>> 
>>>>> mount -t ceph ceph01:/mnt /mnt -o 
>>>>> name=admin,secretfile=/root/ceph/admin.key
>>>>> 
>>>>> Where ceph01 is my mds server.
>>>>> 
>>>>> 
>>>>> Sent from my iPhone 5S
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Sep 12, 2014, at 4:56 PM, Erick Ocrospoma  
>>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm n00b in the ceph world, so here I go. I was following this tutorials 
>>>>>> [1][2] (in case you need to know if I missed something), while trying to 
>>>>>> mount a block from an isolated machine using cehpfs I got this error 
>>>>>> (actually following what's there in [2]).
>>>>>> 
>>>>>> mount error 5 = Input/output error
>>>>>> 
>>>>>> I've searched on the internet  but with no success. No idea what's going 
>>>>>> on, and logs don't seem to have any clue of this error. My setup 
>>>>>> consists on 1 mds server and 3 OSD servers, I perform all test with the 
>>>>>> root user, I've seen on other tutorials (and on this aswell) using an 
>>>>>> specificar user, don't if that could have impacted on the whole setup.
>>>>>> 
>>>>>> 
>>>>>> [1] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph
>>>>>> [2] http://www.server-world.info/en/note?os=Ubuntu_14.04&p=ceph&f=2
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ~ Happy install !
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Erick.
>>>>>> 
>>>>>> ---
>>>>>> 
>>>>>> IRC :   zerick
>>>>>> About :  http://about.me/zerick
>>>>>> Linux User

Re: [ceph-users] full/near full ratio

2014-09-16 Thread JIten Shah

Thanks Craig. That’s exactly what I was looking for.

—Jiten

On Sep 16, 2014, at 2:42 PM, Craig Lewis  wrote:

> 
> 
> On Fri, Sep 12, 2014 at 4:35 PM, JIten Shah  wrote:
> 
> 1. If we need to modify those numbers, do we need to update the values in 
> ceph.conf and restart every OSD or we can run a command on MON, that will 
> overwrite it?
> 
> That will work.  You can also update the values without a restart using:
> ceph tell mon.\* injectargs '--mon_osd_nearfull_ratio 0.85'
> 
> 
> You might also need to look at mon_osd_full_ratio, osd_backfill_full_ratio, 
> osd_failsafe_full_ratio, and  osd_failsafe_nearfull_ratio.
> 
> Variables that start with mon should be sent to all the monitors (ceph tell 
> mon.\* ...), variables that start with osd should be send to the osds (ceph 
> tell osd.\* ...).
> 
>  
> 
> 2. What is the best way to get the OSD’s to work again, if we reach the full 
> ration amount?  You can’t delete the data because read/write is blocked.
> 
> Add more OSDs.  Preferably before they become full, but it'll work if they're 
> toofull.  It may take a while though, Ceph doesn't seem to weight which 
> backfills should be done first, so it might take a while to get to the OSDs 
> that are toofull.
> 
> Since not everybody has nodes and disks laying around, you can stop all of 
> your writes, and bump the nearfull and full ratios.  I've bumped them while I 
> was using ceph osd reweight, and had some toofull disks that wanted to 
> exchange PGs.  Keep in mind that Ceph stops when the percentage is > than 
> toofull, so don't set full_ratio to 0.99.  You really don't want to fill up 
> your disks.
> 
> If all else fails (or you get a disk down to 0 kB free) you can manually 
> delete some PGs on disk.  This is fairly risky, and prone to human error 
> causing data loss.  You'll have to figure out the best ones to delete, and 
> you'll want to make sure you don't delete every replica of the PG.  You'll 
> want to disable backfilling (ceph osd set nobackfill), otherwise Ceph will 
> repair things back to toofull.
> 
>  
> 
> 3. If we add new OSD’s, will it start rebalancing the OSD’s or do I need to 
> trigger it manually and how?
> 
> Adding and starting the OSDs will start rebalancing.  The expected location 
> will change as soon as you add the OSD to the crushmap.  Shortly after the 
> OSD starts, it will begin updating to make reality match expectations.  For 
> most people, that happens in a single step, with ceph-deploy or a Config 
> Management tool.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Replication factor of 50 on a 1000 OSD node cluster

2014-09-16 Thread JIten Shah

Hi Guys,

We have a cluster with 1000 OSD nodes and 5 MON nodes and 1 MDS node. In order 
to be able to loose quite a few OSD’s and still survive the load, we were 
thinking of making the replication factor to 50.

Is that too big of a number? what is the performance implications and any other 
issues that we should consider before setting it to that. Also, do we need the 
same number of metadata copies too or it can be less?

Thanks.

—Jiten



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] UnSubscribe Please

2015-03-17 Thread JIten Shah


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] UnSubscribe Please

2015-04-05 Thread JIten Shah




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Unsubscribe Please

2015-05-22 Thread JIten Shah


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ssh; cannot resolve hostname errors

2014-10-15 Thread JIten Shah

Please send your /etc/hosts contents here.

--Jiten

On Oct 15, 2014, at 7:27 AM, Support - Avantek  wrote:

> I may be completely overlooking something here but I keep getting “ssh; 
> cannot resolve hostname” when I try to contact my OSD node’s from my monitor 
> node. I have set the ipaddress’s of the 3 nodes in /etc/hosts as suggested on 
> the website.
>  
> Thanks in advance
>  
> James
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Puppet module for CephFS

2014-10-22 Thread JIten Shah

Hi Guys,

We are trying to install cephFS using puppet on all the ODS nodes, as well as 
MON and MDS. Are there recommended puppet modules that anyone has used in the 
past or created their own?

Thanks.

—Jiten

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Installing CephFs via puppet

2014-11-06 Thread JIten Shah

Hi Guys,

I am sure many of you guys have installed cephfs using puppet. I am trying to 
install “firefly” using the puppet module from  
https://github.com/ceph/puppet-ceph.git  

and running into the “ceph_config” file issue where it’s unable to find the 
config file and I am not sure why.

Here’s the error I get while running puppet on one of the mon nodes:

Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could 
not evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: Could 
not evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could not 
evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not 
evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No 
ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not 
evaluate: No ability to determine if ceph_config exists
Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could not 
evaluate: No ability to determine if ceph_config exists

—Jiten___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing CephFs via puppet

2014-11-06 Thread JIten Shah

Thanks Loic. 

What is the recommended puppet module for installing cephFS ?

I can send more details about puppet-ceph but basically I haven't changed 
anything in there except for assigning values to the required params in the 
yaml file. 

--Jiten 



> On Nov 6, 2014, at 7:24 PM, Loic Dachary  wrote:
> 
> Hi,
> 
> At the moment puppet-ceph does not support CephFS. The error you're seeing 
> does not ring a bell, would you have more context to help diagnose it ?
> 
> Cheers
> 
>> On 06/11/2014 23:44, JIten Shah wrote:
>> Hi Guys,
>> 
>> I am sure many of you guys have installed cephfs using puppet. I am trying 
>> to install “firefly” using the puppet module from  
>> https://github.com/ceph/puppet-ceph.git  
>> 
>> and running into the “ceph_config” file issue where it’s unable to find the 
>> config file and I am not sure why.
>> 
>> Here’s the error I get while running puppet on one of the mon nodes:
>> 
>> Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: Could 
>> not evaluate: No ability to determine if ceph_config exists
>> Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: 
>> Could not evaluate: No ability to determine if ceph_config exists
>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could 
>> not evaluate: No ability to determine if ceph_config exists
>> Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could not 
>> evaluate: No ability to determine if ceph_config exists
>> Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No 
>> ability to determine if ceph_config exists
>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not 
>> evaluate: No ability to determine if ceph_config exists
>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could 
>> not evaluate: No ability to determine if ceph_config exists
>> 
>> —Jiten
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing CephFs via puppet

2014-11-07 Thread JIten Shah

Thanks JC and Loic but we HAVE to use puppet.  That’s how all of our 
configuration and deployment stuff works and I can’t sway away from it.

Is https://github.com/enovance/puppet-ceph a good resource for cephFS? Has 
anyone used it successfully?

—Jiten

On Nov 7, 2014, at 9:09 AM, Jean-Charles LOPEZ  wrote:

> Hi,
> 
> with ceps-deploy do the following
> 1) Install ceph-deploy
> 2) mkdir ~/ceph-deploy
> 3) cd ~/ceph-deploy
> 4) ceph-deploy --overwrite-conf config pull {monitorhostname}
> 5) If version is Giant
> a) ceph osd pool create cephfsdata 
> b) ceph odd pool create cephfsmeta xxx
> c) ceph mds newfs {cephfsmeta_poolid} {cephfsdata_poolid}
> 5) ceph-deploy mds create {mdshostname}
> 
> Make sure you have password-less ssh access into the later host.
> 
> I think this should do the trick
> 
> JC
> 
> 
> 
>> On Nov 6, 2014, at 20:07, JIten Shah  wrote:
>> 
>> Thanks Loic. 
>> 
>> What is the recommended puppet module for installing cephFS ?
>> 
>> I can send more details about puppet-ceph but basically I haven't changed 
>> anything in there except for assigning values to the required params in the 
>> yaml file. 
>> 
>> --Jiten 
>> 
>> 
>> 
>>> On Nov 6, 2014, at 7:24 PM, Loic Dachary  wrote:
>>> 
>>> Hi,
>>> 
>>> At the moment puppet-ceph does not support CephFS. The error you're seeing 
>>> does not ring a bell, would you have more context to help diagnose it ?
>>> 
>>> Cheers
>>> 
>>>> On 06/11/2014 23:44, JIten Shah wrote:
>>>> Hi Guys,
>>>> 
>>>> I am sure many of you guys have installed cephfs using puppet. I am trying 
>>>> to install “firefly” using the puppet module from  
>>>> https://github.com/ceph/puppet-ceph.git  
>>>> 
>>>> and running into the “ceph_config” file issue where it’s unable to find 
>>>> the config file and I am not sure why.
>>>> 
>>>> Here’s the error I get while running puppet on one of the mon nodes:
>>>> 
>>>> Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]: 
>>>> Could not evaluate: No ability to determine if ceph_config exists
>>>> Error: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]: 
>>>> Could not evaluate: No ability to determine if ceph_config exists
>>>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]: Could 
>>>> not evaluate: No ability to determine if ceph_config exists
>>>> Error: /Stage[main]/Ceph/Ceph_config[global/mon_initial_members]: Could 
>>>> not evaluate: No ability to determine if ceph_config exists
>>>> Error: /Stage[main]/Ceph/Ceph_config[global/fsid]: Could not evaluate: No 
>>>> ability to determine if ceph_config exists
>>>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_supported]: Could not 
>>>> evaluate: No ability to determine if ceph_config exists
>>>> Error: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]: Could 
>>>> not evaluate: No ability to determine if ceph_config exists
>>>> 
>>>> —Jiten
>>>> 
>>>> 
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> -- 
>>> Loïc Dachary, Artisan Logiciel Libre
>>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah

Hi Guys,

We ran into this issue after we nearly max’ed out the sod’s. Since then, we 
have cleaned up a lot of data in the sod’s but pg’s seem to stuck for last 4 to 
5 days. I have run "ceph osd reweight-by-utilization” and that did not seem to 
work.

Any suggestions? 


ceph -s
cluster 909c7fe9-0012-4c27-8087-01497c661511
 health HEALTH_WARN 224 pgs backfill; 130 pgs backfill_toofull; 86 pgs 
backfilling; 4 pgs degraded; 14 pgs recovery_wait; 324 pgs stuck unclean; 
recovery -11922/573322 objects degraded (-2.079%)
 monmap e5: 5 mons at 
{Lab-mon001=x.x.96.12:6789/0,Lab-mon002=x.x.96.13:6789/0,Lab-mon003=x.x.96.14:6789/0,Lab-mon004=x.x.96.15:6789/0,Lab-mon005=x.x.96.16:6789/0},
 election epoch 28, quorum 0,1,2,3,4 
Lab-mon001,Lab-mon002,Lab-mon003,Lab-mon004,Lab-mon005
 mdsmap e6: 1/1/1 up {0=Lab-mon001=up:active}
 osdmap e10598: 495 osds: 492 up, 492 in
  pgmap v1827231: 21568 pgs, 3 pools, 221 GB data, 184 kobjects
4142 GB used, 4982 GB / 9624 GB avail
-11922/573322 objects degraded (-2.079%)
   9 active+recovery_wait
   21244 active+clean
  90 active+remapped+wait_backfill
   5 active+recovery_wait+remapped
   4 active+degraded+remapped+wait_backfill
 130 active+remapped+wait_backfill+backfill_toofull
  86 active+remapped+backfilling
  client io 0 B/s rd, 0 op/s

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah

Thanks Chad. It seems to be working.

—Jiten

On Nov 11, 2014, at 12:47 PM, Chad Seys  wrote:

> Find out which OSD it is:
> 
> ceph health detail
> 
> Squeeze blocks off the affected OSD:
> 
> ceph osd reweight OSDNUM 0.8
> 
> Repeat with any OSD which becomes toofull.
> 
> Your cluster is only about 50% used, so I think this will be enough.
> 
> Then when it finishes, allow data back on OSD:
> 
> ceph osd reweight OSDNUM 1
> 
> Hopefully ceph will someday be taught to move PGs in a better order!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah

Actually there were 100’s that were too full. We manually set the OSD weights 
to 0.5 and it seems to be recovering.

Thanks of the tips on crush reweight. I will look into it.

—Jiten

On Nov 11, 2014, at 1:37 PM, Craig Lewis  wrote:

> How many OSDs are nearfull?
> 
> I've seen Ceph want two toofull OSDs to swap PGs.  In that case, I 
> dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a bit, 
> then put it back to normal once the scheduling deadlock finished. 
> 
> Keep in mind that ceph osd reweight is temporary.  If you mark an osd OUT 
> then IN, the weight will be set to 1.0.  If you need something that's 
> persistent, you can use ceph osd crush reweight osd.NUM .  Look 
> at ceph osd tree to get the current weight.
> 
> I also recommend stepping towards your goal.  Changing either weight can 
> cause a lot of unrelated migrations, and the crush weight seems to cause more 
> than the osd weight.  I step osd weight by 0.125, and crush weight by 0.05.
> 
> 
> On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys  wrote:
> Find out which OSD it is:
> 
> ceph health detail
> 
> Squeeze blocks off the affected OSD:
> 
> ceph osd reweight OSDNUM 0.8
> 
> Repeat with any OSD which becomes toofull.
> 
> Your cluster is only about 50% used, so I think this will be enough.
> 
> Then when it finishes, allow data back on OSD:
> 
> ceph osd reweight OSDNUM 1
> 
> Hopefully ceph will someday be taught to move PGs in a better order!
> Chad.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread JIten Shah

I agree. This was just our brute-force method on our test cluster. We won't do 
this on production cluster.

--Jiten

On Nov 11, 2014, at 2:11 PM, cwseys  wrote:

> 0.5 might be too much.  All the PGs squeezed off of one OSD will need to be 
> stored on another.  The fewer you move the less likely a different OSD will 
> become toofull.
> 
> Better to adjust in small increments as Craig suggested.
> 
> Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Installing CephFs via puppet

2014-11-14 Thread JIten Shah

Hi Guys,

I got 3 MONs and 3 OSD’s  configured via puppet but when trying to run MDS on 
the MON servers, I am running into the error, which I can’t figure out.

BTW, I already manually installed MDS on cephmon001 and trying to install the 
second MDS via puppet to see if I can have multiple MDS’s running at the same 
time ( I am aware that only one MDS can be active at any given time in Firefly).

[jshah@Lab-cephmon002 mds]$ cat /etc/ceph/ceph.conf 

[global]
osd_pool_default_pgp_num = 200
osd_pool_default_min_size = 1
auth_service_required = none
mon_initial_members = Lab-cephmon001, Lab-cephmon002, Lab-cephmon003
fsid = 2e738cda-1930-48cd-a4b1-74bc737c5d56
cluster_network = 
auth_supported = none
auth_cluster_required = none
mon_host = X.X.16.111:6789, X.X.16.78:6789, X.X.16.115:6789
auth_client_required = none
osd_pool_default_size = 3
osd_pool_default_pg_num = 200
public_network = X.X.16.0/23


[mds]
mds_data = /var/lib/ceph/mds/mds.Lab-cephmon002
keyring = /var/lib/ceph/mds/mds.Lab-cephmon002/keyring
[jshah@Lab-cephmon002 mds]$ sudo service ceph start mds.Lab-cephmon002
/etc/init.d/ceph: mds.Lab-cephmon002 not found (/etc/ceph/ceph.conf defines 
mon.Lab-cephmon002 mds.cephmon002 , /var/lib/ceph defines mon.Lab-cephmon002 
mds.cephmon002)
[jshah@Lab-cephmon002 mds]$ ceph -s
cluster 2e738cda-1930-48cd-a4b1-74bc737c5d56
 health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; recovery 40/60 
objects degraded (66.667%)
 monmap e2: 3 mons at 
{Lab-cephmon001=X.X.16.111:6789/0,Lab-cephmon002=X.X.16.78:6789/0,Lab-cephmon003=X.X.16.115:6789/0},
 election epoch 8, quorum 0,1,2 Lab-cephmon002,Lab-cephmon001,Lab-cephmon003
 mdsmap e4: 1/1/1 up {0=Lab-cephmon001=up:active}
 osdmap e49: 3 osds: 3 up, 2 in
  pgmap v3926: 192 pgs, 3 pools, 2230 bytes data, 20 objects
16757 MB used, 4898 MB / 22863 MB avail
40/60 objects degraded (66.667%)
  28 active+degraded
 164 active+degraded+remapped


—Jiten


On Nov 10, 2014, at 8:58 AM, Francois Charlier  wrote:

> - Original Message -
>> From: "JIten Shah" 
>> To: "Jean-Charles LOPEZ" 
>> Cc: "ceph-users" 
>> Sent: Friday, November 7, 2014 7:18:10 PM
>> Subject: Re: [ceph-users] Installing CephFs via puppet
>> 
>> Thanks JC and Loic but we HAVE to use puppet.  That’s how all of our
>> configuration and deployment stuff works and I can’t sway away from it.
>> 
>> Is https://github.com/enovance/puppet-ceph a good resource for cephFS? Has
>> anyone used it successfully?
>> 
> 
> Hi,
> 
> This module doesn't currently doesn't provide any mean to deploy CephFS.
> -- 
> François Charlier   Software Engineer
> // eNovance SAS  http://www.enovance.com/
> // ✉ francois.charl...@enovance.com  ☎ +33 1 49 70 99 81

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Recreating the OSD's with same ID does not seem to work

2014-11-14 Thread JIten Shah

Hi Guys,

I had to rekick some of the hosts where OSD’s were running and after re-kick, 
when I try to run puppet and install OSD’s again, it gives me a key mismatch 
error (as below). After the hosts were shutdown for rekick, I removed the OSD’s 
from the osd tree and the crush map too. Why is it still tied to the old key?


Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 + test -b /osd-data
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 + mkdir -p /osd-data
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 + ceph-disk prepare /osd-data
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 + test -b /osd-data
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 + ceph-disk activate /osd-data
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 got monmap epoch 2
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 2014-11-15 00:31:25.951783 77fe67a0 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 2014-11-15 00:31:26.023037 77fe67a0 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 2014-11-15 00:31:26.023809 77fe67a0 -1 filestore(/osd-data) could not find 
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 2014-11-15 00:31:26.032044 77fe67a0 -1 created object store /osd-data 
journal /osd-data/journal for osd.2 fsid 2e738cda-1930-48cd-a4b1-74bc737c5d56
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 2014-11-15 00:31:26.032097 77fe67a0 -1 auth: error reading file: 
/osd-data/keyring: can't open /osd-data/keyring: (2) No such file or directory
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 2014-11-15 00:31:26.032189 77fe67a0 -1 created new key in keyring 
/osd-data/keyring
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 Error EINVAL: entity osd.2 exists but key does not match
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 Traceback (most recent call last):
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
   File "/usr/sbin/ceph-disk", line 2591, in 
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 main()
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
   File "/usr/sbin/ceph-disk", line 2569, in main
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 args.func(args)
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
   File "/usr/sbin/ceph-disk", line 1929, in main_activate
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 init=args.mark_init,
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
   File "/usr/sbin/ceph-disk", line 1761, in activate_dir
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 (osd_id, cluster) = activate(path, activate_key_template, init)
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
   File "/usr/sbin/ceph-disk", line 1897, in activate
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 keyring=keyring,
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
   File "/usr/sbin/ceph-disk", line 1520, in auth_key
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
 'mon', 'allow profile osd',
Notice: 
/Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
   File

Re: [ceph-users] Recreating the OSD's with same ID does not seem to work

2014-11-14 Thread JIten Shah

But I am not using “cephx” for authentication. I have already disabled that.

—Jiten

On Nov 14, 2014, at 4:44 PM, Gregory Farnum  wrote:

> You didn't remove them from the auth monitor's keyring. If you're
> removing OSDs you need to follow the steps in the documentation.
> -Greg
> 
> On Fri, Nov 14, 2014 at 4:42 PM, JIten Shah  wrote:
>> Hi Guys,
>> 
>> I had to rekick some of the hosts where OSD’s were running and after
>> re-kick, when I try to run puppet and install OSD’s again, it gives me a key
>> mismatch error (as below). After the hosts were shutdown for rekick, I
>> removed the OSD’s from the osd tree and the crush map too. Why is it still
>> tied to the old key?
>> 
>> 
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> + test -b /osd-data
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> + mkdir -p /osd-data
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> + ceph-disk prepare /osd-data
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> + test -b /osd-data
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> + ceph-disk activate /osd-data
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> got monmap epoch 2
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> 2014-11-15 00:31:25.951783 77fe67a0 -1 journal FileJournal::_open:
>> disabling aio for non-block journal.  Use journal_force_aio to force use of
>> aio anyway
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> 2014-11-15 00:31:26.023037 77fe67a0 -1 journal FileJournal::_open:
>> disabling aio for non-block journal.  Use journal_force_aio to force use of
>> aio anyway
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> 2014-11-15 00:31:26.023809 77fe67a0 -1 filestore(/osd-data) could not
>> find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> 2014-11-15 00:31:26.032044 77fe67a0 -1 created object store /osd-data
>> journal /osd-data/journal for osd.2 fsid
>> 2e738cda-1930-48cd-a4b1-74bc737c5d56
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> 2014-11-15 00:31:26.032097 77fe67a0 -1 auth: error reading file:
>> /osd-data/keyring: can't open /osd-data/keyring: (2) No such file or
>> directory
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> 2014-11-15 00:31:26.032189 77fe67a0 -1 created new key in keyring
>> /osd-data/keyring
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> Error EINVAL: entity osd.2 exists but key does not match
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> Traceback (most recent call last):
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> File "/usr/sbin/ceph-disk", line 2591, in 
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> main()
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> File "/usr/sbin/ceph-disk", line 2569, in main
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> args.func(args)
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> File "/usr/sbin/ceph-disk", line 1929, in main_activate
>> Notice:
>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>> init=args.mark_init,
>> Notice:
>> /Sta

Re: [ceph-users] Recreating the OSD's with same ID does not seem to work

2014-11-14 Thread JIten Shah

Ok. I will do that.  Thanks

--Jiten



> On Nov 14, 2014, at 4:57 PM, Gregory Farnum  wrote:
> 
> It's still creating and storing keys in case you enable it later.
> That's exactly what the error is telling you and that's why it's not
> working.
> 
>> On Fri, Nov 14, 2014 at 4:45 PM, JIten Shah  wrote:
>> But I am not using “cephx” for authentication. I have already disabled that.
>> 
>> —Jiten
>> 
>>> On Nov 14, 2014, at 4:44 PM, Gregory Farnum  wrote:
>>> 
>>> You didn't remove them from the auth monitor's keyring. If you're
>>> removing OSDs you need to follow the steps in the documentation.
>>> -Greg
>>> 
>>>> On Fri, Nov 14, 2014 at 4:42 PM, JIten Shah  wrote:
>>>> Hi Guys,
>>>> 
>>>> I had to rekick some of the hosts where OSD’s were running and after
>>>> re-kick, when I try to run puppet and install OSD’s again, it gives me a 
>>>> key
>>>> mismatch error (as below). After the hosts were shutdown for rekick, I
>>>> removed the OSD’s from the osd tree and the crush map too. Why is it still
>>>> tied to the old key?
>>>> 
>>>> 
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> + test -b /osd-data
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> + mkdir -p /osd-data
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> + ceph-disk prepare /osd-data
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> + test -b /osd-data
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> + ceph-disk activate /osd-data
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> got monmap epoch 2
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> 2014-11-15 00:31:25.951783 77fe67a0 -1 journal FileJournal::_open:
>>>> disabling aio for non-block journal.  Use journal_force_aio to force use of
>>>> aio anyway
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> 2014-11-15 00:31:26.023037 77fe67a0 -1 journal FileJournal::_open:
>>>> disabling aio for non-block journal.  Use journal_force_aio to force use of
>>>> aio anyway
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> 2014-11-15 00:31:26.023809 77fe67a0 -1 filestore(/osd-data) could not
>>>> find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> 2014-11-15 00:31:26.032044 77fe67a0 -1 created object store /osd-data
>>>> journal /osd-data/journal for osd.2 fsid
>>>> 2e738cda-1930-48cd-a4b1-74bc737c5d56
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> 2014-11-15 00:31:26.032097 77fe67a0 -1 auth: error reading file:
>>>> /osd-data/keyring: can't open /osd-data/keyring: (2) No such file or
>>>> directory
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> 2014-11-15 00:31:26.032189 77fe67a0 -1 created new key in keyring
>>>> /osd-data/keyring
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> Error EINVAL: entity osd.2 exists but key does not match
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data]/returns:
>>>> Traceback (most recent call last):
>>>> Notice:
>>>> /Stage[main]/Main/Node[infrastructure_node]/Ceph::Osd[/osd-data]/Exec[ceph-osd-mkfs-/osd-data

[ceph-users] mds cluster degraded

2014-11-17 Thread JIten Shah

After i rebuilt the OSD’s, the MDS went into the degraded mode and will not 
recover.


[jshah@Lab-cephmon001 ~]$ sudo tail -100f 
/var/log/ceph/ceph-mds.Lab-cephmon001.log
2014-11-17 17:55:27.855861 7fffef5d3700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/838757053 pipe(0x1e18000 sd=22 :6800 s=0 pgs=0 cs=0 l=0 
c=0x1e02c00).accept peer addr is really X.X.16.114:0/838757053 (socket is 
X.X.16.114:34672/0)
2014-11-17 17:57:27.855519 7fffef5d3700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/838757053 pipe(0x1e18000 sd=22 :6800 s=2 pgs=2 cs=1 l=0 
c=0x1e02c00).fault with nothing to send, going to standby
2014-11-17 17:58:47.883799 7fffef3d1700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/26738200 pipe(0x1e1be80 sd=23 :6800 s=0 pgs=0 cs=0 l=0 
c=0x1e04ba0).accept peer addr is really X.X.16.114:0/26738200 (socket is 
X.X.16.114:34699/0)
2014-11-17 18:00:47.882484 7fffef3d1700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/26738200 pipe(0x1e1be80 sd=23 :6800 s=2 pgs=2 cs=1 l=0 
c=0x1e04ba0).fault with nothing to send, going to standby
2014-11-17 18:01:47.886662 7fffef1cf700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/3673954317 pipe(0x1e1c380 sd=24 :6800 s=0 pgs=0 cs=0 l=0 
c=0x1e05540).accept peer addr is really X.X.16.114:0/3673954317 (socket is 
X.X.16.114:34718/0)
2014-11-17 18:03:47.885488 7fffef1cf700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/3673954317 pipe(0x1e1c380 sd=24 :6800 s=2 pgs=2 cs=1 l=0 
c=0x1e05540).fault with nothing to send, going to standby
2014-11-17 18:04:47.888983 7fffeefcd700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/3403131574 pipe(0x1e18a00 sd=25 :6800 s=0 pgs=0 cs=0 l=0 
c=0x1e05280).accept peer addr is really X.X.16.114:0/3403131574 (socket is 
X.X.16.114:34744/0)
2014-11-17 18:06:47.888427 7fffeefcd700  0 -- X.X.16.111:6800/3046050 >> 
X.X.16.114:0/3403131574 pipe(0x1e18a00 sd=25 :6800 s=2 pgs=2 cs=1 l=0 
c=0x1e05280).fault with nothing to send, going to standby
2014-11-17 20:02:03.558250 707de700 -1 mds.0.1 *** got signal Terminated ***
2014-11-17 20:02:03.558297 707de700  1 mds.0.1 suicide.  wanted down:dne, 
now up:active
2014-11-17 20:02:56.053339 77fe77a0  0 ceph version 0.80.5 
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 3424727
2014-11-17 20:02:56.121367 730e4700  1 mds.-1.0 handle_mds_map standby
2014-11-17 20:02:56.124343 730e4700  1 mds.0.2 handle_mds_map i am now 
mds.0.2
2014-11-17 20:02:56.124345 730e4700  1 mds.0.2 handle_mds_map state change 
up:standby --> up:replay
2014-11-17 20:02:56.124348 730e4700  1 mds.0.2 replay_start
2014-11-17 20:02:56.124359 730e4700  1 mds.0.2  recovery set is 
2014-11-17 20:02:56.124362 730e4700  1 mds.0.2  need osdmap epoch 93, have 
92
2014-11-17 20:02:56.124363 730e4700  1 mds.0.2  waiting for osdmap 93 
(which blacklists prior instance)


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] pg's degraded

2014-11-19 Thread JIten Shah

After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded mode. 
Sone are in the unclean and others are in the stale state. Somehow the MDS is 
also degraded. How do I recover the OSD’s and the MDS back to healthy ? Read 
through the documentation and on the web but no luck so far.

pg 2.33 is stuck unclean since forever, current state 
stale+active+degraded+remapped, last acting [3]
pg 0.30 is stuck unclean since forever, current state 
stale+active+degraded+remapped, last acting [3]
pg 1.31 is stuck unclean since forever, current state stale+active+degraded, 
last acting [2]
pg 2.32 is stuck unclean for 597129.903922, current state 
stale+active+degraded, last acting [2]
pg 0.2f is stuck unclean for 597129.903951, current state 
stale+active+degraded, last acting [2]
pg 1.2e is stuck unclean since forever, current state 
stale+active+degraded+remapped, last acting [3]
pg 2.2d is stuck unclean since forever, current state 
stale+active+degraded+remapped, last acting [2]
pg 0.2e is stuck unclean since forever, current state 
stale+active+degraded+remapped, last acting [3]
pg 1.2f is stuck unclean for 597129.904015, current state 
stale+active+degraded, last acting [2]
pg 2.2c is stuck unclean since forever, current state 
stale+active+degraded+remapped, last acting [3]
pg 0.2d is stuck stale for 422844.566858, current state stale+active+degraded, 
last acting [2]
pg 1.2c is stuck stale for 422598.539483, current state 
stale+active+degraded+remapped, last acting [3]
pg 2.2f is stuck stale for 422598.539488, current state 
stale+active+degraded+remapped, last acting [3]
pg 0.2c is stuck stale for 422598.539487, current state 
stale+active+degraded+remapped, last acting [3]
pg 1.2d is stuck stale for 422598.539492, current state 
stale+active+degraded+remapped, last acting [3]
pg 2.2e is stuck stale for 422598.539496, current state 
stale+active+degraded+remapped, last acting [3]
pg 0.2b is stuck stale for 422598.539491, current state 
stale+active+degraded+remapped, last acting [3]
pg 1.2a is stuck stale for 422598.539496, current state 
stale+active+degraded+remapped, last acting [3]
pg 2.29 is stuck stale for 422598.539504, current state 
stale+active+degraded+remapped, last acting [3]
.
.
.
6 ops are blocked > 2097.15 sec
3 ops are blocked > 2097.15 sec on osd.0
2 ops are blocked > 2097.15 sec on osd.2
1 ops are blocked > 2097.15 sec on osd.4
3 osds have slow requests
recovery 40/60 objects degraded (66.667%)
mds cluster is degraded
mds.Lab-cephmon001 at X.X.16.111:6800/3424727 rank 0 is replaying journal

—Jiten

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah

Yes, it was a healthy cluster and I had to rebuild because the OSD’s got 
accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of 
them.


[jshah@Lab-cephmon001 ~]$ ceph osd tree
# idweight  type name   up/down reweight
-1  0.5 root default
-2  0.0 host Lab-cephosd005
4   0.0 osd.4   up  1   
-3  0.0 host Lab-cephosd001
0   0.0 osd.0   up  1   
-4  0.0 host Lab-cephosd002
1   0.0 osd.1   up  1   
-5  0.0 host Lab-cephosd003
2   0.0 osd.2   up  1   
-6  0.0 host Lab-cephosd004
3   0.0 osd.3   up  1   


[jshah@Lab-cephmon001 ~]$ ceph pg 2.33 query
Error ENOENT: i don't have paid 2.33

—Jiten


On Nov 20, 2014, at 11:18 AM, Craig Lewis  wrote:

> Just to be clear, this is from a cluster that was healthy, had a disk 
> replaced, and hasn't returned to healthy?  It's not a new cluster that has 
> never been healthy, right?
> 
> Assuming it's an existing cluster, how many OSDs did you replace?  It almost 
> looks like you replaced multiple OSDs at the same time, and lost data because 
> of it.
> 
> Can you give us the output of `ceph osd tree`, and `ceph pg 2.33 query`?
> 
> 
> On Wed, Nov 19, 2014 at 2:14 PM, JIten Shah  wrote:
> After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded mode. 
> Sone are in the unclean and others are in the stale state. Somehow the MDS is 
> also degraded. How do I recover the OSD’s and the MDS back to healthy ? Read 
> through the documentation and on the web but no luck so far.
> 
> pg 2.33 is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 0.30 is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 1.31 is stuck unclean since forever, current state stale+active+degraded, 
> last acting [2]
> pg 2.32 is stuck unclean for 597129.903922, current state 
> stale+active+degraded, last acting [2]
> pg 0.2f is stuck unclean for 597129.903951, current state 
> stale+active+degraded, last acting [2]
> pg 1.2e is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 2.2d is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [2]
> pg 0.2e is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 1.2f is stuck unclean for 597129.904015, current state 
> stale+active+degraded, last acting [2]
> pg 2.2c is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 0.2d is stuck stale for 422844.566858, current state 
> stale+active+degraded, last acting [2]
> pg 1.2c is stuck stale for 422598.539483, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 2.2f is stuck stale for 422598.539488, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 0.2c is stuck stale for 422598.539487, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 1.2d is stuck stale for 422598.539492, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 2.2e is stuck stale for 422598.539496, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 0.2b is stuck stale for 422598.539491, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 1.2a is stuck stale for 422598.539496, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 2.29 is stuck stale for 422598.539504, current state 
> stale+active+degraded+remapped, last acting [3]
> .
> .
> .
> 6 ops are blocked > 2097.15 sec
> 3 ops are blocked > 2097.15 sec on osd.0
> 2 ops are blocked > 2097.15 sec on osd.2
> 1 ops are blocked > 2097.15 sec on osd.4
> 3 osds have slow requests
> recovery 40/60 objects degraded (66.667%)
> mds cluster is degraded
> mds.Lab-cephmon001 at X.X.16.111:6800/3424727 rank 0 is replaying journal
> 
> —Jiten
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah

Thanks for your help.

I was using puppet to install the OSD’s where it chooses a path over a device 
name. Hence it created the OSD in the path within the root volume since the 
path specified was incorrect.

And all 3 of the OSD’s were rebuilt at the same time because it was unused and 
we had not put any data in there.

Any way to recover from this or should i rebuild the cluster altogether.

—Jiten

On Nov 20, 2014, at 1:40 PM, Craig Lewis  wrote:

> So you have your crushmap set to choose osd instead of choose host?
> 
> Did you wait for the cluster to recover between each OSD rebuild?  If you 
> rebuilt all 3 OSDs at the same time (or without waiting for a complete 
> recovery between them), that would cause this problem.
> 
> 
> 
> On Thu, Nov 20, 2014 at 11:40 AM, JIten Shah  wrote:
> Yes, it was a healthy cluster and I had to rebuild because the OSD’s got 
> accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of 
> them.
> 
> 
> [jshah@Lab-cephmon001 ~]$ ceph osd tree
> # id  weight  type name   up/down reweight
> -10.5 root default
> -20.0 host Lab-cephosd005
> 4 0.0 osd.4   up  1   
> -30.0 host Lab-cephosd001
> 0 0.0 osd.0   up  1   
> -40.0 host Lab-cephosd002
> 1 0.0 osd.1   up  1   
> -50.0 host Lab-cephosd003
> 2 0.0 osd.2   up  1   
> -60.0 host Lab-cephosd004
> 3 0.0 osd.3   up  1   
> 
> 
> [jshah@Lab-cephmon001 ~]$ ceph pg 2.33 query
> Error ENOENT: i don't have paid 2.33
> 
> —Jiten
> 
> 
> On Nov 20, 2014, at 11:18 AM, Craig Lewis  wrote:
> 
>> Just to be clear, this is from a cluster that was healthy, had a disk 
>> replaced, and hasn't returned to healthy?  It's not a new cluster that has 
>> never been healthy, right?
>> 
>> Assuming it's an existing cluster, how many OSDs did you replace?  It almost 
>> looks like you replaced multiple OSDs at the same time, and lost data 
>> because of it.
>> 
>> Can you give us the output of `ceph osd tree`, and `ceph pg 2.33 query`?
>> 
>> 
>> On Wed, Nov 19, 2014 at 2:14 PM, JIten Shah  wrote:
>> After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded 
>> mode. Sone are in the unclean and others are in the stale state. Somehow the 
>> MDS is also degraded. How do I recover the OSD’s and the MDS back to healthy 
>> ? Read through the documentation and on the web but no luck so far.
>> 
>> pg 2.33 is stuck unclean since forever, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 0.30 is stuck unclean since forever, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 1.31 is stuck unclean since forever, current state stale+active+degraded, 
>> last acting [2]
>> pg 2.32 is stuck unclean for 597129.903922, current state 
>> stale+active+degraded, last acting [2]
>> pg 0.2f is stuck unclean for 597129.903951, current state 
>> stale+active+degraded, last acting [2]
>> pg 1.2e is stuck unclean since forever, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 2.2d is stuck unclean since forever, current state 
>> stale+active+degraded+remapped, last acting [2]
>> pg 0.2e is stuck unclean since forever, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 1.2f is stuck unclean for 597129.904015, current state 
>> stale+active+degraded, last acting [2]
>> pg 2.2c is stuck unclean since forever, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 0.2d is stuck stale for 422844.566858, current state 
>> stale+active+degraded, last acting [2]
>> pg 1.2c is stuck stale for 422598.539483, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 2.2f is stuck stale for 422598.539488, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 0.2c is stuck stale for 422598.539487, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 1.2d is stuck stale for 422598.539492, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 2.2e is stuck stale for 422598.539496, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 0.2b is stuck stale for 422598.539491, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 1.2a is stuck stale for 422598.539496, current state 
>> stale+active+degraded+remapped, last acting [3]
>> pg 2.29 is s

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah

Ok. Thanks.

—Jiten

On Nov 20, 2014, at 2:14 PM, Craig Lewis  wrote:

> If there's no data to lose, tell Ceph to re-create all the missing PGs.
> 
> ceph pg force_create_pg 2.33
> 
> Repeat for each of the missing PGs.  If that doesn't do anything, you might 
> need to tell Ceph that you lost the OSDs.  For each OSD you moved, run ceph 
> osd lost , then try the force_create_pg command again.
> 
> If that doesn't work, you can keep fighting with it, but it'll be faster to 
> rebuild the cluster.
> 
> 
> 
> On Thu, Nov 20, 2014 at 1:45 PM, JIten Shah  wrote:
> Thanks for your help.
> 
> I was using puppet to install the OSD’s where it chooses a path over a device 
> name. Hence it created the OSD in the path within the root volume since the 
> path specified was incorrect.
> 
> And all 3 of the OSD’s were rebuilt at the same time because it was unused 
> and we had not put any data in there.
> 
> Any way to recover from this or should i rebuild the cluster altogether.
> 
> —Jiten
> 
> On Nov 20, 2014, at 1:40 PM, Craig Lewis  wrote:
> 
>> So you have your crushmap set to choose osd instead of choose host?
>> 
>> Did you wait for the cluster to recover between each OSD rebuild?  If you 
>> rebuilt all 3 OSDs at the same time (or without waiting for a complete 
>> recovery between them), that would cause this problem.
>> 
>> 
>> 
>> On Thu, Nov 20, 2014 at 11:40 AM, JIten Shah  wrote:
>> Yes, it was a healthy cluster and I had to rebuild because the OSD’s got 
>> accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of 
>> them.
>> 
>> 
>> [jshah@Lab-cephmon001 ~]$ ceph osd tree
>> # id weight  type name   up/down reweight
>> -1   0.5 root default
>> -2   0.0 host Lab-cephosd005
>> 40.0 osd.4   up  1   
>> -3   0.0 host Lab-cephosd001
>> 00.0 osd.0   up  1   
>> -4   0.0 host Lab-cephosd002
>> 10.0 osd.1   up  1   
>> -5   0.0 host Lab-cephosd003
>> 20.0 osd.2   up  1   
>> -6   0.0 host Lab-cephosd004
>> 30.0 osd.3   up  1   
>> 
>> 
>> [jshah@Lab-cephmon001 ~]$ ceph pg 2.33 query
>> Error ENOENT: i don't have paid 2.33
>> 
>> —Jiten
>> 
>> 
>> On Nov 20, 2014, at 11:18 AM, Craig Lewis  wrote:
>> 
>>> Just to be clear, this is from a cluster that was healthy, had a disk 
>>> replaced, and hasn't returned to healthy?  It's not a new cluster that has 
>>> never been healthy, right?
>>> 
>>> Assuming it's an existing cluster, how many OSDs did you replace?  It 
>>> almost looks like you replaced multiple OSDs at the same time, and lost 
>>> data because of it.
>>> 
>>> Can you give us the output of `ceph osd tree`, and `ceph pg 2.33 query`?
>>> 
>>> 
>>> On Wed, Nov 19, 2014 at 2:14 PM, JIten Shah  wrote:
>>> After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded 
>>> mode. Sone are in the unclean and others are in the stale state. Somehow 
>>> the MDS is also degraded. How do I recover the OSD’s and the MDS back to 
>>> healthy ? Read through the documentation and on the web but no luck so far.
>>> 
>>> pg 2.33 is stuck unclean since forever, current state 
>>> stale+active+degraded+remapped, last acting [3]
>>> pg 0.30 is stuck unclean since forever, current state 
>>> stale+active+degraded+remapped, last acting [3]
>>> pg 1.31 is stuck unclean since forever, current state 
>>> stale+active+degraded, last acting [2]
>>> pg 2.32 is stuck unclean for 597129.903922, current state 
>>> stale+active+degraded, last acting [2]
>>> pg 0.2f is stuck unclean for 597129.903951, current state 
>>> stale+active+degraded, last acting [2]
>>> pg 1.2e is stuck unclean since forever, current state 
>>> stale+active+degraded+remapped, last acting [3]
>>> pg 2.2d is stuck unclean since forever, current state 
>>> stale+active+degraded+remapped, last acting [2]
>>> pg 0.2e is stuck unclean since forever, current state 
>>> stale+active+degraded+remapped, last acting [3]
>>> pg 1.2f is stuck unclean for 597129.904015, current state 
>>> stale+active+degraded, last acting [2]
>>> pg 2.2c is stuck unclean since forever, current state 
>>> stale+active+degraded+remapped, last acting [3]
>>>

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah

Hi Craig,

Recreating the missing PG’s fixed it.  Thanks for your help.

But when I tried to mount the Filesystem, it gave me the “mount error 5”. I 
tried to restart the MDS server but it won’t work. It tells me that it’s 
laggy/unresponsive.

BTW, all these machines are VM’s.

[jshah@Lab-cephmon001 ~]$ ceph health detail
HEALTH_WARN mds cluster is degraded; mds Lab-cephmon001 is laggy
mds cluster is degraded
mds.Lab-cephmon001 at 17.147.16.111:6800/3745284 rank 0 is replaying journal
mds.Lab-cephmon001 at 17.147.16.111:6800/3745284 is laggy/unresponsive


—Jiten

On Nov 20, 2014, at 4:20 PM, JIten Shah  wrote:

> Ok. Thanks.
> 
> —Jiten
> 
> On Nov 20, 2014, at 2:14 PM, Craig Lewis  wrote:
> 
>> If there's no data to lose, tell Ceph to re-create all the missing PGs.
>> 
>> ceph pg force_create_pg 2.33
>> 
>> Repeat for each of the missing PGs.  If that doesn't do anything, you might 
>> need to tell Ceph that you lost the OSDs.  For each OSD you moved, run ceph 
>> osd lost , then try the force_create_pg command again.
>> 
>> If that doesn't work, you can keep fighting with it, but it'll be faster to 
>> rebuild the cluster.
>> 
>> 
>> 
>> On Thu, Nov 20, 2014 at 1:45 PM, JIten Shah  wrote:
>> Thanks for your help.
>> 
>> I was using puppet to install the OSD’s where it chooses a path over a 
>> device name. Hence it created the OSD in the path within the root volume 
>> since the path specified was incorrect.
>> 
>> And all 3 of the OSD’s were rebuilt at the same time because it was unused 
>> and we had not put any data in there.
>> 
>> Any way to recover from this or should i rebuild the cluster altogether.
>> 
>> —Jiten
>> 
>> On Nov 20, 2014, at 1:40 PM, Craig Lewis  wrote:
>> 
>>> So you have your crushmap set to choose osd instead of choose host?
>>> 
>>> Did you wait for the cluster to recover between each OSD rebuild?  If you 
>>> rebuilt all 3 OSDs at the same time (or without waiting for a complete 
>>> recovery between them), that would cause this problem.
>>> 
>>> 
>>> 
>>> On Thu, Nov 20, 2014 at 11:40 AM, JIten Shah  wrote:
>>> Yes, it was a healthy cluster and I had to rebuild because the OSD’s got 
>>> accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of 
>>> them.
>>> 
>>> 
>>> [jshah@Lab-cephmon001 ~]$ ceph osd tree
>>> # idweight  type name   up/down reweight
>>> -1  0.5 root default
>>> -2  0.0 host Lab-cephosd005
>>> 4   0.0 osd.4   up  1   
>>> -3  0.0 host Lab-cephosd001
>>> 0   0.0 osd.0   up  1   
>>> -4  0.0 host Lab-cephosd002
>>> 1   0.0 osd.1   up  1   
>>> -5  0.0 host Lab-cephosd003
>>> 2   0.0 osd.2   up  1   
>>> -6  0.0 host Lab-cephosd004
>>> 3   0.0 osd.3   up  1   
>>> 
>>> 
>>> [jshah@Lab-cephmon001 ~]$ ceph pg 2.33 query
>>> Error ENOENT: i don't have paid 2.33
>>> 
>>> —Jiten
>>> 
>>> 
>>> On Nov 20, 2014, at 11:18 AM, Craig Lewis  wrote:
>>> 
>>>> Just to be clear, this is from a cluster that was healthy, had a disk 
>>>> replaced, and hasn't returned to healthy?  It's not a new cluster that has 
>>>> never been healthy, right?
>>>> 
>>>> Assuming it's an existing cluster, how many OSDs did you replace?  It 
>>>> almost looks like you replaced multiple OSDs at the same time, and lost 
>>>> data because of it.
>>>> 
>>>> Can you give us the output of `ceph osd tree`, and `ceph pg 2.33 query`?
>>>> 
>>>> 
>>>> On Wed, Nov 19, 2014 at 2:14 PM, JIten Shah  wrote:
>>>> After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded 
>>>> mode. Sone are in the unclean and others are in the stale state. Somehow 
>>>> the MDS is also degraded. How do I recover the OSD’s and the MDS back to 
>>>> healthy ? Read through the documentation and on the web but no luck so far.
>>>> 
>>>> pg 2.33 is stuck unclean since forever, current state 
>>>> stale+active+degraded+remapped, last acting [3]
>>>> pg 0.30 is stuck unclean since forever, current state 
>>>> stale+active+degraded+remapped, last acting [3]
>>>> pg 1.31 is

Re: [ceph-users] pg's degraded

2014-11-21 Thread JIten Shah

Thanks Michael. That was a good idea.

I did:

1. sudo service ceph stop mds

2. ceph mds newfs 1 0 —yes-i-really-mean-it (where 1 and 0 are pool ID’s for 
metadata and data)

3. ceph health (It was healthy now!!!)

4. sudo servie ceph start mds.$(hostname -s)

And I am back in business.

Thanks again.

—Jiten



On Nov 20, 2014, at 5:47 PM, Michael Kuriger  wrote:

> Maybe delete the pool and start over?
>  
>  
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> JIten Shah
> Sent: Thursday, November 20, 2014 5:46 PM
> To: Craig Lewis
> Cc: ceph-users
> Subject: Re: [ceph-users] pg's degraded
>  
> Hi Craig,
>  
> Recreating the missing PG’s fixed it.  Thanks for your help.
>  
> But when I tried to mount the Filesystem, it gave me the “mount error 5”. I 
> tried to restart the MDS server but it won’t work. It tells me that it’s 
> laggy/unresponsive.
>  
> BTW, all these machines are VM’s.
>  
> [jshah@Lab-cephmon001 ~]$ ceph health detail
> HEALTH_WARN mds cluster is degraded; mds Lab-cephmon001 is laggy
> mds cluster is degraded
> mds.Lab-cephmon001 at 17.147.16.111:6800/3745284 rank 0 is replaying journal
> mds.Lab-cephmon001 at 17.147.16.111:6800/3745284 is laggy/unresponsive
>  
>  
> —Jiten
>  
> On Nov 20, 2014, at 4:20 PM, JIten Shah  wrote:
> 
> 
> Ok. Thanks.
>  
> —Jiten
>  
> On Nov 20, 2014, at 2:14 PM, Craig Lewis  wrote:
> 
> 
> If there's no data to lose, tell Ceph to re-create all the missing PGs.
>  
> ceph pg force_create_pg 2.33
>  
> Repeat for each of the missing PGs.  If that doesn't do anything, you might 
> need to tell Ceph that you lost the OSDs.  For each OSD you moved, run ceph 
> osd lost , then try the force_create_pg command again.
>  
> If that doesn't work, you can keep fighting with it, but it'll be faster to 
> rebuild the cluster.
>  
>  
>  
> On Thu, Nov 20, 2014 at 1:45 PM, JIten Shah  wrote:
> Thanks for your help.
>  
> I was using puppet to install the OSD’s where it chooses a path over a device 
> name. Hence it created the OSD in the path within the root volume since the 
> path specified was incorrect.
>  
> And all 3 of the OSD’s were rebuilt at the same time because it was unused 
> and we had not put any data in there.
>  
> Any way to recover from this or should i rebuild the cluster altogether.
>  
> —Jiten
>  
> On Nov 20, 2014, at 1:40 PM, Craig Lewis  wrote:
> 
> 
> So you have your crushmap set to choose osd instead of choose host?
>  
> Did you wait for the cluster to recover between each OSD rebuild?  If you 
> rebuilt all 3 OSDs at the same time (or without waiting for a complete 
> recovery between them), that would cause this problem.
>  
>  
>  
> On Thu, Nov 20, 2014 at 11:40 AM, JIten Shah  wrote:
> Yes, it was a healthy cluster and I had to rebuild because the OSD’s got 
> accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of 
> them.
>  
>  
> [jshah@Lab-cephmon001 ~]$ ceph osd tree
> # id weight type name up/down reweight
> -1 0.5 root default
> -2 0.0 host Lab-cephosd005
> 4 0.0 osd.4 up 1
> -3 0.0 host Lab-cephosd001
> 0 0.0 osd.0 up 1
> -4 0.0 host Lab-cephosd002
> 1 0.0 osd.1 up 1
> -5 0.0 host Lab-cephosd003
> 2 0.0 osd.2 up 1
> -6 0.0 host Lab-cephosd004
> 3 0.0 osd.3 up 1
>  
>  
> [jshah@Lab-cephmon001 ~]$ ceph pg 2.33 query
> Error ENOENT: i don't have paid 2.33
>  
> —Jiten
>  
>  
> On Nov 20, 2014, at 11:18 AM, Craig Lewis  wrote:
> 
> 
> Just to be clear, this is from a cluster that was healthy, had a disk 
> replaced, and hasn't returned to healthy?  It's not a new cluster that has 
> never been healthy, right?
>  
> Assuming it's an existing cluster, how many OSDs did you replace?  It almost 
> looks like you replaced multiple OSDs at the same time, and lost data because 
> of it.
>  
> Can you give us the output of `ceph osd tree`, and `ceph pg 2.33 query`?
>  
>  
> On Wed, Nov 19, 2014 at 2:14 PM, JIten Shah  wrote:
> After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded mode. 
> Sone are in the unclean and others are in the stale state. Somehow the MDS is 
> also degraded. How do I recover the OSD’s and the MDS back to healthy ? Read 
> through the documentation and on the web but no luck so far.
>  
> pg 2.33 is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 0.30 is stuck unclean since forever, current state 
> stale+active+degraded+remapped, last acting [3]
> pg 1.31 is stuck unclean since forever, current state stale+active+degraded, 
&g

[ceph-users] Multiple MDS servers...

2014-11-21 Thread JIten Shah

I am trying to setup 3 MDS servers (one on each MON) but after I am done 
setting up the first one, it give me below error when I try to start it on the 
other ones. I understand that only 1 MDS is functional at a time, but I thought 
you can have multiple of them up, incase the first one dies? Or is that not 
true?

[jshah@Lab-cephmon002 mds.Lab-cephmon002]$ sudo service ceph start 
mds.Lab-cephmon002
/etc/init.d/ceph: mds.Lab-cephmon002 not found (/etc/ceph/ceph.conf defines 
mon.Lab-cephmon002 mds.cephmon002 , /var/lib/ceph defines mon.Lab-cephmon002 
mds.cephmon002)

[jshah@Lab-cephmon002 mds.Lab-cephmon002]$ ls -l 
/var/lib/ceph/mds/mds.Lab-cephmon002/
total 0
-rwxr-xr-x 1 root root 0 Nov 14 18:42 done
-rwxr-xr-x 1 root root 0 Nov 14 18:42 sysvinit

[jshah@Lab-cephmon002 mds.Lab-cephmon002]$ grep cephmon002 /etc/ceph/ceph.conf 
mon_initial_members = Lab-cephmon001, Lab-cephmon002, Lab-cephmon003
mds_data = /var/lib/ceph/mds/mds.Lab-cephmon002
keyring = /var/lib/ceph/mds/mds.Lab-cephmon002/keyring

—Jiten___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds cluster degraded

2014-11-21 Thread JIten Shah

This got taken care of after I deleted the pools for metadata and data and 
started it again. 

I did:

1. sudo service ceph stop mds

2. ceph mds newfs 1 0 —yes-i-really-mean-it (where 1 and 0 are pool ID’s for 
metadata and data)

3. ceph health (It was healthy now!!!)

4. sudo servie ceph start mds.$(hostname -s)

And I am back in business.

On Nov 18, 2014, at 3:27 PM, Gregory Farnum  wrote:

> Hmm, last time we saw this it meant that the MDS log had gotten
> corrupted somehow and was a little short (in that case due to the OSDs
> filling up). What do you mean by "rebuilt the OSDs"?
> -Greg
> 
> On Mon, Nov 17, 2014 at 12:52 PM, JIten Shah  wrote:
>> After i rebuilt the OSD’s, the MDS went into the degraded mode and will not
>> recover.
>> 
>> 
>> [jshah@Lab-cephmon001 ~]$ sudo tail -100f
>> /var/log/ceph/ceph-mds.Lab-cephmon001.log
>> 2014-11-17 17:55:27.855861 7fffef5d3700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/838757053 pipe(0x1e18000 sd=22 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e02c00).accept peer addr is really X.X.16.114:0/838757053 (socket is
>> X.X.16.114:34672/0)
>> 2014-11-17 17:57:27.855519 7fffef5d3700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/838757053 pipe(0x1e18000 sd=22 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e02c00).fault with nothing to send, going to standby
>> 2014-11-17 17:58:47.883799 7fffef3d1700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/26738200 pipe(0x1e1be80 sd=23 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e04ba0).accept peer addr is really X.X.16.114:0/26738200 (socket is
>> X.X.16.114:34699/0)
>> 2014-11-17 18:00:47.882484 7fffef3d1700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/26738200 pipe(0x1e1be80 sd=23 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e04ba0).fault with nothing to send, going to standby
>> 2014-11-17 18:01:47.886662 7fffef1cf700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3673954317 pipe(0x1e1c380 sd=24 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e05540).accept peer addr is really X.X.16.114:0/3673954317 (socket is
>> X.X.16.114:34718/0)
>> 2014-11-17 18:03:47.885488 7fffef1cf700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3673954317 pipe(0x1e1c380 sd=24 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e05540).fault with nothing to send, going to standby
>> 2014-11-17 18:04:47.888983 7fffeefcd700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3403131574 pipe(0x1e18a00 sd=25 :6800 s=0 pgs=0 cs=0 l=0
>> c=0x1e05280).accept peer addr is really X.X.16.114:0/3403131574 (socket is
>> X.X.16.114:34744/0)
>> 2014-11-17 18:06:47.888427 7fffeefcd700  0 -- X.X.16.111:6800/3046050 >>
>> X.X.16.114:0/3403131574 pipe(0x1e18a00 sd=25 :6800 s=2 pgs=2 cs=1 l=0
>> c=0x1e05280).fault with nothing to send, going to standby
>> 2014-11-17 20:02:03.558250 707de700 -1 mds.0.1 *** got signal Terminated
>> ***
>> 2014-11-17 20:02:03.558297 707de700  1 mds.0.1 suicide.  wanted
>> down:dne, now up:active
>> 2014-11-17 20:02:56.053339 77fe77a0  0 ceph version 0.80.5
>> (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 3424727
>> 2014-11-17 20:02:56.121367 730e4700  1 mds.-1.0 handle_mds_map standby
>> 2014-11-17 20:02:56.124343 730e4700  1 mds.0.2 handle_mds_map i am now
>> mds.0.2
>> 2014-11-17 20:02:56.124345 730e4700  1 mds.0.2 handle_mds_map state
>> change up:standby --> up:replay
>> 2014-11-17 20:02:56.124348 730e4700  1 mds.0.2 replay_start
>> 2014-11-17 20:02:56.124359 730e4700  1 mds.0.2  recovery set is
>> 2014-11-17 20:02:56.124362 730e4700  1 mds.0.2  need osdmap epoch 93,
>> have 92
>> 2014-11-17 20:02:56.124363 730e4700  1 mds.0.2  waiting for osdmap 93
>> (which blacklists prior instance)
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multiple MDS servers...

2014-11-23 Thread JIten Shah

Hi Greg,

I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph 
folders. I have always tried to set it up as mds.lab-cephmon002, so I am 
wondering where is it getting that value from?

—Jiten

On Nov 23, 2014, at 9:32 PM, Gregory Farnum  wrote:

> On Fri, Nov 21, 2014 at 3:21 PM, JIten Shah  wrote:
>> I am trying to setup 3 MDS servers (one on each MON) but after I am done
>> setting up the first one, it give me below error when I try to start it on
>> the other ones. I understand that only 1 MDS is functional at a time, but I
>> thought you can have multiple of them up, incase the first one dies? Or is
>> that not true?
>> 
>> [jshah@Lab-cephmon002 mds.Lab-cephmon002]$ sudo service ceph start
>> mds.Lab-cephmon002
>> /etc/init.d/ceph: mds.Lab-cephmon002 not found (/etc/ceph/ceph.conf defines
>> mon.Lab-cephmon002 mds.cephmon002 , /var/lib/ceph defines mon.Lab-cephmon002
>> mds.cephmon002)
>> 
>> [jshah@Lab-cephmon002 mds.Lab-cephmon002]$ ls -l
>> /var/lib/ceph/mds/mds.Lab-cephmon002/
>> total 0
>> -rwxr-xr-x 1 root root 0 Nov 14 18:42 done
>> -rwxr-xr-x 1 root root 0 Nov 14 18:42 sysvinit
>> 
>> [jshah@Lab-cephmon002 mds.Lab-cephmon002]$ grep cephmon002
>> /etc/ceph/ceph.conf
>> mon_initial_members = Lab-cephmon001, Lab-cephmon002, Lab-cephmon003
>> mds_data = /var/lib/ceph/mds/mds.Lab-cephmon002
>> keyring = /var/lib/ceph/mds/mds.Lab-cephmon002/keyring
> 
> As the error says, you are trying to start up something called
> "mds.Lab-cephmon002", but the ceph.conf and ceph folder hierarchy only
> defines a "mds.cephmon002".
> -Greg

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multiple MDS servers...

2014-12-08 Thread JIten Shah

Do I need to update the ceph.conf  to support multiple MDS servers?

—Jiten

On Nov 24, 2014, at 6:56 AM, Gregory Farnum  wrote:

> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
>> Hi Greg,
>> 
>> I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
>> folders. I have always tried to set it up as mds.lab-cephmon002, so I am
>> wondering where is it getting that value from?
> 
> No idea, sorry. Probably some odd mismatch between expectations and
> how the names are actually being parsed and saved.
> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multiple MDS servers...

2014-12-09 Thread JIten Shah

Hi Greg,

Sorry for the confusion.  I am not looking for active/active configuration 
which I know is not supported but what documentation can I refer to for 
installing an active/stndby MDSes ?

I tried looking on Ceph.com but could not find that explains how to setup an 
active/standby MDS cluster.

Thanks.

—Jiten

On Dec 9, 2014, at 12:50 PM, Gregory Farnum  wrote:

> You'll need to be a little more explicit about your question. In
> general there is nothing special that needs to be done. If you're
> trying to get multiple active MDSes (instead of active and
> standby/standby-replay/etc) you'll need to tell the monitors to
> increase the mds num (check the docs; this is not recommended right
> now). You obviously need to add an MDS entry to one of your nodes
> somewhere, the mechanism for which can differ based on how you're
> managing your cluster.
> But you don't need to do anything explicit like tell everybody
> globally that there are multiple MDSes.
> -Greg
> 
> On Mon, Dec 8, 2014 at 10:48 AM, JIten Shah  wrote:
>> Do I need to update the ceph.conf  to support multiple MDS servers?
>> 
>> —Jiten
>> 
>> On Nov 24, 2014, at 6:56 AM, Gregory Farnum  wrote:
>> 
>>> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
>>>> Hi Greg,
>>>> 
>>>> I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
>>>> folders. I have always tried to set it up as mds.lab-cephmon002, so I am
>>>> wondering where is it getting that value from?
>>> 
>>> No idea, sorry. Probably some odd mismatch between expectations and
>>> how the names are actually being parsed and saved.
>>> -Greg
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Multiple MDS servers...

2014-12-09 Thread JIten Shah

I have been trying to do that for quite some time now (using puppet) but ti 
keeps failing. Here’s what the error says.

Error: Could not start Service[ceph-mds]: Execution of '/sbin/service ceph 
start mds.Lab-cephmon003' returned 1: /etc/init.d/ceph: mds.Lab-cephmon003 not 
found (/etc/ceph/ceph.conf defines mon.Lab-cephmon003 , /var/lib/ceph defines 
mon.Lab-cephmon003)
Wrapped exception:
Execution of '/sbin/service ceph start mds.Lab-cephmon003' returned 1: 
/etc/init.d/ceph: mds.Lab-cephmon003 not found (/etc/ceph/ceph.conf defines 
mon.Lab-cephmon003 , /var/lib/ceph defines mon.Lab-cephmon003)
Error: /Stage[main]/Ceph::Mds/Service[ceph-mds]/ensure: change from stopped to 
running failed: Could not start Service[ceph-mds]: Execution of '/sbin/service 
ceph start mds.Lab-cephmon003' returned 1: /etc/init.d/ceph: mds.Lab-cephmon003 
not found (/etc/ceph/ceph.conf defines mon.Lab-cephmon003 , /var/lib/ceph 
defines mon.Lab-cephmon003)



On Dec 9, 2014, at 3:12 PM, Christopher Armstrong  wrote:

> JIten,
> 
> You simply start more metadata servers. You'll notice when you inspect the 
> cluster health that one will be the active, and the rest will be standbys.
> 
> Chris
> 
> On Tue, Dec 9, 2014 at 3:10 PM, JIten Shah  wrote:
> Hi Greg,
> 
> Sorry for the confusion.  I am not looking for active/active configuration 
> which I know is not supported but what documentation can I refer to for 
> installing an active/stndby MDSes ?
> 
> I tried looking on Ceph.com but could not find that explains how to setup an 
> active/standby MDS cluster.
> 
> Thanks.
> 
> —Jiten
> 
> On Dec 9, 2014, at 12:50 PM, Gregory Farnum  wrote:
> 
> > You'll need to be a little more explicit about your question. In
> > general there is nothing special that needs to be done. If you're
> > trying to get multiple active MDSes (instead of active and
> > standby/standby-replay/etc) you'll need to tell the monitors to
> > increase the mds num (check the docs; this is not recommended right
> > now). You obviously need to add an MDS entry to one of your nodes
> > somewhere, the mechanism for which can differ based on how you're
> > managing your cluster.
> > But you don't need to do anything explicit like tell everybody
> > globally that there are multiple MDSes.
> > -Greg
> >
> > On Mon, Dec 8, 2014 at 10:48 AM, JIten Shah  wrote:
> >> Do I need to update the ceph.conf  to support multiple MDS servers?
> >>
> >> —Jiten
> >>
> >> On Nov 24, 2014, at 6:56 AM, Gregory Farnum  wrote:
> >>
> >>> On Sun, Nov 23, 2014 at 10:36 PM, JIten Shah  wrote:
> >>>> Hi Greg,
> >>>>
> >>>> I haven’t setup anything in ceph.conf as mds.cephmon002 nor in any ceph
> >>>> folders. I have always tried to set it up as mds.lab-cephmon002, so I am
> >>>> wondering where is it getting that value from?
> >>>
> >>> No idea, sorry. Probably some odd mismatch between expectations and
> >>> how the names are actually being parsed and saved.
> >>> -Greg
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New Cluster (0.87), Missing Default Pools?

2014-12-18 Thread JIten Shah

So what happens if we upgrade from Firefly to Giant? Do we loose the pools?

—Jiten
On Dec 18, 2014, at 5:12 AM, Thomas Lemarchand 
 wrote:

> I remember reading somewhere (maybe in changelogs) that default pools
> were not created automatically anymore.
> 
> You can create pools you need yourself.
> 
> -- 
> Thomas Lemarchand
> Cloud Solutions SAS - Responsable des systèmes d'information
> 
> 
> 
> On jeu., 2014-12-18 at 06:52 -0600, Dyweni - Ceph-Users wrote:
>> Hi All,
>> 
>> 
>> Just setup the monitor for a new cluster based on Giant (0.87) and I 
>> find that only the 'rbd' pool was created automatically.  I don't see 
>> the 'data' or 'metadata' pools in 'ceph osd lspools' or the log files.  
>> I haven't setup any OSDs or MDSs yet.  I'm following the manual 
>> deployment guide.
>> 
>> Would you mind looking over the setup details/logs below and letting me 
>> know my mistake please?
>> 
>> 
>> 
>> Here's my /etc/ceph/ceph.conf file:
>> ---
>> [global]
>> fsid = xx
>> 
>> public network = xx.xx.xx.xx/xx
>> cluster network = xx.xx.xx.xx/xx
>> 
>> auth cluster required = cephx
>> auth service required = cephx
>> auth client required = cephx
>> 
>> osd pool default size = 2
>> osd pool default min size = 1
>> 
>> osd pool default pg num = 100
>> osd pool default pgp num = 100
>> 
>> [mon]
>> mon initial members = a
>> 
>> [mon.a]
>> host = xx
>> mon addr = xx.xx.xx.xx
>> ---
>> 
>> 
>> Here's the commands used to setup the monitor:
>> ---
>> ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. 
>> --cap mon 'allow *'
>> ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring 
>> --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 
>> 'allow *' --cap mds 'allow'
>> ceph-authtool /tmp/ceph.mon.keyring --import-keyring 
>> /etc/ceph/ceph.client.admin.keyring
>> monmaptool --create --add xx xx.xx.xx.xx --fsid xx /tmp/monmap
>> mkdir /var/lib/ceph/mon/ceph-a
>> ceph-mon --mkfs -i a --monmap /tmp/monmap --keyring 
>> /tmp/ceph.mon.keyring
>> /etc/init.d/ceph-mon.a start
>> ---
>> 
>> 
>> Here's the ceph-mon.a logfile:
>> ---
>> 2014-12-18 12:35:45.768752 7fb00df94780  0 ceph version 0.87 
>> (c51c8f9d80fa4e0168aa52685b8de40e42758578), process ceph-mon, pid 3225
>> 2014-12-18 12:35:45.856851 7fb00df94780  0 mon.a does not exist in 
>> monmap, will attempt to join an existing cluster
>> 2014-12-18 12:35:45.857069 7fb00df94780  0 using public_addr 
>> xx.xx.xx.xx:0/0 -> xx.xx.xx.xx:6789/0
>> 2014-12-18 12:35:45.857126 7fb00df94780  0 starting mon.a rank -1 at 
>> xx.xx.xx.xx:6789/0 mon_data /var/lib/ceph/mon/ceph-a fsid xx
>> 2014-12-18 12:35:45.857330 7fb00df94780  1 mon.a@-1(probing) e0 preinit 
>> fsid xx
>> 2014-12-18 12:35:45.857402 7fb00df94780  1 mon.a@-1(probing) e0  
>> initial_members a, filtering seed monmap
>> 2014-12-18 12:35:45.858322 7fb00df94780  0 mon.a@-1(probing) e0  my rank 
>> is now 0 (was -1)
>> 2014-12-18 12:35:45.858360 7fb00df94780  1 mon.a@0(probing) e0 
>> win_standalone_election
>> 2014-12-18 12:35:45.859803 7fb00df94780  0 log_channel(cluster) log 
>> [INF] : mon.a@0 won leader election with quorum 0
>> 2014-12-18 12:35:45.863846 7fb008d4b700  1 
>> mon.a@0(leader).paxosservice(pgmap 0..0) refresh upgraded, format 1 -> 0
>> 2014-12-18 12:35:45.863867 7fb008d4b700  1 mon.a@0(leader).pg v0 
>> on_upgrade discarding in-core PGMap
>> 2014-12-18 12:35:45.865662 7fb008d4b700  1 
>> mon.a@0(leader).paxosservice(auth 0..0) refresh upgraded, format 1 -> 0
>> 2014-12-18 12:35:45.865719 7fb008d4b700  1 mon.a@0(probing) e1 
>> win_standalone_election
>> 2014-12-18 12:35:45.867394 7fb008d4b700  0 log_channel(cluster) log 
>> [INF] : mon.a@0 won leader election with quorum 0
>> 2014-12-18 12:35:46.003223 7fb008d4b700  0 log_channel(cluster) log 
>> [INF] : monmap e1: 1 mons at {a=xx.xx.xx.xx:6789/0}
>> 2014-12-18 12:35:46.040555 7fb008d4b700  1 
>> mon.a@0(leader).paxosservice(auth 0..0) refresh upgraded, format 1 -> 0
>> 2014-12-18 12:35:46.087081 7fb008d4b700  0 log_channel(cluster) log 
>> [INF] : pgmap v1: 0 pgs: ; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
>> 2014-12-18 12:35:46.141415 7fb008d4b700  0 mon.a@0(leader).mds e1 
>> print_map
>> epoch   1
>> flags   0
>> created 0.00
>> modified2014-12-18 12:35:46.038418
>> tableserver 0
>> root0
>> session_timeout 0
>> session_autoclose   0
>> max_file_size   0
>> last_failure0
>> last_failure_osd_epoch  0
>> compat  compat={},rocompat={},incompat={}
>> max_mds 0
>> in
>> up  {}
>> failed
>> stopped
>> data_pools
>> metadata_pool   0
>> inline_data disabled
>> 
>> 2014-12-18 12:35:46.151117 7fb008d4b700  0 log_channel(cluster) log 
>> [INF] : mdsmap e1: 0/0/0 up
>> 2014-12-18 12:35:46.152873 7fb008d4b700  1 mon.a@0(leader).osd e1 e1: 0 
>> osds: 0 up, 0 in
>> 2014-12-18 12:35:46.154551 7fb008d4b700  0 mon.a@

48 matches

Mail list logo