date:20141015

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-15 Thread Jasper Siero

Hello Greg,

The dump and reset of the journal was succesful:

[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
/var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
--dump-journal 0 journaldumptgho-mon001
journal is 9483323613~134215459
read 134213311 bytes at offset 9483323613
wrote 134213311 bytes at offset 9483323613 to journaldumptgho-mon001
NOTE: this is a _sparse_ file; you can
$ tar cSzf journaldumptgho-mon001.tgz journaldumptgho-mon001
  to efficiently compress it while preserving sparseness.

[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
/var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
--reset-journal 0
old journal was 9483323613~134215459
new journal start will be 9621733376 (4194304 bytes past old end)
writing journal head
writing EResetJournal entry
done


Undumping the journal was not successful and looking into the error 
"client_lock.is_locked()" is showed several times. The mds is not running when 
I start the undumping so maybe have forgot something?

[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
/var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
--undump-journal 0 journaldumptgho-mon001
undump journaldumptgho-mon001
start 9483323613 len 134213311
writing header 200.
osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' 
thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287
osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
 1: /usr/bin/ceph-mds() [0x80f15e]
 2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
 3: (main()+0x1632) [0x569c62]
 4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
 5: /usr/bin/ceph-mds() [0x567d99]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.
2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 
'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 
2014-10-15 09:09:32.020287
osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())

 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
 1: /usr/bin/ceph-mds() [0x80f15e]
 2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
 3: (main()+0x1632) [0x569c62]
 4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
 5: /usr/bin/ceph-mds() [0x567d99]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

 0> 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In 
function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 
time 2014-10-15 09:09:32.020287
osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())

 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c
[root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --p8a65c2c0feba6)
 1: /usr/bin/ceph-mds() [0x80f15e]
 2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
 3: (main()+0x1632) [0x569c62]
 4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
 5: /usr/bin/ceph-mds() [0x567d99]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
 in thread 7fec3e5ad7a0
 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
 1: /usr/bin/ceph-mds() [0x82ef61]
 2: (()+0xf710) [0x7fec3d9a6710]
 3: (gsignal()+0x35) [0x7fec3ca7c635]
 4: (abort()+0x175) [0x7fec3ca7de15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
 6: (()+0xbcbe6) [0x7fec3d334be6]
 7: (()+0xbcc13) [0x7fec3d334c13]
 8: (()+0xbcd0e) [0x7fec3d334d0e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x7f2) [0x94b812]
 10: /usr/bin/ceph-mds() [0x80f15e]
 11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
 12: (main()+0x1632) [0x569c62]
 13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
 14: /usr/bin/ceph-mds() [0x567d99]
2014-10-15 09:09:32.024248 7fec3e5ad7a0 -1 *** Caught signal (Aborted) **
 in thread 7fec3e5ad7a0

 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
 1: /usr/bin/ceph-mds() [0x82ef61]
 2: (()+0xf710) [0x7fec3d9a6710]
 3: (gsignal()+0x35) [0x7fec3ca7c635]
 4: (abort()+0x175) [0x7fec3ca7de15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
 6: (()+0xbcbe6) [0x7fec3d334be6]
 7: (()+0xbcc13) [0x7fec3d334c13]
 8: (()+0xbcd0e) [0x7fec3d334d0e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x7f2) [0x94b812]
 10: /usr/bin/ceph-mds() [0x80f15e]
 11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
 12: (main()+0x1632) [0x569c62]
 13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
 14: /usr/bin/ceph-mds() [0x567d99]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

 0> 2014-10-15 09:09:32.024248 7fec3e5ad7a0 -1 *** Caught signal (Aborted) 
**
 in thread 7fec3e5ad7a0

 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
 1: /usr/bin/ceph-mds() [0x82ef61]
 2: (()+0xf71

Re: [ceph-users] the state of cephfs in giant

2014-10-15 Thread Stijn De Weirdt




We've been doing a lot of work on CephFS over the past few months. This
is an update on the current state of things as of Giant.

...

* Either the kernel client (kernel 3.17 or later) or userspace (ceph-fuse
   or libcephfs) clients are in good working order.


Thanks for all the work and specially for concentrating on CephFS! We
have been watching and testing for years by now and really hope to
change our Clusters to CephFS soon.

For kernel maintenance reasons, we only want to run longterm stable
kernels. And for performance reasons and because of severe known
problems we want to avoid Fuse. How good are our chances of a stable
system with the kernel client in the latest longterm kernel 3.14? Will
there be further bugfixes or feature backports?


There are important bug fixes missing from 3.14.  IIRC, the EC, cache
tiering, and firefly CRUSH changes aren't there yet either (they landed in
3.15), and that is not appropriate for a stable series.

They can be backported, but no commitment yet on that :)


If the bugfixes are easily identified in one of your Ceph git branches,
I would even try to backport them myself. Still, I would rather see
someone from the Ceph team with deeper knowledge of the code port them.

IMHO, it would be good for Ceph to have stable support in at least the
latest longterm kernel. No need for new features, but bugfixes should be
there.


does anyone have any data to compare cephfs with fuse-ceph on various 
(recent/stable/distro) kernels? our main interest in ceph comes from 
recent features like EC and tiering. if these don't get backported to 
longterm support kernels, fuse should be a viable alternative imho.



stijn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph installation error

2014-10-15 Thread Sakhi Hadebe


Hi, 


I am deploying a 3 node ceph storagollowing thee cluster for my company, 
following the webinar: http://www.youtube.com/watch?v=R3gnLrsZSno 


I am stuck at formating the osd's and making them ready to mount the 
directories. Below is the error thrown back: 
root@ceph-node1:~# mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring --mkfs 
unrecognized option '--mkfs' 
usage: /sbin/mkcephfs -a -c ceph.conf [-k adminkeyring] [--mkbtrfs] 
   to generate a new ceph cluster on all nodes; for advanced usage see man page 
   ** be careful, this WILL clobber old data; check your ceph.conf carefully ** 


Changing the --mkfs flag to --mkbtrfs results in the error below: 
root@ceph-node1:~# mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring --mkbtrfs 
temp dir is /tmp/mkcephfs.By7pV0aY0W 
preparing monmap in /tmp/mkcephfs.By7pV0aY0W/monmap 
/usr/bin/monmaptool --create --clobber --add a 192.168.56.21:6789 --add b 
192.168.56.22:6789 --add c 192.168.56.23:6789 --print 
/tmp/mkcephfs.By7pV0aY0W/monmap 
/usr/bin/monmaptool: monmap file /tmp/mkcephfs.By7pV0aY0W/monmap 
/usr/bin/monmaptool: generated fsid de3258d8-1a1f-427d-91ca-cbc679f75305 
epoch 0 
fsid de3258d8-1a1f-427d-91ca-cbc679f75305 
last_changed 2014-10-15 09:35:41.950988 
created 2014-10-15 09:35:41.950988 
0: 192.168.56.21:6789/0 mon.a 
1: 192.168.56.22:6789/0 mon.b 
2: 192.168.56.23:6789/0 mon.c 
/usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.By7pV0aY0W/monmap (3 
monitors) 
=== osd.0 ===  
no btrfs devs defined for osd.0 
2014-10-15 09:35:42.183383 7f9a06bca780 must specify '--osd-data=foo' data path 
2014-10-15 09:35:42.192312 7f9a06bca780 usage: ceph-osd -i osdid 
[--osd-data=path] [--osd-journal=path] [--mkfs] [--mkjournal] 
[--convert-filestore] 
2014-10-15 09:35:42.192602 7f9a06bca780--debug_osd N   set debug level 
(e.g. 10) 
--conf/-cRead configuration from the given configuration file 
-d   Run in foreground, log to stderr. 
-f   Run in foreground, log to usual location. 
--id/-i  set ID portion of my name 
--name/-nset name (TYPE.ID) 
--versionshow version and quit 


   --debug_ms N 
set message debug level (e.g. 1) 
failed: '/sbin/mkcephfs -d /tmp/mkcephfs.By7pV0aY0W --init-daemon osd.0' 



Please help 


Regards,
Sakhi Hadebe
Engineer: South African National Research Network (SANReN)Competency Area, 
Meraka, CSIR

Tel:   +27 12 841 2308 
Fax:   +27 12 841 4223 
Cell:  +27 71 331 9622 
Email: shad...@csir.co.za



-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail 
legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at 
http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.

Please consider the environment before printing this email.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-15 Thread Mark Kirkwood

Because this is an interesting problem, I added an additional host to my 
4 node ceph setup that is a purely radosgw host. So I have

- ceph1 (mon + osd)
- ceph2-4 (osd)
- ceph5 (radosgw)

My ceph.conf on ceph5 included below. Obviously I changed my keystone 
endpoints to use this host (ceph5). After that I am unable to reproduce 
your problem - for a moment I thought I had, but it was just that I had 
forgotten to include the keystone config in there at all! So it is now 
working fine. My guess is that there is something subtle broken in your 
config that we have yet to see...


(ceph5) $ cat /etc/ceph/ceph.conf

[global]
fsid = 2ea9a745-d84c-4fc5-95b4-2f6afa98ece1
mon_initial_members = ceph1
mon_host = 192.168.122.21
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pg_bits = 7
osd_pgp_bits = 7
osd_journal_size = 2048

[client.radosgw.gateway]
host = ceph5
keyring = /etc/ceph/ceph.rados.gateway.keyring
rgw_socket_path = /var/run/ceph/$name.sock
log_file = /var/log/ceph/radosgw.log
rgw_data = /var/lib/ceph/radosgw/$cluster-$id
rgw_dns_name = ceph5
rgw print continue = false
debug rgw = 20
rgw keystone url = http://stack1:35357
rgw keystone admin token = tokentoken
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss/

On 15/10/14 10:25, Mark Kirkwood wrote:

Right,

So you have 3 osds, one of whom is a mon. Your rgw is on another host
(called gateway it seems). I'm wondering if is this the issue. In my
case I'm using one of my osds as a rgw as well. This *should* not
matter... but it might be worth trying out a rgw on one of your osds
instead. I'm thinking that your gateway host is setup in some way that
is confusing the [client.radosgw.gatway] entry in ceph.conf (e.g
hostname resolution).

Regards

Mark

On 15/10/14 05:40, lakshmi k s wrote:

Hello Mark - with rgw_keystone_url under radosgw section, I do NOT see
keystone handshake. If I move it under global section, I see initial
keystone handshake as explained earlier. Below is the output of osd dump
and osd tree. I have 3 nodes (node1, node2, node3) acting as OSDs. One
of them (node1) is also a monitor node. I also have an admin node and
gateway node in ceph cluster. Keystone server (swift client) of course
is all together a different Openstack setup. Let me know if you need any
more information.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] new installation

2014-10-15 Thread Roman


Hi ALL,

I've created 2 mon and 2 osd on Centos 6.5 (x86_64).

I've tried 4 times (clean centos installation) but always have health: 
HEALTH_WARN


Never HEALTH_OK always HEALTH_WARN! :(

# ceph -s
cluster d073ed20-4c0e-445e-bfb0-7b7658954874
 health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
 monmap e1: 2 mons at 
{ceph02=192.168.0.142:6789/0,ceph03=192.168.0.143:6789/0}, election 
epoch 4, quorum 0,1 ceph02,ceph03

 osdmap e10: 4 osds: 2 up, 2 in
  pgmap v15: 192 pgs, 3 pools, 0 bytes data, 0 objects
68908 kB used, 6054 MB / 6121 MB avail
 192 active+degraded

What am I doing wrong???

---

host:  192.168.0.141 - admin
host:  192.168.0.142 - mon.ceph02 + osd.0 (/dev/sdb, 8G)
host:  192.168.0.143 - mon.ceph03 + osd.1 (/dev/sdb, 8G)

ceph-deploy version 1.5.18

[global]
osd pool default size = 2
---

Thanks,
Roman.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] new installation

2014-10-15 Thread Pascal Morillon

Hello,

> osdmap e10: 4 osds: 2 up, 2 in


What about following commands :
# ceph osd tree
# ceph osd dump

You have 2 OSDs on 2 hosts, but 4 OSDs seems to be debined in your crush map.

Regards,

Pascal

Le 15 oct. 2014 à 11:11, Roman  a écrit :

> Hi ALL,
> 
> I've created 2 mon and 2 osd on Centos 6.5 (x86_64).
> 
> I've tried 4 times (clean centos installation) but always have health: 
> HEALTH_WARN
> 
> Never HEALTH_OK always HEALTH_WARN! :(
> 
> # ceph -s
>cluster d073ed20-4c0e-445e-bfb0-7b7658954874
> health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
> monmap e1: 2 mons at 
> {ceph02=192.168.0.142:6789/0,ceph03=192.168.0.143:6789/0}, election epoch 4, 
> quorum 0,1 ceph02,ceph03
> osdmap e10: 4 osds: 2 up, 2 in
>  pgmap v15: 192 pgs, 3 pools, 0 bytes data, 0 objects
>68908 kB used, 6054 MB / 6121 MB avail
> 192 active+degraded
> 
> What am I doing wrong???
> 
> ---
> 
> host:  192.168.0.141 - admin
> host:  192.168.0.142 - mon.ceph02 + osd.0 (/dev/sdb, 8G)
> host:  192.168.0.143 - mon.ceph03 + osd.1 (/dev/sdb, 8G)
> 
> ceph-deploy version 1.5.18
> 
> [global]
> osd pool default size = 2
> ---
> 
> Thanks,
> Roman.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Pascal Morillon
University of Rennes 1
IRISA, Rennes, France
SED
Offices : E206 (Grid5000), D050 (SED)
Phone : +33 2 99 84 22 10
pascal.moril...@irisa.fr
Twitter ‏@pmorillon
xmpp: pmori...@jabber.grid5000.fr
http://www.grid5000.fr



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] new installation

2014-10-15 Thread Roman


Pascal,

Here is my latest installation:

cluster 204986f6-f43c-4199-b093-8f5c7bc641bb
 health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; 
recovery 20/40 objects degraded (50.000%)
 monmap e1: 2 mons at 
{ceph02=192.168.33.142:6789/0,ceph03=192.168.33.143:6789/0}, election 
epoch 4, quorum 0,1 ceph02,ceph03

 mdsmap e4: 1/1/1 up {0=ceph02=up:active}
 osdmap e8: 2 osds: 2 up, 2 in
  pgmap v14: 192 pgs, 3 pools, 1884 bytes data, 20 objects
68796 kB used, 6054 MB / 6121 MB avail
20/40 objects degraded (50.000%)
 192 active+degraded


host ceph01 - admin
host ceph02 - mon.ceph02 + osd.1 (sdb, 8G) + mds
host ceph03 - mon.ceph03 + osd.0 (sdb, 8G)

$ ceph osd tree
# idweight  type name   up/down reweight
-1  0   root default
-2  0   host ceph03
0   0   osd.0   up  1
-3  0   host ceph02
1   0   osd.1   up  1


$ ceph osd dump
epoch 8
fsid 204986f6-f43c-4199-b093-8f5c7bc641bb
created 2014-10-15 13:39:05.986977
modified 2014-10-15 13:40:45.644870
flags
pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool 
stripe_width 0
pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0

max_osd 2
osd.0 up   in  weight 1 up_from 4 up_thru 4 down_at 0 
last_clean_interval [0,0) 192.168.33.143:6800/2284 
192.168.33.143:6801/2284 192.168.33.143:6802/2284 
192.168.33.143:6803/2284 exists,up dccd6b99-1885-4c62-864b-107bd9ba0d84
osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 
last_clean_interval [0,0) 192.168.33.142:6800/2399 
192.168.33.142:6801/2399 192.168.33.142:6802/2399 
192.168.33.142:6803/2399 exists,up 4d4adf4b-ae8e-4e26-8667-c952c7fc4e45


Thanks,
Roman


Hello,


osdmap e10: 4 osds: 2 up, 2 in


What about following commands :
# ceph osd tree
# ceph osd dump

You have 2 OSDs on 2 hosts, but 4 OSDs seems to be debined in your 
crush map.


Regards,

Pascal

Le 15 oct. 2014 à 11:11, Roman > a écrit :



Hi ALL,

I've created 2 mon and 2 osd on Centos 6.5 (x86_64).

I've tried 4 times (clean centos installation) but always have 
health: HEALTH_WARN


Never HEALTH_OK always HEALTH_WARN! :(

# ceph -s
   cluster d073ed20-4c0e-445e-bfb0-7b7658954874
health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
monmap e1: 2 mons at 
{ceph02=192.168.0.142:6789/0,ceph03=192.168.0.143:6789/0}, election 
epoch 4, quorum 0,1 ceph02,ceph03

osdmap e10: 4 osds: 2 up, 2 in
 pgmap v15: 192 pgs, 3 pools, 0 bytes data, 0 objects
   68908 kB used, 6054 MB / 6121 MB avail
192 active+degraded

What am I doing wrong???

---

host:  192.168.0.141 - admin
host:  192.168.0.142 - mon.ceph02 + osd.0 (/dev/sdb, 8G)
host:  192.168.0.143 - mon.ceph03 + osd.1 (/dev/sdb, 8G)

ceph-deploy version 1.5.18

[global]
osd pool default size = 2
---

Thanks,
Roman.
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Pascal Morillon
University of Rennes 1
IRISA, Rennes, France
SED
Offices : E206 (Grid5000), D050 (SED)
Phone : +33 2 99 84 22 10
pascal.moril...@irisa.fr 
Twitter ‏@pmorillon 
xmpp: pmori...@jabber.grid5000.fr 
http://www.grid5000.fr 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] new installation

2014-10-15 Thread Anthony Alba

Firewall? Disable iptables, set SELinux to Permissive.
On 15 Oct, 2014 5:49 pm, "Roman"  wrote:

>  Pascal,
>
> Here is my latest installation:
>
> cluster 204986f6-f43c-4199-b093-8f5c7bc641bb
>  health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; recovery
> 20/40 objects degraded (50.000%)
>  monmap e1: 2 mons at {ceph02=
> 192.168.33.142:6789/0,ceph03=192.168.33.143:6789/0}, election epoch 4,
> quorum 0,1 ceph02,ceph03
>  mdsmap e4: 1/1/1 up {0=ceph02=up:active}
>  osdmap e8: 2 osds: 2 up, 2 in
>   pgmap v14: 192 pgs, 3 pools, 1884 bytes data, 20 objects
> 68796 kB used, 6054 MB / 6121 MB avail
> 20/40 objects degraded (50.000%)
>  192 active+degraded
>
>
> host ceph01 - admin
> host ceph02 - mon.ceph02 + osd.1 (sdb, 8G) + mds
> host ceph03 - mon.ceph03 + osd.0 (sdb, 8G)
>
> $ ceph osd tree
> # idweight  type name   up/down reweight
> -1  0   root default
> -2  0   host ceph03
> 0   0   osd.0   up  1
> -3  0   host ceph02
> 1   0   osd.1   up  1
>
>
> $ ceph osd dump
> epoch 8
> fsid 204986f6-f43c-4199-b093-8f5c7bc641bb
> created 2014-10-15 13:39:05.986977
> modified 2014-10-15 13:40:45.644870
> flags
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> max_osd 2
> osd.0 up   in  weight 1 up_from 4 up_thru 4 down_at 0 last_clean_interval
> [0,0) 192.168.33.143:6800/2284 192.168.33.143:6801/2284
> 192.168.33.143:6802/2284 192.168.33.143:6803/2284 exists,up
> dccd6b99-1885-4c62-864b-107bd9ba0d84
> osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval
> [0,0) 192.168.33.142:6800/2399 192.168.33.142:6801/2399
> 192.168.33.142:6802/2399 192.168.33.142:6803/2399 exists,up
> 4d4adf4b-ae8e-4e26-8667-c952c7fc4e45
>
> Thanks,
> Roman
>
>  Hello,
>
>  osdmap e10: 4 osds: 2 up, 2 in
>
>
>  What about following commands :
> # ceph osd tree
> # ceph osd dump
>
>  You have 2 OSDs on 2 hosts, but 4 OSDs seems to be debined in your crush
> map.
>
>  Regards,
>
>  Pascal
>
>  Le 15 oct. 2014 à 11:11, Roman  a écrit :
>
> Hi ALL,
>
> I've created 2 mon and 2 osd on Centos 6.5 (x86_64).
>
> I've tried 4 times (clean centos installation) but always have health:
> HEALTH_WARN
>
> Never HEALTH_OK always HEALTH_WARN! :(
>
> # ceph -s
>cluster d073ed20-4c0e-445e-bfb0-7b7658954874
> health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
> monmap e1: 2 mons at {ceph02=
> 192.168.0.142:6789/0,ceph03=192.168.0.143:6789/0}, election epoch 4,
> quorum 0,1 ceph02,ceph03
> osdmap e10: 4 osds: 2 up, 2 in
>  pgmap v15: 192 pgs, 3 pools, 0 bytes data, 0 objects
>68908 kB used, 6054 MB / 6121 MB avail
> 192 active+degraded
>
> What am I doing wrong???
>
> ---
>
> host:  192.168.0.141 - admin
> host:  192.168.0.142 - mon.ceph02 + osd.0 (/dev/sdb, 8G)
> host:  192.168.0.143 - mon.ceph03 + osd.1 (/dev/sdb, 8G)
>
> ceph-deploy version 1.5.18
>
> [global]
> osd pool default size = 2
> ---
>
> Thanks,
> Roman.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>--
> Pascal Morillon
> University of Rennes 1
> IRISA, Rennes, France
> SED
> Offices : E206 (Grid5000), D050 (SED)
> Phone : +33 2 99 84 22 10
> pascal.moril...@irisa.fr
> Twitter ‏@pmorillon 
> xmpp: pmori...@jabber.grid5000.fr
> http://www.grid5000.fr
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] new installation

2014-10-15 Thread Pascal Morillon


Le 15 oct. 2014 à 11:48, Roman  a écrit :

> Pascal,
> 
> Here is my latest installation:
> 
> cluster 204986f6-f43c-4199-b093-8f5c7bc641bb
>  health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; recovery 
> 20/40 objects degraded (50.000%)
>  monmap e1: 2 mons at 
> {ceph02=192.168.33.142:6789/0,ceph03=192.168.33.143:6789/0}, election epoch 
> 4, quorum 0,1 ceph02,ceph03
>  mdsmap e4: 1/1/1 up {0=ceph02=up:active}
>  osdmap e8: 2 osds: 2 up, 2 in
>   pgmap v14: 192 pgs, 3 pools, 1884 bytes data, 20 objects
> 68796 kB used, 6054 MB / 6121 MB avail
> 20/40 objects degraded (50.000%)
>  192 active+degraded
> 
> 
> host ceph01 - admin
> host ceph02 - mon.ceph02 + osd.1 (sdb, 8G) + mds
> host ceph03 - mon.ceph03 + osd.0 (sdb, 8G)
> 
> $ ceph osd tree
> # idweight  type name   up/down reweight
> -1  0   root default
> -2  0   host ceph03
> 0   0   osd.0   up  1
> -3  0   host ceph02
> 1   0   osd.1   up  1
> 

I'm not sure, but i think something is wrong during the installation, all 
weights are equal at all levels of your map.
See http://docs.ceph.com/docs/master/rados/operations/crush-map/

Check the rulset id #0 (all your pool uses this ruleset).
If you have something like this :

rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

and if i understand the mechanism, for a good replication placement, ceph will 
need two OSDs on two different nodes. Maybe, 0 as weight is a wrong value ?

After deploying a similar configuration, i have :
# idweight  type name   up/down reweight
-1  2   root default
-2  1   host econome-7
1   1   osd.1   up  1   
-3  1   host econome-18
0   1   osd.0   up  1

Pascal


> 
> $ ceph osd dump
> epoch 8
> fsid 204986f6-f43c-4199-b093-8f5c7bc641bb
> created 2014-10-15 13:39:05.986977
> modified 2014-10-15 13:40:45.644870
> flags 
> pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool 
> crash_replay_interval 45 stripe_width 0
> pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
> max_osd 2
> osd.0 up   in  weight 1 up_from 4 up_thru 4 down_at 0 last_clean_interval 
> [0,0) 192.168.33.143:6800/2284 192.168.33.143:6801/2284 
> 192.168.33.143:6802/2284 192.168.33.143:6803/2284 exists,up 
> dccd6b99-1885-4c62-864b-107bd9ba0d84
> osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval 
> [0,0) 192.168.33.142:6800/2399 192.168.33.142:6801/2399 
> 192.168.33.142:6802/2399 192.168.33.142:6803/2399 exists,up 
> 4d4adf4b-ae8e-4e26-8667-c952c7fc4e45
> 
> Thanks,
> Roman
> 
>> Hello,
>> 
>>> osdmap e10: 4 osds: 2 up, 2 in
>> 
>> 
>> What about following commands :
>> # ceph osd tree
>> # ceph osd dump
>> 
>> You have 2 OSDs on 2 hosts, but 4 OSDs seems to be debined in your crush map.
>> 
>> Regards,
>> 
>> Pascal
>> 
>> Le 15 oct. 2014 à 11:11, Roman  a écrit :
>> 
>>> Hi ALL,
>>> 
>>> I've created 2 mon and 2 osd on Centos 6.5 (x86_64).
>>> 
>>> I've tried 4 times (clean centos installation) but always have health: 
>>> HEALTH_WARN
>>> 
>>> Never HEALTH_OK always HEALTH_WARN! :(
>>> 
>>> # ceph -s
>>>cluster d073ed20-4c0e-445e-bfb0-7b7658954874
>>> health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
>>> monmap e1: 2 mons at 
>>> {ceph02=192.168.0.142:6789/0,ceph03=192.168.0.143:6789/0}, election epoch 
>>> 4, quorum 0,1 ceph02,ceph03
>>> osdmap e10: 4 osds: 2 up, 2 in
>>>  pgmap v15: 192 pgs, 3 pools, 0 bytes data, 0 objects
>>>68908 kB used, 6054 MB / 6121 MB avail
>>> 192 active+degraded
>>> 
>>> What am I doing wrong???
>>> 
>>> ---
>>> 
>>> host:  192.168.0.141 - admin
>>> host:  192.168.0.142 - mon.ceph02 + osd.0 (/dev/sdb, 8G)
>>> host:  192.168.0.143 - mon.ceph03 + osd.1 (/dev/sdb, 8G)
>>> 
>>> ceph-deploy version 1.5.18
>>> 
>>> [global]
>>> osd pool default size = 2
>>> ---
>>> 
>>> Thanks,
>>> Roman.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> --
>> Pascal Morillon
>> University of Rennes 1
>> IRISA, Rennes, France
>> SED
>> Offices : E206 (Grid5000), D050 (SED)
>> Phone : +33 2 99 84 22 10
>> pascal.moril...@irisa.fr
>> Twitter ‏@pmorillon
>> xmpp: pmori...@jabber.grid5000.fr
>> http://www.grid5000.fr
>> 
> 

--
Pascal Morillo

Re: [ceph-users] new installation

2014-10-15 Thread Roman


Yes of course...

iptables -F (no rules) = the same as disabled
SELINUX=disabled

As a testing ground, I use VBox. But I think it should not be a problem.


Firewall? Disable iptables, set SELinux to Permissive.

On 15 Oct, 2014 5:49 pm, "Roman" > wrote:


Pascal,

Here is my latest installation:

cluster 204986f6-f43c-4199-b093-8f5c7bc641bb
 health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean;
recovery 20/40 objects degraded (50.000%)
 monmap e1: 2 mons at
{ceph02=192.168.33.142:6789/0,ceph03=192.168.33.143:6789/0
},
election epoch 4, quorum 0,1 ceph02,ceph03
 mdsmap e4: 1/1/1 up {0=ceph02=up:active}
 osdmap e8: 2 osds: 2 up, 2 in
  pgmap v14: 192 pgs, 3 pools, 1884 bytes data, 20 objects
68796 kB used, 6054 MB / 6121 MB avail
20/40 objects degraded (50.000%)
 192 active+degraded


host ceph01 - admin
host ceph02 - mon.ceph02 + osd.1 (sdb, 8G) + mds
host ceph03 - mon.ceph03 + osd.0 (sdb, 8G)

$ ceph osd tree
# idweight  type name   up/down reweight
-1  0   root default
-2  0   host ceph03
0   0   osd.0   up  1
-3  0   host ceph02
1   0   osd.1   up  1


$ ceph osd dump
epoch 8
fsid 204986f6-f43c-4199-b093-8f5c7bc641bb
created 2014-10-15 13:39:05.986977
modified 2014-10-15 13:40:45.644870
flags
pool 0 'data' replicated size 2 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags
hashpspool crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags
hashpspool stripe_width 0
pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags
hashpspool stripe_width 0
max_osd 2
osd.0 up   in  weight 1 up_from 4 up_thru 4 down_at 0
last_clean_interval [0,0) 192.168.33.143:6800/2284
 192.168.33.143:6801/2284
 192.168.33.143:6802/2284
 192.168.33.143:6803/2284
 exists,up
dccd6b99-1885-4c62-864b-107bd9ba0d84
osd.1 up   in  weight 1 up_from 8 up_thru 0 down_at 0
last_clean_interval [0,0) 192.168.33.142:6800/2399
 192.168.33.142:6801/2399
 192.168.33.142:6802/2399
 192.168.33.142:6803/2399
 exists,up
4d4adf4b-ae8e-4e26-8667-c952c7fc4e45

Thanks,
Roman


Hello,


osdmap e10: 4 osds: 2 up, 2 in


What about following commands :
# ceph osd tree
# ceph osd dump

You have 2 OSDs on 2 hosts, but 4 OSDs seems to be debined in
your crush map.

Regards,

Pascal

Le 15 oct. 2014 à 11:11, Roman mailto:intra...@gmail.com>> a écrit :


Hi ALL,

I've created 2 mon and 2 osd on Centos 6.5 (x86_64).

I've tried 4 times (clean centos installation) but always have
health: HEALTH_WARN

Never HEALTH_OK always HEALTH_WARN! :(

# ceph -s
   cluster d073ed20-4c0e-445e-bfb0-7b7658954874
health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
monmap e1: 2 mons at
{ceph02=192.168.0.142:6789/0,ceph03=192.168.0.143:6789/0
},
election epoch 4, quorum 0,1 ceph02,ceph03
osdmap e10: 4 osds: 2 up, 2 in
 pgmap v15: 192 pgs, 3 pools, 0 bytes data, 0 objects
   68908 kB used, 6054 MB / 6121 MB avail
192 active+degraded

What am I doing wrong???

---

host:  192.168.0.141 - admin
host:  192.168.0.142 - mon.ceph02 + osd.0 (/dev/sdb, 8G)
host:  192.168.0.143 - mon.ceph03 + osd.1 (/dev/sdb, 8G)

ceph-deploy version 1.5.18

[global]
osd pool default size = 2
---

Thanks,
Roman.
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Pascal Morillon
University of Rennes 1
IRISA, Rennes, France
SED
Offices : E206 (Grid5000), D050 (SED)
Phone : +33 2 99 84 22 10 
pascal.moril...@irisa.fr 
Twitter ‏@pmorillon 
xmpp: pmori...@jabber.grid5000.fr

http://www.grid5000.fr 




___
ceph-users mailing li

Re: [ceph-users] the state of cephfs in giant

2014-10-15 Thread Amon Ott

Am 15.10.2014 14:11, schrieb Ric Wheeler:
> On 10/15/2014 08:43 AM, Amon Ott wrote:
>> Am 14.10.2014 16:23, schrieb Sage Weil:
>>> On Tue, 14 Oct 2014, Amon Ott wrote:
 Am 13.10.2014 20:16, schrieb Sage Weil:
> We've been doing a lot of work on CephFS over the past few months.
> This
> is an update on the current state of things as of Giant.
 ...
> * Either the kernel client (kernel 3.17 or later) or userspace
> (ceph-fuse
>or libcephfs) clients are in good working order.
 Thanks for all the work and specially for concentrating on CephFS! We
 have been watching and testing for years by now and really hope to
 change our Clusters to CephFS soon.

 For kernel maintenance reasons, we only want to run longterm stable
 kernels. And for performance reasons and because of severe known
 problems we want to avoid Fuse. How good are our chances of a stable
 system with the kernel client in the latest longterm kernel 3.14? Will
 there be further bugfixes or feature backports?
>>> There are important bug fixes missing from 3.14.  IIRC, the EC, cache
>>> tiering, and firefly CRUSH changes aren't there yet either (they
>>> landed in
>>> 3.15), and that is not appropriate for a stable series.
>>>
>>> They can be backported, but no commitment yet on that :)
>> If the bugfixes are easily identified in one of your Ceph git branches,
>> I would even try to backport them myself. Still, I would rather see
>> someone from the Ceph team with deeper knowledge of the code port them.
>>
>> IMHO, it would be good for Ceph to have stable support in at least the
>> latest longterm kernel. No need for new features, but bugfixes should be
>> there.
>>
>> Amon Ott
> 
> Long term support and aggressive, tedious backports are what you go to
> distro vendors for normally - I don't think that it is generally a good
> practice to continually backport anything to stable series kernels that
> is not a bugfix/security issue (or else, the stable branches rapidly
> just a stale version of the upstream tip :)).

bugfix/security is exactly what I am looking for.

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH   Tel: +49 30 24342334
Werner-Voß-Damm 62   Fax: +49 30 99296856
12101 Berlin http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-15 Thread lakshmi k s

Thanks Mark for looking into this further. As I mentioned earlier, I have 
following nodes in my ceph cluster - 

1 admin node
3 OSD (One of them is a monitor too)
1 gateway node

This should have worked technically. But I am not sure where I am going wrong. 
I will continue to look into this and keep you all posted.

Thanks,
Lakshmi.


On Wednesday, October 15, 2014 2:00 AM, Mark Kirkwood 
 wrote:
 


Because this is an interesting problem, I added an additional host to my 
4 node ceph setup that is a purely radosgw host. So I have
- ceph1 (mon + osd)
- ceph2-4 (osd)
- ceph5 (radosgw)

My ceph.conf on ceph5 included below. Obviously I changed my keystone 
endpoints to use this host (ceph5). After that I am unable to reproduce 
your problem - for a moment I thought I had, but it was just that I had 
forgotten to include the keystone config in there at all! So it is now 
working fine. My guess is that there is something subtle broken in your 
config that we have yet to see...

(ceph5) $ cat /etc/ceph/ceph.conf

[global]
fsid = 2ea9a745-d84c-4fc5-95b4-2f6afa98ece1
mon_initial_members = ceph1
mon_host = 192.168.122.21
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pg_bits = 7
osd_pgp_bits = 7
osd_journal_size = 2048

[client.radosgw.gateway]
host = ceph5
keyring = /etc/ceph/ceph.rados.gateway.keyring
rgw_socket_path = /var/run/ceph/$name.sock
log_file = /var/log/ceph/radosgw.log
rgw_data = /var/lib/ceph/radosgw/$cluster-$id
rgw_dns_name = ceph5
rgw print continue = false
debug rgw = 20
rgw keystone url = http://stack1:35357
rgw keystone admin token = tokentoken
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss/


On 15/10/14 10:25, Mark Kirkwood wrote:
> Right,
>
> So you have 3 osds, one of whom is a mon. Your rgw is on another host
> (called gateway it seems). I'm wondering if is this the issue. In my
> case I'm using one of my osds as a rgw as well. This *should* not
> matter... but it might be worth trying out a rgw on one of your osds
> instead. I'm thinking that your gateway host is setup in some way that
> is confusing the [client.radosgw.gatway] entry in ceph.conf (e.g
> hostname resolution).
>
> Regards
>
> Mark
>
> On 15/10/14 05:40, lakshmi k s wrote:
>> Hello Mark - with rgw_keystone_url under radosgw section, I do NOT see
>> keystone handshake. If I move it under global section, I see initial
>> keystone handshake as explained earlier. Below is the output of osd dump
>> and osd tree. I have 3 nodes (node1, node2, node3) acting as OSDs. One
>> of them (node1) is also a monitor node. I also have an admin node and
>> gateway node in ceph cluster. Keystone server (swift client) of course
>> is all together a different Openstack setup. Let me know if you need any
>> more information.
>>
>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ssh; cannot resolve hostname errors

2014-10-15 Thread Support - Avantek

I may be completely overlooking something here but I keep getting "ssh; cannot 
resolve hostname" when I try to contact my OSD node's from my monitor node. I 
have set the ipaddress's of the 3 nodes in /etc/hosts as suggested on the 
website.

Thanks in advance

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ssh; cannot resolve hostname errors

2014-10-15 Thread Wido den Hollander

On 10/15/2014 04:27 PM, Support - Avantek wrote:
> I may be completely overlooking something here but I keep getting "ssh; 
> cannot resolve hostname" when I try to contact my OSD node's from my monitor 
> node. I have set the ipaddress's of the 3 nodes in /etc/hosts as suggested on 
> the website.
> 

That seems like a issue in the hosts file.

Try:

$ ping 

Does that work? But this is really a hosts issue, not so much Ceph or SSH.

> Thanks in advance
> 
> James
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Bryan Wright

Hi folks,

I recently had an OSD disk die, and I'm wondering what are the
current "best practices" for replacing it.  I think I've thoroughly removed
the old disk, both physically and logically, but I'm having trouble figuring
out how to add the new disk into ceph.

For one thing, taking a look at this:

http://article.gmane.org/gmane.comp.file-systems.ceph.user/5285/match=osd+number

it sounds like I'll need to abandon my beautiful OSD numbering scheme.  Is
that right?

I've been looking around for instructions about replacing disks, and
came across this:

http://karan-mj.blogspot.com/2014/03/admin-guide-replacing-failed-disk-in.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+CephStorageNextBigThing+(Ceph+Storage+%3A%3A+Next+Big+Thing)

and this:

http://dachary.org/?p=2428

which sound very different from each other.

   What procedure do you recommend?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] the state of cephfs in giant

2014-10-15 Thread Sage Weil

On Wed, 15 Oct 2014, Amon Ott wrote:
> Am 15.10.2014 14:11, schrieb Ric Wheeler:
> > On 10/15/2014 08:43 AM, Amon Ott wrote:
> >> Am 14.10.2014 16:23, schrieb Sage Weil:
> >>> On Tue, 14 Oct 2014, Amon Ott wrote:
>  Am 13.10.2014 20:16, schrieb Sage Weil:
> > We've been doing a lot of work on CephFS over the past few months.
> > This
> > is an update on the current state of things as of Giant.
>  ...
> > * Either the kernel client (kernel 3.17 or later) or userspace
> > (ceph-fuse
> >or libcephfs) clients are in good working order.
>  Thanks for all the work and specially for concentrating on CephFS! We
>  have been watching and testing for years by now and really hope to
>  change our Clusters to CephFS soon.
> 
>  For kernel maintenance reasons, we only want to run longterm stable
>  kernels. And for performance reasons and because of severe known
>  problems we want to avoid Fuse. How good are our chances of a stable
>  system with the kernel client in the latest longterm kernel 3.14? Will
>  there be further bugfixes or feature backports?
> >>> There are important bug fixes missing from 3.14.  IIRC, the EC, cache
> >>> tiering, and firefly CRUSH changes aren't there yet either (they
> >>> landed in
> >>> 3.15), and that is not appropriate for a stable series.
> >>>
> >>> They can be backported, but no commitment yet on that :)
> >> If the bugfixes are easily identified in one of your Ceph git branches,
> >> I would even try to backport them myself. Still, I would rather see
> >> someone from the Ceph team with deeper knowledge of the code port them.
> >>
> >> IMHO, it would be good for Ceph to have stable support in at least the
> >> latest longterm kernel. No need for new features, but bugfixes should be
> >> there.
> >>
> >> Amon Ott
> > 
> > Long term support and aggressive, tedious backports are what you go to
> > distro vendors for normally - I don't think that it is generally a good
> > practice to continually backport anything to stable series kernels that
> > is not a bugfix/security issue (or else, the stable branches rapidly
> > just a stale version of the upstream tip :)).
> 
> bugfix/security is exactly what I am looking for.

Right; sorry if I was unclear.  We make a point of sending bug fixes to 
sta...@vger.kernel.org but haven't been aggressive with cephfs because 
the code is less stable.  There will be catch-up required to get 3.14 in 
good working order.

Definitely hear you that this important, just can't promise when we'll 
have the time to do it.  There's probably a half day's effort to pick out 
the right patches and make sure they build properly, and then some time to 
feed it through the test suite.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Daniel Schwager

Hi,

> I recently had an OSD disk die, and I'm wondering what are the
> current "best practices" for replacing it.  I think I've thoroughly removed
> the old disk, both physically and logically, but I'm having trouble figuring
> out how to add the new disk into ceph.

I did this today (one disk - osd.16 - died ;-):

   # @ceph-node3
/etc/init.d/ceph stop osd.16

# osd.16 loeschen
ceph osd crush remove osd.16
ceph auth del osd.16
ceph osd rm osd.16

# remove hdd, plugin new hdd
# /var/log/messages tells me
Oct 15 09:51:09 ceph-node3 kernel: [1489736.671840] sd 0:0:0:0: 
[sdd] Synchronizing SCSI cache
Oct 15 09:51:09 ceph-node3 kernel: [1489736.671873] sd 0:0:0:0: 
[sdd]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Oct 15 09:54:56 ceph-node3 kernel: [1489963.094744] sd 0:0:8:0: 
Attached scsi generic sg4 type 0
Oct 15 09:54:56 ceph-node3 kernel: [1489963.095235] sd 0:0:8:0: 
[sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
Oct 15 09:54:57 ceph-node3 kernel: [1489963.343664] sd 0:0:8:0: 
[sdd] Attached SCSI disk
--> /dev/sdd


# check /dev/sdd
root@ceph-node3:~#  smartctl -a /dev/sdd | less
=== START OF INFORMATION SECTION ===
Device Model: ST4000NM0033-9ZM170
Serial Number:Z1Z5LGBX
LU WWN Device Id: 5 000c50 079577e1a
Firmware Version: SN04
User Capacity:4.000.787.030.016 bytes [4,00 TB]
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
  UPDATED  WHEN_FAILED RAW_VALUE
  4 Start_Stop_Count0x0032   100   100   020Old_age 
  Always   -   1
  5 Reallocated_Sector_Ct   0x0033   100   100   010
Pre-fail  Always   -   0
--> ok

# new /dev/sdd uses the absolute path:
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX

# create new  OSD  (with old journal partition)
admin@ceph-admin:~/cluster1$ ceph-deploy osd create 
ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.17): 
/usr/bin/ceph-deploy osd create 
ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph-node3:/dev/sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
...
[ceph_deploy.osd][DEBUG ] Host ceph-node3 is now ready for osd 
use.

# @ceph-admin modify config
admin@ceph-admin:~/cluster1$ ceph osd tree
...
admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
# osd16 was replaced

[osd.16]
...
devs = 
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
...

# deploy config
ceph-deploy  --overwrite-conf config push ceph-mon{1,2,3} 
ceph-node{1,2,3} ceph-admin

# cluster-sync enablen
ceph osd unset noout

# check
ceph -w

regards
Danny


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Loic Dachary

Hi Daniel,

On 15/10/2014 08:02, Daniel Schwager wrote:
> Hi,
> 
>> I recently had an OSD disk die, and I'm wondering what are the
>> current "best practices" for replacing it.  I think I've thoroughly removed
>> the old disk, both physically and logically, but I'm having trouble figuring
>> out how to add the new disk into ceph.
> 
> I did this today (one disk - osd.16 - died ;-):
> 
># @ceph-node3
> /etc/init.d/ceph stop osd.16
> 
> # osd.16 loeschen
> ceph osd crush remove osd.16
> ceph auth del osd.16
> ceph osd rm osd.16
> 
> # remove hdd, plugin new hdd
> # /var/log/messages tells me
> Oct 15 09:51:09 ceph-node3 kernel: [1489736.671840] sd 
> 0:0:0:0: [sdd] Synchronizing SCSI cache
> Oct 15 09:51:09 ceph-node3 kernel: [1489736.671873] sd 
> 0:0:0:0: [sdd]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> Oct 15 09:54:56 ceph-node3 kernel: [1489963.094744] sd 
> 0:0:8:0: Attached scsi generic sg4 type 0
> Oct 15 09:54:56 ceph-node3 kernel: [1489963.095235] sd 
> 0:0:8:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
> Oct 15 09:54:57 ceph-node3 kernel: [1489963.343664] sd 
> 0:0:8:0: [sdd] Attached SCSI disk
> --> /dev/sdd
> 
> 
> # check /dev/sdd
> root@ceph-node3:~#  smartctl -a /dev/sdd | less
> === START OF INFORMATION SECTION ===
> Device Model: ST4000NM0033-9ZM170
> Serial Number:Z1Z5LGBX
> LU WWN Device Id: 5 000c50 079577e1a
> Firmware Version: SN04
> User Capacity:4.000.787.030.016 bytes [4,00 TB]
> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  
> UPDATED  WHEN_FAILED RAW_VALUE
>   4 Start_Stop_Count0x0032   100   100   020
> Old_age   Always   -   1
>   5 Reallocated_Sector_Ct   0x0033   100   100   010
> Pre-fail  Always   -   0
> --> ok
> 
> # new /dev/sdd uses the absolute path:
> /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX
> 
> # create new  OSD  (with old journal partition)
> admin@ceph-admin:~/cluster1$ ceph-deploy osd create 
> ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
> [ceph_deploy.conf][DEBUG ] found configuration file at: 
> /home/admin/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.17): 
> /usr/bin/ceph-deploy osd create 
> ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
> [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
> ceph-node3:/dev/sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
> ...
> [ceph_deploy.osd][DEBUG ] Host ceph-node3 is now ready for 
> osd use.
> 
> # @ceph-admin modify config
> admin@ceph-admin:~/cluster1$ ceph osd tree
> ...
> admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
> # osd16 was replaced
> 
> [osd.16]
> ...
> devs = 
> /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
>

I'm curious about what this is used for.

Thanks a lot for sharing, very interesting read :-)

Cheers

 ...
> 
> # deploy config
> ceph-deploy  --overwrite-conf config push ceph-mon{1,2,3} 
> ceph-node{1,2,3} ceph-admin
> 
> # cluster-sync enablen
> ceph osd unset noout
> 
> # check
> ceph -w
> 
> regards
> Danny
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Chad Seys

Hi all,
  When I remove all OSDs on a given host, then wait for all objects (PGs?) to 
be to be active+clean, then remove the host (ceph osd crush remove hostname), 
that causes the objects to shuffle around the cluster again.
  Why does the CRUSH map depend on hosts that no longer have OSDs on them?

A wonderment question,
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Mariusz Gronczewski

On Wed, 15 Oct 2014 11:06:55 -0500, Chad Seys 
wrote:

> Hi all,
>   When I remove all OSDs on a given host, then wait for all objects (PGs?) to 
> be to be active+clean, then remove the host (ceph osd crush remove hostname), 
> that causes the objects to shuffle around the cluster again.
>   Why does the CRUSH map depend on hosts that no longer have OSDs on them?
> 
> A wonderment question,
> C.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Usually removing OSD without removing host happens when you
remove/replace dead drives.

Hosts are in map so

* CRUSH wont put 2 copies on same node
* you can balance around network interface speed

The question should be "why you remove all OSDs if you are going to
remove host anyway" :)

-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com



signature.asc
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Firefly maintenance release schedule

2014-10-15 Thread Dmitry Borodaenko

On Tue, Sep 30, 2014 at 6:49 PM, Dmitry Borodaenko
 wrote:
> Last stable Firefly release (v0.80.5) was tagged on July 29 (over 2
> months ago). Since then, there were twice as many commits merged into
> the firefly branch than there existed on the branch before v0.80.5:
>
> $ git log --oneline --no-merges v0.80..v0.80.5|wc -l
> 122
> $ git log --oneline --no-merges v0.80.5..firefly|wc -l
> 227
>
> Is this a one time aberration in the process or should we expect the
> gap between maintenance updates for LTS releases of Ceph to keep
> growing?

I didn't get a response to that nag other than the v0.80.6 release
announcement on the day after, so I guess it wasn't completely ignored
:)

Except it turned out v0.80.6 was slightly less than useful as a
maintenance release:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/043701.html

Two weeks later we have v0.80.7 with 3 more commits that hopefully
make it actually usable. There are many ways to look at that from
release management perspective.

Good: 2 weeks is much better than 2 months.
Bad: that's 2.5 months since last *stable* Firefly release.
Ugly: that's 2 weeks for 3 commits, and now we have 54 more waiting
for the next release...

Wait what?! Oh right, 54 more commits were merged from firefly-next as
soon as v0.80.7 was tagged:
$ git log --oneline --no-merges v0.80.7..firefly|wc -l
54

Some of these are fixes for Urgent priority bugs, crashes, and data loss:
http://tracker.ceph.com/issues/9492
http://tracker.ceph.com/issues/9039
http://tracker.ceph.com/issues/9582
http://tracker.ceph.com/issues/9307
etc.

So what a Ceph deployer supposed to do with this? Wait another couple
of weeks (hopefully) for v0.80.8? Take v0.80.7 and hope not to
encounter any of these bugs? Or label Firefly as "not production ready
yet" and go back to Dumpling? My personal preference obviously would
be the first option, but waiting for 2.5 more months is not going to
fit my schedule :(

-- 
Dmitry Borodaenko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] the state of cephfs in giant

2014-10-15 Thread Alphe Salas

For the humble ceph user I am it is really hard to follow what version 
of what product will get the changes I requiere.


Let me explain myself. I use ceph in my company is specialised in disk 
recovery, my company needs a flexible, easy to maintain, trustable way 
to store the data from the disks of our clients.


We try the usual way jbod boxes connected to a single server with a SAS 
raid card and ZFS mirror to handle replicas and merging disks into a big 
disk. result is really slow. (used to use zfs and solaris 11 on x86 
servers... with openZfs and ubuntu 14.04 the perf are way better but not 
any were comparable with ceph (on a giga ethernet lan you can get data 
transfer betwin client and ceph cluster around 80MB/s...while client to 
openzfs/ubuntu is around 25MB/S)


Along my path with ceph I first used cephfs, worked fine! until I 
noticed that part of the folder tree suddently randomly disapeared 
forcing a constant periodical remount of the partitions.


Then I choose to forget about cephfs and use rbd images, worked fine!
Until I noticed that rbd replicas where never freed or overwriten and 
that for a replicas set to 2 (data and 1 replica) and an image of 13 TB 
after some time of write erase cycles on the same rbd image I get an 
overall data use of 34 TB over the 36TB available on my cluster I 
noticed that there was a real problem with "space management". The data 
part of the rbd image was properly managed using overwrites on old 
deleted data at OS level, so the only logical explaination of the 
overall data use growth was that the replicas where never freed.


All along that time I was pending of the bugs/ features and advances of 
ceph.
But those isues are not really ceph related they are kernel modules for 
using "ceph clients" so the release of feature add and bug fix are in 
part to be given in the ceph-common package (for the server related 
machanics) and the other part is then to be provided at the kernel level.


For comodity I use Ubuntu which is not really top notch using the very 
lastest brew of the kernel and all the bug fixed modules.


So when I see this great news about giant and the fact that alot of work 
has been done in solving most of the problems we all faced with
ceph then I notice that it will be around a year or so for those fix to 
be production available in ubuntu. There is some inertia there that 
doesn t match with the pace of the work on ceph.


Then people can arg with me "why you use ubuntu?"
and the answers are simple I have a cluster of 10 machines and 1 proxy 
if I need to compile from source lastest brew of ceph and lastest brew 
of kernel then my maintainance time will be way bigger. And I am more 
intended to get something that isn t properly done and have a machine 
that doesn t reboot.
I know what I am talking about I used during several month ceph in 
archlinux compiling kernel and ceph from source until the gcc installed 
on my test server was too new and a compile option had been removed then 
ceph wasn t compiling. That way to proceed was descarted because not 
stable enough to bring production level quality.


So as far as I understand things I will have cephfs enhanced and rbd 
discard ability available at same time using the couple ceph giant and 
linux kernel 3.18 and up ?


regards and thank you again for your hardwork, I wish I could do more to 
help.



---
Alphe Salas
I.T ingeneer

On 10/15/2014 11:58 AM, Sage Weil wrote:

On Wed, 15 Oct 2014, Amon Ott wrote:

Am 15.10.2014 14:11, schrieb Ric Wheeler:

On 10/15/2014 08:43 AM, Amon Ott wrote:

Am 14.10.2014 16:23, schrieb Sage Weil:

On Tue, 14 Oct 2014, Amon Ott wrote:

Am 13.10.2014 20:16, schrieb Sage Weil:

We've been doing a lot of work on CephFS over the past few months.
This
is an update on the current state of things as of Giant.

...

* Either the kernel client (kernel 3.17 or later) or userspace
(ceph-fuse
or libcephfs) clients are in good working order.

Thanks for all the work and specially for concentrating on CephFS! We
have been watching and testing for years by now and really hope to
change our Clusters to CephFS soon.

For kernel maintenance reasons, we only want to run longterm stable
kernels. And for performance reasons and because of severe known
problems we want to avoid Fuse. How good are our chances of a stable
system with the kernel client in the latest longterm kernel 3.14? Will
there be further bugfixes or feature backports?

There are important bug fixes missing from 3.14.  IIRC, the EC, cache
tiering, and firefly CRUSH changes aren't there yet either (they
landed in
3.15), and that is not appropriate for a stable series.

They can be backported, but no commitment yet on that :)

If the bugfixes are easily identified in one of your Ceph git branches,
I would even try to backport them myself. Still, I would rather see
someone from the Ceph team with deeper knowledge of the code port them.

IMHO, it would be good for Ceph to have stable support in at least

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Dan van der Ster

Hi Chad,
That sounds bizarre to me, and I can't reproduce it. I added an osd (which was 
previously not in the crush map) to a fake host=test:

   ceph osd crush create-or-move osd.52 1.0 rack=RJ45 host=test

that resulted in some data movement of course. Then I removed that osd from the 
crush map:

   ceph osd crush rm osd.52

which left the test host in the crushmap but now it's weight is zero. I waited 
until all the PGs where active and clean, then removed that host:

   ceph osd crush remove test

And there was no data movement.

As far as I've experienced, an entry in the crush map with a _crush_ weight of 
zero is equivalent to that entry not being in the map. (In fact, I use this to 
drain OSDs ... I just ceph osd crush reweight osd.X 0, then sometime later I 
crush rm the osd, without incurring any secondary data movement).

Cheers, Dan

October 15 2014 6:07 PM, "Chad Seys"  wrote: 
> Hi all,
> When I remove all OSDs on a given host, then wait for all objects (PGs?) to
> be to be active+clean, then remove the host (ceph osd crush remove hostname),
> that causes the objects to shuffle around the cluster again.
> Why does the CRUSH map depend on hosts that no longer have OSDs on them?
> 
> A wonderment question,
> C.
> 
> ___
> 
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Firefly maintenance release schedule

2014-10-15 Thread Gregory Farnum

On Wed, Oct 15, 2014 at 9:39 AM, Dmitry Borodaenko
 wrote:
> On Tue, Sep 30, 2014 at 6:49 PM, Dmitry Borodaenko
>  wrote:
>> Last stable Firefly release (v0.80.5) was tagged on July 29 (over 2
>> months ago). Since then, there were twice as many commits merged into
>> the firefly branch than there existed on the branch before v0.80.5:
>>
>> $ git log --oneline --no-merges v0.80..v0.80.5|wc -l
>> 122
>> $ git log --oneline --no-merges v0.80.5..firefly|wc -l
>> 227
>>
>> Is this a one time aberration in the process or should we expect the
>> gap between maintenance updates for LTS releases of Ceph to keep
>> growing?
>
> I didn't get a response to that nag other than the v0.80.6 release
> announcement on the day after, so I guess it wasn't completely ignored
> :)
>
> Except it turned out v0.80.6 was slightly less than useful as a
> maintenance release:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/043701.html
>
> Two weeks later we have v0.80.7 with 3 more commits that hopefully
> make it actually usable. There are many ways to look at that from
> release management perspective.
>
> Good: 2 weeks is much better than 2 months.
> Bad: that's 2.5 months since last *stable* Firefly release.
> Ugly: that's 2 weeks for 3 commits, and now we have 54 more waiting
> for the next release...
>
> Wait what?! Oh right, 54 more commits were merged from firefly-next as
> soon as v0.80.7 was tagged:
> $ git log --oneline --no-merges v0.80.7..firefly|wc -l
> 54
>
> Some of these are fixes for Urgent priority bugs, crashes, and data loss:
> http://tracker.ceph.com/issues/9492
> http://tracker.ceph.com/issues/9039
> http://tracker.ceph.com/issues/9582
> http://tracker.ceph.com/issues/9307
> etc.
>
> So what a Ceph deployer supposed to do with this? Wait another couple
> of weeks (hopefully) for v0.80.8? Take v0.80.7 and hope not to
> encounter any of these bugs? Or label Firefly as "not production ready
> yet" and go back to Dumpling? My personal preference obviously would
> be the first option, but waiting for 2.5 more months is not going to
> fit my schedule :(

Take .80.7. All of the bugs you've cited, you are supremely unlikely
to run into. The "Urgent" tag is a measure of planning priority, not
of impact to users; here it generally means "we found a bug on a
stable branch that we can reproduce". Taking them in order:
http://tracker.ceph.com/issues/9492: only happens if you try and cheat
with your CRUSH rules, and obviously nobody did that until Sage
suggested it as a solution to the problem somebody had 29 days ago
when this was discovered.
http://tracker.ceph.com/issues/9039: The most serious here, but only
happens if you're using RGW, and storing user data in multiple pools,
and issue a COPY command to copy data between different pools.
http://tracker.ceph.com/issues/9582: Only happens if you're using the
op timeout feature of librados with the C bindings OR the op timeout
feature *and* the user-provided buffers in the C++ interface. (To the
best of my knowledge, the people who discovered this are the only ones
using op timeouts.)
http://tracker.ceph.com/issues/9307: I'm actually not sure what's
going on here; looks like some kind of extremely rare race when
authorizing requests? (ie, fixed by a retry)

We messed up the v0.80.6 release in a very specific way (and if you
were deploying a new cluster it wasn't a problem), but you're
extrapolating too much from the presence of patches about what their
impact is and what the system's stability is. These are largely
cleaning up rough edges around user interfaces, and smoothing out
issues in the new functionality that a standard deployment isn't going
to experience. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Chad Seys

Hi Mariusz,

> Usually removing OSD without removing host happens when you
> remove/replace dead drives.
> 
> Hosts are in map so
> 
> * CRUSH wont put 2 copies on same node
> * you can balance around network interface speed

That does not answer the original question IMO: "Why does the CRUSH map depend 
on hosts that no longer have OSDs on them?"

But I think it does answer the question "Why does the CRUSH map depend on OSDs 
AND hosts?"

> The question should be "why you remove all OSDs if you are going to
> remove host anyway" :)

This is your question, not mine!  :)
I am decommissioning the entire node.  What is the recommended (fastest yet 
safe) way of doing this?  I am currently follwing the current procedure

for all osdnum on server:
  ceph osd crush remove osd.osdnum

#wait for health to not be degraded, migration stops

for all osdnum on server:
  stop osdnum on server
  ceph auth del osd.osdnum
  ceph osd rm osdnum

# no new migration

# remove server with no OSD from CRUSH
ceph osd crush remove server
# lots of migration!

Thanks!
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Chad Seys

Hi Dan,
  I'm using Emperor (0.72).  Though I would think CRUSH maps have not changed 
that much btw versions?

> That sounds bizarre to me, and I can't reproduce it. I added an osd (which
> was previously not in the crush map) to a fake host=test:
> 
>ceph osd crush create-or-move osd.52 1.0 rack=RJ45 host=test

I have flatter failure domain with only servers/drives.  Looks like you would 
have at least rack/server/drive.  Would that make the difference?

> As far as I've experienced, an entry in the crush map with a _crush_ weight
> of zero is equivalent to that entry not being in the map. (In fact, I use
> this to drain OSDs ... I just ceph osd crush reweight osd.X 0, then
> sometime later I crush rm the osd, without incurring any secondary data
> movement).

Is the crush weight the second column of ceph osd tree ?
I'll have to pay attention to that next time I drain a node.

Thanks for investigating!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Dan van der Ster

Hi,

October 15 2014 7:05 PM, "Chad Seys"  wrote: 
> Hi Dan,
> I'm using Emperor (0.72). Though I would think CRUSH maps have not changed
> that much btw versions?

I'm using dumpling, with the hashpspool flag enabled, which I believe could 
have been the only difference.

>> That sounds bizarre to me, and I can't reproduce it. I added an osd (which
>> was previously not in the crush map) to a fake host=test:
>> 
>> ceph osd crush create-or-move osd.52 1.0 rack=RJ45 host=test
> 
> I have flatter failure domain with only servers/drives. Looks like you would
> have at least rack/server/drive. Would that make the difference?

Could be. Now I just tried using testrack, testhost then removing the osd. So I 
have

-30 0   rack testrack
-23 0   host testhost

Then I remove testhost and testrack and there is still no data movement 
afterwards. Our crush rule is doing

rule data {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}

in case that makes a difference.

> 
>> As far as I've experienced, an entry in the crush map with a _crush_ weight
>> of zero is equivalent to that entry not being in the map. (In fact, I use
>> this to drain OSDs ... I just ceph osd crush reweight osd.X 0, then
>> sometime later I crush rm the osd, without incurring any secondary data
>> movement).
> 
> Is the crush weight the second column of ceph osd tree ?

Yes, that's the one I'm talking about. The reweight (0-1 value in the rightmost 
column) is another thing altogether.

Cheers, Dan

> I'll have to pay attention to that next time I drain a node.
> 
> Thanks for investigating!
> Chad. 
> ___
> 
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Daniel Schwager

Loic,

> > root@ceph-node3:~#  smartctl -a /dev/sdd | less
> > === START OF INFORMATION SECTION ===
> > Device Model: ST4000NM0033-9ZM170
> > Serial Number:Z1Z5LGBX
> >
.. 
> > admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
> > [osd.16]
> > ...
> > devs = 
> > /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
> >
> 
> I'm curious about what this is used for.

The "normal" device path /dev/sdd1 can change dependent on the amount/order of 
disks/controllers. So, using the scsi-path (containing the serial number)  is 
always unique:

root@ceph-node3:~# ls -altr /dev/sdd1
brw-rw---T 1 root disk 8, 49 Okt 15 10:06 /dev/sdd1

root@ceph-node3:~# ls -altr 
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
lrwxrwxrwx 1 root root 10 Okt 15 10:06 
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 -> ../../sdd1

regards
Danny


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] mds isn't working anymore after osd's running full

2014-10-15 Thread John Spray

Sadly undump has been broken for quite some time (it was fixed in
giant as part of creating cephfs-journal-tool).  If there's a one line
fix for this then it's probably worth putting in firefly since it's a
long term supported branch -- I'll do that now.

John

On Wed, Oct 15, 2014 at 8:23 AM, Jasper Siero
 wrote:
> Hello Greg,
>
> The dump and reset of the journal was succesful:
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --dump-journal 0 journaldumptgho-mon001
> journal is 9483323613~134215459
> read 134213311 bytes at offset 9483323613
> wrote 134213311 bytes at offset 9483323613 to journaldumptgho-mon001
> NOTE: this is a _sparse_ file; you can
> $ tar cSzf journaldumptgho-mon001.tgz journaldumptgho-mon001
>   to efficiently compress it while preserving sparseness.
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --reset-journal 0
> old journal was 9483323613~134215459
> new journal start will be 9621733376 (4194304 bytes past old end)
> writing journal head
> writing EResetJournal entry
> done
>
>
> Undumping the journal was not successful and looking into the error 
> "client_lock.is_locked()" is showed several times. The mds is not running 
> when I start the undumping so maybe have forgot something?
>
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --pid-file 
> /var/run/ceph/mds.th1-mon001.pid -c /etc/ceph/ceph.conf --cluster ceph 
> --undump-journal 0 journaldumptgho-mon001
> undump journaldumptgho-mon001
> start 9483323613 len 134213311
> writing header 200.
> osdc/Objecter.cc: In function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' 
> thread 7fec3e5ad7a0 time 2014-10-15 09:09:32.020287
> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x80f15e]
>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  3: (main()+0x1632) [0x569c62]
>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  5: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In function 
> 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 time 
> 2014-10-15 09:09:32.020287
> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x80f15e]
>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  3: (main()+0x1632) [0x569c62]
>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  5: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
>  0> 2014-10-15 09:09:32.021313 7fec3e5ad7a0 -1 osdc/Objecter.cc: In 
> function 'ceph_tid_t Objecter::op_submit(Objecter::Op*)' thread 7fec3e5ad7a0 
> time 2014-10-15 09:09:32.020287
> osdc/Objecter.cc: 1225: FAILED assert(client_lock.is_locked())
>
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c
> [root@th1-mon001 ~]# /usr/bin/ceph-mds -i th1-mon001 --p8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x80f15e]
>  2: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  3: (main()+0x1632) [0x569c62]
>  4: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  5: /usr/bin/ceph-mds() [0x567d99]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
>
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> *** Caught signal (Aborted) **
>  in thread 7fec3e5ad7a0
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x82ef61]
>  2: (()+0xf710) [0x7fec3d9a6710]
>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>  4: (abort()+0x175) [0x7fec3ca7de15]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>  6: (()+0xbcbe6) [0x7fec3d334be6]
>  7: (()+0xbcc13) [0x7fec3d334c13]
>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x7f2) [0x94b812]
>  10: /usr/bin/ceph-mds() [0x80f15e]
>  11: (Dumper::undump(char const*)+0x65d) [0x56c7ad]
>  12: (main()+0x1632) [0x569c62]
>  13: (__libc_start_main()+0xfd) [0x7fec3ca68d5d]
>  14: /usr/bin/ceph-mds() [0x567d99]
> 2014-10-15 09:09:32.024248 7fec3e5ad7a0 -1 *** Caught signal (Aborted) **
>  in thread 7fec3e5ad7a0
>
>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>  1: /usr/bin/ceph-mds() [0x82ef61]
>  2: (()+0xf710) [0x7fec3d9a6710]
>  3: (gsignal()+0x35) [0x7fec3ca7c635]
>  4: (abort()+0x175) [0x7fec3ca7de15]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fec3d336a5d]
>  6: (()+0xbcbe6) [0x7fec3d334be6]
>  7: (()+0xbcc13) [0x7fec3d334c13]
>  8: (()+0xbcd0e) [0x7fec3d334d0e]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x7f2)

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-15 Thread lakshmi k s

Hello Mark - 

I setup a new Ceph cluster like before. But this time it is talking to 
Icehouse. Same set of problems like before. That is keystone flags are not 
being honored if they are under [client.radosgw.gateway]. It seems like the 
issue is with my radosgw setup. Let me create a new thread for this new issue. 

Thanks much for all your help so far.

Regards,
Lakshmi.

On Wednesday, October 15, 2014 6:54 AM, lakshmi k s  wrote:

Thanks Mark for looking into this further. As I mentioned earlier, I have 
following nodes in my ceph cluster - 

1 admin node
3 OSD (One of them is a monitor too)
1 gateway node

This should have worked technically. But I am not sure where I am going wrong. 
I will continue to look into this and keep you all posted.

Thanks,
Lakshmi.

On Wednesday, October 15, 2014 2:00 AM, Mark Kirkwood 
 wrote:

Because this is an interesting problem, I added an additional host to my 
4 node ceph setup that is a purely radosgw host. So I have
- ceph1 (mon + osd)
- ceph2-4 (osd)
- ceph5 (radosgw)

My ceph.conf on ceph5 included below. Obviously I changed my keystone 
endpoints to use this host (ceph5). After that I am unable to reproduce 
your problem - for a moment I thought I had, but it was just that I had 
forgotten to include the keystone config in there at all! So it is now 
working fine. My guess is that there is something subtle broken in your 
config that we have yet to see...

(ceph5) $ cat /etc/ceph/ceph.conf

[global]
fsid = 2ea9a745-d84c-4fc5-95b4-2f6afa98ece1
mon_initial_members = ceph1
mon_host = 192.168.122.21
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pg_bits = 7
osd_pgp_bits = 7
osd_journal_size = 2048

[client.radosgw.gateway]
host = ceph5
keyring = /etc/ceph/ceph.rados.gateway.keyring
rgw_socket_path = /var/run/ceph/$name.sock
log_file = /var/log/ceph/radosgw.log
rgw_data = /var/lib/ceph/radosgw/$cluster-$id
rgw_dns_name = ceph5
rgw print continue = false
debug rgw = 20
rgw keystone url = http://stack1:35357
rgw keystone admin token = tokentoken
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss/

On 15/10/14 10:25, Mark Kirkwood wrote:
> Right,
>
> So you have 3 osds, one of whom is a mon. Your rgw is on another host
> (called gateway it seems). I'm wondering if is this the issue. In my
> case I'm using one of my osds as a rgw as well. This *should* not
> matter... but it might be worth trying out a rgw on one of your osds
> instead. I'm thinking that your gateway host is setup in some
 way that
> is confusing the [client.radosgw.gatway] entry in ceph.conf (e.g
> hostname resolution).
>
> Regards
>
> Mark
>
> On 15/10/14 05:40, lakshmi k s wrote:
>> Hello Mark - with rgw_keystone_url under radosgw section, I do NOT see
>> keystone handshake. If I move it under global section, I see initial
>> keystone handshake as explained earlier. Below is the output of osd dump
>> and osd tree. I have 3 nodes (node1, node2, node3) acting as OSDs. One
>> of them (node1) is also a monitor node. I also have an admin node and
>> gateway node in ceph cluster. Keystone server (swift client) of course
>> is all together a different Openstack setup. Let me
 know if you need any
>> more information.
>>
>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] (no subject)

2014-10-15 Thread lakshmi k s

I am trying to integrate Openstack keystone with radosgw. I have followed the 
instructions as per the link - http://ceph.com/docs/master/radosgw/keystone/. 
But for some reason, keystone flags under [client.radosgw.gateway] section are 
not being honored. That means, presence of these flags never attempt to use 
keystone. Hence, any swift v2.0 calls results in 401-Authorization problem. But 
If I move the keystone url outside under global section, I see that there is 
initial keystone handshake between keystone and gateway nodes. 

Please note that swift v1 calls (without using keystone) work great. 
Any thoughts on how to resolve this problem?


ceph.conf

[global]
fsid = f216cbe1-fa49-42ed-b28a-322aa3d48fff
mon_initial_members = node1
mon_host = 192.168.122.182
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

[client.admin]

keyring = /etc/ceph/ceph.client.admin.keyring

[client.radosgw.gateway]
host = radosgw
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw dns name = radosgw

rgw keystone url = http://192.168.122.165:5000
rgw keystone admin token = faedf7bc53e3371924e7b3ddb9d13ddd
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss

Thanks much.

Lakshmi.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Iban Cabrillo

HI Cephers,

 I have an other question related to this issue, What would be the
procedure to restore a server fail (a whole server for example due to a
mother board trouble with no damage on disk).

Regards, I

2014-10-15 20:22 GMT+02:00 Daniel Schwager :

> Loic,
>
> > > root@ceph-node3:~#  smartctl -a /dev/sdd | less
> > > === START OF INFORMATION SECTION ===
> > > Device Model: ST4000NM0033-9ZM170
> > > Serial Number:Z1Z5LGBX
> > >
> ..
> > > admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
> > > [osd.16]
> > > ...
> > > devs =
> /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
> > >
> >
> > I'm curious about what this is used for.
>
> The "normal" device path /dev/sdd1 can change dependent on the
> amount/order of disks/controllers. So, using the scsi-path (containing the
> serial number)  is always unique:
>
> root@ceph-node3:~# ls -altr /dev/sdd1
> brw-rw---T 1 root disk 8, 49 Okt 15 10:06 /dev/sdd1
>
> root@ceph-node3:~# ls -altr
> /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
> lrwxrwxrwx 1 root root 10 Okt 15 10:06
> /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 -> ../../sdd1
>
> regards
> Danny
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 

Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=get&search=0xD9DF0B3D6C8C08AC

Bertrand Russell:
*"El problema con el mundo es que los estúpidos están seguros de todo y los
inteligentes están llenos de dudas*"
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] converting legacy puppet-ceph configured OSDs to look like ceph-deployed OSDs

2014-10-15 Thread Dan van der Ster

Hi Ceph users,

(sorry for the novel, but perhaps this might be useful for someone)

During our current project to upgrade our cluster from disks-only to
SSD journals, we've found it useful to convert our legacy puppet-ceph
deployed cluster (using something like the enovance module) to one that
looks like it has had its OSD created with ceph-disk prepare. It's been
educational for me, and I thought it would be good experience to share.

To start, the "old" puppet-ceph configures OSDs explicitly in
ceph.conf, like this:

[osd.211]
   host = p05151113489275
   devs = /dev/disk/by-path/pci-:02:00.0-sas-...-lun-0-part1

and ceph-disk list says this about the disks:

/dev/sdh :
 /dev/sdh1 other, xfs, mounted on /var/lib/ceph/osd/osd.211

In other words, ceph-disk doesn't know anything about the OSD living
on that disk.

Before deploying our SSD journals I was trying to find the best way to
map OSDs to SSD journal partitions (in puppet!), but basically there is
no good way to do this with the legacy puppet-ceph module. (What we'd have
to do is puppetize the partitioning of SSDs, then manually map OSDs to
SSD partitions. This would be tedious, and also error prone after
disk replacements and reboots).

However, I've found that by using ceph-deploy, i.e ceph-disk, to
prepare and activate OSDs, this becomes very simple, trivial even. Using
ceph-disk we keep the OSD/SSD mapping out of puppet; instead the state is
stored in the OSD itself. (1.5 years ago when we deployed this cluster,
ceph-deploy was advertised as quick tool to spin up small clusters, so we
didn't dare
use it. I realize now that it (or the puppet/chef/... recipes based on
it) is _the_only_way_ to build a cluster if you're starting out today.)

Now our problem was that I couldn't go and re-ceph-deploy the whole
cluster, since we've got some precious user data there. Instead, I needed
to learn how ceph-disk is labeling and preparing disks, and modify our
existing OSDs in place to look like they'd been prepared and activated with
ceph-disk.

In the end, I've worked out all the configuration and sgdisk magic and
put the recipes into a couple of scripts here [1]. Note that I do not
expect these to work for any other cluster unmodified. In fact, that would
be dangerous, so don't blame me if you break something. But they might
helpful for understanding how the ceph-disk udev magic works and could be a
basis for upgrading other clusters.

The scripts are:

ceph-deployifier/ceph-create-journals.sh:
  - this script partitions SSDs (assuming sda to sdd) with 5 partitions each
  - the only trick is to add the partition name 'ceph journal' and set the
typecode to the magic JOURNAL_UUID along with a random partition guid

ceph-deployifier/ceph-label-disks.sh:
  - this script discovers the next OSD which is not prepared with
ceph-disk, finds an appropriate unused journal partition, and converts the
OSD to a ceph-disk prepared lookalike.
  - aside from the discovery part, the main magic is to:
- create the files active, sysvinit and journal_uuid on the OSD
- rename the partition to 'ceph data', set the typecode to the
magic OSD_UUID, and the partition guid to the OSD's uuid.
- link to the /dev/disk/by-partuuid/ journal symlink, and make the
new journal
  - at the end, udev is triggered and the OSD is started (via the
ceph-disk activation magic)

The complete details are of course in the scripts. (I also have
another version of ceph-label-disks.sh that doesn't expect an SSD journal
but instead prepares the single disk 2 partitions scheme.)

After running these scripts you'll get a nice shiny ceph-disk list output:

/dev/sda :
 /dev/sda1 ceph journal, for /dev/sde1
 /dev/sda2 ceph journal, for /dev/sdf1
 /dev/sda3 ceph journal, for /dev/sdg1
...
/dev/sde :
 /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sda1
/dev/sdf :
 /dev/sdf1 ceph data, active, cluster ceph, osd.8, journal /dev/sda2
/dev/sdg :
 /dev/sdg1 ceph data, active, cluster ceph, osd.12, journal /dev/sda3
...

And all of the udev magic is working perfectly. I've tested all of the
reboot, failed OSD, and failed SSD scenarios and it all works as it should.
And the puppet-ceph manifest for osd's is now just a very simple wrapper
around ceph-disk prepare. (I haven't published ours to github yet, but it
is very similar to the stackforge puppet-ceph manifest).

There you go, sorry that was so long. I hope someone finds this useful :)

Best Regards,
Dan

[1]
https://github.com/cernceph/ceph-scripts/tree/master/tools/ceph-deployifier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Ceph storage pool definition with KVM/libvirt

2014-10-15 Thread Dan Geist

I'm leveraging Ceph in a vm prototyping environment currently and am having 
issues abstracting my VM definitions from the storage pool (to use a libvirt 
convention). 

I'm able to use the rbd support within the disk configuration of individual VMs 
but am struggling to find a good reference for abstracting it to a storage 
pool. How do I pull the source definition from below to the pool definition?



  
  

  
  



  
  
  



Thanks.
Dan

-- 
Dan Geist dan(@)polter.net
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] converting legacy puppet-ceph configured OSDs to look like ceph-deployed OSDs

2014-10-15 Thread Mike Dawson


On 10/15/2014 4:20 PM, Dan van der Ster wrote:

Hi Ceph users,

(sorry for the novel, but perhaps this might be useful for someone)

During our current project to upgrade our cluster from disks-only to
SSD journals, we've found it useful to convert our legacy puppet-ceph
deployed cluster (using something like the enovance module) to one that
looks like it has had its OSD created with ceph-disk prepare. It's been
educational for me, and I thought it would be good experience to share.

To start, the "old" puppet-ceph configures OSDs explicitly in
ceph.conf, like this:

[osd.211]
host = p05151113489275
devs = /dev/disk/by-path/pci-:02:00.0-sas-...-lun-0-part1

and ceph-disk list says this about the disks:

/dev/sdh :
  /dev/sdh1 other, xfs, mounted on /var/lib/ceph/osd/osd.211

In other words, ceph-disk doesn't know anything about the OSD living
on that disk.

Before deploying our SSD journals I was trying to find the best way to
map OSDs to SSD journal partitions (in puppet!), but basically there is
no good way to do this with the legacy puppet-ceph module. (What we'd
have to do is puppetize the partitioning of SSDs, then manually map OSDs
to SSD partitions. This would be tedious, and also error prone after
disk replacements and reboots).

However, I've found that by using ceph-deploy, i.e ceph-disk, to
prepare and activate OSDs, this becomes very simple, trivial even. Using
ceph-disk we keep the OSD/SSD mapping out of puppet; instead the state
is stored in the OSD itself. (1.5 years ago when we deployed this
cluster, ceph-deploy was advertised as quick tool to spin up small
clusters, so we didn't dare
use it. I realize now that it (or the puppet/chef/... recipes based on
it) is _the_only_way_ to build a cluster if you're starting out today.)

Now our problem was that I couldn't go and re-ceph-deploy the whole
cluster, since we've got some precious user data there. Instead, I
needed to learn how ceph-disk is labeling and preparing disks, and
modify our existing OSDs in place to look like they'd been prepared and
activated with ceph-disk.

In the end, I've worked out all the configuration and sgdisk magic and
put the recipes into a couple of scripts here [1]. Note that I do not
expect these to work for any other cluster unmodified. In fact, that
would be dangerous, so don't blame me if you break something. But they
might helpful for understanding how the ceph-disk udev magic works and
could be a basis for upgrading other clusters.

The scripts are:

ceph-deployifier/ceph-create-journals.sh:
   - this script partitions SSDs (assuming sda to sdd) with 5 partitions
each
   - the only trick is to add the partition name 'ceph journal' and set
the typecode to the magic JOURNAL_UUID along with a random partition guid

ceph-deployifier/ceph-label-disks.sh:
   - this script discovers the next OSD which is not prepared with
ceph-disk, finds an appropriate unused journal partition, and converts
the OSD to a ceph-disk prepared lookalike.
   - aside from the discovery part, the main magic is to:
 - create the files active, sysvinit and journal_uuid on the OSD
 - rename the partition to 'ceph data', set the typecode to the
magic OSD_UUID, and the partition guid to the OSD's uuid.
 - link to the /dev/disk/by-partuuid/ journal symlink, and make the
new journal
   - at the end, udev is triggered and the OSD is started (via the
ceph-disk activation magic)

The complete details are of course in the scripts. (I also have
another version of ceph-label-disks.sh that doesn't expect an SSD
journal but instead prepares the single disk 2 partitions scheme.)

After running these scripts you'll get a nice shiny ceph-disk list output:

/dev/sda :
  /dev/sda1 ceph journal, for /dev/sde1
  /dev/sda2 ceph journal, for /dev/sdf1
  /dev/sda3 ceph journal, for /dev/sdg1
...
/dev/sde :
  /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sda1
/dev/sdf :
  /dev/sdf1 ceph data, active, cluster ceph, osd.8, journal /dev/sda2
/dev/sdg :
  /dev/sdg1 ceph data, active, cluster ceph, osd.12, journal /dev/sda3
...

And all of the udev magic is working perfectly. I've tested all of the
reboot, failed OSD, and failed SSD scenarios and it all works as it
should. And the puppet-ceph manifest for osd's is now just a very simple
wrapper around ceph-disk prepare. (I haven't published ours to github
yet, but it is very similar to the stackforge puppet-ceph manifest).

There you go, sorry that was so long. I hope someone finds this useful :)

Best Regards,
Dan

[1]
https://github.com/cernceph/ceph-scripts/tree/master/tools/ceph-deployifier



Dan,

Thank you for publishing this! I put some time into this very issue 
earlier this year, but got pulled in another direction before completing 
the work. I'd like to bring a production cluster deployed with mkcephfs 
out of the stone ages, so your work will be very useful to me.


Thanks again,
Mike Dawson




___
ceph-users mailing list

Re: [ceph-users] Radosgw refusing to even attempt to use keystone auth

2014-10-15 Thread Mark Kirkwood


On 16/10/14 09:08, lakshmi k s wrote:

I am trying to integrate Openstack keystone with radosgw. I have
followed the instructions as per the link -
http://ceph.com/docs/master/radosgw/keystone/. But for some reason,
keystone flags under [client.radosgw.gateway] section are not being
honored. That means, presence of these flags never attempt to use
keystone. Hence, any swift v2.0 calls results in 401-Authorization
problem. But If I move the keystone url outside under global section, I
see that there is initial keystone handshake between keystone and
gateway nodes.

Please note that swift v1 calls (without using keystone) work great.
Any thoughts on how to resolve this problem?

ceph.conf

[global]
fsid = f216cbe1-fa49-42ed-b28a-322aa3d48fff
mon_initial_members = node1
mon_host = 192.168.122.182
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

[client.admin]
keyring = /etc/ceph/ceph.client.admin.keyring

[client.radosgw.gateway]
host = radosgw
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw dns name = radosgw

rgw keystone url = http://192.168.122.165:5000
rgw keystone admin token = faedf7bc53e3371924e7b3ddb9d13ddd
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss




I have managed to to reproduce this:

If I copy your [client.radosgw.gateway] section and amend the obvious 
differences (hostnames and ips, and socket paths), then I too see auth 
failed and no sign of any attempt to use keystone auth logged. Making 
the following change:


- rgw keystone url = http://192.168.122.165:5000
+ rgw keystone url = http://192.168.122.165:35357

makes it work again. I'm guessing it is tied up with with the fact we 
needed to add WSGI Chunked encoding... and we did that only for the 
35357 keystone virtualhost (I guess I can add it to 5000 too and see if 
that fixes it). I does seem odd that there is no log entry on the rgw... 
but it may be failing before the call gets logged (will look).


Regards

Mark

P.s: Added $SUBJECT header.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Firefly maintenance release schedule

2014-10-15 Thread Dmitry Borodaenko

Gregory,

Thanks for prompt response, we'll go with v0.80.7.

It still would be nice if v0.80.8 doesn't take as long as v0.80.6, I
suspect one of the reasons you messed it up was too many commits
without intermediate releases.

On Wed, Oct 15, 2014 at 9:59 AM, Gregory Farnum  wrote:
> Take .80.7. All of the bugs you've cited, you are supremely unlikely
> to run into. The "Urgent" tag is a measure of planning priority, not
> of impact to users; here it generally means "we found a bug on a
> stable branch that we can reproduce". Taking them in order:
> http://tracker.ceph.com/issues/9492: only happens if you try and cheat
> with your CRUSH rules, and obviously nobody did that until Sage
> suggested it as a solution to the problem somebody had 29 days ago
> when this was discovered.
> http://tracker.ceph.com/issues/9039: The most serious here, but only
> happens if you're using RGW, and storing user data in multiple pools,
> and issue a COPY command to copy data between different pools.
> http://tracker.ceph.com/issues/9582: Only happens if you're using the
> op timeout feature of librados with the C bindings OR the op timeout
> feature *and* the user-provided buffers in the C++ interface. (To the
> best of my knowledge, the people who discovered this are the only ones
> using op timeouts.)
> http://tracker.ceph.com/issues/9307: I'm actually not sure what's
> going on here; looks like some kind of extremely rare race when
> authorizing requests? (ie, fixed by a retry)
>
> We messed up the v0.80.6 release in a very specific way (and if you
> were deploying a new cluster it wasn't a problem), but you're
> extrapolating too much from the presence of patches about what their
> impact is and what the system's stability is. These are largely
> cleaning up rough edges around user interfaces, and smoothing out
> issues in the new functionality that a standard deployment isn't going
> to experience. :)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com



-- 
Dmitry Borodaenko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw refusing to even attempt to use keystone auth

2014-10-15 Thread lakshmi k s

Hello Mark - 

Changing the rwg keystone url to http://192.168.122.165:35357 did not help. I 
continue to get 401 error. Also, I am trying to integrate with Icehouse this 
time. I did not see any keystone.conf in /etc/apache2/sites-available for 
adding WSGI chunked encoding. That said, I am having issues with initial 
keystone handshake itself. 

Thanks,
Lakshmi.

On Wednesday, October 15, 2014 2:37 PM, Mark Kirkwood 
 wrote:

On 16/10/14 09:08, lakshmi k s wrote:
> I am trying to integrate Openstack keystone with radosgw. I have
> followed the instructions as per the link -
> http://ceph.com/docs/master/radosgw/keystone/. But for some reason,
> keystone flags under [client.radosgw.gateway] section are not being
> honored. That means, presence of these flags never attempt to use
> keystone. Hence, any swift v2.0 calls results in 401-Authorization
> problem. But If I move the keystone url outside under global section, I
> see that there is initial keystone handshake between keystone and
> gateway nodes.
>
> Please note that swift v1 calls (without using keystone) work great.
> Any thoughts on how to resolve this problem?
>
> ceph.conf
>
> [global]
> fsid = f216cbe1-fa49-42ed-b28a-322aa3d48fff
> mon_initial_members = node1
> mon_host = 192.168.122.182
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
>
> [client.admin]
> keyring = /etc/ceph/ceph.client.admin.keyring
>
> [client.radosgw.gateway]
> host = radosgw
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> log file = /var/log/ceph/client.radosgw.gateway.log
> rgw dns name = radosgw
>
> rgw keystone url = http://192.168.122.165:5000
> rgw keystone admin token = faedf7bc53e3371924e7b3ddb9d13ddd
> rgw keystone accepted roles = admin Member _member_
> rgw keystone token cache size = 500
> rgw keystone revocation interval = 500
> rgw s3 auth use keystone = true
> nss db path = /var/ceph/nss
>
>

I have managed to to reproduce this:

If I copy your [client.radosgw.gateway] section and amend the obvious 
differences (hostnames and ips, and socket paths), then I too see auth 
failed and no sign of any attempt to use keystone auth logged. Making 
the following change:

- rgw keystone url = http://192.168.122.165:5000

+ rgw keystone url = http://192.168.122.165:35357

makes it work again. I'm guessing it is tied up with with the fact we 
needed to add WSGI Chunked encoding... and we did that only for the 
35357 keystone virtualhost (I guess I can add it to 5000 too and see if 
that fixes it). I does seem odd that there is no log entry on the rgw... 
but it may be failing before the call gets logged (will look).

Regards

Mark

P.s: Added $SUBJECT header.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw refusing to even attempt to use keystone auth

2014-10-15 Thread Mark Kirkwood


On 16/10/14 10:37, Mark Kirkwood wrote:

On 16/10/14 09:08, lakshmi k s wrote:

I am trying to integrate Openstack keystone with radosgw. I have
followed the instructions as per the link -
http://ceph.com/docs/master/radosgw/keystone/. But for some reason,
keystone flags under [client.radosgw.gateway] section are not being
honored. That means, presence of these flags never attempt to use
keystone. Hence, any swift v2.0 calls results in 401-Authorization
problem. But If I move the keystone url outside under global section, I
see that there is initial keystone handshake between keystone and
gateway nodes.

Please note that swift v1 calls (without using keystone) work great.
Any thoughts on how to resolve this problem?

ceph.conf

[global]
fsid = f216cbe1-fa49-42ed-b28a-322aa3d48fff
mon_initial_members = node1
mon_host = 192.168.122.182
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true

[client.admin]
keyring = /etc/ceph/ceph.client.admin.keyring

[client.radosgw.gateway]
host = radosgw
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw dns name = radosgw

rgw keystone url = http://192.168.122.165:5000
rgw keystone admin token = faedf7bc53e3371924e7b3ddb9d13ddd
rgw keystone accepted roles = admin Member _member_
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = true
nss db path = /var/ceph/nss




I have managed to to reproduce this:

If I copy your [client.radosgw.gateway] section and amend the obvious
differences (hostnames and ips, and socket paths), then I too see auth
failed and no sign of any attempt to use keystone auth logged. Making
the following change:

- rgw keystone url = http://192.168.122.165:5000
+ rgw keystone url = http://192.168.122.165:35357

makes it work again. I'm guessing it is tied up with with the fact we
needed to add WSGI Chunked encoding... and we did that only for the
35357 keystone virtualhost (I guess I can add it to 5000 too and see if
that fixes it). I does seem odd that there is no log entry on the rgw...
but it may be failing before the call gets logged (will look).





So amending the keystone site config:


...
WSGIChunkedRequest On
...


makes the original keystone url with port 5000 work too.

The logging business is a bit more tricky - I'd copied your 
[client.radosgw.gateway] section which lacks


debug rgw = 20

line, which explains *my* lack of seeing the keystone auth log lines. 
When I add that line I'm seeing the debug auth info (even if I remove 
the WSGI chunking for 5000 and make it fail again).


So Lakshmi, can you add the 'WSGIChunkedRequest On' as inidicated, and 
make sure you have the debug line in there and retest?


Regards

Mark

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ssh; cannot resolve hostname errors

2014-10-15 Thread JIten Shah

Please send your /etc/hosts contents here.

--Jiten

On Oct 15, 2014, at 7:27 AM, Support - Avantek  wrote:

> I may be completely overlooking something here but I keep getting “ssh; 
> cannot resolve hostname” when I try to contact my OSD node’s from my monitor 
> node. I have set the ipaddress’s of the 3 nodes in /etc/hosts as suggested on 
> the website.
>  
> Thanks in advance
>  
> James
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Radosgw refusing to even attempt to use keystone auth

2014-10-15 Thread lakshmi k s

I still think that there is problem with the way radosgw is setup. Two things I 
want to point out - 

1. rgw keystone url - If this flag is under radosgw section of ceph.conf file, 
I do not see the packets being exchanged between keystone and gateway node when 
radosgw is restarted. I tried to run tcpdump on both the nodes. 

2. rgw.keystone url - If this is in global section (which is wrong), I do see 
the packets being exchanged between the nodes when radosgw is restarted. 

I have tried my best to follow the instructions as per 
http://ceph.com/docs/master/radosgw/config/ to setup radosgw. Also with this 
setup, I can still create users using radosgw-admin and make swift v1.0 calls 
from swift-client.

How should I go about resolving this issue? Please help.
Thanks,
Lakshmi.

On Wednesday, October 15, 2014 2:58 PM, Mark Kirkwood 
 wrote:

On 16/10/14 10:37, Mark Kirkwood wrote:
> On 16/10/14 09:08, lakshmi k s wrote:
>> I am trying to integrate Openstack keystone with radosgw. I have
>> followed the instructions as per the link -
>> http://ceph.com/docs/master/radosgw/keystone/. But for some reason,
>> keystone flags under [client.radosgw.gateway] section are not being
>> honored. That means, presence of these flags never attempt to use
>> keystone. Hence, any swift v2.0 calls results in 401-Authorization
>> problem. But If I move the keystone url outside under global section, I
>> see that there is initial keystone handshake between keystone and
>> gateway nodes.
>>
>> Please note that swift v1 calls (without using keystone) work great.
>> Any thoughts on how to resolve this problem?
>>
>> ceph.conf
>>
>> [global]
>> fsid = f216cbe1-fa49-42ed-b28a-322aa3d48fff
>> mon_initial_members = node1
>> mon_host = 192.168.122.182
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> filestore_xattr_use_omap = true
>>
>> [client.admin]
>> keyring = /etc/ceph/ceph.client.admin.keyring
>>
>> [client.radosgw.gateway]
>> host = radosgw
>> keyring = /etc/ceph/ceph.client.radosgw.keyring
>> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
>> log file = /var/log/ceph/client.radosgw.gateway.log
>> rgw dns name = radosgw
>>
>> rgw keystone url = http://192.168.122.165:5000
>> rgw keystone admin token = faedf7bc53e3371924e7b3ddb9d13ddd
>> rgw keystone accepted roles = admin Member _member_
>> rgw keystone token cache size = 500
>> rgw keystone revocation interval = 500
>> rgw s3 auth use keystone = true
>> nss db path = /var/ceph/nss
>>
>>
>
> I have managed to to reproduce this:
>
> If I copy your [client.radosgw.gateway] section and amend the obvious
> differences (hostnames and ips, and socket paths), then I too see auth
> failed and no sign of any attempt to use keystone auth logged. Making
> the following change:
>
> - rgw keystone url = http://192.168.122.165:5000
> + rgw keystone url = http://192.168.122.165:35357
>
> makes it work again. I'm guessing it is tied up with with the fact we
> needed to add WSGI Chunked encoding... and we did that only for the
> 35357 keystone virtualhost (I guess I can add it to 5000 too and see if
> that fixes it). I does seem odd that there is no log entry on the rgw...
> but it may be failing before the call gets logged (will look).
>
>

So amending the keystone site config:

 ...
 WSGIChunkedRequest On
 ...

makes the original keystone url with port 5000 work too.

The logging business is a bit more tricky - I'd copied your 
[client.radosgw.gateway] section which lacks

debug rgw = 20

line, which explains *my* lack of seeing the keystone auth log lines. 
When I add that line I'm seeing the debug auth info (even if I remove 
the WSGI chunking for 5000 and make it fail again).

So Lakshmi, can you add the 'WSGIChunkedRequest On' as inidicated, and 
make sure you have the debug line in there and retest?

Regards

Mark___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] converting legacy puppet-ceph configured OSDs to look like ceph-deployed OSDs

2014-10-15 Thread Loic Dachary

Hi Dan,

Great story and congratulation on the successful conversion :-) There are two 
minor pitfalls left but they are only an inconvenience when testing the 
ceph-disk prepare / udev logic ( https://github.com/ceph/ceph/pull/2717 and 
https://github.com/ceph/ceph/pull/2648 ). 

Cheers

On 15/10/2014 13:20, Dan van der Ster wrote:
> Hi Ceph users,
> 
> (sorry for the novel, but perhaps this might be useful for someone)
> 
> During our current project to upgrade our cluster from disks-only to SSD 
> journals, we've found it useful to convert our legacy puppet-ceph deployed 
> cluster (using something like the enovance module) to one that looks like it 
> has had its OSD created with ceph-disk prepare. It's been educational for me, 
> and I thought it would be good experience to share.
> 
> To start, the "old" puppet-ceph configures OSDs explicitly in ceph.conf, like 
> this:
> 
> [osd.211]
>host = p05151113489275
>devs = /dev/disk/by-path/pci-:02:00.0-sas-...-lun-0-part1
> 
> and ceph-disk list says this about the disks:
> 
> /dev/sdh :
>  /dev/sdh1 other, xfs, mounted on /var/lib/ceph/osd/osd.211
> 
> In other words, ceph-disk doesn't know anything about the OSD living on that 
> disk.
> 
> Before deploying our SSD journals I was trying to find the best way to map 
> OSDs to SSD journal partitions (in puppet!), but basically there is no good 
> way to do this with the legacy puppet-ceph module. (What we'd have to do is 
> puppetize the partitioning of SSDs, then manually map OSDs to SSD partitions. 
> This would be tedious, and also error prone after disk replacements and 
> reboots).
> 
> However, I've found that by using ceph-deploy, i.e ceph-disk, to prepare and 
> activate OSDs, this becomes very simple, trivial even. Using ceph-disk we 
> keep the OSD/SSD mapping out of puppet; instead the state is stored in the 
> OSD itself. (1.5 years ago when we deployed this cluster, ceph-deploy was 
> advertised as quick tool to spin up small clusters, so we didn't dare
> use it. I realize now that it (or the puppet/chef/... recipes based on it) is 
> _the_only_way_ to build a cluster if you're starting out today.)
> 
> Now our problem was that I couldn't go and re-ceph-deploy the whole cluster, 
> since we've got some precious user data there. Instead, I needed to learn how 
> ceph-disk is labeling and preparing disks, and modify our existing OSDs in 
> place to look like they'd been prepared and activated with ceph-disk.
> 
> In the end, I've worked out all the configuration and sgdisk magic and put 
> the recipes into a couple of scripts here [1]. Note that I do not expect 
> these to work for any other cluster unmodified. In fact, that would be 
> dangerous, so don't blame me if you break something. But they might helpful 
> for understanding how the ceph-disk udev magic works and could be a basis for 
> upgrading other clusters.
> 
> The scripts are:
> 
> ceph-deployifier/ceph-create-journals.sh:
>   - this script partitions SSDs (assuming sda to sdd) with 5 partitions each
>   - the only trick is to add the partition name 'ceph journal' and set the 
> typecode to the magic JOURNAL_UUID along with a random partition guid
> 
> ceph-deployifier/ceph-label-disks.sh:
>   - this script discovers the next OSD which is not prepared with ceph-disk, 
> finds an appropriate unused journal partition, and converts the OSD to a 
> ceph-disk prepared lookalike.
>   - aside from the discovery part, the main magic is to:
> - create the files active, sysvinit and journal_uuid on the OSD
> - rename the partition to 'ceph data', set the typecode to the magic 
> OSD_UUID, and the partition guid to the OSD's uuid.
> - link to the /dev/disk/by-partuuid/ journal symlink, and make the new 
> journal
>   - at the end, udev is triggered and the OSD is started (via the ceph-disk 
> activation magic)
> 
> The complete details are of course in the scripts. (I also have another 
> version of ceph-label-disks.sh that doesn't expect an SSD journal but instead 
> prepares the single disk 2 partitions scheme.)
> 
> After running these scripts you'll get a nice shiny ceph-disk list output:
> 
> /dev/sda :
>  /dev/sda1 ceph journal, for /dev/sde1
>  /dev/sda2 ceph journal, for /dev/sdf1
>  /dev/sda3 ceph journal, for /dev/sdg1
> ...
> /dev/sde :
>  /dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sda1
> /dev/sdf :
>  /dev/sdf1 ceph data, active, cluster ceph, osd.8, journal /dev/sda2
> /dev/sdg :
>  /dev/sdg1 ceph data, active, cluster ceph, osd.12, journal /dev/sda3
> ...
> 
> And all of the udev magic is working perfectly. I've tested all of the 
> reboot, failed OSD, and failed SSD scenarios and it all works as it should. 
> And the puppet-ceph manifest for osd's is now just a very simple wrapper 
> around ceph-disk prepare. (I haven't published ours to github yet, but it is 
> very similar to the stackforge puppet-ceph manifest).
> 
> There you go, sorry that was so long. I hope someone finds this u

Re: [ceph-users] Radosgw refusing to even attempt to use keystone auth

2014-10-15 Thread lakshmi k s

Has anyone seen this issue? Appreciate your time.

On Wednesday, October 15, 2014 4:50 PM, lakshmi k s  wrote:

I still think that there is problem with the way radosgw is setup. Two things I 
want to point out - 

1. rgw keystone url - If this flag is under radosgw section of ceph.conf file, 
I do not see the packets being exchanged between keystone and gateway node when 
radosgw is restarted. I tried to run tcpdump on both the nodes. 

2. rgw.keystone url - If this is in global section (which is wrong), I do see 
the packets being exchanged between the nodes when radosgw is restarted. 

I have tried my best to follow the instructions as per 
http://ceph.com/docs/master/radosgw/config/ to setup radosgw. Also with this 
setup, I can still create users using radosgw-admin and make swift v1.0 calls 
from swift-client.

How should I go about resolving this issue? Please help.
Thanks,
Lakshmi.

On Wednesday, October 15, 2014 2:58 PM, Mark Kirkwood 
 wrote:

On 16/10/14 10:37, Mark Kirkwood wrote:
> On 16/10/14 09:08, lakshmi k s wrote:
>> I am trying to integrate Openstack keystone with radosgw. I have
>> followed the instructions as per the link -
>> http://ceph.com/docs/master/radosgw/keystone/. But for some reason,
>> keystone flags under [client.radosgw.gateway] section are not being
>> honored. That means, presence of these flags never attempt
 to use
>> keystone. Hence, any swift v2.0 calls results in 401-Authorization
>> problem. But If I move the keystone url outside under global section, I
>> see that there is initial keystone handshake between keystone and
>> gateway nodes.
>>
>> Please note that swift v1 calls (without using keystone) work great.
>> Any thoughts on how to resolve this problem?
>>
>> ceph.conf
>>
>> [global]
>> fsid = f216cbe1-fa49-42ed-b28a-322aa3d48fff
>>
 mon_initial_members = node1
>> mon_host = 192.168.122.182
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> filestore_xattr_use_omap = true
>>
>> [client.admin]
>> keyring = /etc/ceph/ceph.client.admin.keyring
>>
>> [client.radosgw.gateway]
>> host = radosgw
>> keyring = /etc/ceph/ceph.client.radosgw.keyring
>> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
>> log file = /var/log/ceph/client.radosgw.gateway.log
>> rgw dns name = radosgw
>>
>> rgw keystone url = http://192.168.122.165:5000
>> rgw keystone admin token = faedf7bc53e3371924e7b3ddb9d13ddd
>> rgw keystone accepted roles = admin Member _member_
>> rgw keystone token cache size = 500
>> rgw keystone revocation interval = 500
>> rgw s3 auth use keystone = true
>> nss db path = /var/ceph/nss
>>
>>
>
> I have managed to to reproduce this:
>
> If I copy your [client.radosgw.gateway] section and amend the obvious
> differences (hostnames and ips, and socket paths), then I too see auth
> failed and no sign of any attempt to use keystone auth logged. Making
> the following change:
>
> - rgw keystone url = http://192.168.122.165:5000
> + rgw keystone url = http://192.168.122.165:35357
>
> makes it work again. I'm guessing it is tied up with with the fact we
> needed to add WSGI Chunked encoding... and we did that only for the
> 35357 keystone virtualhost (I guess I can add it to 5000 too and see if
> that fixes it). I does seem odd that there is no log entry on the rgw...
> but it may be failing before the call gets logged (will look).
>
>

So amending the keystone site config:

 ...

WSGIChunkedRequest On
 ...

makes the original keystone url with port 5000 work too.

The logging business is a bit more tricky - I'd copied your 
[client.radosgw.gateway] section which lacks

debug rgw = 20

line, which explains *my* lack of seeing the keystone auth log lines. 
When I add that line I'm seeing the debug auth info (even if I remove 
the WSGI chunking for 5000 and make it fail again).

So
 Lakshmi, can you add the 'WSGIChunkedRequest On' as inidicated, and 
make sure you have the debug line in there and retest?

Regards

Mark___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

43 matches

Mail list logo