date:20141009

Re: [ceph-users] Mapping rbd with read permission

2014-10-09 Thread Ilya Dryomov

On Thu, Oct 9, 2014 at 9:32 AM, Ramakrishnan Periyasamy
 wrote:
> Hi,
>
> Thanks Ilya for reply and I require some more clarifications, correct me if 
> somewhere am wrong.
>
> Am able to map rbd with --read-only option using user specific keyring for 
> pool3 since it is having "rwx" but unable to map for pool1 where capabilities 
> are "rx"/"r" (i.e. tried both).
>
> User specific keyring for client8 as follows:
> client.client8
> key: AQB9bjVU4FWPMBAAeB8DBAU53LoYV+bIKSr7WQ==
> caps: [mds] allow
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow pool 
> pool1 r class-read, allow pool pool3 rwx
>
> server@node1:~$ sudo rbd map --read-only pool3img2 -p pool3 -n client.client8 
> -k /etc/ceph/client.client8.keyring
> 2014-10-09 16:11:51.781214 7f2934e58840  2 auth: KeyRing::load: loaded key 
> file /etc/ceph/client.client8.keyring
> /dev/rbd5
> server@node1:~$ sudo rbd map --read-only pool1img3 -p pool1 -n client.client8 
> -k /etc/ceph/client.client8.keyring
> 2014-10-09 16:13:06.670636 7fc80d68b840  2 auth: KeyRing::load: loaded key 
> file /etc/ceph/client.client8.keyring
> rbd: sysfs write failed
> rbd: map failed: (1) Operation not permitted
>
> As per this link 
> http://ceph.com/docs/master/man/8/ceph-authtool/?highlight=authtool we can 
> set read access to one pool, is this read access allowed for objects or only 
> classes in that Pool ?
> What is the exact usage of "allow pool pool1 r class-read" capability ?

osd capabilites are at the rados (lower) layer, --read-only is at the
kernel rbd (higher) layer, they have nothing in common.  Like I said in
my previous mail, to map an rbd image you need to have both write and
execute capabilities, *even* if you are going to be mapping with
--read-only.  That's the reason mapping out of pool1 above fails with
-EPERM.

As far as class-read and rbd, I think with --read-only you can get away
with 'rw class-read' instead of 'rwx', but I haven't tried it and the
value of that given that you are still going to need 'w' is unclear.

With 'r class-read' you can read objects and execute read-only cls
methods.  Unfortunately, because of how watch osd ops work, that won't
work for rbd pools, you'll need to make it 'rw class-read' at least.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread Mark Kirkwood

I ran into this - needed to actually be root via sudo -i or similar, 
*then* it worked. Unhelpful error message is I think referring to no 
intialized db.

On 09/10/14 16:36, lakshmi k s wrote:

Good workaround. But it did not work. Not sure what this error is all
about now.

gateway@gateway:~$ openssl x509 -in /home/gateway/ca.pem -pubkey |
certutil -d /var/lib/ceph/nss -A -n ca -t "TCu,Cu,Tuw"
certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The
certificate/key database is in an old, unsupported format.

On Wednesday, October 8, 2014 7:55 PM, Mark Kirkwood
 wrote:

As a workaround check if your rgw host has openssl and certutil
installed, if so you can copy the relevant unconverted certs over to it
and convert 'em there.

On 09/10/14 15:07, lakshmi k s wrote:
 > Tried aptitude as well, but no luck.
 >
 > Ceph users, have you tried to install libnss3-tools or certutil tool on
 > debian/ubuntu? If so, how did you go about this problem.
 >
 >
 > On Wednesday, October 8, 2014 7:01 PM, Mark Kirkwood
 > mailto:mark.kirkw...@catalyst.net.nz>> wrote:
 >
 >
 > Ok, so that is the thing to get sorted. I'd suggest posting the error(s)
 > you are getting perhaps here (someone else might know), but definitely
 > to one of the Debian specific lists.
 >
 > In the meantime perhaps try installing the packages with aptitude rather
 > than apt-get - if there is some fancy footwork required it is fairly
 > smart about what needs to be done.
 >
 > Cheers
 >
 > Mark
 >
 > On 09/10/14 14:38, lakshmi k s wrote:
 >  > Thanks Mark. I have been trying to install this on controller
node. But
 >  > for some reason, I am unable to install certutil or libnss3-tools on
 >  > debian. I am not sure how to proceed.
 >  >
 >
 >
 >

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW over HTTPS

2014-10-09 Thread Marco Garcês

Hi guys, thanks for the hints...
I was able to fix it, by adding the line to nginx.conf (or fastcgi_params file):

fastcgi_param  SERVER_PORT_SECURE $server_port;


Thank you so much!

Marco Garcês
#sysadmin
Maputo - Mozambique


On Wed, Oct 8, 2014 at 6:25 PM, Yehuda Sadeh  wrote:
> On Wed, Oct 8, 2014 at 9:21 AM, Marco Garcês  wrote:
>> I believe so:
>> 2014-10-08 18:19:38.438133 7f9119b90700  2
>> RGWDataChangesLog::ChangesRenewThread: start
>> 2014-10-08 18:19:44.151527 7f90ea7fc700 20 enqueued request req=0x1b9e400
>> 2014-10-08 18:19:44.151558 7f90ea7fc700 20 RGWWQ:
>> 2014-10-08 18:19:44.151561 7f90ea7fc700 20 req: 0x1b9e400
>> 2014-10-08 18:19:44.151569 7f90ea7fc700 10 allocated request req=0x1b9e6f0
>> 2014-10-08 18:19:44.151595 7f90e97fa700 20 dequeued request req=0x1b9e400
>> 2014-10-08 18:19:44.151600 7f90e97fa700 20 RGWWQ: empty
>> 2014-10-08 18:19:44.151655 7f90e97fa700 20 CONTENT_LENGTH=
>> 2014-10-08 18:19:44.151659 7f90e97fa700 20 CONTENT_TYPE=
>> 2014-10-08 18:19:44.151660 7f90e97fa700 20 
>> DOCUMENT_ROOT=/usr/local/nginx/html
>> 2014-10-08 18:19:44.151662 7f90e97fa700 20 DOCUMENT_URI=/auth
>> 2014-10-08 18:19:44.151663 7f90e97fa700 20 FCGI_ROLE=RESPONDER
>> 2014-10-08 18:19:44.151665 7f90e97fa700 20 GATEWAY_INTERFACE=CGI/1.1
>> 2014-10-08 18:19:44.151666 7f90e97fa700 20 HTTP_ACCEPT=*/*
>> 2014-10-08 18:19:44.151668 7f90e97fa700 20 HTTP_HOST=gateway.local
>> 2014-10-08 18:19:44.151669 7f90e97fa700 20 HTTP_SERVER_PORT_SECURE=443
>
> This is not what we expect. The server translates it into
> HTTP_SERVER_PORT_SECURE, whereas we need it to be SERVER_PORT_SECURE.
> Maybe there's a way to configure the web server to send the needed
> header?
>
> Yehuda
>
>> 2014-10-08 18:19:44.151670 7f90e97fa700 20 HTTP_USER_AGENT=curl/7.30.0
>> 2014-10-08 18:19:44.151672 7f90e97fa700 20
>> HTTP_X_AUTH_KEY=QoakiyY0tg8jULacsJLsmAbyZHJbY5g/Rc/dOHK3
>> 2014-10-08 18:19:44.151673 7f90e97fa700 20 HTTP_X_AUTH_USER=frontend:swf0002
>> 2014-10-08 18:19:44.151675 7f90e97fa700 20 HTTPS=on
>> 2014-10-08 18:19:44.151676 7f90e97fa700 20 QUERY_STRING=
>> 2014-10-08 18:19:44.151677 7f90e97fa700 20 REDIRECT_STATUS=200
>> 2014-10-08 18:19:44.151678 7f90e97fa700 20 REMOTE_ADDR=10.5.5.222
>> 2014-10-08 18:19:44.151679 7f90e97fa700 20 REMOTE_PORT=64145
>> 2014-10-08 18:19:44.151680 7f90e97fa700 20 REQUEST_METHOD=GET
>> 2014-10-08 18:19:44.151681 7f90e97fa700 20 REQUEST_URI=/auth
>> 2014-10-08 18:19:44.151682 7f90e97fa700 20 SCRIPT_NAME=/auth
>> 2014-10-08 18:19:44.151683 7f90e97fa700 20 SERVER_ADDR=10.2.27.80
>> 2014-10-08 18:19:44.151684 7f90e97fa700 20 SERVER_NAME=gateway.local
>> 2014-10-08 18:19:44.151685 7f90e97fa700 20 SERVER_PORT=443
>> 2014-10-08 18:19:44.151686 7f90e97fa700 20 SERVER_PROTOCOL=HTTP/1.1
>> 2014-10-08 18:19:44.151687 7f90e97fa700 20 SERVER_SOFTWARE=nginx/1.4.7
>> 2014-10-08 18:19:44.151690 7f90e97fa700  1 == starting new request
>> req=0x1b9e400 =
>> 2014-10-08 18:19:44.151711 7f90e97fa700  2 req 2:0.22::GET
>> /auth::initializing
>> 2014-10-08 18:19:44.151718 7f90e97fa700 10 host=gateway.local
>> rgw_dns_name=gateway.local
>> 2014-10-08 18:19:44.151757 7f90e97fa700  2 req
>> 2:0.68:swift-auth:GET /auth::getting op
>> 2014-10-08 18:19:44.151763 7f90e97fa700  2 req
>> 2:0.75:swift-auth:GET /auth:swift_auth_get:authorizing
>> 2014-10-08 18:19:44.151767 7f90e97fa700  2 req
>> 2:0.78:swift-auth:GET /auth:swift_auth_get:reading permissions
>> 2014-10-08 18:19:44.151770 7f90e97fa700  2 req
>> 2:0.82:swift-auth:GET /auth:swift_auth_get:init op
>> 2014-10-08 18:19:44.151773 7f90e97fa700  2 req
>> 2:0.85:swift-auth:GET /auth:swift_auth_get:verifying op mask
>> 2014-10-08 18:19:44.151797 7f90e97fa700 20 required_mask= 0 user.op_mask=7
>> 2014-10-08 18:19:44.151799 7f90e97fa700  2 req
>> 2:0.000111:swift-auth:GET /auth:swift_auth_get:verifying op
>> permissions
>> 2014-10-08 18:19:44.151803 7f90e97fa700  2 req
>> 2:0.000115:swift-auth:GET /auth:swift_auth_get:verifying op params
>> 2014-10-08 18:19:44.151806 7f90e97fa700  2 req
>> 2:0.000117:swift-auth:GET /auth:swift_auth_get:executing
>> 2014-10-08 18:19:44.151874 7f90e97fa700 20 get_obj_state:
>> rctx=0x7f90d8018380 obj=.users.swift:frontend:swf0002
>> state=0x7f90d8022c18 s->prefetch_data=0
>> 2014-10-08 18:19:44.151895 7f90e97fa700 10 cache get:
>> name=.users.swift+frontend:swf0002 : type miss (requested=6, cached=3)
>> 2014-10-08 18:19:44.153757 7f90e97fa700 10 cache put:
>> name=.users.swift+frontend:swf0002
>> 2014-10-08 18:19:44.153763 7f90e97fa700 10 moving
>> .users.swift+frontend:swf0002 to cache LRU end
>> 2014-10-08 18:19:44.153770 7f90e97fa700 20 get_obj_state: s->obj_tag
>> was set empty
>> 2014-10-08 18:19:44.153780 7f90e97fa700 10 cache get:
>> name=.users.swift+frontend:swf0002 : hit
>> 2014-10-08 18:19:44.153828 7f90e97fa700 20 get_obj_state:
>> rctx=0x7f90d8018380 obj=.users.uid:frontend state=0x7f90d8023578
>> s->prefetch_data=0
>> 2014-10-08 18:19:44.153837 7f90e97fa700 10 cache get:
>> name=.users.uid+frontend

Re: [ceph-users] RadosGW over HTTPS

2014-10-09 Thread Marco Garcês

I spoke to soon...
Now if I use HTTP I get errors!
Let me try to debug, and post back.

Thanks,

Marco Garcês
#sysadmin
Maputo - Mozambique
[Phone] +258 84 4105579
[Skype] marcogarces


On Thu, Oct 9, 2014 at 10:38 AM, Marco Garcês  wrote:
> Hi guys, thanks for the hints...
> I was able to fix it, by adding the line to nginx.conf (or fastcgi_params 
> file):
>
> fastcgi_param  SERVER_PORT_SECURE $server_port;
>
>
> Thank you so much!
>
> Marco Garcês
> #sysadmin
> Maputo - Mozambique
>
>
> On Wed, Oct 8, 2014 at 6:25 PM, Yehuda Sadeh  wrote:
>> On Wed, Oct 8, 2014 at 9:21 AM, Marco Garcês  wrote:
>>> I believe so:
>>> 2014-10-08 18:19:38.438133 7f9119b90700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>> 2014-10-08 18:19:44.151527 7f90ea7fc700 20 enqueued request req=0x1b9e400
>>> 2014-10-08 18:19:44.151558 7f90ea7fc700 20 RGWWQ:
>>> 2014-10-08 18:19:44.151561 7f90ea7fc700 20 req: 0x1b9e400
>>> 2014-10-08 18:19:44.151569 7f90ea7fc700 10 allocated request req=0x1b9e6f0
>>> 2014-10-08 18:19:44.151595 7f90e97fa700 20 dequeued request req=0x1b9e400
>>> 2014-10-08 18:19:44.151600 7f90e97fa700 20 RGWWQ: empty
>>> 2014-10-08 18:19:44.151655 7f90e97fa700 20 CONTENT_LENGTH=
>>> 2014-10-08 18:19:44.151659 7f90e97fa700 20 CONTENT_TYPE=
>>> 2014-10-08 18:19:44.151660 7f90e97fa700 20 
>>> DOCUMENT_ROOT=/usr/local/nginx/html
>>> 2014-10-08 18:19:44.151662 7f90e97fa700 20 DOCUMENT_URI=/auth
>>> 2014-10-08 18:19:44.151663 7f90e97fa700 20 FCGI_ROLE=RESPONDER
>>> 2014-10-08 18:19:44.151665 7f90e97fa700 20 GATEWAY_INTERFACE=CGI/1.1
>>> 2014-10-08 18:19:44.151666 7f90e97fa700 20 HTTP_ACCEPT=*/*
>>> 2014-10-08 18:19:44.151668 7f90e97fa700 20 HTTP_HOST=gateway.local
>>> 2014-10-08 18:19:44.151669 7f90e97fa700 20 HTTP_SERVER_PORT_SECURE=443
>>
>> This is not what we expect. The server translates it into
>> HTTP_SERVER_PORT_SECURE, whereas we need it to be SERVER_PORT_SECURE.
>> Maybe there's a way to configure the web server to send the needed
>> header?
>>
>> Yehuda
>>
>>> 2014-10-08 18:19:44.151670 7f90e97fa700 20 HTTP_USER_AGENT=curl/7.30.0
>>> 2014-10-08 18:19:44.151672 7f90e97fa700 20
>>> HTTP_X_AUTH_KEY=QoakiyY0tg8jULacsJLsmAbyZHJbY5g/Rc/dOHK3
>>> 2014-10-08 18:19:44.151673 7f90e97fa700 20 HTTP_X_AUTH_USER=frontend:swf0002
>>> 2014-10-08 18:19:44.151675 7f90e97fa700 20 HTTPS=on
>>> 2014-10-08 18:19:44.151676 7f90e97fa700 20 QUERY_STRING=
>>> 2014-10-08 18:19:44.151677 7f90e97fa700 20 REDIRECT_STATUS=200
>>> 2014-10-08 18:19:44.151678 7f90e97fa700 20 REMOTE_ADDR=10.5.5.222
>>> 2014-10-08 18:19:44.151679 7f90e97fa700 20 REMOTE_PORT=64145
>>> 2014-10-08 18:19:44.151680 7f90e97fa700 20 REQUEST_METHOD=GET
>>> 2014-10-08 18:19:44.151681 7f90e97fa700 20 REQUEST_URI=/auth
>>> 2014-10-08 18:19:44.151682 7f90e97fa700 20 SCRIPT_NAME=/auth
>>> 2014-10-08 18:19:44.151683 7f90e97fa700 20 SERVER_ADDR=10.2.27.80
>>> 2014-10-08 18:19:44.151684 7f90e97fa700 20 SERVER_NAME=gateway.local
>>> 2014-10-08 18:19:44.151685 7f90e97fa700 20 SERVER_PORT=443
>>> 2014-10-08 18:19:44.151686 7f90e97fa700 20 SERVER_PROTOCOL=HTTP/1.1
>>> 2014-10-08 18:19:44.151687 7f90e97fa700 20 SERVER_SOFTWARE=nginx/1.4.7
>>> 2014-10-08 18:19:44.151690 7f90e97fa700  1 == starting new request
>>> req=0x1b9e400 =
>>> 2014-10-08 18:19:44.151711 7f90e97fa700  2 req 2:0.22::GET
>>> /auth::initializing
>>> 2014-10-08 18:19:44.151718 7f90e97fa700 10 host=gateway.local
>>> rgw_dns_name=gateway.local
>>> 2014-10-08 18:19:44.151757 7f90e97fa700  2 req
>>> 2:0.68:swift-auth:GET /auth::getting op
>>> 2014-10-08 18:19:44.151763 7f90e97fa700  2 req
>>> 2:0.75:swift-auth:GET /auth:swift_auth_get:authorizing
>>> 2014-10-08 18:19:44.151767 7f90e97fa700  2 req
>>> 2:0.78:swift-auth:GET /auth:swift_auth_get:reading permissions
>>> 2014-10-08 18:19:44.151770 7f90e97fa700  2 req
>>> 2:0.82:swift-auth:GET /auth:swift_auth_get:init op
>>> 2014-10-08 18:19:44.151773 7f90e97fa700  2 req
>>> 2:0.85:swift-auth:GET /auth:swift_auth_get:verifying op mask
>>> 2014-10-08 18:19:44.151797 7f90e97fa700 20 required_mask= 0 user.op_mask=7
>>> 2014-10-08 18:19:44.151799 7f90e97fa700  2 req
>>> 2:0.000111:swift-auth:GET /auth:swift_auth_get:verifying op
>>> permissions
>>> 2014-10-08 18:19:44.151803 7f90e97fa700  2 req
>>> 2:0.000115:swift-auth:GET /auth:swift_auth_get:verifying op params
>>> 2014-10-08 18:19:44.151806 7f90e97fa700  2 req
>>> 2:0.000117:swift-auth:GET /auth:swift_auth_get:executing
>>> 2014-10-08 18:19:44.151874 7f90e97fa700 20 get_obj_state:
>>> rctx=0x7f90d8018380 obj=.users.swift:frontend:swf0002
>>> state=0x7f90d8022c18 s->prefetch_data=0
>>> 2014-10-08 18:19:44.151895 7f90e97fa700 10 cache get:
>>> name=.users.swift+frontend:swf0002 : type miss (requested=6, cached=3)
>>> 2014-10-08 18:19:44.153757 7f90e97fa700 10 cache put:
>>> name=.users.swift+frontend:swf0002
>>> 2014-10-08 18:19:44.153763 7f90e97fa700 10 moving
>>> .users.swift+frontend:swf0002 to cache LRU end
>>> 2014-10-08 18:19:44.153770 7f90e97fa700 20 get_obj_state: s->obj_tag
>>> was s

[ceph-users] python ceph-deploy problem

2014-10-09 Thread Roman


Hi All,

Anybody know how to fix ceph-deploy problem like this?

[root@ceph01 ceph-new-2]# ceph-deploy osd activate 
ceph03:/var/local/osd0 ceph04:/var/local/osd1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.17): /usr/bin/ceph-deploy osd 
activate ceph03:/var/local/osd0 ceph04:/var/local/osd1
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks 
ceph03:/var/local/osd0: ceph04:/var/local/osd1:

[ceph03][DEBUG ] connected to host: ceph03
[ceph03][DEBUG ] detect platform information from remote host
[ceph03][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] activating host ceph03 disk /var/local/osd0
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[ceph03][INFO  ] Running command: ceph-disk -v activate --mark-init 
sysvinit --mount /var/local/osd0

[ceph03][DEBUG ] === osd.0 ===
[ceph03][DEBUG ] Starting Ceph osd.0 on ceph03...already running
[ceph03][WARNIN] DEBUG:ceph-disk:Cluster uuid is 
f948a85c-cc63-498e-908b-d461085538dd
[ceph03][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid

[ceph03][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
[ceph03][WARNIN] DEBUG:ceph-disk:OSD uuid is 
e880f969-c5a6-4fdc-ba68-abdd4db6a97d

[ceph03][WARNIN] DEBUG:ceph-disk:OSD id is 0
[ceph03][WARNIN] DEBUG:ceph-disk:Marking with init system sysvinit
[ceph03][WARNIN] DEBUG:ceph-disk:ceph osd.0 data dir is ready at 
/var/local/osd0

[ceph03][WARNIN] DEBUG:ceph-disk:Starting ceph osd.0...
[ceph03][WARNIN] INFO:ceph-disk:Running command: /sbin/service ceph 
--cluster ceph start osd.0

[ceph03][INFO  ] checking OSD status...
[ceph03][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 
69, in newfunc

[ceph_deploy][ERROR ] return f(*a, **kw)
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 160, in _main

[ceph_deploy][ERROR ] return args.func(args)
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 603, in osd

[ceph_deploy][ERROR ] activate(args, cfg)
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 387, in activate

[ceph_deploy][ERROR ] system.enable_service(distro.conn)
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/util/system.py", line 41, 
in enable_service

[ceph_deploy][ERROR ] if is_systemd(conn):
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/util/system.py", line 30, 
in is_systemd

[ceph_deploy][ERROR ] '/proc/1/comm'
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/connection.py", 
line 98, in wrapper

[ceph_deploy][ERROR ] self.channel.send("%s(%s)" % (name, arguments))
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py", 
line 684, in send
[ceph_deploy][ERROR ] self.gateway._send(Message.CHANNEL_DATA, 
self.id, dumps_internal(item))
[ceph_deploy][ERROR ]   File 
"/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py", 
line 953, in _send

[ceph_deploy][ERROR ] raise IOError("cannot send (already closed?)")
[ceph_deploy][ERROR ] IOError: cannot send (already closed?)
[ceph_deploy][ERROR ]

Thanks,
Roman
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW over HTTPS

2014-10-09 Thread Marco Garcês

Fixed.. I attach the server part, for nginx/tengine config file:

server {
listen 80;
server_name gateway.local;
error_log logs/error_http.log debug;
client_max_body_size 100m;

fastcgi_request_buffering off;

location / {
fastcgi_pass_header Authorization;
fastcgi_pass_request_headers on;

if ($request_method  = PUT ) {
rewrite ^ /PUT$request_uri;
 }
 include fastcgi_params;

 fastcgi_pass
unix:/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock;
 }

 location /PUT/ {
 internal;
 fastcgi_pass_header Authorization;
 fastcgi_pass_request_headers on;

 include fastcgi_params;
 fastcgi_param  CONTENT_LENGTH   $content_length;
 fastcgi_param HTTPS on;

 fastcgi_pass
unix:/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock;
 }

}
server {
listen 10.2.27.80:443 ssl default;

server_name gateway.local;
error_log logs/error_https.log debug;
client_max_body_size 100m;

fastcgi_request_buffering off;

ssl_certificate  /etc/pki/tls/certs/ca_rgw.crt;
ssl_certificate_key  /etc/pki/tls/private/ca_rgw.key;

ssl_session_timeout  5m;

ssl_protocols  SSLv2 SSLv3 TLSv1;
ssl_ciphers  HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers   on;
location / {
fastcgi_pass_header Authorization;
fastcgi_pass_request_headers on;
fastcgi_param HTTPS on;
fastcgi_param  SERVER_PORT_SECURE $server_port;

if ($request_method  = PUT ) {
rewrite ^ /PUT$request_uri;
 }
 include fastcgi_params;

 fastcgi_pass
unix:/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock;
 }

 location /PUT/ {
 internal;
 fastcgi_pass_header Authorization;
 fastcgi_pass_request_headers on;

 include fastcgi_params;
 fastcgi_param  CONTENT_LENGTH   $content_length;
 fastcgi_param HTTPS on;
 fastcgi_param  SERVER_PORT_SECURE $server_port;

 fastcgi_pass
unix:/var/run/ceph/ceph.radosgw.gateway.fastcgi.sock;
 }

}
}

I had the /server listening on 80 and 443 together, and I just had to
separate everything, and include the "fastcgi_param
SERVER_PORT_SECURE $server_port;" on the 443 listener.
I hope this helps someone same day! :)

Thank you once again!


Marco Garcês
#sysadmin
Maputo - Mozambique

On Thu, Oct 9, 2014 at 10:52 AM, Marco Garcês  wrote:
> I spoke to soon...
> Now if I use HTTP I get errors!
> Let me try to debug, and post back.
>
> Thanks,
>
> Marco Garcês
> #sysadmin
> Maputo - Mozambique
> [Phone] +258 84 4105579
> [Skype] marcogarces
>
>
> On Thu, Oct 9, 2014 at 10:38 AM, Marco Garcês  wrote:
>> Hi guys, thanks for the hints...
>> I was able to fix it, by adding the line to nginx.conf (or fastcgi_params 
>> file):
>>
>> fastcgi_param  SERVER_PORT_SECURE $server_port;
>>
>>
>> Thank you so much!
>>
>> Marco Garcês
>> #sysadmin
>> Maputo - Mozambique
>>
>>
>> On Wed, Oct 8, 2014 at 6:25 PM, Yehuda Sadeh  wrote:
>>> On Wed, Oct 8, 2014 at 9:21 AM, Marco Garcês  wrote:
 I believe so:
 2014-10-08 18:19:38.438133 7f9119b90700  2
 RGWDataChangesLog::ChangesRenewThread: start
 2014-10-08 18:19:44.151527 7f90ea7fc700 20 enqueued request req=0x1b9e400
 2014-10-08 18:19:44.151558 7f90ea7fc700 20 RGWWQ:
 2014-10-08 18:19:44.151561 7f90ea7fc700 20 req: 0x1b9e400
 2014-10-08 18:19:44.151569 7f90ea7fc700 10 allocated request req=0x1b9e6f0
 2014-10-08 18:19:44.151595 7f90e97fa700 20 dequeued request req=0x1b9e400
 2014-10-08 18:19:44.151600 7f90e97fa700 20 RGWWQ: empty
 2014-10-08 18:19:44.151655 7f90e97fa700 20 CONTENT_LENGTH=
 2014-10-08 18:19:44.151659 7f90e97fa700 20 CONTENT_TYPE=
 2014-10-08 18:19:44.151660 7f90e97fa700 20 
 DOCUMENT_ROOT=/usr/local/nginx/html
 2014-10-08 18:19:44.151662 7f90e97fa700 20 DOCUMENT_URI=/auth
 2014-10-08 18:19:44.151663 7f90e97fa700 20 FCGI_ROLE=RESPONDER
 2014-10-08 18:19:44.151665 7f90e97fa700 20 GATEWAY_INTERFACE=CGI/1.1
 2014-10-08 18:19:44.151666 7f90e97fa700 20 HTTP_ACCEPT=*/*
 2014-10-08 18:19:44.151668 7f90e97fa700 20 HTTP_HOST=gateway.local
 2014-10-08 18:19:44.151669 7f90e97fa700 20 HTTP_SERVER_PORT_SECURE=443
>>>
>>> This is not what we expect. The server translates it into
>>> HTTP_SERVER_PORT_SECURE, whereas we need it to be SERVER_PORT_SECURE.
>>> Maybe there's a way to configure the web server to send the needed
>>> header?
>>>
>>> Yehuda
>>>
 2014-10-08 18:19:44.151670 7f90e97fa700 20 HTTP_USER_AGENT=curl/7.30.0
 2014-10-08 18:19:44.151672 7f90e97

Re: [ceph-users] rbd and libceph kernel api

2014-10-09 Thread Ilya Dryomov

On Wed, Oct 8, 2014 at 9:13 PM, Shawn Edwards  wrote:
> On Wed, Oct 8, 2014 at 2:35 AM, Ilya Dryomov 
> wrote:
>>
>> On Wed, Oct 8, 2014 at 2:19 AM, Shawn Edwards 
>> wrote:
>> > Are there any docs on what is possible by writing/reading from the rbd
>> > driver's sysfs paths?  Is it documented anywhere?
>> >
>> > I've seen at least one blog post:
>> > http://www.sebastien-han.fr/blog/2012/06/24/use-rbd-on-a-client/ about
>> > how
>> > you can attach to an rbd using the sysfs interface, but I haven't found
>> > much
>> > else.
>>
>> It's in the kernel tree, Documentation/ABI/testing/sysfs-bus-rbd.
>>
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-bus-rbd
>>
>> But keep in mind that rbd map and rbd unmap commands exist for a reason
>> and do a bit more than just writing stuff into sysfs.  If you are
>> concerned about fetching tons of packages, I think there is work
>> underway to fix the packaging so that there is a relatively small
>> package containing just rbd binary and ceph mount helpers that can be
>> installed.
>>
>
> That's the problem I'm running into, where I need the rbd command on a
> machine which has horribly old tools but a modern kernel.  Is the
> simple-rbd-install effort somewhere I could see/help?  I could see this as
> being very interesting to folks.

Hmm, if it's horribly old, you best bet is probably doing it by hand.
rbd binary has a bunch of dependencies: libblkid, libudev, libkeyutils
and then everything librados and librbd depend on, so packaging changes
will probably be irrelevant.  IIRC libblkid should be at least 2.17,
I'm sure other constraints will pop up as well..

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic

Hi,

I am setting up a test ceph cluster, on decommissioned  hardware (hence : not 
optimal, I know).
I have installed CentOS7, installed and setup ceph mons and OSD machines using 
puppet, and now I'm trying to add OSDs with the servers OSD disks... and I have 
issues (of course ;) )
I used the Ceph RHEL7 RPMs (ceph-0.80.6-0.el7.x86_64)

When I run "ceph-disk prepare" for a disk, I most of the time (but not always) 
get the partitions created, but not activated :

[root@ceph4 ~]# ceph-disk list|grep sdh
WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying 
sgdisk; may not correctly identify ceph volumes with dmcrypt
/dev/sdh :
/dev/sdh1 ceph data, prepared, cluster ceph, journal /dev/sdh2
/dev/sdh2 ceph journal, for /dev/sdh1

I tried to debug udev rules thinking they were not launched to activate the 
OSD, but they are, and they fail on this error :

+ ln -sf ../../sdh2 /dev/disk/by-partuuid/5b3bde8f-ccad-4093-a8a5-ad6413ae8931
+ mkdir -p /dev/disk/by-parttypeuuid
+ ln -sf ../../sdh2 
/dev/disk/by-parttypeuuid/45b0969e-9b03-4f30-b4c6-b4b80ceff106.5b3bde8f-ccad-4093-a8a5-ad6413ae8931
+ case $ID_PART_ENTRY_TYPE in
+ /usr/sbin/ceph-disk -v activate-journal /dev/sdh2
INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid 
--osd-journal /dev/sdh2
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
DEBUG:ceph-disk:Journal /dev/sdh2 has OSD UUID 
----
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- 
/dev/disk/by-partuuid/----
error: /dev/disk/by-partuuid/----: No such file 
or directory
ceph-disk: Cannot discover filesystem type: device 
/dev/disk/by-partuuid/----: Command 
'/sbin/blkid' returned non-zero exit status 2
+ exit
+ exec

You'll notice the zeroed UUID...
Because of this, I looked at the output of ceph-disk prepare, and saw that 
partx complains at the end (this is the partx -a command) :

Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
partx: /dev/sdh: error adding partitions 1-2

And indeed, running "partx -a /dev/sdh" does not change anything.
But I just discovered that running "partx -u /dev/sdh" will fix everything 

I.e : right after I send this update command to the kernel, my debug logs show 
that the udev rule does everything fine and the OSD starts up.

I'm therefore wondering what I did wrong ?
is this CentOS 7 that is misbehaving, or the kernel, or...?
Any reason why partx -a is used instead of partx -u ?

I'd be glad to hear others advice on this !
Thanks && regards

Frederic Schaer

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] python ceph-deploy problem

2014-10-09 Thread Alfredo Deza

Hi Roman,

This was a recent change in ceph-deploy to enable Ceph services on
CentOS/RHEL/Fedora distros after deploying a daemon (an OSD in your
case).

There was an issue where the remote connection was closed before being
able to enable a service when creating an OSD and this just got fixed
yesterday (ticket: http://tracker.ceph.com/issues/9698)

This should not affect your OSD deployment, a new ceph-deploy release
should be coming up that fixes this.



On Thu, Oct 9, 2014 at 4:55 AM, Roman  wrote:
> Hi All,
>
> Anybody know how to fix ceph-deploy problem like this?
>
> [root@ceph01 ceph-new-2]# ceph-deploy osd activate ceph03:/var/local/osd0
> ceph04:/var/local/osd1
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /root/.cephdeploy.conf
> [ceph_deploy.cli][INFO  ] Invoked (1.5.17): /usr/bin/ceph-deploy osd
> activate ceph03:/var/local/osd0 ceph04:/var/local/osd1
> [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks
> ceph03:/var/local/osd0: ceph04:/var/local/osd1:
> [ceph03][DEBUG ] connected to host: ceph03
> [ceph03][DEBUG ] detect platform information from remote host
> [ceph03][DEBUG ] detect machine type
> [ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
> [ceph_deploy.osd][DEBUG ] activating host ceph03 disk /var/local/osd0
> [ceph_deploy.osd][DEBUG ] will use init type: sysvinit
> [ceph03][INFO  ] Running command: ceph-disk -v activate --mark-init sysvinit
> --mount /var/local/osd0
> [ceph03][DEBUG ] === osd.0 ===
> [ceph03][DEBUG ] Starting Ceph osd.0 on ceph03...already running
> [ceph03][WARNIN] DEBUG:ceph-disk:Cluster uuid is
> f948a85c-cc63-498e-908b-d461085538dd
> [ceph03][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd
> --cluster=ceph --show-config-value=fsid
> [ceph03][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
> [ceph03][WARNIN] DEBUG:ceph-disk:OSD uuid is
> e880f969-c5a6-4fdc-ba68-abdd4db6a97d
> [ceph03][WARNIN] DEBUG:ceph-disk:OSD id is 0
> [ceph03][WARNIN] DEBUG:ceph-disk:Marking with init system sysvinit
> [ceph03][WARNIN] DEBUG:ceph-disk:ceph osd.0 data dir is ready at
> /var/local/osd0
> [ceph03][WARNIN] DEBUG:ceph-disk:Starting ceph osd.0...
> [ceph03][WARNIN] INFO:ceph-disk:Running command: /sbin/service ceph
> --cluster ceph start osd.0
> [ceph03][INFO  ] checking OSD status...
> [ceph03][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
> [ceph_deploy][ERROR ] Traceback (most recent call last):
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/util/decorators.py", line 69,
> in newfunc
> [ceph_deploy][ERROR ] return f(*a, **kw)
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/cli.py", line 160, in _main
> [ceph_deploy][ERROR ] return args.func(args)
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 603, in osd
> [ceph_deploy][ERROR ] activate(args, cfg)
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/osd.py", line 387, in activate
> [ceph_deploy][ERROR ] system.enable_service(distro.conn)
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/util/system.py", line 41, in
> enable_service
> [ceph_deploy][ERROR ] if is_systemd(conn):
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/util/system.py", line 30, in
> is_systemd
> [ceph_deploy][ERROR ] '/proc/1/comm'
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/connection.py",
> line 98, in wrapper
> [ceph_deploy][ERROR ] self.channel.send("%s(%s)" % (name, arguments))
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py",
> line 684, in send
> [ceph_deploy][ERROR ] self.gateway._send(Message.CHANNEL_DATA, self.id,
> dumps_internal(item))
> [ceph_deploy][ERROR ]   File
> "/usr/lib/python2.6/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py",
> line 953, in _send
> [ceph_deploy][ERROR ] raise IOError("cannot send (already closed?)")
> [ceph_deploy][ERROR ] IOError: cannot send (already closed?)
> [ceph_deploy][ERROR ]
>
> Thanks,
> Roman
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread Loic Dachary

Bonjour,

I'm not familiar with RHEL7 but willing to learn ;-) I recently ran into 
confusing situations regarding the content of /dev/disk/by-partuuid because 
partprobe was not called when it should have (ubuntu). On RHEL, kpartx is used 
instead because partprobe reboots, apparently. What is the content of 
/dev/disk/by-partuuid on your machine ?

ls -l /dev/disk/by-partuuid 

Cheers

On 09/10/2014 12:24, SCHAER Frederic wrote:
> Hi,
> 
>  
> 
> I am setting up a test ceph cluster, on decommissioned  hardware (hence : not 
> optimal, I know).
> 
> I have installed CentOS7, installed and setup ceph mons and OSD machines 
> using puppet, and now I’m trying to add OSDs with the servers OSD disks… and 
> I have issues (of course ;) )
> 
> I used the Ceph RHEL7 RPMs (ceph-0.80.6-0.el7.x86_64)
> 
>  
> 
> When I run “ceph-disk prepare” for a disk, I most of the time (but not 
> always) get the partitions created, but not activated :
> 
>  
> 
> [root@ceph4 ~]# ceph-disk list|grep sdh
> 
> WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying 
> sgdisk; may not correctly identify ceph volumes with dmcrypt
> 
> /dev/sdh :
> 
> /dev/sdh1 ceph data, prepared, cluster ceph, journal /dev/sdh2
> 
> /dev/sdh2 ceph journal, for /dev/sdh1
> 
>  
> 
> I tried to debug udev rules thinking they were not launched to activate the 
> OSD, but they are, and they fail on this error :
> 
>  
> 
> + ln -sf ../../sdh2 /dev/disk/by-partuuid/5b3bde8f-ccad-4093-a8a5-ad6413ae8931
> 
> + mkdir -p /dev/disk/by-parttypeuuid
> 
> + ln -sf ../../sdh2 
> /dev/disk/by-parttypeuuid/45b0969e-9b03-4f30-b4c6-b4b80ceff106.5b3bde8f-ccad-4093-a8a5-ad6413ae8931
> 
> + case $ID_PART_ENTRY_TYPE in
> 
> + /usr/sbin/ceph-disk -v activate-journal /dev/sdh2
> 
> INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid 
> --osd-journal /dev/sdh2
> 
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> DEBUG:ceph-disk:Journal /dev/sdh2 has OSD UUID 
> ----
> 
> INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- 
> /dev/disk/by-partuuid/----
> 
> error: /dev/disk/by-partuuid/----: No such 
> file or directory
> 
> ceph-disk: Cannot discover filesystem type: device 
> /dev/disk/by-partuuid/----: Command 
> '/sbin/blkid' returned non-zero exit status 2
> 
> + exit
> 
> + exec
> 
>  
> 
> You’ll notice the zeroed UUID…
> 
> Because of this, I looked at the output of ceph-disk prepare, and saw that 
> partx complains at the end (this is the partx –a command) :
> 
>  
> 
> Warning: The kernel is still using the old partition table.
> 
> The new table will be used at the next reboot.
> 
> The operation has completed successfully.
> 
> partx: /dev/sdh: error adding partitions 1-2
> 
>  
> 
> And indeed, running “partx –a /dev/sdh” does not change anything.
> 
> But I just discovered that running “partx –u /dev/sdh” will fix everything 
> ….
> 
> I.e : right after I send this update command to the kernel, my debug logs 
> show that the udev rule does everything fine and the OSD starts up.
> 
>  
> 
> I’m therefore wondering what I did wrong ?
> 
> is this CentOS 7 that is misbehaving, or the kernel, or…?
> 
> Any reason why partx –a is used instead of partx –u ?
> 
>  
> 
> I’d be glad to hear others advice on this !
> 
> Thanks && regards
> 
>  
> 
> Frederic Schaer
> 
>  
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic

Hi Loic,

With this example disk/machine that I left untouched until now :

/dev/sdb :
 /dev/sdb1 ceph data, prepared, cluster ceph, osd.44, journal /dev/sdb2
 /dev/sdb2 ceph journal, for /dev/sdb1

[root@ceph1 ~]# ll /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 10 Oct  9 15:09 2c27dbda-fbe3-48d6-80fe-b513e1c11702 -> 
../../sdb1
lrwxrwxrwx 1 root root 10 Oct  9 15:09 d2352e3b-f7f2-40c7-8273-8bfa8ab4206a -> 
../../sdb2

This is the blkid output :

[root@ceph1 ~]# blkid  /dev/sdb2
[root@ceph1 ~]# blkid  /dev/sdb1
/dev/sdb1: UUID="c8feaaad-bd83-41a3-a82a-0a8727d0b067" TYPE="xfs" 
PARTLABEL="ceph data" PARTUUID="2c27dbda-fbe3-48d6-80fe-b513e1c11702"

If I run "partx -u /dev/sdb", then the filesystem will get activated and the 
OSD started.
And sometimes, it just works without intervention, but that's the exception.

I modified the udev script this morning, so I can give you the output of what 
happens when things go wrong : links are created, but somewhere the UUIDD is 
wrongly detected by ceph-osd, as far as I understand :

Thu Oct  9 11:15:13 CEST 2014
+ PARTNO=2
+ NAME=sde2
+ PARENT_NAME=sde
++ /usr/sbin/sgdisk --info=2 /dev/sde
++ grep 'Partition GUID code'
++ awk '{print $4}'
++ tr '[:upper:]' '[:lower:]'
+ ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106
+ '[' -z 45b0969e-9b03-4f30-b4c6-b4b80ceff106 ']'
++ /usr/sbin/sgdisk --info=2 /dev/sde
++ grep 'Partition unique GUID'
++ awk '{print $4}'
++ tr '[:upper:]' '[:lower:]'
+ ID_PART_ENTRY_UUID=a9e8d490-82a7-48c1-8ef1-aff92351c69c
+ mkdir -p /dev/disk/by-partuuid
+ ln -sf ../../sde2 /dev/disk/by-partuuid/a9e8d490-82a7-48c1-8ef1-aff92351c69c
+ mkdir -p /dev/disk/by-parttypeuuid
+ ln -sf ../../sde2 
/dev/disk/by-parttypeuuid/45b0969e-9b03-4f30-b4c6-b4b80ceff106.a9e8d490-82a7-48c1-8ef1-aff92351c69c
+ case $ID_PART_ENTRY_TYPE in
+ /usr/sbin/ceph-disk -v activate-journal /dev/sde2
INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid 
--osd-journal /dev/sde2
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
DEBUG:ceph-disk:Journal /dev/sde2 has OSD UUID 
----
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- 
/dev/disk/by-partuuid/----
error: /dev/disk/by-partuuid/----: No such file 
or directory
ceph-disk: Cannot discover filesystem type: device 
/dev/disk/by-partuuid/----: Command 
'/sbin/blkid' returned non-zero exit status 2
+ exit
+ exec

regards

Frederic.

P.S : in your puppet module, it seems impossible to specify osd disks by path, 
i.e : 
ceph::profile::params::osds:
  '/dev/disk/by-path/pci-\:0a\:00.0-scsi-0\:2\:':
(I tried without the backslashes too)

-Message d'origine-
De : Loic Dachary [mailto:l...@dachary.org] 
Envoyé : jeudi 9 octobre 2014 15:01
À : SCHAER Frederic; ceph-users@lists.ceph.com
Objet : Re: [ceph-users] ceph-dis prepare : 
UUID=----

Bonjour,

I'm not familiar with RHEL7 but willing to learn ;-) I recently ran into 
confusing situations regarding the content of /dev/disk/by-partuuid because 
partprobe was not called when it should have (ubuntu). On RHEL, kpartx is used 
instead because partprobe reboots, apparently. What is the content of 
/dev/disk/by-partuuid on your machine ?

ls -l /dev/disk/by-partuuid 

Cheers

On 09/10/2014 12:24, SCHAER Frederic wrote:
> Hi,
> 
>  
> 
> I am setting up a test ceph cluster, on decommissioned  hardware (hence : not 
> optimal, I know).
> 
> I have installed CentOS7, installed and setup ceph mons and OSD machines 
> using puppet, and now I'm trying to add OSDs with the servers OSD disks. and 
> I have issues (of course ;) )
> 
> I used the Ceph RHEL7 RPMs (ceph-0.80.6-0.el7.x86_64)
> 
>  
> 
> When I run "ceph-disk prepare" for a disk, I most of the time (but not 
> always) get the partitions created, but not activated :
> 
>  
> 
> [root@ceph4 ~]# ceph-disk list|grep sdh
> 
> WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying 
> sgdisk; may not correctly identify ceph volumes with dmcrypt
> 
> /dev/sdh :
> 
> /dev/sdh1 ceph data, prepared, cluster ceph, journal /dev/sdh2
> 
> /dev/sdh2 ceph journal, for /dev/sdh1
> 
>  
> 
> I tried to debug udev rules thinking they were not launched to activate the 
> OSD, but they are, and they fail on this error :
> 
>  
> 
> + ln -sf ../../sdh2 /dev/disk/by-partuuid/5b3bde8f-ccad-4093-a8a5-ad6413ae8931
> 
> + mkdir -p /dev/disk/by-parttypeuuid
> 
> + ln -sf ../../sdh2 
> /dev/disk/by-parttypeuuid/45b0969e-9b03-4f30-b4c6-b4b80ceff106.5b3bde8f-ccad-4093-a8a5-ad6413ae8931
> 
> + case $ID_PART_ENTRY_TYPE in
> 
> + /usr/sbin/ceph-disk -v activate-journal /dev/sdh2
> 
> INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid 
> --osd-journal /dev/sdh2
> 
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread Loic Dachary


Does what do sgdisk --info=1 /dev/sde and sgdisk --info=2 /dev/sde print ?

It looks like the journal points to an incorrect location (you should see this 
by mounting /dev/sde1). Here is what I have on a cluster

root@bm0015:~# ls -l /var/lib/ceph/osd/ceph-1/
total 56
-rw-r--r--   1 root root  192 Nov  2  2013 activate.monmap
-rw-r--r--   1 root root3 Nov  2  2013 active
-rw-r--r--   1 root root   37 Nov  2  2013 ceph_fsid
drwxr-xr-x 114 root root 8192 Sep 14 11:01 current
-rw-r--r--   1 root root   37 Nov  2  2013 fsid
lrwxrwxrwx   1 root root   58 Nov  2  2013 journal -> 
/dev/disk/by-partuuid/7e811295-1b45-477d-907a-41c4c90d9687
-rw-r--r--   1 root root   37 Nov  2  2013 journal_uuid
-rw---   1 root root   56 Nov  2  2013 keyring
-rw-r--r--   1 root root   21 Nov  2  2013 magic
-rw-r--r--   1 root root6 Nov  2  2013 ready
-rw-r--r--   1 root root4 Nov  2  2013 store_version
-rw-r--r--   1 root root   42 Dec 27  2013 superblock
-rw-r--r--   1 root root0 May  2 14:01 upstart
-rw-r--r--   1 root root2 Nov  2  2013 whoami
root@bm0015:~# cat /var/lib/ceph/osd/ceph-1/journal_uuid
7e811295-1b45-477d-907a-41c4c90d9687
root@bm0015:~#

I guess in your case the content of journal_uuid is 0- etc. for some 
reason.

Do you know where that

SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

comes from ?

On 09/10/2014 15:20, SCHAER Frederic wrote:
> Hi Loic,
> 
> With this example disk/machine that I left untouched until now :
> 
> /dev/sdb :
>  /dev/sdb1 ceph data, prepared, cluster ceph, osd.44, journal /dev/sdb2
>  /dev/sdb2 ceph journal, for /dev/sdb1
> 
> [root@ceph1 ~]# ll /dev/disk/by-partuuid/
> total 0
> lrwxrwxrwx 1 root root 10 Oct  9 15:09 2c27dbda-fbe3-48d6-80fe-b513e1c11702 
> -> ../../sdb1
> lrwxrwxrwx 1 root root 10 Oct  9 15:09 d2352e3b-f7f2-40c7-8273-8bfa8ab4206a 
> -> ../../sdb2
> 
> This is the blkid output :
> 
> [root@ceph1 ~]# blkid  /dev/sdb2
> [root@ceph1 ~]# blkid  /dev/sdb1
> /dev/sdb1: UUID="c8feaaad-bd83-41a3-a82a-0a8727d0b067" TYPE="xfs" 
> PARTLABEL="ceph data" PARTUUID="2c27dbda-fbe3-48d6-80fe-b513e1c11702"
> 
> If I run "partx -u /dev/sdb", then the filesystem will get activated and the 
> OSD started.
> And sometimes, it just works without intervention, but that's the exception.
> 
> I modified the udev script this morning, so I can give you the output of what 
> happens when things go wrong : links are created, but somewhere the UUIDD is 
> wrongly detected by ceph-osd, as far as I understand :
> 
> Thu Oct  9 11:15:13 CEST 2014
> + PARTNO=2
> + NAME=sde2
> + PARENT_NAME=sde
> ++ /usr/sbin/sgdisk --info=2 /dev/sde
> ++ grep 'Partition GUID code'
> ++ awk '{print $4}'
> ++ tr '[:upper:]' '[:lower:]'
> + ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106
> + '[' -z 45b0969e-9b03-4f30-b4c6-b4b80ceff106 ']'
> ++ /usr/sbin/sgdisk --info=2 /dev/sde
> ++ grep 'Partition unique GUID'
> ++ awk '{print $4}'
> ++ tr '[:upper:]' '[:lower:]'
> + ID_PART_ENTRY_UUID=a9e8d490-82a7-48c1-8ef1-aff92351c69c
> + mkdir -p /dev/disk/by-partuuid
> + ln -sf ../../sde2 /dev/disk/by-partuuid/a9e8d490-82a7-48c1-8ef1-aff92351c69c
> + mkdir -p /dev/disk/by-parttypeuuid
> + ln -sf ../../sde2 
> /dev/disk/by-parttypeuuid/45b0969e-9b03-4f30-b4c6-b4b80ceff106.a9e8d490-82a7-48c1-8ef1-aff92351c69c
> + case $ID_PART_ENTRY_TYPE in
> + /usr/sbin/ceph-disk -v activate-journal /dev/sde2
> INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid 
> --osd-journal /dev/sde2
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> DEBUG:ceph-disk:Journal /dev/sde2 has OSD UUID 
> ----
> INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- 
> /dev/disk/by-partuuid/----
> error: /dev/disk/by-partuuid/----: No such 
> file or directory
> ceph-disk: Cannot discover filesystem type: device 
> /dev/disk/by-partuuid/----: Command 
> '/sbin/blkid' returned non-zero exit status 2
> + exit
> + exec
> 
> regards
> 
> Frederic.
> 
> P.S : in your puppet module, it seems impossible to specify osd disks by 
> path, i.e : 
> ceph::profile::params::osds:
>   '/dev/disk/by-path/pci-\:0a\:00.0-scsi-0\:2\:':
> (I tried without the backslashes too)
> 
> -Message d'origine-
> De : Loic Dachary [mailto:l...@dachary.org] 
> Envoyé : jeudi 9 octobre 2014 15:01
> À : SCHAER Frederic; ceph-users@lists.ceph.com
> Objet : Re: [ceph-users] ceph-dis prepare : 
> UUID=----
> 
> Bonjour,
> 
> I'm not familiar with RHEL7 but willing to learn ;-) I recently ran into 
> confusing situations regarding the content of /dev/disk/by-partuuid because 
> partprobe was not called when it should have (ubuntu). On RHEL, kpartx is 
> used instead because partprobe reboo

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic

Hi Loic,

Back on sdb, as the sde output was from another machine on which I ran partx -u 
afterwards.
To reply your last question first : I think the SG_IO error comes from the fact 
that disks are exported as a single disks RAID0 on a PERC 6/E, which does not 
support JBOD - this is decommissioned hardware on which I'd like to test and 
validate we can use ceph for our use case...

So back on the  UUID.
It's funny : I retried and ceph-disk prepare worked this time. I tried on 
another disk, and it failed.
There is a difference in the output from ceph-disk : on the failing disk, I 
have these extra lines after disks are prepared :

(...)
realtime =none   extsz=4096   blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
partx: /dev/sdc: error adding partitions 1-2

I didn't have the warning about the old partition tables on the disk that 
worked. 
So on this new disk, I have :

[root@ceph1 ~]# mount /dev/sdc1 /mnt
[root@ceph1 ~]# ll /mnt/
total 16
-rw-r--r-- 1 root root 37 Oct  9 15:58 ceph_fsid
-rw-r--r-- 1 root root 37 Oct  9 15:58 fsid
lrwxrwxrwx 1 root root 58 Oct  9 15:58 journal -> 
/dev/disk/by-partuuid/5e50bb8b-0b99-455f-af71-10815a32bfbc
-rw-r--r-- 1 root root 37 Oct  9 15:58 journal_uuid
-rw-r--r-- 1 root root 21 Oct  9 15:58 magic

[root@ceph1 ~]# cat /mnt/journal_uuid
5e50bb8b-0b99-455f-af71-10815a32bfbc

[root@ceph1 ~]# sgdisk --info=1 /dev/sdc
Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
Partition unique GUID: 244973DE-7472-421C-BB25-4B09D3F8D441
First sector: 10487808 (at 5.0 GiB)
Last sector: 1952448478 (at 931.0 GiB)
Partition size: 1941960671 sectors (926.0 GiB)
Attribute flags: 
Partition name: 'ceph data'

[root@ceph1 ~]# sgdisk --info=2 /dev/sdc
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
Partition unique GUID: 5E50BB8B-0B99-455F-AF71-10815A32BFBC
First sector: 2048 (at 1024.0 KiB)
Last sector: 10485760 (at 5.0 GiB)
Partition size: 10483713 sectors (5.0 GiB)
Attribute flags: 
Partition name: 'ceph journal'

Puzzling, isn't it ?


-Message d'origine-
De : Loic Dachary [mailto:l...@dachary.org] 
Envoyé : jeudi 9 octobre 2014 15:37
À : SCHAER Frederic; ceph-users@lists.ceph.com
Objet : Re: [ceph-users] ceph-dis prepare : 
UUID=----


Does what do sgdisk --info=1 /dev/sde and sgdisk --info=2 /dev/sde print ?

It looks like the journal points to an incorrect location (you should see this 
by mounting /dev/sde1). Here is what I have on a cluster

root@bm0015:~# ls -l /var/lib/ceph/osd/ceph-1/
total 56
-rw-r--r--   1 root root  192 Nov  2  2013 activate.monmap
-rw-r--r--   1 root root3 Nov  2  2013 active
-rw-r--r--   1 root root   37 Nov  2  2013 ceph_fsid
drwxr-xr-x 114 root root 8192 Sep 14 11:01 current
-rw-r--r--   1 root root   37 Nov  2  2013 fsid
lrwxrwxrwx   1 root root   58 Nov  2  2013 journal -> 
/dev/disk/by-partuuid/7e811295-1b45-477d-907a-41c4c90d9687
-rw-r--r--   1 root root   37 Nov  2  2013 journal_uuid
-rw---   1 root root   56 Nov  2  2013 keyring
-rw-r--r--   1 root root   21 Nov  2  2013 magic
-rw-r--r--   1 root root6 Nov  2  2013 ready
-rw-r--r--   1 root root4 Nov  2  2013 store_version
-rw-r--r--   1 root root   42 Dec 27  2013 superblock
-rw-r--r--   1 root root0 May  2 14:01 upstart
-rw-r--r--   1 root root2 Nov  2  2013 whoami
root@bm0015:~# cat /var/lib/ceph/osd/ceph-1/journal_uuid
7e811295-1b45-477d-907a-41c4c90d9687
root@bm0015:~#

I guess in your case the content of journal_uuid is 0- etc. for some 
reason.

Do you know where that

SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

comes from ?

On 09/10/2014 15:20, SCHAER Frederic wrote:
> Hi Loic,
> 
> With this example disk/machine that I left untouched until now :
> 
> /dev/sdb :
>  /dev/sdb1 ceph data, prepared, cluster ceph, osd.44, journal /dev/sdb2
>  /dev/sdb2 ceph journal, for /dev/sdb1
> 
> [root@ceph1 ~]# ll /dev/disk/by-partuuid/
> total 0
> lrwxrwxrwx 1 root root 10 Oct  9 15:09 2c27dbda-fbe3-48d6-80fe-b513e1c11702 
> -> ../../sdb1
> lrwxrwxrwx 1 root root 10 Oct  9 15:09 d2352e3b-f7f2-40c7-8273-8bfa8ab4206a 
> -> ../../sdb2
> 
> This is the blkid output :
> 
> [root@ceph1 ~]# blkid  /dev/sdb2
> [root@ceph1 ~]# blkid  /dev/sdb1
> /dev/sdb1: UUID="c8feaaad-bd83-41a3-a82a-0a8727d0b067" TYPE="xfs" 
> PARTLABEL="ceph data" PARTUUID="2c27dbda-fbe3-48d6-80fe-b513e1c11702"
> 
> If I run "partx -u /dev/sdb", then the filesystem will get activated and the 
> OSD started.
> And sometimes, it just works without intervention, but that's the exception.
> 
> I modified the udev script this morning, so I can give you the output of what 
> happens when things go wrong : links are created, but somewhere the UUIDD is 
> wrongly detected by ceph-osd, a

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread Loic Dachary



On 09/10/2014 16:04, SCHAER Frederic wrote:
> Hi Loic,
> 
> Back on sdb, as the sde output was from another machine on which I ran partx 
> -u afterwards.
> To reply your last question first : I think the SG_IO error comes from the 
> fact that disks are exported as a single disks RAID0 on a PERC 6/E, which 
> does not support JBOD - this is decommissioned hardware on which I'd like to 
> test and validate we can use ceph for our use case...
> 
> So back on the  UUID.
> It's funny : I retried and ceph-disk prepare worked this time. I tried on 
> another disk, and it failed.
> There is a difference in the output from ceph-disk : on the failing disk, I 
> have these extra lines after disks are prepared :
> 
> (...)
> realtime =none   extsz=4096   blocks=0, rtextents=0
> Warning: The kernel is still using the old partition table.
> The new table will be used at the next reboot.
> The operation has completed successfully.
> partx: /dev/sdc: error adding partitions 1-2
> 
> I didn't have the warning about the old partition tables on the disk that 
> worked. 
> So on this new disk, I have :
> 
> [root@ceph1 ~]# mount /dev/sdc1 /mnt
> [root@ceph1 ~]# ll /mnt/
> total 16
> -rw-r--r-- 1 root root 37 Oct  9 15:58 ceph_fsid
> -rw-r--r-- 1 root root 37 Oct  9 15:58 fsid
> lrwxrwxrwx 1 root root 58 Oct  9 15:58 journal -> 
> /dev/disk/by-partuuid/5e50bb8b-0b99-455f-af71-10815a32bfbc
> -rw-r--r-- 1 root root 37 Oct  9 15:58 journal_uuid
> -rw-r--r-- 1 root root 21 Oct  9 15:58 magic
> 
> [root@ceph1 ~]# cat /mnt/journal_uuid
> 5e50bb8b-0b99-455f-af71-10815a32bfbc
> 
> [root@ceph1 ~]# sgdisk --info=1 /dev/sdc
> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> Partition unique GUID: 244973DE-7472-421C-BB25-4B09D3F8D441
> First sector: 10487808 (at 5.0 GiB)
> Last sector: 1952448478 (at 931.0 GiB)
> Partition size: 1941960671 sectors (926.0 GiB)
> Attribute flags: 
> Partition name: 'ceph data'
> 
> [root@ceph1 ~]# sgdisk --info=2 /dev/sdc
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: 5E50BB8B-0B99-455F-AF71-10815A32BFBC
> First sector: 2048 (at 1024.0 KiB)
> Last sector: 10485760 (at 5.0 GiB)
> Partition size: 10483713 sectors (5.0 GiB)
> Attribute flags: 
> Partition name: 'ceph journal'
> 
> Puzzling, isn't it ?
> 
> 

Yes :-) Just to be 100% sure, when you try to activate this /dev/sdc it shows 
an error and complains that the journal uuid is -000* etc ? If so could you 
copy your udev debug output ?

Cheers


> -Message d'origine-
> De : Loic Dachary [mailto:l...@dachary.org] 
> Envoyé : jeudi 9 octobre 2014 15:37
> À : SCHAER Frederic; ceph-users@lists.ceph.com
> Objet : Re: [ceph-users] ceph-dis prepare : 
> UUID=----
> 
> 
> Does what do sgdisk --info=1 /dev/sde and sgdisk --info=2 /dev/sde print ?
> 
> It looks like the journal points to an incorrect location (you should see 
> this by mounting /dev/sde1). Here is what I have on a cluster
> 
> root@bm0015:~# ls -l /var/lib/ceph/osd/ceph-1/
> total 56
> -rw-r--r--   1 root root  192 Nov  2  2013 activate.monmap
> -rw-r--r--   1 root root3 Nov  2  2013 active
> -rw-r--r--   1 root root   37 Nov  2  2013 ceph_fsid
> drwxr-xr-x 114 root root 8192 Sep 14 11:01 current
> -rw-r--r--   1 root root   37 Nov  2  2013 fsid
> lrwxrwxrwx   1 root root   58 Nov  2  2013 journal -> 
> /dev/disk/by-partuuid/7e811295-1b45-477d-907a-41c4c90d9687
> -rw-r--r--   1 root root   37 Nov  2  2013 journal_uuid
> -rw---   1 root root   56 Nov  2  2013 keyring
> -rw-r--r--   1 root root   21 Nov  2  2013 magic
> -rw-r--r--   1 root root6 Nov  2  2013 ready
> -rw-r--r--   1 root root4 Nov  2  2013 store_version
> -rw-r--r--   1 root root   42 Dec 27  2013 superblock
> -rw-r--r--   1 root root0 May  2 14:01 upstart
> -rw-r--r--   1 root root2 Nov  2  2013 whoami
> root@bm0015:~# cat /var/lib/ceph/osd/ceph-1/journal_uuid
> 7e811295-1b45-477d-907a-41c4c90d9687
> root@bm0015:~#
> 
> I guess in your case the content of journal_uuid is 0- etc. for some 
> reason.
> 
> Do you know where that
> 
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> comes from ?
> 
> On 09/10/2014 15:20, SCHAER Frederic wrote:
>> Hi Loic,
>>
>> With this example disk/machine that I left untouched until now :
>>
>> /dev/sdb :
>>  /dev/sdb1 ceph data, prepared, cluster ceph, osd.44, journal /dev/sdb2
>>  /dev/sdb2 ceph journal, for /dev/sdb1
>>
>> [root@ceph1 ~]# ll /dev/disk/by-partuuid/
>> total 0
>> lrwxrwxrwx 1 root root 10 Oct  9 15:09 2c27dbda-fbe3-48d6-80fe-b513e1c11702 
>> -> ../../sdb1
>> lrwxrwxrwx 1 root root 10 Oct  9 15:09 d2352e3b-f7f2-40c7-8273-8bfa8ab4206a 
>> -> ../../sdb2
>>
>> This is the blkid output :
>>
>> [root@ceph1 ~]# blkid  /dev/sdb2
>> [root@ceph1 ~]# blkid  /dev/sdb1
>> /dev/sdb1: UUID="c8feaaad-bd83-41a3-a82a-

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread SCHAER Frederic



-Message d'origine-
De : Loic Dachary [mailto:l...@dachary.org] 
Envoyé : jeudi 9 octobre 2014 16:20
À : SCHAER Frederic; ceph-users@lists.ceph.com
Objet : Re: [ceph-users] ceph-dis prepare : 
UUID=----



On 09/10/2014 16:04, SCHAER Frederic wrote:
> Hi Loic,
> 
> Back on sdb, as the sde output was from another machine on which I ran partx 
> -u afterwards.
> To reply your last question first : I think the SG_IO error comes from the 
> fact that disks are exported as a single disks RAID0 on a PERC 6/E, which 
> does not support JBOD - this is decommissioned hardware on which I'd like to 
> test and validate we can use ceph for our use case...
> 
> So back on the  UUID.
> It's funny : I retried and ceph-disk prepare worked this time. I tried on 
> another disk, and it failed.
> There is a difference in the output from ceph-disk : on the failing disk, I 
> have these extra lines after disks are prepared :
> 
> (...)
> realtime =none   extsz=4096   blocks=0, rtextents=0
> Warning: The kernel is still using the old partition table.
> The new table will be used at the next reboot.
> The operation has completed successfully.
> partx: /dev/sdc: error adding partitions 1-2
> 
> I didn't have the warning about the old partition tables on the disk that 
> worked. 
> So on this new disk, I have :
> 
> [root@ceph1 ~]# mount /dev/sdc1 /mnt
> [root@ceph1 ~]# ll /mnt/
> total 16
> -rw-r--r-- 1 root root 37 Oct  9 15:58 ceph_fsid
> -rw-r--r-- 1 root root 37 Oct  9 15:58 fsid
> lrwxrwxrwx 1 root root 58 Oct  9 15:58 journal -> 
> /dev/disk/by-partuuid/5e50bb8b-0b99-455f-af71-10815a32bfbc
> -rw-r--r-- 1 root root 37 Oct  9 15:58 journal_uuid
> -rw-r--r-- 1 root root 21 Oct  9 15:58 magic
> 
> [root@ceph1 ~]# cat /mnt/journal_uuid
> 5e50bb8b-0b99-455f-af71-10815a32bfbc
> 
> [root@ceph1 ~]# sgdisk --info=1 /dev/sdc
> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
> Partition unique GUID: 244973DE-7472-421C-BB25-4B09D3F8D441
> First sector: 10487808 (at 5.0 GiB)
> Last sector: 1952448478 (at 931.0 GiB)
> Partition size: 1941960671 sectors (926.0 GiB)
> Attribute flags: 
> Partition name: 'ceph data'
> 
> [root@ceph1 ~]# sgdisk --info=2 /dev/sdc
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: 5E50BB8B-0B99-455F-AF71-10815A32BFBC
> First sector: 2048 (at 1024.0 KiB)
> Last sector: 10485760 (at 5.0 GiB)
> Partition size: 10483713 sectors (5.0 GiB)
> Attribute flags: 
> Partition name: 'ceph journal'
> 
> Puzzling, isn't it ?
> 
> 

Yes :-) Just to be 100% sure, when you try to activate this /dev/sdc it shows 
an error and complains that the journal uuid is -000* etc ? If so could you 
copy your udev debug output ?

Cheers

[>- FS : -<]  

No, when I manually activate the disk instead of attempting to go the udev way, 
it seems to work :
[root@ceph1 ~]# ceph-disk activate /dev/sdc1
got monmap epoch 1
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2014-10-09 16:21:43.286288 7f2be6a027c0 -1 journal check: ondisk fsid 
---- doesn't match expected 
244973de-7472-421c-bb25-4b09d3f8d441, invalid (someone else's?) journal
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2014-10-09 16:21:43.301957 7f2be6a027c0 -1 
filestore(/var/lib/ceph/tmp/mnt.4lJlzP) could not find 
23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
2014-10-09 16:21:43.305941 7f2be6a027c0 -1 created object store 
/var/lib/ceph/tmp/mnt.4lJlzP journal /var/lib/ceph/tmp/mnt.4lJlzP/journal for 
osd.47 fsid 70ac4a78-46c0-45e6-8ff9-878b37f50fa1
2014-10-09 16:21:43.305992 7f2be6a027c0 -1 auth: error reading file: 
/var/lib/ceph/tmp/mnt.4lJlzP/keyring: can't open 
/var/lib/ceph/tmp/mnt.4lJlzP/keyring: (2) No such file or directory
2014-10-09 16:21:43.306099 7f2be6a027c0 -1 created new key in keyring 
/var/lib/ceph/tmp/mnt.4lJlzP/keyring
added key for osd.47
=== osd.47 ===
create-or-move updating item name 'osd.47' weight 0.9 at location 
{host=ceph1,root=default} to crush map
Starting Ceph osd.47 on ceph1...
Running as unit run-12392.service.

The osd then appeared in the osd tree...
I attached the logs to this email (I just added a set -x in the script called 
by udev, and redirected the output)

Regards


udev_ceph.log.out
Description: udev_ceph.log.out
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

2014-10-09 Thread Loic Dachary



On 09/10/2014 16:29, SCHAER Frederic wrote:
> 
> 
> -Message d'origine-
> De : Loic Dachary [mailto:l...@dachary.org] 
> Envoyé : jeudi 9 octobre 2014 16:20
> À : SCHAER Frederic; ceph-users@lists.ceph.com
> Objet : Re: [ceph-users] ceph-dis prepare : 
> UUID=----
> 
> 
> 
> On 09/10/2014 16:04, SCHAER Frederic wrote:
>> Hi Loic,
>>
>> Back on sdb, as the sde output was from another machine on which I ran partx 
>> -u afterwards.
>> To reply your last question first : I think the SG_IO error comes from the 
>> fact that disks are exported as a single disks RAID0 on a PERC 6/E, which 
>> does not support JBOD - this is decommissioned hardware on which I'd like to 
>> test and validate we can use ceph for our use case...
>>
>> So back on the  UUID.
>> It's funny : I retried and ceph-disk prepare worked this time. I tried on 
>> another disk, and it failed.
>> There is a difference in the output from ceph-disk : on the failing disk, I 
>> have these extra lines after disks are prepared :
>>
>> (...)
>> realtime =none   extsz=4096   blocks=0, rtextents=0
>> Warning: The kernel is still using the old partition table.
>> The new table will be used at the next reboot.
>> The operation has completed successfully.
>> partx: /dev/sdc: error adding partitions 1-2
>>
>> I didn't have the warning about the old partition tables on the disk that 
>> worked. 
>> So on this new disk, I have :
>>
>> [root@ceph1 ~]# mount /dev/sdc1 /mnt
>> [root@ceph1 ~]# ll /mnt/
>> total 16
>> -rw-r--r-- 1 root root 37 Oct  9 15:58 ceph_fsid
>> -rw-r--r-- 1 root root 37 Oct  9 15:58 fsid
>> lrwxrwxrwx 1 root root 58 Oct  9 15:58 journal -> 
>> /dev/disk/by-partuuid/5e50bb8b-0b99-455f-af71-10815a32bfbc
>> -rw-r--r-- 1 root root 37 Oct  9 15:58 journal_uuid
>> -rw-r--r-- 1 root root 21 Oct  9 15:58 magic
>>
>> [root@ceph1 ~]# cat /mnt/journal_uuid
>> 5e50bb8b-0b99-455f-af71-10815a32bfbc
>>
>> [root@ceph1 ~]# sgdisk --info=1 /dev/sdc
>> Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
>> Partition unique GUID: 244973DE-7472-421C-BB25-4B09D3F8D441
>> First sector: 10487808 (at 5.0 GiB)
>> Last sector: 1952448478 (at 931.0 GiB)
>> Partition size: 1941960671 sectors (926.0 GiB)
>> Attribute flags: 
>> Partition name: 'ceph data'
>>
>> [root@ceph1 ~]# sgdisk --info=2 /dev/sdc
>> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
>> Partition unique GUID: 5E50BB8B-0B99-455F-AF71-10815A32BFBC
>> First sector: 2048 (at 1024.0 KiB)
>> Last sector: 10485760 (at 5.0 GiB)
>> Partition size: 10483713 sectors (5.0 GiB)
>> Attribute flags: 
>> Partition name: 'ceph journal'
>>
>> Puzzling, isn't it ?
>>
>>
> 
> Yes :-) Just to be 100% sure, when you try to activate this /dev/sdc it shows 
> an error and complains that the journal uuid is -000* etc ? If so could 
> you copy your udev debug output ?
> 
> Cheers
> 
> [>- FS : -<]  
> 
> No, when I manually activate the disk instead of attempting to go the udev 
> way, it seems to work :
> [root@ceph1 ~]# ceph-disk activate /dev/sdc1
> got monmap epoch 1
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 2014-10-09 16:21:43.286288 7f2be6a027c0 -1 journal check: ondisk fsid 
> ---- doesn't match expected 
> 244973de-7472-421c-bb25-4b09d3f8d441, invalid (someone else's?) journal
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 2014-10-09 16:21:43.301957 7f2be6a027c0 -1 
> filestore(/var/lib/ceph/tmp/mnt.4lJlzP) could not find 
> 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
> 2014-10-09 16:21:43.305941 7f2be6a027c0 -1 created object store 
> /var/lib/ceph/tmp/mnt.4lJlzP journal /var/lib/ceph/tmp/mnt.4lJlzP/journal for 
> osd.47 fsid 70ac4a78-46c0-45e6-8ff9-878b37f50fa1
> 2014-10-09 16:21:43.305992 7f2be6a027c0 -1 auth: error reading file: 
> /var/lib/ceph/tmp/mnt.4lJlzP/keyring: can't open 
> /var/lib/ceph/tmp/mnt.4lJlzP/keyring: (2) No such file or directory
> 2014-10-09 16:21:43.306099 7f2be6a027c0 -1 created new key in keyring 
> /var/lib/ceph/tmp/mnt.4lJlzP/keyring
> added key for osd.47
> === osd.47 ===
> create-or-move updating item name 'osd.47' weight 0.9 at location 
> {host=ceph1,root=default} to crush map
> Starting Ceph osd.47 on ceph1...
> Running as unit run-12392.service.
> 
> The osd then appeared in the osd tree...
> I attached the logs to this email (I just added a set -x in the script called 
> by udev, and redirected the output)

The failure 

journal check:

Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-09 Thread Christopher Armstrong

Hey guys,

Good news!! Ilya investigated the ticket and gave me a hint as to the issue
- we need to use `--net host` on the consuming container so that the
network context is what Ceph expects. I am now running my test container
like so:

docker run -i -v /sys:/sys --net host
172.21.12.100:5000/deis/store-base:git-3d4ca8f /bin/bash

Note that we also had to bind-mount /sys so that it's not read-only
within the container. And I can confirm that it works!



*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Tue, Oct 7, 2014 at 11:06 AM, Christopher Armstrong 
wrote:

> Thank you Ilya! Please let me know if I can help. To give you some
> background, I'm one of the core maintainers of Deis, an open-source PaaS
> built on Docker and CoreOS. We have Ceph running quite successfully as
> implemented in https://github.com/deis/deis/pull/1910 based on Seán
> McCord's containerized Ceph work: https://github.com/ulexus/docker-ceph
>
> We are currently only using radosgw. We really need shared volume support,
> which is why we're interested in getting RBD mapping working.
>
> Thanks for helping with this!
>
>
> *Chris Armstrong*Head of Services
> OpDemand / Deis.io
>
> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
>
>
> On Tue, Oct 7, 2014 at 4:05 AM, Ilya Dryomov 
> wrote:
>
>> On Tue, Oct 7, 2014 at 9:46 AM, Christopher Armstrong
>>  wrote:
>> > Hi folks,
>> >
>> > I'm trying to gather additional information surrounding
>> > http://tracker.ceph.com/issues/9355 so we can hopefully find the root
>> of
>> > what's preventing us from successfully mapping RBD volumes inside a
>> Linux
>> > container.
>> >
>> > With the RBD kernel module debugging enabled (and cephx authentication
>> > disabled so I can echo to the RBD bus) as instructed by joshd, I notice
>> this
>> > error in my dmesg:
>> >
>> > [ 1005.143340] libceph: error -22 on auth protocol 2 init
>> >
>> > Not sure this is the root of the issues, but it's certainly a lead.
>> This may
>> > just be caused by the fact that we've disabled authentication in
>> ceph.conf
>> > so we can debug this, but was hoping someone from the list could shed
>> some
>> > light.
>>
>> Hi Christopher,
>>
>> I'll try to setup docker and have a look.
>>
>> Thanks,
>>
>> Ilya
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] accept: got bad authorizer

2014-10-09 Thread Nathan Stratton

Yep, that was it. My concern tho is that one node with a bad clock was able
to lock the whole 16 node cluster, should that be the case?


><>
nathan stratton | vp technology | broadsoft, inc | +1-240-404-6580 |
www.broadsoft.com

On Wed, Oct 8, 2014 at 6:48 PM, Gregory Farnum  wrote:

> Check your clock sync on that node. That's the usual cause of this issue.
> -Greg
>
>
> On Wednesday, October 8, 2014, Nathan Stratton 
> wrote:
>
>> I have one out of 16 of my OSDs doing something odd. The logs show some
>> sort of authentication issue. If I restart the OSD things are fine, but in
>> a few hours it happens again and I have to restart it to get things back up.
>>
>> 2014-10-08 06:46:46.858260 7f43f62a0700  0 auth: could not find
>> secret_id=221
>> 2014-10-08 06:46:46.858276 7f43f62a0700  0 cephx: verify_authorizer could
>> not get service secret for service osd secret_id=221
>> 2014-10-08 06:46:46.858302 7f43f62a0700  0 -- 10.71.1.26:6800/22284 >>
>> 10.71.0.218:0/1002562 pipe(0x7c92800 sd=73 :6800 s=0 pgs=0 cs=0 l=1
>> c=0x87b44c0).accept: got bad authorizer
>>
>>
>> ><>
>> nathan stratton | vp technology | broadsoft, inc | +1-240-404-6580 |
>> www.broadsoft.com
>>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Monitor segfaults when updating the crush map

2014-10-09 Thread Stephen Jahl

Hi All,

I'm trying to add a crush rule to my map, which looks like this:

rule rack_ruleset {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step choose firstn 2 type rack
step chooseleaf firstn 2 type host
step emit
}

I'm not configuring any pools to use the ruleset at this time. When I
recompile the map, and test the rule with crushtool --test, everything
seems fine, and I'm not noticing anything out of the ordinary.

But, when I try to inject the compiled crush map back into the cluster like
this:

ceph osd setcrushmap -i /path/to/compiled-crush-map

The monitor process appears to stop, and I see a monitor election
happening. Things hang until I ^C the setcrushmap command, and I need to
restart the monitor processes to make things happy again (and the crush map
never ends up getting updated).

In the monitor logs, I see several segfaults that look like this:
http://pastebin.com/K1XqPpbF

I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with
kernel 3.13.0-35-generic.

Anyone have any ideas as to what is happening?

-Steve
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados Gateway and Swift create containers/buckets that cannot be opened

2014-10-09 Thread Yehuda Sadeh

I have a trivial fix for the issue that I'd like to check and get this
one cleared, but never got to it due to some difficulties with a
proper keystone setup in my environment. If you can and would like to
test it so that we could get it merged it would be great.

Thanks,
Yehuda

On Wed, Oct 8, 2014 at 6:18 PM, Mark Kirkwood
 wrote:
> Yes. I ran into that as well - I used
>
> WSGIChunkedRequest On
>
> in the virtualhost config for the *keystone* server [1] as indicated in
> issue 7796.
>
> Cheers
>
> Mark
>
> [1] i.e, not the rgw.
>
> On 08/10/14 22:58, Ashish Chandra wrote:
>>
>> Hi Mark,
>> Good you got the solution. But since you have already done
>> authenticating RadosGW with Keystone, I am having one issue that you can
>> help with. For me I get an error "411 Length Required" with Keystone
>> token authentication.
>> To fix this I use "WSGIChunkedRequest On" in rgw.conf as mentioned in
>> http://tracker.ceph.com/issues/7796.
>>
>> Did you face the issue, if yes what was your solution.
>>
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-09 Thread Ilya Dryomov

On Thu, Oct 9, 2014 at 8:34 PM, Christopher Armstrong
 wrote:
> Hey guys,
>
> Good news!! Ilya investigated the ticket and gave me a hint as to the issue
> - we need to use `--net host` on the consuming container so that the network
> context is what Ceph expects. I am now running my test container like so:
>
> docker run -i -v /sys:/sys --net host
> 172.21.12.100:5000/deis/store-base:git-3d4ca8f /bin/bash
>
> Note that we also had to bind-mount /sys so that it's not read-only within
> the container. And I can confirm that it works!

What are you doing about /dev?  /dev/rbdX won't show up.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-09 Thread Christopher Armstrong

Good point. I'll have to play around with it - was just excited to get past
the blocking map issue.


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Thu, Oct 9, 2014 at 11:20 AM, Ilya Dryomov 
wrote:

> On Thu, Oct 9, 2014 at 8:34 PM, Christopher Armstrong
>  wrote:
> > Hey guys,
> >
> > Good news!! Ilya investigated the ticket and gave me a hint as to the
> issue
> > - we need to use `--net host` on the consuming container so that the
> network
> > context is what Ceph expects. I am now running my test container like so:
> >
> > docker run -i -v /sys:/sys --net host
> > 172.21.12.100:5000/deis/store-base:git-3d4ca8f /bin/bash
> >
> > Note that we also had to bind-mount /sys so that it's not read-only
> within
> > the container. And I can confirm that it works!
>
> What are you doing about /dev?  /dev/rbdX won't show up.
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados Gateway and Swift create containers/buckets that cannot be opened

2014-10-09 Thread M Ranga Swami Reddy

Hi Yehuda,
Please share the fix/patch, we could test and confirm the fix status.

Thanks
Swami

On Thu, Oct 9, 2014 at 10:42 PM, Yehuda Sadeh  wrote:
> I have a trivial fix for the issue that I'd like to check and get this
> one cleared, but never got to it due to some difficulties with a
> proper keystone setup in my environment. If you can and would like to
> test it so that we could get it merged it would be great.
>
> Thanks,
> Yehuda
>
> On Wed, Oct 8, 2014 at 6:18 PM, Mark Kirkwood
>  wrote:
>> Yes. I ran into that as well - I used
>>
>> WSGIChunkedRequest On
>>
>> in the virtualhost config for the *keystone* server [1] as indicated in
>> issue 7796.
>>
>> Cheers
>>
>> Mark
>>
>> [1] i.e, not the rgw.
>>
>> On 08/10/14 22:58, Ashish Chandra wrote:
>>>
>>> Hi Mark,
>>> Good you got the solution. But since you have already done
>>> authenticating RadosGW with Keystone, I am having one issue that you can
>>> help with. For me I get an error "411 Length Required" with Keystone
>>> token authentication.
>>> To fix this I use "WSGIChunkedRequest On" in rgw.conf as mentioned in
>>> http://tracker.ceph.com/issues/7796.
>>>
>>> Did you face the issue, if yes what was your solution.
>>>
>>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Monitor segfaults when updating the crush map

2014-10-09 Thread Stephen Jahl

Hi All,

I'm trying to add a crush rule to my map, which looks like this:

rule rack_ruleset {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step choose firstn 2 type rack
step chooseleaf firstn 2 type host
step emit
}

I'm not configuring any pools to use the ruleset at this time. When I
recompile the map, and test the rule with crushtool --test, everything
seems fine, and I'm not noticing anything out of the ordinary.

But, when I try to inject the compiled crush map back into the cluster like
this:

ceph osd setcrushmap -i /path/to/compiled-crush-map

The monitor process appears to stop, and I see a monitor election
happening. Things hang until I ^C the setcrushmap command, and I need to
restart the monitor processes to make things happy again (and the crush map
never ends up getting updated).

In the monitor logs, I see several segfaults that look like this:
http://pastebin.com/K1XqPpbF

I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with
kernel 3.13.0-35-generic.

Anyone have any ideas as to what is happening?

-Steve
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-09 Thread Ilya Dryomov

On Thu, Oct 9, 2014 at 9:23 PM, Christopher Armstrong
 wrote:
> Good point. I'll have to play around with it - was just excited to get past
> the blocking map issue.

This could be a docker bug - my understanding is that all devices have
to show up if running with --privileged, which I do on my test box.
I'll poke around some more as well.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-09 Thread Christopher Armstrong

Adding `-v /dev:/dev` works as expected - after mapping, the device shows
up as /dev/rbd0. Agreed, though - I thought --privileged should do this.


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Thu, Oct 9, 2014 at 11:36 AM, Ilya Dryomov 
wrote:

> On Thu, Oct 9, 2014 at 9:23 PM, Christopher Armstrong
>  wrote:
> > Good point. I'll have to play around with it - was just excited to get
> past
> > the blocking map issue.
>
> This could be a docker bug - my understanding is that all devices have
> to show up if running with --privileged, which I do on my test box.
> I'll poke around some more as well.
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Regarding Primary affinity configuration

2014-10-09 Thread Johnu George (johnugeo)

Hi All,
  I have few questions regarding the Primary affinity.  In the original 
blueprint 
(https://wiki.ceph.com/Planning/Blueprints/Firefly/osdmap%3A_primary_role_affinity
 ), one example has been given.

For PG x, CRUSH returns [a, b, c]
If a has primary_affinity of .5, b and c have 1 , with 50% probability, we will 
choose b or c instead of a. (25% for b, 25% for c)

A) I was browsing through the code, but I could not find this logic of 
splitting the rest of configured primary affinity value between other osds. How 
is this handled?

  1.  if (a < CEPH_OSD_MAX_PRIMARY_AFFINITY &&
  2.  (crush_hash32_2(CRUSH_HASH_RJENKINS1,
  3.  seed, o) >> 16) >= a) {
  4.// we chose not to use this primary.  note it anyway as a
  5.// fallback in case we don't pick anyone else, but keep looking.
  6.if (pos < 0)
  7.  pos = i;
  8.  } else {
  9.pos = i;
  10.   break;
  11. }
  12.   }

B) Since, primary affinity value is configured independently, there can be a 
situation with [0.1,0.1,0.1]  with total value that don’t add to 1.  How is 
this taken care of?

C) Slightly confused. What happens for a situation with [1,0.5,1] ? Is osd.0 
always returned?

D) After calculating primary based on the affinity values, I see a shift of 
osds so that primary comes to the front. Why is this needed?. I thought, 
primary affinity value affects only reads and hence, osd ordering need not be 
changed.


Thanks,
Johnu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor segfaults when updating the crush map

2014-10-09 Thread Loic Dachary

Hi Stephen,

It looks like you're hitting http://tracker.ceph.com/issues/9492 which has been 
fixed but is not yet available in firefly. The simplest workaround is to 
min_size 4 in this case. 

Cheers

On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,
> 
> I'm trying to add a crush rule to my map, which looks like this:
> 
> rule rack_ruleset {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step choose firstn 2 type rack
> step chooseleaf firstn 2 type host
> step emit
> }
> 
> I'm not configuring any pools to use the ruleset at this time. When I 
> recompile the map, and test the rule with crushtool --test, everything seems 
> fine, and I'm not noticing anything out of the ordinary.
> 
> But, when I try to inject the compiled crush map back into the cluster like 
> this:
> 
> ceph osd setcrushmap -i /path/to/compiled-crush-map
> 
> The monitor process appears to stop, and I see a monitor election happening. 
> Things hang until I ^C the setcrushmap command, and I need to restart the 
> monitor processes to make things happy again (and the crush map never ends up 
> getting updated).
> 
> In the monitor logs, I see several segfaults that look like this: 
> http://pastebin.com/K1XqPpbF
> 
> I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel 3.13.0-35-generic.
> 
> Anyone have any ideas as to what is happening?
> 
> -Steve
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor segfaults when updating the crush map

2014-10-09 Thread Stephen Jahl

Thanks Loic,

In my case, I actually only have three replicas for my pools -- with this
rule, I'm trying to ensure that at OSDs in at least two racks are selected.
Since the replica size is only 3, I think I'm still affected by the bug
(unless of course I set my replica size to 4).

Is there a better way I can express what I want in the crush rule,
preferably in a way not hit by that bug ;) ? Is there an ETA on when that
bugfix might land in firefly?

Best,
-Steve

On Thu, Oct 9, 2014 at 1:59 PM, Loic Dachary  wrote:

> Hi Stephen,
>
> It looks like you're hitting http://tracker.ceph.com/issues/9492 which
> has been fixed but is not yet available in firefly. The simplest workaround
> is to min_size 4 in this case.
>
> Cheers
>
> On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,
> >
> > I'm trying to add a crush rule to my map, which looks like this:
> >
> > rule rack_ruleset {
> > ruleset 1
> > type replicated
> > min_size 1
> > max_size 10
> > step take default
> > step choose firstn 2 type rack
> > step chooseleaf firstn 2 type host
> > step emit
> > }
> >
> > I'm not configuring any pools to use the ruleset at this time. When I
> recompile the map, and test the rule with crushtool --test, everything
> seems fine, and I'm not noticing anything out of the ordinary.
> >
> > But, when I try to inject the compiled crush map back into the cluster
> like this:
> >
> > ceph osd setcrushmap -i /path/to/compiled-crush-map
> >
> > The monitor process appears to stop, and I see a monitor election
> happening. Things hang until I ^C the setcrushmap command, and I need to
> restart the monitor processes to make things happy again (and the crush map
> never ends up getting updated).
> >
> > In the monitor logs, I see several segfaults that look like this:
> http://pastebin.com/K1XqPpbF
> >
> > I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel
> 3.13.0-35-generic.
> >
> > Anyone have any ideas as to what is happening?
> >
> > -Steve
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [ANN] ceph-deploy 1.5.18 released

2014-10-09 Thread Alfredo Deza

Hi All,

There is a new release of ceph-deploy that includes a fix where
enabling the OSD service would
fail on certain distros.

There is also a new improvement for creating a monitor keyring if not
found when deploying
monitors.

The full changelog can be seen here:
http://ceph.com/ceph-deploy/docs/changelog.html#id1

Make sure you upgrade!


-Alfredo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados Gateway and Swift create containers/buckets that cannot be opened

2014-10-09 Thread Yehuda Sadeh

Here's the fix, let me know if you need any help with that.

Thanks,
Yehuda

diff --git a/src/rgw/rgw_swift.cc b/src/rgw/rgw_swift.cc
index d9654a7..2445e17 100644
--- a/src/rgw/rgw_swift.cc
+++ b/src/rgw/rgw_swift.cc
@@ -505,6 +505,8 @@ int RGWSwift::validate_keystone_token(RGWRados
*store, const string& token, stru

 validate.append_header("X-Auth-Token", admin_token);

+validate.set_send_length(0);
+
 int ret = validate.process(url.c_str());
 if (ret < 0)
   return ret;



On Thu, Oct 9, 2014 at 10:30 AM, M Ranga Swami Reddy
 wrote:
> Hi Yehuda,
> Please share the fix/patch, we could test and confirm the fix status.
>
> Thanks
> Swami
>
> On Thu, Oct 9, 2014 at 10:42 PM, Yehuda Sadeh  wrote:
>> I have a trivial fix for the issue that I'd like to check and get this
>> one cleared, but never got to it due to some difficulties with a
>> proper keystone setup in my environment. If you can and would like to
>> test it so that we could get it merged it would be great.
>>
>> Thanks,
>> Yehuda
>>
>> On Wed, Oct 8, 2014 at 6:18 PM, Mark Kirkwood
>>  wrote:
>>> Yes. I ran into that as well - I used
>>>
>>> WSGIChunkedRequest On
>>>
>>> in the virtualhost config for the *keystone* server [1] as indicated in
>>> issue 7796.
>>>
>>> Cheers
>>>
>>> Mark
>>>
>>> [1] i.e, not the rgw.
>>>
>>> On 08/10/14 22:58, Ashish Chandra wrote:

 Hi Mark,
 Good you got the solution. But since you have already done
 authenticating RadosGW with Keystone, I am having one issue that you can
 help with. For me I get an error "411 Length Required" with Keystone
 token authentication.
 To fix this I use "WSGIChunkedRequest On" in rgw.conf as mentioned in
 http://tracker.ceph.com/issues/7796.

 Did you face the issue, if yes what was your solution.


>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor segfaults when updating the crush map

2014-10-09 Thread Loic Dachary

The patch is already in the firefly maintenance branch:

https://github.com/ceph/ceph/commits/firefly
https://github.com/ceph/ceph/commit/548be0b2aea18ed3196ef8f0ab5f58a66e3a9af4

but I'm not sure when the 0.80.7 release will be published. 
http://ceph.com/releases/v0-80-6-firefly-released/ was only a few days ago.

On 09/10/2014 20:11, Stephen Jahl wrote:
> Thanks Loic,
> 
> In my case, I actually only have three replicas for my pools -- with this 
> rule, I'm trying to ensure that at OSDs in at least two racks are selected. 
> Since the replica size is only 3, I think I'm still affected by the bug 
> (unless of course I set my replica size to 4).
> 
> Is there a better way I can express what I want in the crush rule, preferably 
> in a way not hit by that bug ;) ? Is there an ETA on when that bugfix might 
> land in firefly?
> 
> Best,
> -Steve
> 
> On Thu, Oct 9, 2014 at 1:59 PM, Loic Dachary  > wrote:
> 
> Hi Stephen,
> 
> It looks like you're hitting http://tracker.ceph.com/issues/9492 which 
> has been fixed but is not yet available in firefly. The simplest workaround 
> is to min_size 4 in this case.
> 
> Cheers
> 
> On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,
> >
> > I'm trying to add a crush rule to my map, which looks like this:
> >
> > rule rack_ruleset {
> > ruleset 1
> > type replicated
> > min_size 1
> > max_size 10
> > step take default
> > step choose firstn 2 type rack
> > step chooseleaf firstn 2 type host
> > step emit
> > }
> >
> > I'm not configuring any pools to use the ruleset at this time. When I 
> recompile the map, and test the rule with crushtool --test, everything seems 
> fine, and I'm not noticing anything out of the ordinary.
> >
> > But, when I try to inject the compiled crush map back into the cluster 
> like this:
> >
> > ceph osd setcrushmap -i /path/to/compiled-crush-map
> >
> > The monitor process appears to stop, and I see a monitor election 
> happening. Things hang until I ^C the setcrushmap command, and I need to 
> restart the monitor processes to make things happy again (and the crush map 
> never ends up getting updated).
> >
> > In the monitor logs, I see several segfaults that look like this: 
> http://pastebin.com/K1XqPpbF
> >
> > I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel 
> 3.13.0-35-generic.
> >
> > Anyone have any ideas as to what is happening?
> >
> > -Steve
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor segfaults when updating the crush map

2014-10-09 Thread Johnu George (johnugeo)

Stephen,
 You are right. Crash can happen if replica size doesn’t match 
the no of osds.  I am not sure if there exists any other solution for your 
problem " choose first 2 replicas from a rack and choose third replica from any 
other rack different from one”.

Some different thoughts:


1)If you have 3 racks, you can try for choose 3 racks and chooseleaf 1 host 
ensuring three separate racks and three replicas


2)Another thought

Take rack1
Chooseleaf firstn 2 type host
Emit
Take rack2
Chooseleaf firstn 1 type host
Emit

This of course restricts first 2 replicas in rack1 and may become 
unbalanced.(Ensure enough storage in rack1)

Thanks,
Johnu
From: Stephen Jahl mailto:stephenj...@gmail.com>>
Date: Thursday, October 9, 2014 at 11:11 AM
To: Loic Dachary mailto:l...@dachary.org>>
Cc: "ceph-users@lists.ceph.com" 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] Monitor segfaults when updating the crush map

Thanks Loic,

In my case, I actually only have three replicas for my pools -- with this rule, 
I'm trying to ensure that at OSDs in at least two racks are selected. Since the 
replica size is only 3, I think I'm still affected by the bug (unless of course 
I set my replica size to 4).

Is there a better way I can express what I want in the crush rule, preferably 
in a way not hit by that bug ;) ? Is there an ETA on when that bugfix might 
land in firefly?

Best,
-Steve

On Thu, Oct 9, 2014 at 1:59 PM, Loic Dachary 
mailto:l...@dachary.org>> wrote:
Hi Stephen,

It looks like you're hitting http://tracker.ceph.com/issues/9492 which has been 
fixed but is not yet available in firefly. The simplest workaround is to 
min_size 4 in this case.

Cheers

On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,
>
> I'm trying to add a crush rule to my map, which looks like this:
>
> rule rack_ruleset {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step choose firstn 2 type rack
> step chooseleaf firstn 2 type host
> step emit
> }
>
> I'm not configuring any pools to use the ruleset at this time. When I 
> recompile the map, and test the rule with crushtool --test, everything seems 
> fine, and I'm not noticing anything out of the ordinary.
>
> But, when I try to inject the compiled crush map back into the cluster like 
> this:
>
> ceph osd setcrushmap -i /path/to/compiled-crush-map
>
> The monitor process appears to stop, and I see a monitor election happening. 
> Things hang until I ^C the setcrushmap command, and I need to restart the 
> monitor processes to make things happy again (and the crush map never ends up 
> getting updated).
>
> In the monitor logs, I see several segfaults that look like this: 
> http://pastebin.com/K1XqPpbF
>
> I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel 3.13.0-35-generic.
>
> Anyone have any ideas as to what is happening?
>
> -Steve
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

--
Loïc Dachary, Artisan Logiciel Libre


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Monitor segfaults when updating the crush map

2014-10-09 Thread Stephen Jahl

So, I _do_ have three racks, but unfortunately, one of them has fewer OSDs
in it. Weighting takes care of a little bit of that, but I do end up with
an uneven distribution (according to the utilization numbers from crushtool
--test). Because of that, is how I ended up going down the "at least two
racks" route.

I'll have to play around with various rules and see what works. Adding more
OSDs to the third rack to even things up might be on the roadmap now as
well :)

On Thu, Oct 9, 2014 at 2:37 PM, Johnu George (johnugeo) 
wrote:

>  Stephen,
>  You are right. Crash can happen if replica size doesn’t
> match the no of osds.  I am not sure if there exists any other solution for
> your problem " choose first 2 replicas from a rack and choose third replica
> from any other rack different from one”.
>
>  Some different thoughts:
>
>
>  1)If you have 3 racks, you can try for choose 3 racks and chooseleaf 1
> host ensuring three separate racks and three replicas
>
>
>  2)Another thought
>
>  Take rack1
> Chooseleaf firstn 2 type host
> Emit
>  Take rack2
> Chooseleaf firstn 1 type host
> Emit
>
>  This of course restricts first 2 replicas in rack1 and may become
> unbalanced.(Ensure enough storage in rack1)
>
>  Thanks,
> Johnu
>  From: Stephen Jahl 
> Date: Thursday, October 9, 2014 at 11:11 AM
> To: Loic Dachary 
> Cc: "ceph-users@lists.ceph.com" 
> Subject: Re: [ceph-users] Monitor segfaults when updating the crush map
>
>   Thanks Loic,
>
>  In my case, I actually only have three replicas for my pools -- with
> this rule, I'm trying to ensure that at OSDs in at least two racks are
> selected. Since the replica size is only 3, I think I'm still affected by
> the bug (unless of course I set my replica size to 4).
>
>  Is there a better way I can express what I want in the crush rule,
> preferably in a way not hit by that bug ;) ? Is there an ETA on when that
> bugfix might land in firefly?
>
>  Best,
> -Steve
>
> On Thu, Oct 9, 2014 at 1:59 PM, Loic Dachary  wrote:
>
>> Hi Stephen,
>>
>> It looks like you're hitting http://tracker.ceph.com/issues/9492 which
>> has been fixed but is not yet available in firefly. The simplest workaround
>> is to min_size 4 in this case.
>>
>> Cheers
>>
>> On 09/10/2014 19:31, Stephen Jahl wrote:> Hi All,
>>  >
>> > I'm trying to add a crush rule to my map, which looks like this:
>> >
>> > rule rack_ruleset {
>> > ruleset 1
>> > type replicated
>> > min_size 1
>> > max_size 10
>> > step take default
>> > step choose firstn 2 type rack
>> > step chooseleaf firstn 2 type host
>> > step emit
>> > }
>> >
>> > I'm not configuring any pools to use the ruleset at this time. When I
>> recompile the map, and test the rule with crushtool --test, everything
>> seems fine, and I'm not noticing anything out of the ordinary.
>> >
>> > But, when I try to inject the compiled crush map back into the cluster
>> like this:
>> >
>> > ceph osd setcrushmap -i /path/to/compiled-crush-map
>> >
>> > The monitor process appears to stop, and I see a monitor election
>> happening. Things hang until I ^C the setcrushmap command, and I need to
>> restart the monitor processes to make things happy again (and the crush map
>> never ends up getting updated).
>> >
>> > In the monitor logs, I see several segfaults that look like this:
>> http://pastebin.com/K1XqPpbF
>> >
>> > I'm running ceph 0.80.5-1trusty on Ubuntu 14.04 with kernel
>> 3.13.0-35-generic.
>> >
>> > Anyone have any ideas as to what is happening?
>> >
>> > -Steve
>> >
>> >
>>  > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread lakshmi k s

Thanks Mark. I got past this error being root. So essentially, I copied the 
certs from openstack controller node to gateway node. Did the conversion using 
certutil and copied the files back to controller node under /var/lib/ceph/nss 
directory. Is this the correct directory? Ceph doc says /var/ceph/nss though. 

But after this, I tried to use curl GET command, but in vain.Same old 401 - 
Authorization failure. 

curl -i -X GET 
http://gateway.ex.com/swift/v1/AUTH_bad9e2232b304f89acb03436635b80cc -H "X-Auth-

Token: a510edb22f074946940cd4c07aafcd9d"


HTTP/1.1 401 Unauthorized
Date: Thu, 09 Oct 2014 19:17:31 GMT
Server: Apache/2.4.7 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 12
Content-Type: text/plain; charset=utf-8
AccessDeniedroot


Not much difference in radosgw logs too. Note that the token used above is same 
one in ceph.conf file too. Please help.

[client.radosgw.gateway]
rgw keystone url = http://192.0.8.2:5000
rgw keystone admin token = a510edb22f074946940cd4c07aafcd9d
rgw keystone accepted roles = admim Member _member_ swiftoperator
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = false
nss db path = /var/lib/ceph/nss
debug rgw = 20
host = gateway
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw dns name = gateway





On Thursday, October 9, 2014 1:15 AM, Mark Kirkwood 
 wrote:
 


I ran into this - needed to actually be root via sudo -i or similar, 
*then* it worked. Unhelpful error message is I think referring to no 
intialized db.

On 09/10/14 16:36, lakshmi k s wrote:
> Good workaround. But it did not work. Not sure what this error is all
> about now.
>
> gateway@gateway:~$ openssl x509 -in /home/gateway/ca.pem -pubkey |
> certutil -d /var/lib/ceph/nss -A -n ca -t "TCu,Cu,Tuw"
> certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The
> certificate/key database is in an old, unsupported format.
>
>
>
> On Wednesday, October 8, 2014 7:55 PM, Mark Kirkwood
>  wrote:
>
>
> As a workaround check if your rgw host has openssl and certutil
> installed, if so you can copy the relevant unconverted certs over to it
> and convert 'em there.
>
> On 09/10/14 15:07, lakshmi k s wrote:
>  > Tried aptitude as well, but no luck.
>  >
>  > Ceph users, have you tried to install libnss3-tools or certutil tool on
>  > debian/ubuntu? If so, how did you go about this problem.
>  >
>  >
>  > On Wednesday, October 8, 2014 7:01 PM, Mark Kirkwood
>  >  > wrote:

>  >
>  >
>  > Ok, so that is the thing to get sorted. I'd suggest posting the error(s)
>  > you are getting perhaps here (someone else might know), but definitely
>  > to one of the Debian specific lists.
>  >
>  > In the meantime perhaps try installing the packages with aptitude rather
>  > than apt-get - if there is some fancy footwork required it is fairly
>  > smart about what needs to be done.
>  >
>  > Cheers
>  >
>  > Mark
>  >
>  > On 09/10/14 14:38, lakshmi k s wrote:
>  >  > Thanks Mark. I have been trying to install this on controller
> node. But
>  >  > for some reason, I am unable to install certutil or libnss3-tools on
>  >  > debian. I am not sure how to proceed.
>  >  >
>  >
>  >
>  >
>
>
>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-09 Thread Christopher Armstrong

So I can successfully map within the container, but when I try to
`mkfs.ext4 -m0 /dev/rbd0` I get:

Oct 09 19:31:03 deis-2 sh[1569]: mke2fs 1.42.9 (4-Feb-2014)
Oct 09 19:31:03 deis-2 sh[1569]: mkfs.ext4: Operation not permitted while
trying to determine filesystem size

Once the device is mapped within the container, though, I can successfully
format the volume on the host.


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Thu, Oct 9, 2014 at 11:54 AM, Christopher Armstrong 
wrote:

> Adding `-v /dev:/dev` works as expected - after mapping, the device shows
> up as /dev/rbd0. Agreed, though - I thought --privileged should do this.
>
>
> *Chris Armstrong*Head of Services
> OpDemand / Deis.io
>
> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
>
>
> On Thu, Oct 9, 2014 at 11:36 AM, Ilya Dryomov 
> wrote:
>
>> On Thu, Oct 9, 2014 at 9:23 PM, Christopher Armstrong
>>  wrote:
>> > Good point. I'll have to play around with it - was just excited to get
>> past
>> > the blocking map issue.
>>
>> This could be a docker bug - my understanding is that all devices have
>> to show up if running with --privileged, which I do on my test box.
>> I'll poke around some more as well.
>>
>> Thanks,
>>
>> Ilya
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD map debug: error -22 on auth protocol 2 init

2014-10-09 Thread Christopher Armstrong

Turns out we need to explicitly list --privileged in addition to the other
flags. Here's how it runs now:

docker run --name deis-store-volume --rm -e HOST=$COREOS_PRIVATE_IPV4 --net
host --privileged -v /dev:/dev -v /sys:/sys -v /data:/data $IMAGE


*Chris Armstrong*Head of Services
OpDemand / Deis.io

GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/


On Thu, Oct 9, 2014 at 1:47 PM, Christopher Armstrong 
wrote:

> So I can successfully map within the container, but when I try to
> `mkfs.ext4 -m0 /dev/rbd0` I get:
>
> Oct 09 19:31:03 deis-2 sh[1569]: mke2fs 1.42.9 (4-Feb-2014)
> Oct 09 19:31:03 deis-2 sh[1569]: mkfs.ext4: Operation not permitted while
> trying to determine filesystem size
>
> Once the device is mapped within the container, though, I can successfully
> format the volume on the host.
>
>
> *Chris Armstrong*Head of Services
> OpDemand / Deis.io
>
> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
>
>
> On Thu, Oct 9, 2014 at 11:54 AM, Christopher Armstrong  > wrote:
>
>> Adding `-v /dev:/dev` works as expected - after mapping, the device shows
>> up as /dev/rbd0. Agreed, though - I thought --privileged should do this.
>>
>>
>> *Chris Armstrong*Head of Services
>> OpDemand / Deis.io
>>
>> GitHub: https://github.com/deis/deis -- Docs: http://docs.deis.io/
>>
>>
>> On Thu, Oct 9, 2014 at 11:36 AM, Ilya Dryomov 
>> wrote:
>>
>>> On Thu, Oct 9, 2014 at 9:23 PM, Christopher Armstrong
>>>  wrote:
>>> > Good point. I'll have to play around with it - was just excited to get
>>> past
>>> > the blocking map issue.
>>>
>>> This could be a docker bug - my understanding is that all devices have
>>> to show up if running with --privileged, which I do on my test box.
>>> I'll poke around some more as well.
>>>
>>> Thanks,
>>>
>>> Ilya
>>>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rbd map vsmpool_hp1/rbd9 --id admin -->rbd: add failed: (5) Input/output error

2014-10-09 Thread Aquino, Ben O

Hello Ceph Users:

Ceph baremetal client attempting to map device volume via kernel RBD Driver, 
resulting in unable to map device volume and outputs I/O error.
This is Ceph client only, no MDS,OSD or MON running…see I/O error output below.


Client Host Linux Kernel Version :
[root@root ceph]# uname -a
Linux root 3.10.25-11.el6.centos.alt.x86_64 #1 SMP Fri Dec 27 21:44:15 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux

Ceph Version:
[root@root ceph]# ceph -v
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)

Check Kernel RBD driver:
[root@root ceph]# locate rbd.ko
/lib/modules/3.10.25-11.el6.centos.alt.x86_64/kernel/drivers/block/rbd.ko
/lib/modules/3.10.25-11.el6.centos.alt.x86_64/kernel/drivers/block/drbd/drbd.ko

Check Client to Ceph-Server Connections:
[root@root ceph]# ceph osd lspools
0 data,1 metadata,2 rbd,3 vsmpool_hp1,4 vsmpool_perf1,5 vsmpool_vperf1,6 
openstack_hp1,7 openstack_perf1,8 openstack_vperf1,9 vsmpool_perf2,10 
vsmpool_hp2,11 vsmpool_vperf2,12 testopnstack,13 ec_perf_pool,14 
ec_perf_pool1,15 ec_perf_pool2,16 ec_hiperf_pool1,17 ec_valperf_pool1,

Created RBD:
[root@root ceph]# rbd create rbd9  --size 104800 --pool vsmpool_hp1 --id admin

Check RBD:
[root@root ceph]# rbd ls vsmpool_hp1
rbd1
rbd2
rbd3
rbd4
rbd5
rbd6
rbd7
rbd8
rbd9

Display RBD INFO:
[root@root ceph]# rbd info vsmpool_hp1/rbd9
rbd image 'rbd9':
size 102 GB in 26200 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.227915.238e1f29
format: 1

Map RBD:
[root@root ceph]# rbd map vsmpool_hp1/rbd9 --id admin
rbd: add failed: (5) Input/output error


Thank You In advance for sharing any possible solution to this error.

Regards,
-Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Regarding Primary affinity configuration

2014-10-09 Thread Gregory Farnum

On Thu, Oct 9, 2014 at 10:55 AM, Johnu George (johnugeo)
 wrote:
> Hi All,
>   I have few questions regarding the Primary affinity.  In the
> original blueprint
> (https://wiki.ceph.com/Planning/Blueprints/Firefly/osdmap%3A_primary_role_affinity
> ), one example has been given.
>
> For PG x, CRUSH returns [a, b, c]
> If a has primary_affinity of .5, b and c have 1 , with 50% probability, we
> will choose b or c instead of a. (25% for b, 25% for c)
>
> A) I was browsing through the code, but I could not find this logic of
> splitting the rest of configured primary affinity value between other osds.
> How is this handled?
>
> if (a < CEPH_OSD_MAX_PRIMARY_AFFINITY &&
> (crush_hash32_2(CRUSH_HASH_RJENKINS1,
> seed, o) >> 16) >= a) {
>   // we chose not to use this primary.  note it anyway as a
>   // fallback in case we don't pick anyone else, but keep looking.
>   if (pos < 0)
> pos = i;
> } else {
>   pos = i;
>   break;
> }
>   }

It's a fallback mechanism — if the chosen primary for a PG has primary
affinity less than the default (max), we (probabilistically) look for
a different OSD to be the primary. We decide whether to offload by
running a hash and discarding the OSD if the output value is greater
than the OSDs affinity, and then we go through the list and run that
calculation in order (obviously if the affinity is 1, then it passes
without needing to run the hash).
If no OSD in the list has a high enough hash value, we take the
originally-chosen primary.

> B) Since, primary affinity value is configured independently, there can be a
> situation with [0.1,0.1,0.1]  with total value that don’t add to 1.  How is
> this taken care of?

These primary affinity values are just compared against the hash
output I mentioned, so the sum doesn't matter. In general we simply
expect that OSDs which don't have the max weight value will be chosen
as primary in proportion to their share of the total weight of their
PG membership (ie, if they have a weight of .5 and everybody else has
weight 1, they will be primary in half the normal number of PGs. If
everybody has a weight of .5, they will be primary in the normal
proportions. Etc).

>
> C) Slightly confused. What happens for a situation with [1,0.5,1] ? Is osd.0
> always returned?

If the first OSD in the PG list has primary affinity of 1 then it is
always the primary for that OSD, yes. That's not osd.0, though; just
the first OSD in the PG list. ;)

> D) After calculating primary based on the affinity values, I see a shift of
> osds so that primary comes to the front. Why is this needed?. I thought,
> primary affinity value affects only reads and hence, osd ordering need not
> be changed.

Primary affinity impacts which OSD is chosen to be primary; the
primary is the ordering point for *all* access to the PG. That
includes writes as well as reads, plus coordination of the cluster on
map changes. We move the primary to the front of the list...well, I
think it's just because we were lazy and there are a bunch of places
that assume the first OSD in a replicated pool is the primary.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread Mark Kirkwood

Almost - the converted certs need to be saved on your *rgw* host in 
nss_db_path (default is /var/ceph/nss but wherever you have it 
configured should be ok). Then restart the gateway.


What is happening is the the rgw needs these certs to speak with 
encryption to the keystone server (the latter does not need anything 
changed, as it is already using encryption).


Regards

Mark

On 10/10/14 08:31, lakshmi k s wrote:

Thanks Mark. I got past this error being root. So essentially, I copied
the certs from openstack controller node to gateway node. Did the
conversion using certutil and copied the files back to controller node
under /var/lib/ceph/nss directory. Is this the correct directory? Ceph
doc says /var/ceph/nss though.

But after this, I tried to use curl GET command, but in vain.Same old
401 - Authorization failure.

curl -i -X GET
http://gateway.ex.com/swift/v1/AUTH_bad9e2232b304f89acb03436635b80cc -H
"X-Auth-
Token: a510edb22f074946940cd4c07aafcd9d"

HTTP/1.1 401 Unauthorized
Date: Thu, 09 Oct 2014 19:17:31 GMT
Server: Apache/2.4.7 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 12
Content-Type: text/plain; charset=utf-8
AccessDeniedroot

Not much difference in radosgw logs too. Note that the token used above
is same one in ceph.conf file too. Please help.

[client.radosgw.gateway]
rgw keystone url = http://192.0.8.2:5000
rgw keystone admin token = a510edb22f074946940cd4c07aafcd9d
rgw keystone accepted roles = admim Member _member_ swiftoperator
rgw keystone token cache size = 500
rgw keystone revocation interval = 500
rgw s3 auth use keystone = false
nss db path = /var/lib/ceph/nss
debug rgw = 20
host = gateway
keyring = /etc/ceph/ceph.client.radosgw.keyring
rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
log file = /var/log/ceph/client.radosgw.gateway.log
rgw dns name = gateway





On Thursday, October 9, 2014 1:15 AM, Mark Kirkwood
 wrote:


I ran into this - needed to actually be root via sudo -i or similar,
*then* it worked. Unhelpful error message is I think referring to no
intialized db.

On 09/10/14 16:36, lakshmi k s wrote:
 > Good workaround. But it did not work. Not sure what this error is all
 > about now.
 >
 > gateway@gateway :~$ openssl x509 -in
/home/gateway/ca.pem -pubkey |
 > certutil -d /var/lib/ceph/nss -A -n ca -t "TCu,Cu,Tuw"
 > certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The
 > certificate/key database is in an old, unsupported format.
 >
 >
 >
 > On Wednesday, October 8, 2014 7:55 PM, Mark Kirkwood
 > mailto:mark.kirkw...@catalyst.net.nz>> wrote:
 >
 >
 > As a workaround check if your rgw host has openssl and certutil
 > installed, if so you can copy the relevant unconverted certs over to it
 > and convert 'em there.
 >
 > On 09/10/14 15:07, lakshmi k s wrote:
 >  > Tried aptitude as well, but no luck.
 >  >
 >  > Ceph users, have you tried to install libnss3-tools or certutil
tool on
 >  > debian/ubuntu? If so, how did you go about this problem.
 >  >
 >  >
 >  > On Wednesday, October 8, 2014 7:01 PM, Mark Kirkwood
 >  > mailto:mark.kirkw...@catalyst.net.nz>
 > >> wrote:

 >  >
 >  >
 >  > Ok, so that is the thing to get sorted. I'd suggest posting the
error(s)
 >  > you are getting perhaps here (someone else might know), but definitely
 >  > to one of the Debian specific lists.
 >  >
 >  > In the meantime perhaps try installing the packages with aptitude
rather
 >  > than apt-get - if there is some fancy footwork required it is fairly
 >  > smart about what needs to be done.
 >  >
 >  > Cheers
 >  >
 >  > Mark
 >  >
 >  > On 09/10/14 14:38, lakshmi k s wrote:
 >  >  > Thanks Mark. I have been trying to install this on controller
 > node. But
 >  >  > for some reason, I am unable to install certutil or
libnss3-tools on
 >  >  > debian. I am not sure how to proceed.
 >  >  >
 >  >
 >  >
 >  >
 >
 >
 >





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread lakshmi k s

Right, I have these certs on both nodes - keystone node and rgw gateway node. 
Not sure where I am going wrong. And what about SSL? Should the following be in 
rgw.conf in gateway node? I am not using this as it was optional.

SSLEngine on
SSLCertificateFile /etc/apache2/ssl/apache.crt
SSLCertificateKeyFile /etc/apache2/ssl/apache.key
SetEnv SERVER_PORT_SECURE 443




On Thursday, October 9, 2014 2:48 PM, Mark Kirkwood 
 wrote:
 


Almost - the converted certs need to be saved on your *rgw* host in 
nss_db_path (default is /var/ceph/nss but wherever you have it 
configured should be ok). Then restart the gateway.

What is happening is the the rgw needs these certs to speak with 
encryption to the keystone server (the latter does not need anything 
changed, as it is already using encryption).

Regards

Mark

On 10/10/14 08:31, lakshmi k s wrote:
> Thanks Mark. I got past this error being root. So essentially, I copied
> the certs from openstack controller node to gateway node. Did the
> conversion using certutil and copied the files back to controller node
> under /var/lib/ceph/nss directory. Is this the correct directory? Ceph
> doc says /var/ceph/nss though.
>
> But after this, I tried to use curl GET command, but in vain.Same old
> 401 - Authorization failure.
>
> curl -i -X GET
> http://gateway.ex.com/swift/v1/AUTH_bad9e2232b304f89acb03436635b80cc -H
> "X-Auth-
> Token: a510edb22f074946940cd4c07aafcd9d"
>
> HTTP/1.1 401 Unauthorized
> Date: Thu, 09 Oct 2014 19:17:31 GMT
> Server: Apache/2.4.7 (Ubuntu)
> Accept-Ranges: bytes
> Content-Length: 12
> Content-Type: text/plain; charset=utf-8
> AccessDeniedroot
>
> Not much difference in radosgw logs too. Note that the token used above
> is same one in ceph.conf file too. Please help.
>
> [client.radosgw.gateway]
> rgw keystone url = http://192.0.8.2:5000
> rgw keystone admin token = a510edb22f074946940cd4c07aafcd9d
> rgw keystone accepted roles = admim Member _member_ swiftoperator
> rgw keystone token cache size = 500
> rgw keystone revocation interval = 500
> rgw s3 auth use keystone = false
> nss db path = /var/lib/ceph/nss
> debug rgw = 20
> host = gateway
> keyring = /etc/ceph/ceph.client.radosgw.keyring
> rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
> log file = /var/log/ceph/client.radosgw.gateway.log
> rgw dns name = gateway
>
>
>
>
>
> On Thursday, October 9, 2014 1:15 AM, Mark Kirkwood
>  wrote:
>
>
> I ran into this - needed to actually be root via sudo -i or similar,
> *then* it worked. Unhelpful error message is I think referring to no
> intialized db.
>
> On 09/10/14 16:36, lakshmi k s wrote:
>  > Good workaround. But it did not work. Not sure what this error is all
>  > about now.
>  >
>  > gateway@gateway :~$ openssl x509 -in
> /home/gateway/ca.pem -pubkey |
>  > certutil -d /var/lib/ceph/nss -A -n ca -t "TCu,Cu,Tuw"
>  > certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The
>  > certificate/key database is in an old, unsupported format.
>  >
>  >
>  >
>  > On Wednesday, October 8, 2014 7:55 PM, Mark Kirkwood
>  >  > wrote:
>  >
>  >
>  > As a workaround check if your rgw host has openssl and certutil
>  > installed, if so you can copy the relevant unconverted certs over to it
>  > and convert 'em there.
>  >
>  > On 09/10/14 15:07, lakshmi k s wrote:
>  >  > Tried aptitude as well, but no luck.
>  >  >
>  >  > Ceph users, have you tried to install libnss3-tools or certutil
> tool on
>  >  > debian/ubuntu? If so, how did you go about this problem.
>  >  >
>  >  >
>  >  > On Wednesday, October 8, 2014 7:01 PM, Mark Kirkwood
>  >  > mailto:mark.kirkw...@catalyst.net.nz>
>  >  >> wrote:
>
>  >  >
>  >  >
>  >  > Ok, so that is the thing to get sorted. I'd suggest posting the
> error(s)
>  >  > you are getting perhaps here (someone else might know), but definitely
>  >  > to one of the Debian specific lists.
>  >  >
>  >  > In the meantime perhaps try installing the packages with aptitude
> rather
>  >  > than apt-get - if there is some fancy footwork required it is fairly
>  >  > smart about what needs to be done.
>  >  >
>  >  > Cheers
>  >  >
>  >  > Mark
>  >  >
>  >  > On 09/10/14 14:38, lakshmi k s wrote:
>  >  >  > Thanks Mark. I have been trying to install this on controller
>  > node. But
>  >  >  > for some reason, I am unable to install certutil or
> libnss3-tools on
>  >  >  > debian. I am not sure how to proceed.
>  >  >  >
>  >  >
>  >  >
>  >  >
>  >
>  >
>  >
>
>
>___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados Gateway and Swift create containers/buckets that cannot be opened

2014-10-09 Thread Mark Kirkwood

That certainly fixes the issue for me. Removing the WSGIChunkedRequest 
On directive from my keystone config and restarting brought back the 
original error. Installing a new patched radosgw binary and restarting 
got back a working swift.


Cheers

Mark

On 10/10/14 07:19, Yehuda Sadeh wrote:

Here's the fix, let me know if you need any help with that.

Thanks,
Yehuda

diff --git a/src/rgw/rgw_swift.cc b/src/rgw/rgw_swift.cc
index d9654a7..2445e17 100644
--- a/src/rgw/rgw_swift.cc
+++ b/src/rgw/rgw_swift.cc
@@ -505,6 +505,8 @@ int RGWSwift::validate_keystone_token(RGWRados
*store, const string& token, stru

  validate.append_header("X-Auth-Token", admin_token);

+validate.set_send_length(0);
+
  int ret = validate.process(url.c_str());
  if (ret < 0)
return ret;



On Thu, Oct 9, 2014 at 10:30 AM, M Ranga Swami Reddy
 wrote:

Hi Yehuda,
Please share the fix/patch, we could test and confirm the fix status.

Thanks
Swami

On Thu, Oct 9, 2014 at 10:42 PM, Yehuda Sadeh  wrote:

I have a trivial fix for the issue that I'd like to check and get this
one cleared, but never got to it due to some difficulties with a
proper keystone setup in my environment. If you can and would like to
test it so that we could get it merged it would be great.

Thanks,
Yehuda

On Wed, Oct 8, 2014 at 6:18 PM, Mark Kirkwood
 wrote:

Yes. I ran into that as well - I used

WSGIChunkedRequest On

in the virtualhost config for the *keystone* server [1] as indicated in
issue 7796.

Cheers

Mark

[1] i.e, not the rgw.

On 08/10/14 22:58, Ashish Chandra wrote:


Hi Mark,
Good you got the solution. But since you have already done
authenticating RadosGW with Keystone, I am having one issue that you can
help with. For me I get an error "411 Length Required" with Keystone
token authentication.
To fix this I use "WSGIChunkedRequest On" in rgw.conf as mentioned in
http://tracker.ceph.com/issues/7796.

Did you face the issue, if yes what was your solution.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Rados Gateway and Swift create containers/buckets that cannot be opened

2014-10-09 Thread Yehuda Sadeh

Great, I'll prepare it upstream.

Thanks,
Yehuda

On Thu, Oct 9, 2014 at 3:39 PM, Mark Kirkwood
 wrote:
> That certainly fixes the issue for me. Removing the WSGIChunkedRequest On
> directive from my keystone config and restarting brought back the original
> error. Installing a new patched radosgw binary and restarting got back a
> working swift.
>
> Cheers
>
> Mark
>
>
> On 10/10/14 07:19, Yehuda Sadeh wrote:
>>
>> Here's the fix, let me know if you need any help with that.
>>
>> Thanks,
>> Yehuda
>>
>> diff --git a/src/rgw/rgw_swift.cc b/src/rgw/rgw_swift.cc
>> index d9654a7..2445e17 100644
>> --- a/src/rgw/rgw_swift.cc
>> +++ b/src/rgw/rgw_swift.cc
>> @@ -505,6 +505,8 @@ int RGWSwift::validate_keystone_token(RGWRados
>> *store, const string& token, stru
>>
>>   validate.append_header("X-Auth-Token", admin_token);
>>
>> +validate.set_send_length(0);
>> +
>>   int ret = validate.process(url.c_str());
>>   if (ret < 0)
>> return ret;
>>
>>
>>
>> On Thu, Oct 9, 2014 at 10:30 AM, M Ranga Swami Reddy
>>  wrote:
>>>
>>> Hi Yehuda,
>>> Please share the fix/patch, we could test and confirm the fix status.
>>>
>>> Thanks
>>> Swami
>>>
>>> On Thu, Oct 9, 2014 at 10:42 PM, Yehuda Sadeh  wrote:

 I have a trivial fix for the issue that I'd like to check and get this
 one cleared, but never got to it due to some difficulties with a
 proper keystone setup in my environment. If you can and would like to
 test it so that we could get it merged it would be great.

 Thanks,
 Yehuda

 On Wed, Oct 8, 2014 at 6:18 PM, Mark Kirkwood
  wrote:
>
> Yes. I ran into that as well - I used
>
> WSGIChunkedRequest On
>
> in the virtualhost config for the *keystone* server [1] as indicated in
> issue 7796.
>
> Cheers
>
> Mark
>
> [1] i.e, not the rgw.
>
> On 08/10/14 22:58, Ashish Chandra wrote:
>>
>>
>> Hi Mark,
>> Good you got the solution. But since you have already done
>> authenticating RadosGW with Keystone, I am having one issue that you
>> can
>> help with. For me I get an error "411 Length Required" with Keystone
>> token authentication.
>> To fix this I use "WSGIChunkedRequest On" in rgw.conf as mentioned in
>> http://tracker.ceph.com/issues/7796.
>>
>> Did you face the issue, if yes what was your solution.
>>
>>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread Mark Kirkwood


No, I don't have any explicit ssl enabled in the rgw site.

Now you might be running into http://tracker.ceph.com/issues/7796 . So 
check if you have enabled


WSGIChunkedRequest On

In your keystone virtualhost setup (explained in the issue).

Cheers

Mark


On 10/10/14 11:03, lakshmi k s wrote:

Right, I have these certs on both nodes - keystone node and rgw gateway
node. Not sure where I am going wrong. And what about SSL? Should the
following be in rgw.conf in gateway node? I am not using this as it was
optional.

SSLEngine on
SSLCertificateFile /etc/apache2/ssl/apache.crt
SSLCertificateKeyFile /etc/apache2/ssl/apache.key
SetEnv SERVER_PORT_SECURE 443





On Thursday, October 9, 2014 2:48 PM, Mark Kirkwood
 wrote:


Almost - the converted certs need to be saved on your *rgw* host in
nss_db_path (default is /var/ceph/nss but wherever you have it
configured should be ok). Then restart the gateway.

What is happening is the the rgw needs these certs to speak with
encryption to the keystone server (the latter does not need anything
changed, as it is already using encryption).

Regards

Mark

On 10/10/14 08:31, lakshmi k s wrote:
 > Thanks Mark. I got past this error being root. So essentially, I copied
 > the certs from openstack controller node to gateway node. Did the
 > conversion using certutil and copied the files back to controller node
 > under /var/lib/ceph/nss directory. Is this the correct directory? Ceph
 > doc says /var/ceph/nss though.
 >
 > But after this, I tried to use curl GET command, but in vain.Same old
 > 401 - Authorization failure.
 >
 > curl -i -X GET
 > http://gateway.ex.com/swift/v1/AUTH_bad9e2232b304f89acb03436635b80cc
-H
 > "X-Auth-
 > Token: a510edb22f074946940cd4c07aafcd9d"
 >
 > HTTP/1.1 401 Unauthorized
 > Date: Thu, 09 Oct 2014 19:17:31 GMT
 > Server: Apache/2.4.7 (Ubuntu)
 > Accept-Ranges: bytes
 > Content-Length: 12
 > Content-Type: text/plain; charset=utf-8
 > AccessDeniedroot
 >
 > Not much difference in radosgw logs too. Note that the token used above
 > is same one in ceph.conf file too. Please help.
 >
 > [client.radosgw.gateway]
 > rgw keystone url = http://192.0.8.2:5000 
 > rgw keystone admin token = a510edb22f074946940cd4c07aafcd9d
 > rgw keystone accepted roles = admim Member _member_ swiftoperator
 > rgw keystone token cache size = 500
 > rgw keystone revocation interval = 500
 > rgw s3 auth use keystone = false
 > nss db path = /var/lib/ceph/nss
 > debug rgw = 20
 > host = gateway
 > keyring = /etc/ceph/ceph.client.radosgw.keyring
 > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
 > log file = /var/log/ceph/client.radosgw.gateway.log
 > rgw dns name = gateway
 >
 >
 >
 >
 >
 > On Thursday, October 9, 2014 1:15 AM, Mark Kirkwood
 > mailto:mark.kirkw...@catalyst.net.nz>> wrote:
 >
 >
 > I ran into this - needed to actually be root via sudo -i or similar,
 > *then* it worked. Unhelpful error message is I think referring to no
 > intialized db.
 >
 > On 09/10/14 16:36, lakshmi k s wrote:
 >  > Good workaround. But it did not work. Not sure what this error is all
 >  > about now.
 >  >
 >  > gateway@gateway  >:~$ openssl x509 -in
 > /home/gateway/ca.pem -pubkey |
 >  > certutil -d /var/lib/ceph/nss -A -n ca -t "TCu,Cu,Tuw"
 >  > certutil: function failed: SEC_ERROR_LEGACY_DATABASE: The
 >  > certificate/key database is in an old, unsupported format.
 >  >
 >  >
 >  >
 >  > On Wednesday, October 8, 2014 7:55 PM, Mark Kirkwood
 >  > mailto:mark.kirkw...@catalyst.net.nz>
 > >> wrote:
 >  >
 >  >
 >  > As a workaround check if your rgw host has openssl and certutil
 >  > installed, if so you can copy the relevant unconverted certs over
to it
 > > and convert 'em there.
 >  >
 >  > On 09/10/14 15:07, lakshmi k s wrote:
 >  >  > Tried aptitude as well, but no luck.
 >  >  >
 >  >  > Ceph users, have you tried to install libnss3-tools or certutil
 > tool on
 >  >  > debian/ubuntu? If so, how did you go about this problem.
 >  >  >
 >  >  >
 >  >  > On Wednesday, October 8, 2014 7:01 PM, Mark Kirkwood
 >  >  > mailto:mark.kirkw...@catalyst.net.nz>
>
 >  > 

 > 
 >  >  >
 >  >  >
 >  >  > Ok, so that is the thing to get sorted. I'd suggest posting the
 > error(s)
 >  >  > you are getting perhaps here (someone else might know), but
definitely
 >  >  > to one of the Debian specific lists.
 >  >  >
 >  >  > In the meantime perhaps try installing the packages with aptitude
 > rather
 >  >  > than apt-get - if there is some fancy footwork required it is
fairly
 >  >  > smart about wha

[ceph-users] Blueprints

2014-10-09 Thread Robert LeBlanc

I have a question regarding submitting blueprints. Should only people who
intend to do the work of adding/changing features of Ceph submit
blueprints? I'm not primarily a programmer (but can do programming if
needed), but have a feature request for Ceph.

Thanks,
Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Regarding Primary affinity configuration

2014-10-09 Thread Johnu George (johnugeo)

Hi Greg,
 Thanks for your extremely informative post. My related questions
are posted inline

On 10/9/14, 2:21 PM, "Gregory Farnum"  wrote:

>On Thu, Oct 9, 2014 at 10:55 AM, Johnu George (johnugeo)
> wrote:
>> Hi All,
>>   I have few questions regarding the Primary affinity.  In the
>> original blueprint
>> 
>>(https://wiki.ceph.com/Planning/Blueprints/Firefly/osdmap%3A_primary_role
>>_affinity
>> ), one example has been given.
>>
>> For PG x, CRUSH returns [a, b, c]
>> If a has primary_affinity of .5, b and c have 1 , with 50% probability,
>>we
>> will choose b or c instead of a. (25% for b, 25% for c)
>>
>> A) I was browsing through the code, but I could not find this logic of
>> splitting the rest of configured primary affinity value between other
>>osds.
>> How is this handled?
>>
>> if (a < CEPH_OSD_MAX_PRIMARY_AFFINITY &&
>> (crush_hash32_2(CRUSH_HASH_RJENKINS1,
>> seed, o) >> 16) >= a) {
>>   // we chose not to use this primary.  note it anyway as a
>>   // fallback in case we don't pick anyone else, but keep looking.
>>   if (pos < 0)
>> pos = i;
>> } else {
>>   pos = i;
>>   break;
>> }
>>   }
>
>It's a fallback mechanism ‹ if the chosen primary for a PG has primary
>affinity less than the default (max), we (probabilistically) look for
>a different OSD to be the primary. We decide whether to offload by
>running a hash and discarding the OSD if the output value is greater
>than the OSDs affinity, and then we go through the list and run that
>calculation in order (obviously if the affinity is 1, then it passes
>without needing to run the hash).
>If no OSD in the list has a high enough hash value, we take the
>originally-chosen primary.
 As in example for [0.5,1,1], I got your point that with 50% probability,
first osd will be chosen. But, how do we ensure that second and third osd
will be having remaining 25% and 25% respectively?. I could see only
individual primary affinity values but not a sum value anywhere to ensure
that.

>
>> B) Since, primary affinity value is configured independently, there can
>>be a
>> situation with [0.1,0.1,0.1]  with total value that don¹t add to 1.
>>How is
>> this taken care of?
>
>These primary affinity values are just compared against the hash
>output I mentioned, so the sum doesn't matter. In general we simply
>expect that OSDs which don't have the max weight value will be chosen
>as primary in proportion to their share of the total weight of their
>PG membership (ie, if they have a weight of .5 and everybody else has
>weight 1, they will be primary in half the normal number of PGs. If
>everybody has a weight of .5, they will be primary in the normal
>proportions. Etc).

I got your idea but I couldn¹t figure out that from the code. You said
that max weight value will be chosen as primary in proportion to their
share of the total weight of their
PG membership. But, from what I understood from code, if it is
[0.1,0.1,0.1], first osd will be chosen always. (Probabilistically for 10%
reads, it will choose first osd. However,first osd will still be chosen
for rest of the reads as part of fallback mechanism which is the
originally chosen primary.) Am I wrong?

>
>>
>> C) Slightly confused. What happens for a situation with [1,0.5,1] ? Is
>>osd.0
>> always returned?
>
>If the first OSD in the PG list has primary affinity of 1 then it is
>always the primary for that OSD, yes. That's not osd.0, though; just
>the first OSD in the PG list. ;)

Sorry. I meant the first OSD, but accidentally wrote as osd.0 . As you
said, if first osd is always selected in the PG list for this scenario,
doesn¹t it violate our assumption to have probabilistically  25%, 50%, 25%
reads for first ,second and third osd respectively?
>
>> D) After calculating primary based on the affinity values, I see a
>>shift of
>> osds so that primary comes to the front. Why is this needed?. I thought,
>> primary affinity value affects only reads and hence, osd ordering need
>>not
>> be changed.
>
>Primary affinity impacts which OSD is chosen to be primary; the
>primary is the ordering point for *all* access to the PG. That
>includes writes as well as reads, plus coordination of the cluster on
>map changes. We move the primary to the front of the list...well, I
>think it's just because we were lazy and there are a bunch of places
>that assume the first OSD in a replicated pool is the primary.

Does that mean that osd set ordering keeps on changing(in real time) for
various object reads in a pg if primary affinity is configured?  Whenever
osd set is returned from pg_to_up_acting_osds, can we always say that the
first osd is the current primary for read and writes? .  Is it the same
for osd set returned by ceph pg dump? However, I am surprised that the
ordering remains same when I dump values at different times.

Thanks,
Johnu 

>-Greg
>Software Engineer #42 @ http://inktank.com | http://ceph.com

___
c

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread lakshmi k s

Have done this too, but in vain. I made changes to Horizon.conf as shown below. 
I had only I do not see the user being validated in radosgw log at all. 

root@overcloud-controller0-fjvtpqjip2hl:/etc/apache2/sites-available# ls
000-default.conf  default-ssl.conf  horizon.conf




WSGIScriptAlias / 
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/wsgi/django.wsgi
WSGIDaemonProcess horizon user=horizon group=horizon processes=3 threads=10 
home=/opt/stack/venvs/horizon 
python-path=/opt/stack/venvs/horizon:/opt/stack/venvs/horizon/lib/python2.7/site-packages/
WSGIApplicationGroup %{GLOBAL}

SetEnv APACHE_RUN_USER horizon
SetEnv APACHE_RUN_GROUP horizon
WSGIProcessGroup horizon
WSGIChunkedRequest On

DocumentRoot 
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
Alias /static 
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
Alias /media 
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static


Options FollowSymLinks
AllowOverride None



Options Indexes FollowSymLinks MultiViews
Require all granted
AllowOverride None
Order allow,deny
allow from all



Options Indexes FollowSymLinks MultiViews
Require all granted
AllowOverride None
Order allow,deny
allow from all


ErrorLog /var/log/httpd/horizon_error.log
LogLevel debug
CustomLog /var/log/httpd/horizon_access.log combined


WSGISocketPrefix /var/run/httpd

--




On Thursday, October 9, 2014 3:51 PM, Mark Kirkwood 
 wrote:
 


No, I don't have any explicit ssl enabled in the rgw site.

Now you might be running into http://tracker.ceph.com/issues/7796 . So 
check if you have enabled

WSGIChunkedRequest On

In your keystone virtualhost setup (explained in the issue).

Cheers

Mark


On 10/10/14 11:03, lakshmi k s wrote:
> Right, I have these certs on both nodes - keystone node and rgw gateway
> node. Not sure where I am going wrong. And what about SSL? Should the
> following be in rgw.conf in gateway node? I am not using this as it was
> optional.
>
> SSLEngine on
> SSLCertificateFile /etc/apache2/ssl/apache.crt
> SSLCertificateKeyFile /etc/apache2/ssl/apache.key
> SetEnv SERVER_PORT_SECURE 443
>
>
>
>
>
> On Thursday, October 9, 2014 2:48 PM, Mark Kirkwood
>  wrote:
>
>
> Almost - the converted certs need to be saved on your *rgw* host in
> nss_db_path (default is /var/ceph/nss but wherever you have it
> configured should be ok). Then restart the gateway.
>
> What is happening is the the rgw needs these certs to speak with
> encryption to the keystone server (the latter does not need anything
> changed, as it is already using encryption).
>
> Regards
>
> Mark
>
> On 10/10/14 08:31, lakshmi k s wrote:
>  > Thanks Mark. I got past this error being root. So essentially, I copied
>  > the certs from openstack controller node to gateway node. Did the
>  > conversion using certutil and copied the files back to controller node
>  > under /var/lib/ceph/nss directory. Is this the correct directory? Ceph
>  > doc says /var/ceph/nss though.
>  >
>  > But after this, I tried to use curl GET command, but in vain.Same old
>  > 401 - Authorization failure.
>  >
>  > curl -i -X GET
>  > http://gateway.ex.com/swift/v1/AUTH_bad9e2232b304f89acb03436635b80cc
> -H
>  > "X-Auth-
>  > Token: a510edb22f074946940cd4c07aafcd9d"
>  >
>  > HTTP/1.1 401 Unauthorized
>  > Date: Thu, 09 Oct 2014 19:17:31 GMT
>  > Server: Apache/2.4.7 (Ubuntu)
>  > Accept-Ranges: bytes
>  > Content-Length: 12
>  > Content-Type: text/plain; charset=utf-8
>  > AccessDeniedroot
>  >
>  > Not much difference in radosgw logs too. Note that the token used above
>  > is same one in ceph.conf file too. Please help.
>  >
>  > [client.radosgw.gateway]
>  > rgw keystone url = http://192.0.8.2:5000 
>  > rgw keystone admin token = a510edb22f074946940cd4c07aafcd9d
>  > rgw keystone accepted roles = admim Member _member_ swiftoperator
>  > rgw keystone token cache size = 500
>  > rgw keystone revocation interval = 500
>  > rgw s3 auth use keystone = false
>  > nss db path = /var/lib/ceph/nss
>  > debug rgw = 20
>  > host = gateway
>  > keyring = /etc/ceph/ceph.client.radosgw.keyring
>  > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
>  > log file = /var/log/ceph/client.radosgw.gateway.log
>  > rgw dns name = gateway
>  >
>  >
>  >
>  >
>  >
>  > On Thursday, October 9, 2014 1:15 AM, Mark Kirkwood
>  >  > wrote:
>  >
>  >
>  > I ran into this - needed to actually be root via sudo -i or similar,
>  > *then* it worked. Unhelpful error message is I think referring to no
>  > intialized db.
>  >
>  > On 09/10/14 16:

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread Mark Kirkwood

Hmm - It looks to me like you added the chunked request into Horizon 
instead of Keystone. You want virtual host *:35357



On 10/10/14 12:32, lakshmi k s wrote:

Have done this too, but in vain. I made changes to Horizon.conf as shown
below. I had only I do not see the user being validated in radosgw log
at all.

root@overcloud-controller0-fjvtpqjip2hl:/etc/apache2/sites-available# ls
000-default.conf  default-ssl.conf  horizon.conf



 WSGIScriptAlias /
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/wsgi/django.wsgi
 WSGIDaemonProcess horizon user=horizon group=horizon processes=3
threads=10 home=/opt/stack/venvs/horizon
python-path=/opt/stack/venvs/horizon:/opt/stack/venvs/horizon/lib/python2.7/site-packages/
 WSGIApplicationGroup %{GLOBAL}

 SetEnv APACHE_RUN_USER horizon
 SetEnv APACHE_RUN_GROUP horizon
 WSGIProcessGroup horizon
   WSGIChunkedRequest On

 DocumentRoot
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
 Alias /static
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
 Alias /media
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static

 
 Options FollowSymLinks
 AllowOverride None
 

 
 Options Indexes FollowSymLinks MultiViews
 Require all granted
 AllowOverride None
 Order allow,deny
 allow from all
 

 
 Options Indexes FollowSymLinks MultiViews
 Require all granted
 AllowOverride None
 Order allow,deny
 allow from all
 

 ErrorLog /var/log/httpd/horizon_error.log
 LogLevel debug
 CustomLog /var/log/httpd/horizon_access.log combined


WSGISocketPrefix /var/run/httpd

--




On Thursday, October 9, 2014 3:51 PM, Mark Kirkwood
 wrote:


No, I don't have any explicit ssl enabled in the rgw site.

Now you might be running into http://tracker.ceph.com/issues/7796
. So
check if you have enabled

WSGIChunkedRequest On

In your keystone virtualhost setup (explained in the issue).

Cheers

Mark


On 10/10/14 11:03, lakshmi k s wrote:
 > Right, I have these certs on both nodes - keystone node and rgw gateway
 > node. Not sure where I am going wrong. And what about SSL? Should the
 > following be in rgw.conf in gateway node? I am not using this as it was
 > optional.
 >
 > SSLEngine on
 > SSLCertificateFile /etc/apache2/ssl/apache.crt
 > SSLCertificateKeyFile /etc/apache2/ssl/apache.key
 > SetEnv SERVER_PORT_SECURE 443
 >
 >
 >
 >
 >
 > On Thursday, October 9, 2014 2:48 PM, Mark Kirkwood
 > mailto:mark.kirkw...@catalyst.net.nz>> wrote:
 >
 >
 > Almost - the converted certs need to be saved on your *rgw* host in
 > nss_db_path (default is /var/ceph/nss but wherever you have it
 > configured should be ok). Then restart the gateway.
 >
 > What is happening is the the rgw needs these certs to speak with
 > encryption to the keystone server (the latter does not need anything
 > changed, as it is already using encryption).
 >
 > Regards
 >
 > Mark
 >
 > On 10/10/14 08:31, lakshmi k s wrote:
 >  > Thanks Mark. I got past this error being root. So essentially, I
copied
 >  > the certs from openstack controller node to gateway node. Did the
 >  > conversion using certutil and copied the files back to controller node
 >  > under /var/lib/ceph/nss directory. Is this the correct directory? Ceph
 >  > doc says /var/ceph/nss though.
 >  >
 >  > But after this, I tried to use curl GET command, but in vain.Same old
 >  > 401 - Authorization failure.
 >  >
 >  > curl -i -X GET
 >  > http://gateway.ex.com/swift/v1/AUTH_bad9e2232b304f89acb03436635b80cc
 > -H
 >  > "X-Auth-
 >  > Token: a510edb22f074946940cd4c07aafcd9d"
 >  >
 >  > HTTP/1.1 401 Unauthorized
 >  > Date: Thu, 09 Oct 2014 19:17:31 GMT
 >  > Server: Apache/2.4.7 (Ubuntu)
 >  > Accept-Ranges: bytes
 >  > Content-Length: 12
 >  > Content-Type: text/plain; charset=utf-8
 >  > AccessDeniedroot
 >  >
 >  > Not much difference in radosgw logs too. Note that the token used
above
 >  > is same one in ceph.conf file too. Please help.
 >  >
 >  > [client.radosgw.gateway]
 >  > rgw keystone url = http://192.0.8.2:5000

 >  > rgw keystone admin token = a510edb22f074946940cd4c07aafcd9d
 >  > rgw keystone accepted roles = admim Member _member_ swiftoperator
 >  > rgw keystone token cache size = 500
 >  > rgw keystone revocation interval = 500
 >  > rgw s3 auth use keystone = false
 >  > nss db path = /var/lib/ceph/nss
 >  > debug rgw = 20
 >  > host = gateway
 >  > keyring = /etc/ceph/ceph.client.radosgw.keyring
 >  > rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock
 >  > log file = /var/log/ceph/client.radosgw.gateway.log
 >  > rgw dns name

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread lakshmi k s

Yes Mark, but there is no keystone.conf in this modified Openstack code. There 
is only horizon.conf under /etc/apache2/sites-available folder. And that has 
virtual host 80 only. Should I simply add :35357?

 root@overcloud-controller0-fjvtpqjip2hl:/etc/apache2/sites-available# ls
000-default.conf  default-ssl.conf  horizon.conf





On Thursday, October 9, 2014 4:45 PM, Mark Kirkwood 
 wrote:
 


Hmm - It looks to me like you added the chunked request into Horizon 
instead of Keystone. You want virtual host *:35357


On 10/10/14 12:32, lakshmi k s wrote:
> Have done this too, but in vain. I made changes to Horizon.conf as shown
> below. I had only I do not see the user being validated in radosgw log
> at all.
>
> root@overcloud-controller0-fjvtpqjip2hl:/etc/apache2/sites-available# ls
> 000-default.conf  default-ssl.conf  horizon.conf
>
> 
> 
>  WSGIScriptAlias /
> /opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/wsgi/django.wsgi
>  WSGIDaemonProcess horizon user=horizon group=horizon processes=3
> threads=10 home=/opt/stack/venvs/horizon
> python-path=/opt/stack/venvs/horizon:/opt/stack/venvs/horizon/lib/python2.7/site-packages/
>  WSGIApplicationGroup %{GLOBAL}
>
>  SetEnv APACHE_RUN_USER horizon
>  SetEnv APACHE_RUN_GROUP horizon
>  WSGIProcessGroup horizon
>WSGIChunkedRequest On
>
>  DocumentRoot
> /opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
>  Alias /static
> /opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
>  Alias /media
> /opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
>
>  
>  Options FollowSymLinks
>  AllowOverride None
>  
>
>   /opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static>
>  Options Indexes FollowSymLinks MultiViews
>  Require all granted
>  AllowOverride None
>  Order allow,deny
>  allow from all
>  
>
>   /opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard>
>  Options Indexes FollowSymLinks MultiViews
>  Require all granted
>  AllowOverride None
>  Order allow,deny
>  allow from all
>  
>
>  ErrorLog /var/log/httpd/horizon_error.log
>  LogLevel debug
>  CustomLog /var/log/httpd/horizon_access.log combined
> 
>
> WSGISocketPrefix /var/run/httpd
>
> --
>
>
>
>
> On Thursday, October 9, 2014 3:51 PM, Mark Kirkwood
>  wrote:
>
>
> No, I don't have any explicit ssl enabled in the rgw site.
>
> Now you might be running into http://tracker.ceph.com/issues/7796
> . So
> check if you have enabled
>
> WSGIChunkedRequest On
>
> In your keystone virtualhost setup (explained in the issue).
>
> Cheers
>
> Mark
>
>
> On 10/10/14 11:03, lakshmi k s wrote:
>  > Right, I have these certs on both nodes - keystone node and rgw gateway
>  > node. Not sure where I am going wrong. And what about SSL? Should the
>  > following be in rgw.conf in gateway node? I am not using this as it was
>  > optional.
>  >
>  > SSLEngine on
>  > SSLCertificateFile /etc/apache2/ssl/apache.crt
>  > SSLCertificateKeyFile /etc/apache2/ssl/apache.key
>  > SetEnv SERVER_PORT_SECURE 443
>  >
>  >
>  >
>  >
>  >
>  > On Thursday, October 9, 2014 2:48 PM, Mark Kirkwood
>  >  > wrote:
>  >
>  >
>  > Almost - the converted certs need to be saved on your *rgw* host in
>  > nss_db_path (default is /var/ceph/nss but wherever you have it
>  > configured should be ok). Then restart the gateway.
>  >
>  > What is happening is the the rgw needs these certs to speak with
>  > encryption to the keystone server (the latter does not need anything
>  > changed, as it is already using encryption).
>  >
>  > Regards
>  >
>  > Mark
>  >
>  > On 10/10/14 08:31, lakshmi k s wrote:
>  >  > Thanks Mark. I got past this error being root. So essentially, I
> copied
>  >  > the certs from openstack controller node to gateway node. Did the
>  >  > conversion using certutil and copied the files back to controller node
>  >  > under /var/lib/ceph/nss directory. Is this the correct directory? Ceph
>  >  > doc says /var/ceph/nss though.
>  >  >
>  >  > But after this, I tried to use curl GET command, but in vain.Same old
>  >  > 401 - Authorization failure.
>  >  >
>  >  > curl -i -X GET
>  >  > http://gateway.ex.com/swift/v1/AUTH_bad9e2232b304f89acb03436635b80cc
>  > -H
>  >  > "X-Auth-
>  >  > Token: a510edb22f074946940cd4c07aafcd9d"
>  >  >
>  >  > HTTP/1.1 401 Unauthorized
>  >  > Date: Thu, 09 Oct 2014 19:17:31 GMT
>  >  > Server: Apache/2.4.7 (Ubuntu)
>  >  > Accept-Ranges: bytes
>  >  > Content-Length: 12
>  >  > Content-Type: text/plain; charset=utf-8
>  >  > AccessDeniedroot
>  >  >
>  >

Re: [ceph-users] Regarding Primary affinity configuration

2014-10-09 Thread Gregory Farnum

On Thu, Oct 9, 2014 at 4:24 PM, Johnu George (johnugeo)
 wrote:
> Hi Greg,
>  Thanks for your extremely informative post. My related questions
> are posted inline
>
> On 10/9/14, 2:21 PM, "Gregory Farnum"  wrote:
>
>>On Thu, Oct 9, 2014 at 10:55 AM, Johnu George (johnugeo)
>> wrote:
>>> Hi All,
>>>   I have few questions regarding the Primary affinity.  In the
>>> original blueprint
>>>
>>>(https://wiki.ceph.com/Planning/Blueprints/Firefly/osdmap%3A_primary_role
>>>_affinity
>>> ), one example has been given.
>>>
>>> For PG x, CRUSH returns [a, b, c]
>>> If a has primary_affinity of .5, b and c have 1 , with 50% probability,
>>>we
>>> will choose b or c instead of a. (25% for b, 25% for c)
>>>
>>> A) I was browsing through the code, but I could not find this logic of
>>> splitting the rest of configured primary affinity value between other
>>>osds.
>>> How is this handled?
>>>
>>> if (a < CEPH_OSD_MAX_PRIMARY_AFFINITY &&
>>> (crush_hash32_2(CRUSH_HASH_RJENKINS1,
>>> seed, o) >> 16) >= a) {
>>>   // we chose not to use this primary.  note it anyway as a
>>>   // fallback in case we don't pick anyone else, but keep looking.
>>>   if (pos < 0)
>>> pos = i;
>>> } else {
>>>   pos = i;
>>>   break;
>>> }
>>>   }
>>
>>It's a fallback mechanism ‹ if the chosen primary for a PG has primary
>>affinity less than the default (max), we (probabilistically) look for
>>a different OSD to be the primary. We decide whether to offload by
>>running a hash and discarding the OSD if the output value is greater
>>than the OSDs affinity, and then we go through the list and run that
>>calculation in order (obviously if the affinity is 1, then it passes
>>without needing to run the hash).
>>If no OSD in the list has a high enough hash value, we take the
>>originally-chosen primary.
>  As in example for [0.5,1,1], I got your point that with 50% probability,
> first osd will be chosen. But, how do we ensure that second and third osd
> will be having remaining 25% and 25% respectively?. I could see only
> individual primary affinity values but not a sum value anywhere to ensure
> that.

Well, for any given PG with that pattern, the second OSD in the list
is going to be chosen. But *which* osd is listed second is random, so
if you only have 3 OSDs 0,1,2 (with weights .5, 1, 1, respectively),
then the PGs in total will work in a 1:2:2 ratio because OSDs 1 and 2
will between themselves be first in half of the PG lists.

>
>>
>>> B) Since, primary affinity value is configured independently, there can
>>>be a
>>> situation with [0.1,0.1,0.1]  with total value that don¹t add to 1.
>>>How is
>>> this taken care of?
>>
>>These primary affinity values are just compared against the hash
>>output I mentioned, so the sum doesn't matter. In general we simply
>>expect that OSDs which don't have the max weight value will be chosen
>>as primary in proportion to their share of the total weight of their
>>PG membership (ie, if they have a weight of .5 and everybody else has
>>weight 1, they will be primary in half the normal number of PGs. If
>>everybody has a weight of .5, they will be primary in the normal
>>proportions. Etc).
>
> I got your idea but I couldn¹t figure out that from the code. You said
> that max weight value will be chosen as primary in proportion to their
> share of the total weight of their
> PG membership. But, from what I understood from code, if it is
> [0.1,0.1,0.1], first osd will be chosen always. (Probabilistically for 10%
> reads, it will choose first osd. However,first osd will still be chosen
> for rest of the reads as part of fallback mechanism which is the
> originally chosen primary.) Am I wrong?

If each OSD has affinity of 0.1, then the hash is run until its output
is <0.1 for one of the OSDs in the list. If *none* of the OSDs in the
list hashes out a number smaller than that, then the first one in the
list (which would be the primary by default!) will be selected.

>
>>
>>>
>>> C) Slightly confused. What happens for a situation with [1,0.5,1] ? Is
>>>osd.0
>>> always returned?
>>
>>If the first OSD in the PG list has primary affinity of 1 then it is
>>always the primary for that OSD, yes. That's not osd.0, though; just
>>the first OSD in the PG list. ;)
>
> Sorry. I meant the first OSD, but accidentally wrote as osd.0 . As you
> said, if first osd is always selected in the PG list for this scenario,
> doesn¹t it violate our assumption to have probabilistically  25%, 50%, 25%
> reads for first ,second and third osd respectively?

Err, your numbers don't match the code here — we have two OSDs in that
list with affinity 1 and one with affinity 0.5. That would be a 2:1:2
ratio, or 40%, 20%, 40%. In this case the first OSD in the list is
selected because it's got the max affinity. And the ratios don't
actually work out like that if some of your OSDs have the max affinity
and others don't (because a max affinity OSD will happily take
whatever you throw at

Re: [ceph-users] Blueprints

2014-10-09 Thread Gregory Farnum

On Thu, Oct 9, 2014 at 4:01 PM, Robert LeBlanc  wrote:
> I have a question regarding submitting blueprints. Should only people who
> intend to do the work of adding/changing features of Ceph submit blueprints?
> I'm not primarily a programmer (but can do programming if needed), but have
> a feature request for Ceph.

Blueprints are documents *for* developers. If you as a user have
enough information about the feature you want, and the things it needs
to do in Ceph, to generate a reasonable description of the feature,
its user interface, and a skeleton of how it could be implemented,
we'd love a blueprint. Blueprints which are backed by developers are
more likely to get time at CDS, I think (Patrick/Sage could confirm),
but even just having them is helpful.

If that sounds intimidating, we take less detailed feature requests in
our Redmine at tracker.ceph.com too. ;)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Basic Ceph questions

2014-10-09 Thread Marcus White

Thanks:)

Just curious, what kind of applications use RBD? It cant be
applications which need high speed SAN storage performance
characteristics?

For VMs, I am trying to visualize how the RBD device would be exposed.
Where does the driver live exactly? If its exposed via libvirt and
QEMU, does the kernel driver run in the host OS, and communicate with
a backend Ceph cluster? If yes, does libRBD provide a target (SCSI?)
interface which the kernel driver connects to? Trying to visualize
what the stack looks like, and the flow of IOs for block devices.

FUSE is probably for Ceph file system..

MW





On Wed, Oct 8, 2014 at 6:37 PM, Craig Lewis  wrote:
> Comments inline.
>
> On Tue, Oct 7, 2014 at 5:51 PM, Marcus White 
> wrote:
>>
>> Hello,
>> Some basic Ceph questions, would appreciate your help:) Sorry about
>> the number and detail in advance!
>>
>> a. Ceph RADOS is strongly consistent and different from usual object,
>> does that mean all metadata also, container and account etc is all
>> consistent and everything is updated in the path of the client
>> operation itself, for a single site?
>
>
> Yes.  In a single site, it's CP out of CAP.


>
>>
>> b. If it is strongly consistent, is that the case across sites also?
>> How can it be performant across geo sites if that is the case? If its
>> choosing consistency over partitioning and availability...For object,
>> I read somewhere that it is now eventually consistent(local CP,
>> remotely AP) via DR. Gets a bit confusing with all the literature out
>> there. If it is DR, isnt that slightly different from the Swift case?
>
>
> If you're referring to RadosGW Federation, no.  That replication is async.
> The replication has several delays built in, so the fastest you could to see
> your data show up in the secondary is about a minute.  Longer if the file
> takes a while to transfer, or you have a lot of activity to replicate.
>
> Each site is still CP.  There is just delay getting data from the primary to
> the secondary.
In that case, it is like Swift, only differently done. The async makes
it eventually consistent across sites, no?

>
>
> If you want CP in multiple locations, that's doable by creating one cluster
> that spans both locations, and tuning the CRUSH rules to make sure the
> object is written to both locations. You really want a low latency
> connection between the two sites.
>
> I tested one cluster in two colos with 20ms of latency between them.  It
> worked, but it was noticeably slow.  I went with two clusters and async
> replication.
>
>
>>
>>
>> c. For block, is it CP on a single site and then usual DR to another
>> site using snapshotting?
>
>
> Yes.
>
>
>>
>>
>> d. For block, is it just a linux block device or is it SCSI? Is it a
>> custom device driver running within Linux which hooks into the block
>> layer? Trying to understand the layering diagram.
>
>
> I'm a bit out of my element here, but there is a kernel module and a FUSE
> module.  The kernel module connects RDB images to a /dev/rbd/... block
> device.  It can then be used however you would use a block device.  Most
> people put a filesystem on it, but it's not required.  I'm really unfamiliar
> with the FUSE module.
>
> Several people are exporting RDB images via iSCSI and Fiber Channel.
>
>>
>> e. Do the snapshot, compression features come from the underlying file
>> system?
>
>
> It depends on the filesystem.  Ceph will emulate any required features that
> the FS doesn't support.  For example, ext4 and XFS have no snapshots, so
> Ceph has track them itself.  On BtrFS, Ceph uses the native snapshots, and
> it much quicker because of it.
>
>>
>>
>> f. What is the plan for deduplication? If that comes from the local
>> file system, how would it deduplicate across nodes to achieve the best
>> dedup ratio?
>>
>
> I don't believe Ceph does anything with de-dup.  If the FS underneath has it
> turned on, it can de-dup the stuff it sees, but there's no cluster-wide
> de-dup.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] scrub error with keyvalue backend

2014-10-09 Thread 廖建锋

Dear ceph,

 # ceph -s
cluster e1f18421-5d20-4c3e-83be-a74b77468d61
health HEALTH_ERR 4 pgs inconsistent; 4 scrub errors
monmap e2: 3 mons at 
{storage-1-213=10.1.0.213:6789/0,storage-1-214=10.1.0.214:6789/0,storage-1-215=10.1.0.215:6789/0},
 election epoch 16, quorum 0,1,2 storage-1-213,storage-1-214,storage-1-215
mdsmap e7: 1/1/1 up {0=storage-1-213=up:active}, 2 up:standby
osdmap e135: 18 osds: 18 up, 18 in
pgmap v84135: 1164 pgs, 3 pools, 801 GB data, 15264 kobjects
1853 GB used, 34919 GB / 36772 GB avail
1159 active+clean
4 active+clean+inconsistent
1 active+clean+scrubbing
client io 17400 kB/s wr, 611 op/s

[root@storage-1-213:~] [Fri Oct 10 - 13:30:19]
999 => # ceph -v
ceph version 0.80.6 (f93610a4421cb670b08e974c6550ee715ac528ae)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread Mark Kirkwood

Oh, I see. That complicates it a wee bit (looks back at your messages). 
I see you have:


rgw_keystone_url = http://192.0.8.2:5000

So you'll need to amend/create etc a



and put it in there. I suspect you might be better off changing your rgw 
kesytone url to use port 35357 (the public one). However I think that is 
a side issue.


Also just to double check - 192.0.8.2 *is* the server you are showing us 
the sites-available from?


Cheers

Mark

On 10/10/14 12:50, lakshmi k s wrote:

Yes Mark, but there is no keystone.conf in this modified Openstack code.
There is only horizon.conf under /etc/apache2/sites-available folder.
And that has virtual host 80 only. Should I simply add :35357?

root@overcloud-controller0-fjvtpqjip2hl
:/etc/apache2/sites-available#
ls
000-default.conf  default-ssl.conf  horizon.conf




On Thursday, October 9, 2014 4:45 PM, Mark Kirkwood
 wrote:


Hmm - It looks to me like you added the chunked request into Horizon
instead of Keystone. You want virtual host *:35357


On 10/10/14 12:32, lakshmi k s wrote:
 > Have done this too, but in vain. I made changes to Horizon.conf as shown
 > below. I had only I do not see the user being validated in radosgw log
 > at all.
 >
 > root@overcloud-controller0-fjvtpqjip2hl
:/etc/apache2/sites-available#
ls
 > 000-default.conf  default-ssl.conf  horizon.conf
 >
 > 
 > 
 >  WSGIScriptAlias /
 >
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/wsgi/django.wsgi
 >  WSGIDaemonProcess horizon user=horizon group=horizon processes=3
 > threads=10 home=/opt/stack/venvs/horizon
 >
python-path=/opt/stack/venvs/horizon:/opt/stack/venvs/horizon/lib/python2.7/site-packages/
 > WSGIApplicationGroup %{GLOBAL}
 >
 >  SetEnv APACHE_RUN_USER horizon
 >  SetEnv APACHE_RUN_GROUP horizon
 >  WSGIProcessGroup horizon
 >WSGIChunkedRequest On
 >
 >  DocumentRoot
 >
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
 >  Alias /static
 >
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
 >  Alias /media
 >
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static
 >
 >  
 >  Options FollowSymLinks
 >  AllowOverride None
 >  
 >
 >  
/opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard/static>
 >  Options Indexes FollowSymLinks MultiViews
 >  Require all granted
 > AllowOverride None
 >  Order allow,deny
 >  allow from all
 >  
 >
 >   /opt/stack/venvs/horizon/lib/python2.7/site-packages/openstack_dashboard>
 >  Options Indexes FollowSymLinks MultiViews
 >  Require all granted
 >  AllowOverride None
 >  Order allow,deny
 > allow from all
 >  
 >
 >  ErrorLog /var/log/httpd/horizon_error.log
 >  LogLevel debug
 >  CustomLog /var/log/httpd/horizon_access.log combined
 > 
 >
 > WSGISocketPrefix /var/run/httpd
 >
 > --
 >
 >
 >
 >
 > On Thursday, October 9, 2014 3:51 PM, Mark Kirkwood
 > mailto:mark.kirkw...@catalyst.net.nz>> wrote:
 >
 >
 > No, I don't have any explicit ssl enabled in the rgw site.
 >
 > Now you might be running into http://tracker.ceph.com/issues/7796
 > . So
 > check if you have enabled
 >
 > WSGIChunkedRequest On
 >
 > In your keystone virtualhost setup (explained in the issue).
 >
 > Cheers
 >
 > Mark
 >
 >
 > On 10/10/14 11:03, lakshmi k s wrote:
 >  > Right, I have these certs on both nodes - keystone node and rgw
gateway
 >  > node. Not sure where I am going wrong. And what about SSL? Should the
 >  > following be in rgw.conf in gateway node? I am not using this as
it was
 >  > optional.
 >  >
 >  > SSLEngine on
 >  > SSLCertificateFile /etc/apache2/ssl/apache.crt
 >  > SSLCertificateKeyFile /etc/apache2/ssl/apache.key
 >  > SetEnv SERVER_PORT_SECURE 443
 >  >
 >  >
 >  >
 >  >
 >  >
 >  > On Thursday, October 9, 2014 2:48 PM, Mark Kirkwood
 >  > mailto:mark.kirkw...@catalyst.net.nz>
 > >> wrote:
 >  >
 >  >
 >  > Almost - the converted certs need to be saved on your *rgw* host in
 >  > nss_db_path (default is /var/ceph/nss but wherever you have it
 >  > configured should be ok). Then restart the gateway.
 >  >
 >  > What is happening is the the rgw needs these certs to speak with
 >  > encryption to the keystone server (the latter does not need anything
 >  > changed, as it is already using encryption).
 >  >
 >  > Regards
 >  >
 >  > Mark
 >  >
 >  > On 10/10/14 08:31, lakshmi k s wrote:
 >  >  > Thanks Mark. I got past this error being root. So essentially, I
 > copied
 >  >  > the certs from openstack controller node to gateway node. Did the
 >  >  > conversion using certutil

Re: [ceph-users] Openstack keystone with Radosgw

2014-10-09 Thread Mark Kirkwood

Given your setup appears to be non standard, it might be useful to see 
the output of the 2 commands below:


$ keystone service-list
$ keystone endpoint-list

So we can avoid advising you incorrectly.

Regards

Mark

On 10/10/14 18:46, Mark Kirkwood wrote:

Also just to double check - 192.0.8.2 *is* the server you are showing us
the sites-available from?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

55 matches

Mail list logo