Re: [ceph-users] Ceph re-ip of OSD node

2017-09-05 Thread Morrice Ben
Hi all,

Thanks for your responses. I managed to re-ip the OSDs

I did not need to set cluster network or public network in [global], just 
changing the address in the [osd.#] section was sufficient.


In my environment, the catalyst was a misconfiguration on the network side. 
After I provided iperf results between servers in the OLD and NEW, our network 
team resolved the issue and then ceph 'just worked'.


Cheers,


Ben​




From: ceph-users  on behalf of Jake Young 

Sent: Wednesday, August 30, 2017 11:37 PM
To: Jeremy Hanmer; ceph-users
Subject: Re: [ceph-users] Ceph re-ip of OSD node

Hey Ben,

Take a look at the osd log for another OSD who's ip you did not change.

What errors does it show related the re-ip'd OSD?

Is the other OSD trying to communicate with the re-ip'd OSD's old ip address?

Jake


On Wed, Aug 30, 2017 at 3:55 PM Jeremy Hanmer 
mailto:jeremy.han...@dreamhost.com>> wrote:
This is simply not true. We run quite a few ceph clusters with
rack-level layer2 domains (thus routing between racks) and everything
works great.

On Wed, Aug 30, 2017 at 10:52 AM, David Turner 
mailto:drakonst...@gmail.com>> wrote:
> ALL OSDs need to be running the same private network at the same time.  ALL
> clients, RGW, OSD, MON, MGR, MDS, etc, etc need to be running on the same
> public network at the same time.  You cannot do this as a one at a time
> migration to the new IP space.  Even if all of the servers can still
> communicate via routing, it just won't work.  Changing the public/private
> network addresses for a cluster requires full cluster down time.
>
> On Wed, Aug 30, 2017 at 11:09 AM Ben Morrice 
> mailto:ben.morr...@epfl.ch>> wrote:
>>
>> Hello
>>
>> We have a small cluster that we need to move to a different network in
>> the same datacentre.
>>
>> My workflow was the following (for a single OSD host), but I failed
>> (further details below)
>>
>> 1) ceph osd set noout
>> 2) stop ceph-osd processes
>> 3) change IP, gateway, domain (short hostname is the same), VLAN
>> 4) change references of OLD IP (cluster and public network) in
>> /etc/ceph/ceph.conf with NEW IP (see [1])
>> 5) start a single OSD process
>>
>> This seems to work as the NEW IP can communicate with mon hosts and osd
>> hosts on the OLD network, the OSD is booted and is visible via 'ceph -w'
>> however after a few seconds the OSD drops with messages such as the
>> below in it's log file
>>
>> heartbeat_check: no reply from 10.1.1.100:6818 
>> osd.14 ever on either
>> front or back, first ping sent 2017-08-30 16:42:14.692210 (cutoff
>> 2017-08-30 16:42:24.962245)
>>
>> There are logs like the above for every OSD server/process
>>
>> and then eventually a
>>
>> 2017-08-30 16:42:14.486275 7f6d2c966700  0 log_channel(cluster) log
>> [WRN] : map e85351 wrongly marked me down
>>
>>
>> Am I missing something obvious to reconfigure the network on a OSD host?
>>
>>
>>
>> [1]
>>
>> OLD
>> [osd.0]
>> host = sn01
>> devs = /dev/sdi
>> cluster addr = 10.1.1.101
>> public addr = 10.1.1.101
>> NEW
>> [osd.0]
>> host = sn01
>> devs = /dev/sdi
>> cluster addr = 10.1.2.101
>> public addr = 10.1.2.101
>>
>> --
>> Kind regards,
>>
>> Ben Morrice
>>
>> __
>> Ben Morrice | e: ben.morr...@epfl.ch | t: 
>> +41-21-693-9670
>> EPFL / BBP
>> Biotech Campus
>> Chemin des Mines 9
>> 1202 Geneva
>> Switzerland
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Command that lists all client connections (with ips)?

2017-09-05 Thread Marc Roos
What would be the best way to get an overview of all client connetions. 
Something similar to the output of rbd lock list 


  cluster:
1 clients failing to respond to capability release
1 MDSs report slow requests


ceph daemon mds.a dump_ops_in_flight
{
"ops": [
{
"description": "client_request(client.2342664:12 create 
#0x11b9177/..discinfo.hJpqTF 2017-09-05 09:56:43.419636 
caller_uid=500, caller_gid=500{500,1,2,3,4,6,10,})",
"initiated_at": "2017-09-05 09:56:43.419708",
"age": 5342.233837,
"duration": 5342.233857,
"type_data": {
"flag_point": "failed to wrlock, waiting",
"reqid": "client.2342664:12",
"op_type": "client_request",
"client_info": {
"client": "client.2342664",
"tid": 12
},
"events": [
{
"time": "2017-09-05 09:56:43.419708",
"event": "initiated"
},
{
"time": "2017-09-05 09:56:43.419913",
"event": "failed to wrlock, waiting"
}
]
}
}
],
"num_ops": 1
}

http://docs.ceph.com/docs/master/cephfs/troubleshooting/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Command that lists all client connections (with ips)?

2017-09-05 Thread John Spray
On Tue, Sep 5, 2017 at 10:28 AM, Marc Roos  wrote:
> What would be the best way to get an overview of all client connetions.
> Something similar to the output of rbd lock list
>
>
>   cluster:
> 1 clients failing to respond to capability release
> 1 MDSs report slow requests
>
>
> ceph daemon mds.a dump_ops_in_flight
> {
> "ops": [
> {
> "description": "client_request(client.2342664:12 create
> #0x11b9177/..discinfo.hJpqTF 2017-09-05 09:56:43.419636
> caller_uid=500, caller_gid=500{500,1,2,3,4,6,10,})",
> "initiated_at": "2017-09-05 09:56:43.419708",
> "age": 5342.233837,
> "duration": 5342.233857,
> "type_data": {
> "flag_point": "failed to wrlock, waiting",
> "reqid": "client.2342664:12",
> "op_type": "client_request",
> "client_info": {
> "client": "client.2342664",
> "tid": 12
> },
> "events": [
> {
> "time": "2017-09-05 09:56:43.419708",
> "event": "initiated"
> },
> {
> "time": "2017-09-05 09:56:43.419913",
> "event": "failed to wrlock, waiting"
> }
> ]
> }
> }
> ],
> "num_ops": 1
> }
>
> http://docs.ceph.com/docs/master/cephfs/troubleshooting/

As of luminous, it's "ceph tell mds.0 client ls".

In earlier releases there is "ceph tell mds.0 session ls", and "ceph
daemon mds. session ls" (some of these output non-pretty-printed
JSON that you'd probably need to pipe into a script).

John

>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 答复: How to enable ceph-mgr dashboard

2017-09-05 Thread 许雪寒
Here is the log in /var/log/messages

Sep  5 19:01:55 rg1-ceph7 systemd: Started Ceph cluster manager daemon.
Sep  5 19:01:55 rg1-ceph7 systemd: Starting Ceph cluster manager daemon...
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STARTING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Started 
monitor thread '_TimeoutMonitor'.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Error in HTTP 
server: shutting down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.httpserver.start()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 
1824, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise socket.error(msg)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Stopped 
thread '_TimeoutMonitor'.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Exception in thread HTTPServer Thread-3:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 811, in __bootstrap_inner
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.run()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 764, in run
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.__target(*self.__args, **self.__kwargs)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 201, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.bus.exit()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 276, in exit
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: os._exit(70) # EX_SOFTWARE
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: TypeError: os_exit_noop() takes no 
arguments (1 given)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Error in 
'start' listener >
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 197, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: output.append(listener(*args, **kwargs))
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/_cpserver.py", line 151, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ServerAdapter.start(self)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 174, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.wait()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 208, in 
wait
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise self.interrupt
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Shutting down 
due to error in start listener:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 235, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.publish('start')
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 215, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise exc
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ChannelFailures: error('No socket could be 
created',)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE No thread 
running for None.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858240 7f01a634e700 -1 
mgr serve dashboard.serve:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858266 7f01a634e700 -1 
mgr serve Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib64/ceph/mgr/dashboard/module.py

[ceph-users] 答复: How to enable ceph-mgr dashboard

2017-09-05 Thread 许雪寒
Sorry, for the miss formatting, here is the right one:

Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.httpserver.start()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 
1824, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise socket.error(msg)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Stopped 
thread '_TimeoutMonitor'.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Exception in thread HTTPServer Thread-3:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 811, in __bootstrap_inner
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.run()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 764, in run
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.__target(*self.__args, **self.__kwargs)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 201, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.bus.exit()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 276, in exit
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: os._exit(70) # EX_SOFTWARE
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: TypeError: os_exit_noop() takes no 
arguments (1 given)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Error in 
'start' listener >
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 197, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: output.append(listener(*args, **kwargs))
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/_cpserver.py", line 151, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ServerAdapter.start(self)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 174, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.wait()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 208, in 
wait
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise self.interrupt
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Shutting down 
due to error in start listener:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 235, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.publish('start')
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 215, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise exc
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ChannelFailures: error('No socket could be 
created',)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE No thread 
running for None.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858240 7f01a634e700 -1 
mgr serve dashboard.serve:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858266 7f01a634e700 -1 
mgr serve Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib64/ceph/mgr/dashboard/module.py", line 989, in serve
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: cherrypy.engine.start()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 250, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise e_info
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ChannelFailures: error('No socket could be 
created',)

-邮件原件-
发件人: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] 代表 许雪寒
发送时间: 2017年9月5日 19:05
收件人: ceph-users@lists.ceph.com
主题: [ceph

Re: [ceph-users] 答复: How to enable ceph-mgr dashboard

2017-09-05 Thread Henrik Korkuc

what is output of "netstat -anp | grep 7000"?

On 17-09-05 14:19, 许雪寒 wrote:

Sorry, for the miss formatting, here is the right one:

Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.httpserver.start()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 
1824, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise socket.error(msg)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Stopped 
thread '_TimeoutMonitor'.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Exception in thread HTTPServer Thread-3:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 811, in __bootstrap_inner
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.run()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 764, in run
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.__target(*self.__args, **self.__kwargs)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 201, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.bus.exit()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 276, in exit
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: os._exit(70) # EX_SOFTWARE
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: TypeError: os_exit_noop() takes no 
arguments (1 given)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Error in 'start' listener 
>
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 197, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: output.append(listener(*args, **kwargs))
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/_cpserver.py", line 151, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ServerAdapter.start(self)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 174, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.wait()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 208, in 
wait
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise self.interrupt
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Shutting down 
due to error in start listener:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 235, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.publish('start')
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 215, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise exc
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ChannelFailures: error('No socket could be 
created',)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE No thread 
running for None.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858240 7f01a634e700 -1 
mgr serve dashboard.serve:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858266 7f01a634e700 -1 
mgr serve Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib64/ceph/mgr/dashboard/module.py", line 989, in serve
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: cherrypy.engine.start()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 250, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise e_info
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ChannelFailures: error('No socket could be 
created',)

-邮件原件-
发件人: ceph-users [mailto:ceph-users-boun...@lists.

Re: [ceph-users] Ceph re-ip of OSD node

2017-09-05 Thread David Turner
Good to know. We must have misconfigured our router when we were testing
this.

On Tue, Sep 5, 2017, 3:00 AM Morrice Ben  wrote:

> Hi all,
>
> Thanks for your responses. I managed to re-ip the OSDs
>
> I did not need to set cluster network or public network in [global], just
> changing the address in the [osd.#] section was sufficient.
>
>
> In my environment, the catalyst was a misconfiguration on the network
> side. After I provided iperf results between servers in the OLD and NEW,
> our network team resolved the issue and then ceph 'just worked'.
>
>
> Cheers,
>
>
> Ben​
>
> --
>
> *From:* ceph-users  on behalf of Jake
> Young 
> *Sent:* Wednesday, August 30, 2017 11:37 PM
> *To:* Jeremy Hanmer; ceph-users
> *Subject:* Re: [ceph-users] Ceph re-ip of OSD node
>
> Hey Ben,
>
> Take a look at the osd log for another OSD who's ip you did not change.
>
> What errors does it show related the re-ip'd OSD?
>
> Is the other OSD trying to communicate with the re-ip'd OSD's old ip
> address?
>
> Jake
>
>
> On Wed, Aug 30, 2017 at 3:55 PM Jeremy Hanmer 
> wrote:
>
>> This is simply not true. We run quite a few ceph clusters with
>> rack-level layer2 domains (thus routing between racks) and everything
>> works great.
>>
>> On Wed, Aug 30, 2017 at 10:52 AM, David Turner 
>> wrote:
>> > ALL OSDs need to be running the same private network at the same time.
>> ALL
>> > clients, RGW, OSD, MON, MGR, MDS, etc, etc need to be running on the
>> same
>> > public network at the same time.  You cannot do this as a one at a time
>> > migration to the new IP space.  Even if all of the servers can still
>> > communicate via routing, it just won't work.  Changing the
>> public/private
>> > network addresses for a cluster requires full cluster down time.
>> >
>> > On Wed, Aug 30, 2017 at 11:09 AM Ben Morrice 
>> wrote:
>> >>
>> >> Hello
>> >>
>> >> We have a small cluster that we need to move to a different network in
>> >> the same datacentre.
>> >>
>> >> My workflow was the following (for a single OSD host), but I failed
>> >> (further details below)
>> >>
>> >> 1) ceph osd set noout
>> >> 2) stop ceph-osd processes
>> >> 3) change IP, gateway, domain (short hostname is the same), VLAN
>> >> 4) change references of OLD IP (cluster and public network) in
>> >> /etc/ceph/ceph.conf with NEW IP (see [1])
>> >> 5) start a single OSD process
>> >>
>> >> This seems to work as the NEW IP can communicate with mon hosts and osd
>> >> hosts on the OLD network, the OSD is booted and is visible via 'ceph
>> -w'
>> >> however after a few seconds the OSD drops with messages such as the
>> >> below in it's log file
>> >>
>> >> heartbeat_check: no reply from 10.1.1.100:6818 osd.14 ever on either
>> >> front or back, first ping sent 2017-08-30 16:42:14.692210 (cutoff
>> >> 2017-08-30 16:42:24.962245)
>> >>
>> >> There are logs like the above for every OSD server/process
>> >>
>> >> and then eventually a
>> >>
>> >> 2017-08-30 16:42:14.486275 7f6d2c966700  0 log_channel(cluster) log
>> >> [WRN] : map e85351 wrongly marked me down
>> >>
>> >>
>> >> Am I missing something obvious to reconfigure the network on a OSD
>> host?
>> >>
>> >>
>> >>
>> >> [1]
>> >>
>> >> OLD
>> >> [osd.0]
>> >> host = sn01
>> >> devs = /dev/sdi
>> >> cluster addr = 10.1.1.101
>> >> public addr = 10.1.1.101
>> >> NEW
>> >> [osd.0]
>> >> host = sn01
>> >> devs = /dev/sdi
>> >> cluster addr = 10.1.2.101
>> >> public addr = 10.1.2.101
>> >>
>> >> --
>> >> Kind regards,
>> >>
>> >> Ben Morrice
>> >>
>> >> __
>> >> Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670
>> >> EPFL / BBP
>> >> Biotech Campus
>> >> Chemin des Mines 9
>> >> 1202 Geneva
>> >> Switzerland
>> >>
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD journal (with dmcrypt) replacement

2017-09-05 Thread David Turner
Did the journal drive fail during operation? Or was it taken out during
pre-failure. If it fully failed, then most likely you can't guarantee the
consistency of the underlying osds. In this case, you just put the affected
osds and add them back in as new osds.

In the case of having good data on the osds, you follow the standard
process of closing the journal, create the new partition, set up all of the
partition metadata so that the ceph udev rules will know what the journal
is, and just create a new dmcrypt volume on it. I would recommend using the
same uuid as the old journal so that you don't need to update the symlinks
and such on the osd. After everything is done, run the journal create
command for the osd and start the osd.

On Tue, Sep 5, 2017, 2:47 AM M Ranga Swami Reddy 
wrote:

> Hello,
> How to replace an OSD's journal created with dmcrypt, from one drive
> to another drive, in case of current journal drive failed.
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Object gateway and LDAP Auth

2017-09-05 Thread Josh Haft
Thanks for your suggestions, Matt. ldapsearch functionality from the rados
gw machines works fine using the same parameters specified in ceph.conf
(uri, binddn, searchdn, ldap_secret). As expected I see network traffic
to/from the ldap host when performing a search as well.

The only configuration I have in /etc/openldap/ldap.conf is 'TLSREQCERT
demand' and TLS_CACERTDIR pointing at the location of my certdb... is there
something else required here for ceph-rgw or does it look elsewhere?

Josh




On Fri, Sep 1, 2017 at 11:15 PM, Matt Benjamin  wrote:

> Hi Josh,
>
> I'm not certain, but you might try disabling the searchfilter to start
> with.  If you're not seeing traffic, I would focus on verifying ldap
> search connectivity using the same credentials, using the openldap
> client, to rule out something low level.
>
> Matt
>
>
> On Thu, Aug 31, 2017 at 3:33 PM, Josh  wrote:
> > Hello!
> >
> > I've setup LDAP authentication on an object gateway and am attempting to
> > create a bucket via s3 using python's boto3. It works fine using the
> access
> > and secret key for a radosgw user, but access is denied using a token
> > generated via radosgw-token with the LDAP user's credentials. The user
> does
> > exist in the directory (I'm using Active Directory), and I am able to
> query
> > for that user using the creds specified in rgw_ldap_binddn and
> > rgw_ldap_secret.
> >
> > I've bumped the rgw logging to 20 and can see the request come in, but it
> > ultimately gets denied:
> > 2017-08-30 15:44:55.754721 7f4878ff9700  2 req 1:0.76:s3:PUT
> > /foobar:create_bucket:authorizing
> > 2017-08-30 15:44:55.754738 7f4878ff9700 10 v4 signature format = 
> > 2017-08-30 15:44:55.754746 7f4878ff9700 10 v4 credential format =
> > /20170830/us-east-1/s3/aws4_request
> > 2017-08-30 15:44:55.754750 7f4878ff9700 10 access key id = 
> > 2017-08-30 15:44:55.754755 7f4878ff9700 10 credential scope =
> > 20170830/us-east-1/s3/aws4_request
> > 2017-08-30 15:44:55.754769 7f4878ff9700 20 get_system_obj_state:
> > rctx=0x7f4878ff2060 obj=default.rgw.users.keys: state=0x7f48f40131a8
> > s->prefetch_data=0
> > 2017-08-30 15:44:55.754778 7f4878ff9700 10 cache get:
> > name=default.rgw.users.keys+ : miss
> > 2017-08-30 15:44:55.755312 7f4878ff9700 10 cache put:
> > name=default.rgw.users.keys+ info.flags=0
> > 2017-08-30 15:44:55.755321 7f4878ff9700 10 adding
> > default.rgw.users.keys+ to cache LRU end
> > 2017-08-30 15:44:55.755328 7f4878ff9700 10 error reading user info,
> uid=
> > can't authenticate
> > 2017-08-30 15:44:55.755330 7f4878ff9700 10 failed to authorize request
> > 2017-08-30 15:44:55.755331 7f4878ff9700 20 handler->ERRORHANDLER:
> > err_no=-2028 new_err_no=-2028
> > 2017-08-30 15:44:55.755393 7f4878ff9700  2 req 1:0.000747:s3:PUT
> > /foobar:create_bucket:op status=0
> > 2017-08-30 15:44:55.755398 7f4878ff9700  2 req 1:0.000752:s3:PUT
> > /foobar:create_bucket:http status=403
> > 2017-08-30 15:44:55.755402 7f4878ff9700  1 == req done
> > req=0x7f4878ff3710 op status=0 http_status=403 ==
> > 2017-08-30 15:44:55.755409 7f4878ff9700 20 process_request() returned
> -2028
> >
> > I am also running a tcpdump on the machine while I see these log
> messages,
> > but strangely I see no traffic destined for my configured LDAP server.
> > Here's some info on my setup. It seems like I'm missing something very
> > obvious; any help would be appreciated!
> >
> > # rpm -q ceph-radosgw
> > ceph-radosgw-10.2.9-0.el7.x86_64
> >
> > # grep rgw /etc/ceph/ceph.conf
> > [client.rgw.hostname]
> > rgw_frontends = civetweb port=8081s ssl_certificate=/path/to/
> private/key.pem
> > debug rgw = 20
> > rgw_s3_auth_use_ldap = true
> > rgw_ldap_secret = "/path/to/creds/file"
> > rgw_ldap_uri = "ldaps://hostname.domain.com:636"
> > rgw_ldap_binddn = "CN=valid_user,OU=Accounts,DC=domain,DC=com"
> > rgw_ldap_searchdn = "ou=Accounts,dc=domain,dc=com"
> > rgw_ldap_dnattr = "uid"
> > rgw_ldap_searchfilter = "objectclass=user"
> >
> >
> > Thanks,
> > Josh
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

2017-09-05 Thread Tyler Bishop
We had to change these in our cluster for some drives to come up.

_ 

Tyler Bishop 
Founder EST 2007 


O: 513-299-7108 x10 
M: 513-646-5809 
[ http://beyondhosting.net/ | http://BeyondHosting.net ] 


This email is intended only for the recipient(s) above and/or otherwise 
authorized personnel. The information contained herein and attached is 
confidential and the property of Beyond Hosting. Any unauthorized copying, 
forwarding, printing, and/or disclosing any information related to this email 
is prohibited. If you received this message in error, please contact the sender 
and destroy all copies of this email and any attachment(s).

- Original Message -
From: "Andreas Calminder" 
To: "Gregory Farnum" 
Cc: "Ceph Users" 
Sent: Tuesday, September 5, 2017 1:17:32 AM
Subject: Re: [ceph-users] Jewel (10.2.7) osd suicide timeout while deep-scrub

Hi!
Thanks for the pointer about leveldb_compact_on_mount, it took a while
to get everything compacted but after that the deep scrub of the
offending pg went smooth without any suicides. I'm considering using
the compact on mount feature for all our osd's in the cluster since
they're kind of large and thereby kind of slow, sas, but still.
Anyhow, thanks a lot for the help!

/andreas

On 17 August 2017 at 23:48, Gregory Farnum  wrote:
> On Thu, Aug 17, 2017 at 1:02 PM, Andreas Calminder
>  wrote:
>> Hi!
>> Thanks for getting back to me!
>>
>> Clients access the cluster through rgw (s3), we had some big buckets
>> containing a lot of small files. Prior to this happening I removed a
>> semi-stale bucket with a rather large index, 2.5 million objects, all but 30
>> objects didn't actually exist which left the normal radosgw-admin bucket rm
>> command to fail so I had to remove the bucket instances and bucket metadata
>> by hand, leaving the remaining 30 objects floating around in the cluster.
>>
>> I don't have access to the logs at the moment, but I see the deep-scrub
>> starting in the log for osd.34, after a while it starts with
>>
>> 1 heartbeat_map is_healthy
>> 'OSD::osd_op_tp thread $THREADID' had timed out after 15
>>
>> the $THREADID seemingly is the same one as the deep scrub, after a while it
>> will suicide and a lot of operations will happen until the deep scrub tries
>> again for the same pg and the above repeats.
>>
>> The osd disk (we have 1 osd per disk) is rather large and pretty slow so it
>> might be that, but I think the behaviour should've been observed elsewhere
>> in the cluster as well since all osd disks are of the same type and size.
>>
>> One thought I had is to just kill the disk and re-add it since the data is
>> supposed to be replicated to 3 nodes in the cluster, but I kind of want to
>> find out what has happened and have it fixed.
>
> Ah. Some people have also found that compacting the leveldb store
> improves the situation a great deal. In most versions you can do this
> by setting "leveldb_compact_on_mount = true" in the OSD's config file
> and then restarting the daemon. You may also have admin socket
> commands available to trigger it.
>
> I'd try out those and then turn it on again with the high suicide
> timeout and see if things improve.
> -Greg
>
>
>>
>> /andreas
>>
>>
>> On 17 Aug 2017 20:21, "Gregory Farnum"  wrote:
>>
>> On Thu, Aug 17, 2017 at 12:14 AM Andreas Calminder
>>  wrote:
>>>
>>> Thanks,
>>> I've modified the timeout successfully, unfortunately it wasn't enough
>>> for the deep-scrub to finish, so I increased the
>>> osd_op_thread_suicide_timeout even higher (1200s), the deep-scrub
>>> command will however get killed before this timeout is reached, I
>>> figured it was osd_command_thread_suicide_timeout and adjusted it
>>> accordingly and restarted the osd, but it still got killed
>>> approximately 900s after starting.
>>>
>>> The log spits out:
>>> 2017-08-17 09:01:35.723865 7f062e696700  1 heartbeat_map is_healthy
>>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>>> 2017-08-17 09:01:40.723945 7f062e696700  1 heartbeat_map is_healthy
>>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>>> 2017-08-17 09:01:45.012105 7f05cceee700  1 heartbeat_map reset_timeout
>>> 'OSD::osd_op_tp thread 0x7f05cceee700' had timed out after 15
>>>
>>> I'm thinking having an osd in a cluster locked for ~900s maybe isn't
>>> the best thing, is there any way of doing this deep-scrub operation
>>> "offline" or in some way that wont affect or get affected by the rest
>>> of the cluster?
>>
>>
>> Deep scrub actually timing out a thread is pretty weird anyway — I think it
>> requires some combination of abnormally large objects/omap indexes and buggy
>> releases.
>>
>> Is there any more information in the log about the thread that's timing out?
>> What's leading you to believe it's the deep scrub? What kind of data is in
>> the pool?
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/list

[ceph-users] OSD won't start, even created ??

2017-09-05 Thread Phil Schwarz

Hi,
I come back with same issue as seen in previous thread ( link given)

trying to a 2TB SATA as OSD:
Using proxmox GUI or CLI (command given) give the same (bad) result.

Didn't want to use a direct 'ceph osd create', thus bypassing pxmfs 
redundant filesystem.


I tried to build an OSD woth same disk on another machine (stronger one 
with Opteron QuadCore), failing at the same time.



Sorry for crossposting, but i think, i fail against the pveceph wrapper.


Any help or clue would be really useful..

Thanks
Best regards.










-- Link to previous thread (but same problem):
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg38897.html


-- commands :
fdisk /dev/sdc ( mklabel msdos, w, q)
ceph-disk zap /dev/sdc
pveceph createosd /dev/sdc

-- dpkg -l

 dpkg -l |grep ceph
ii  ceph 12.1.2-pve1 
amd64distributed storage and file system
ii  ceph-base12.1.2-pve1 
amd64common ceph daemon libraries and management tools
ii  ceph-common  12.1.2-pve1 
amd64common utilities to mount and interact with a ceph storage 
cluster
ii  ceph-mgr 12.1.2-pve1 
amd64manager for the ceph distributed storage system
ii  ceph-mon 12.1.2-pve1 
amd64monitor server for the ceph storage system
ii  ceph-osd 12.1.2-pve1 
amd64OSD server for the ceph storage system
ii  libcephfs1   10.2.5-7.2 
amd64Ceph distributed file system client library
ii  libcephfs2   12.1.2-pve1 
amd64Ceph distributed file system client library
ii  python-cephfs12.1.2-pve1 
amd64Python 2 libraries for the Ceph libcephfs library


-- tail -f /var/log/ceph/ceph-osd.admin.log

2017-09-03 18:28:20.856641 7fad97e45e00  0 ceph version 12.1.2 
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process 
(unknown), pid 5493
2017-09-03 18:28:20.857104 7fad97e45e00 -1 bluestore(/dev/sdc2) 
_read_bdev_label unable to decode label at offset 102: 
buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode 
past end of struct encoding
2017-09-03 18:28:20.857200 7fad97e45e00  1 journal _open /dev/sdc2 fd 4: 
2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0

2017-09-03 18:28:20.857366 7fad97e45e00  1 journal close /dev/sdc2
2017-09-03 18:28:20.857431 7fad97e45e00  0 probe_block_device_fsid 
/dev/sdc2 is filestore, ----
2017-09-03 18:28:21.937285 7fa5766a5e00  0 ceph version 12.1.2 
(cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc), process 
(unknown), pid 5590
2017-09-03 18:28:21.944189 7fa5766a5e00 -1 bluestore(/dev/sdc2) 
_read_bdev_label unable to decode label at offset 102: 
buffer::malformed_input: void 
bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode 
past end of struct encoding
2017-09-03 18:28:21.944305 7fa5766a5e00  1 journal _open /dev/sdc2 fd 4: 
2000293007360 bytes, block size 4096 bytes, directio = 0, aio = 0

2017-09-03 18:28:21.944527 7fa5766a5e00  1 journal close /dev/sdc2
2017-09-03 18:28:21.944588 7fa5766a5e00  0 probe_block_device_fsid 
/dev/sdc2 is filestore, ----

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Mentors for next Outreachy Round

2017-09-05 Thread Ali Maredia
Hey Cephers,

Leo and I are coordinators for Ceph's particpation in Outreachy 
(https://www.outreachy.org/), a program similar to the Google Summer of Code 
for groups that are traditionally underrepresented in tech. During the program, 
mentee's work on a project for three months under a mentor and with the rest 
of the community.

Outreachy has two rounds each year, one of which is starting in December and
ending in March.

If you have any project ideas you would like to be a mentor for this round,
please send Leo and I a project title and a two to three sentence
description to start with. The deadline for proposing a project for Ceph this 
round
is a week from today, Tuesday September 12th.

If you would like a reference for this summers Google Summer of Code projects
to get an idea of previous projects, you can see them here:

http://ceph.com/gsoc2017-ideas/

If you have any questions please don't hesitate to ask.

Thanks,

Ali & Leo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] a question about use of CEPH_IOC_SYNCIO in write

2017-09-05 Thread Gregory Farnum
On Fri, Sep 1, 2017 at 7:24 AM,   wrote:
> Hi:
> I want to ask a question about CEPH_IOC_SYNCIO flag.
> I know that when using O_SYNC flag or O_DIRECT flag, write call 
> executes in other two code paths different than using CEPH_IOC_SYNCIO flag.
> And I find the comments about CEPH_IOC_SYNCIO here:
>
> /*
>  * CEPH_IOC_SYNCIO - force synchronous IO
>  *
>  * This ioctl sets a file flag that forces the synchronous IO that
>  * bypasses the page cache, even if it is not necessary.  This is
>  * essentially the opposite behavior of IOC_LAZYIO.  This forces the
>  * same read/write path as a file opened by multiple clients when one
>  * or more of those clients is opened for write.
>  *
>  * Note that this type of sync IO takes a different path than a file
>  * opened with O_SYNC/D_SYNC (writes hit the page cache and are
>  * immediately flushed on page boundaries).  It is very similar to
>  * O_DIRECT (writes bypass the page cache) except that O_DIRECT writes
>  * are not copied (user page must remain stable) and O_DIRECT writes
>  * have alignment restrictions (on the buffer and file offset).
>  */
> #define CEPH_IOC_SYNCIO _IO(CEPH_IOCTL_MAGIC, 5)
>
> My question is:
> 1."This forces the same read/write path as a file opened by multiple 
> clients when one or more of those clients is opened for write." -- Does this 
> mean multiple clients can execute in the same code path when they all use the 
> CEPH_IOC_SYNCIO flag? Will the use of CEPH_IOC_SYNCIO in all clients bring 
> effects such as coherency and performance?

If you're just using the normal interfaces, you don't need to play
around with this. I *think* this ioctl is only so that if you are
using lazyio (which disables the usual cache coherence), you can still
get get data IO which is coordinated with other clients.

> 2."...except that O_DIRECT writes are not copied (user page must 
> remain stable)" -- As I know when threads write with CEPH_IOC_SYNCIO flag, 
> the write call will block until ceph osd and mds send back responses. So even 
> with CEPH_IOC_SYNCIO flag(the user pages are not locked here, I guess), but 
> the user cannot use these pages. How can the use of CEPH_IOC_SYNCIO flag make 
> better use of user space memory?

I'm not very familiar with these mechanisms, but I think it's saying
that if you use CEPH_IOC_SYNCIO in an async IO interface, once the
async write returns then it will have done an internal copy and can
use the pages again?
Not really sure...
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD: How many snapshots is too many?

2017-09-05 Thread Florian Haas
Hi everyone,

with the Luminous release out the door and the Labor Day weekend over,
I hope I can kick off a discussion on another issue that has irked me
a bit for quite a while. There doesn't seem to be a good documented
answer to this: what are Ceph's real limits when it comes to RBD
snapshots?

For most people, any RBD image will have perhaps a single-digit number
of snapshots. For example, in an OpenStack environment we typically
have one snapshot per Glance image, a few snapshots per Cinder volume,
and perhaps a few snapshots per ephemeral Nova disk (unless clones are
configured to flatten immediately). Ceph generally performs well under
those circumstances.

However, things sometimes start getting problematic when RBD snapshots
are generated frequently, and in an automated fashion. I've seen Ceph
operators configure snapshots on a daily or even hourly basis,
typically when using snapshots as a backup strategy (where they
promise to allow for very short RTO and RPO). In combination with
thousands or maybe tens of thousands of RBDs, that's a lot of
snapshots. And in such scenarios (and only in those), users have been
bitten by a few nasty bugs in the past — here's an example where the
OSD snap trim queue went berserk in the event of lots of snapshots
being deleted:

http://tracker.ceph.com/issues/9487
https://www.spinics.net/lists/ceph-devel/msg20470.html

It seems to me that there still isn't a good recommendation along the
lines of "try not to have more than X snapshots per RBD image" or "try
not to have more than Y snapshots in the cluster overall". Or is the
"correct" recommendation actually "create as many snapshots as you
might possibly want, none of that is allowed to create any instability
nor performance degradation and if it does, that's a bug"?

Looking forward to your thoughts. Thanks in advance!

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC pool as a tier/cache pool

2017-09-05 Thread Gregory Farnum
On Fri, Aug 25, 2017 at 3:20 AM, Henrik Korkuc  wrote:
> Hello,
>
> I tried creating tiering with EC pools (EC pool as a cache for another EC
> pool) and end up with "Error ENOTSUP: tier pool 'ecpool' is an ec pool,
> which cannot be a tier". Having overwrite support on EC pools with direct
> support by RBD and CephFS it may be worth having tiering using EC pools
> (e.g. on SSDs). What others think about it? Maybe it is already planned?

This still isn't supported because EC pools with overwrites still
don't support omap or many class operations.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (no subject)

2017-09-05 Thread Gregory Farnum
On Thu, Aug 31, 2017 at 11:51 AM, Marc Roos  wrote:
>
> Should these messages not be gone in 12.2.0?
>
> 2017-08-31 20:49:33.500773 7f5aa1756d40 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> 2017-08-31 20:49:33.501026 7f5aa1756d40 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
> 2017-08-31 20:49:33.540667 7f5aa1756d40 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore
>
> ceph-selinux-12.2.0-0.el7.x86_64
> ceph-mon-12.2.0-0.el7.x86_64
> collectd-ceph-5.7.1-2.el7.x86_64
> ceph-base-12.2.0-0.el7.x86_64
> ceph-osd-12.2.0-0.el7.x86_64
> ceph-mgr-12.2.0-0.el7.x86_64
> ceph-12.2.0-0.el7.x86_64
> ceph-common-12.2.0-0.el7.x86_64
> ceph-mds-12.2.0-0.el7.x86_64

Yes. We actually produce that in a couple places -- where exactly are
you seeing them?
-Greg

>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs in peered state?

2017-09-05 Thread Gregory Farnum
On Mon, Aug 28, 2017 at 4:05 AM, Yuri Gorshkov  wrote:
> Hi.
>
> When trying to take down a host for maintenance purposes I encountered an
> I/O stall along with some PGs marked 'peered' unexpectedly.
>
> Cluster stats: 96/96 OSDs, healthy prior to incident, 5120 PGs, 4 hosts
> consisting of 24 OSDs each. Ceph version 11.2.0, using standard filestore
> (with LVM journals on SSD) and default crush map. All pools are size 3,
> min_size 2.
>
> Steps to reproduce the problem:
> 0. Cluster is healthy, HEALTH_OK
> 1. Set noout flag to prepare for host removal.
> 2. Begin taking OSDs on one of the hosts down: systemctl stop ceph-osd@$osd.
> 3. Notice the IO has stalled unexpectedly and about 100 PGs total are in
> degraded+undersized+peered state if the host is down.
>
> AFAIK the 'peered' state means that the PG has not been replicated to
> min_size yet, so there is something strange going on. Since we have 4 hosts
> and are using the default crush map, how is it possible that after taking
> one host (or even just some OSDs on that host) down some PGs in the cluster
> are left with less than 2 copies?
>
> Here's the snippet of 'ceph pg dump_stuck' when this happened. Sadly I don't
> have any more information yet...
>
> # ceph pg dump|grep peered
> dumped all in format plain
> 3.c80   173  0  346   692   0   715341824
> 1004110041 undersized+degraded+remapped+backfill_wait+peered 2017-08-02
> 19:12:39.319222  12124'104727   12409:62777 [62,76,44] 62[2]
> 21642'32485 2017-07-18 22:57:06.2637271008'135 2017-07-09
> 22:34:40.893182
> 3.204   184  0  368   649   0   769544192
> 1006510065 undersized+degraded+remapped+backfill_wait+peered 2017-08-02
> 19:12:39.334905   12124'13665   12409:37345  [75,52,1] 75[2]
> 2 1375'4316 2017-07-18 00:10:27.601548   1371'2740 2017-07-12
> 07:48:34.953831
> 11.19 25525  051050 78652   0 14829768529
> 1005910059 undersized+degraded+remapped+backfill_wait+peered 2017-08-02
> 19:12:39.311612  12124'156267  12409:137128 [56,26,14] 56   [18]
> 181375'28148 2017-07-17 20:27:04.916079 0'0 2017-07-10
> 16:12:49.270606

Well, are those listed OSDs all on different hosts, or are they on the
same host? Kind of sounds (and look) like your CRUSH map is separating
copies across hard drives rather than across hosts. (This could happen
if you initially created your cluster with only one host or
something.)
-Greg

>
> --
> Sincerely,
> Yuri Gorshkov
> Systems Engineer
> SmartLabs LLC
> +7 (495) 645-44-46 ext. 6926
> ygorsh...@smartlabs.tv
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD: How many snapshots is too many?

2017-09-05 Thread Gregory Farnum
On Tue, Sep 5, 2017 at 1:44 PM, Florian Haas  wrote:
> Hi everyone,
>
> with the Luminous release out the door and the Labor Day weekend over,
> I hope I can kick off a discussion on another issue that has irked me
> a bit for quite a while. There doesn't seem to be a good documented
> answer to this: what are Ceph's real limits when it comes to RBD
> snapshots?
>
> For most people, any RBD image will have perhaps a single-digit number
> of snapshots. For example, in an OpenStack environment we typically
> have one snapshot per Glance image, a few snapshots per Cinder volume,
> and perhaps a few snapshots per ephemeral Nova disk (unless clones are
> configured to flatten immediately). Ceph generally performs well under
> those circumstances.
>
> However, things sometimes start getting problematic when RBD snapshots
> are generated frequently, and in an automated fashion. I've seen Ceph
> operators configure snapshots on a daily or even hourly basis,
> typically when using snapshots as a backup strategy (where they
> promise to allow for very short RTO and RPO). In combination with
> thousands or maybe tens of thousands of RBDs, that's a lot of
> snapshots. And in such scenarios (and only in those), users have been
> bitten by a few nasty bugs in the past — here's an example where the
> OSD snap trim queue went berserk in the event of lots of snapshots
> being deleted:
>
> http://tracker.ceph.com/issues/9487
> https://www.spinics.net/lists/ceph-devel/msg20470.html
>
> It seems to me that there still isn't a good recommendation along the
> lines of "try not to have more than X snapshots per RBD image" or "try
> not to have more than Y snapshots in the cluster overall". Or is the
> "correct" recommendation actually "create as many snapshots as you
> might possibly want, none of that is allowed to create any instability
> nor performance degradation and if it does, that's a bug"?

I think we're closer to "as many snapshots as you want", but there are
some known shortages there.

First of all, if you haven't seen my talk from the last OpenStack
summit on snapshots and you want a bunch of details, go watch that. :p
https://www.openstack.org/videos/boston-2017/ceph-snapshots-for-fun-and-profit-1

There are a few dimensions there can be failures with snapshots:
1) right now the way we mark snapshots as deleted is suboptimal — when
deleted they go into an interval_set in the OSDMap. So if you have a
bunch of holes in your deleted snapshots, it is possible to inflate
the osdmap to a size which causes trouble. But I'm not sure if we've
actually seen this be an issue yet — it requires both a large cluster,
and a large map, and probably some other failure causing osdmaps to be
generated very rapidly.

2) There may be issues with how rbd records what snapshots it is
associated with? No idea about this; haven't heard of any.

3) Trimming snapshots requires IO. This is where most (all?) of the
issues I've seen have come from; either in it being unscheduled IO
that the rest of the system doesn't account for or throttle (as in the
links you highlighted) or in admins overwhelming the IO capacity of
their clusters.
At this point I think we've got everything being properly scheduled so
it shouldn't break your cluster, but you can build up large queues of
deferred work.

-Greg

>
> Looking forward to your thoughts. Thanks in advance!
>
> Cheers,
> Florian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs in peered state?

2017-09-05 Thread Yuri Gorshkov
I'm using the default (host level) crush map so that shouldn't be the case.
Nothing was misplaced, etc.
And yes, judging by the pg dump output these OSDs were on different hosts.

I was thinking, maybe this has to do with OSDs not having a consistent
state somehow? Or some pgmap issues?


6 сент. 2017 г. 12:31 ДП пользователь "Gregory Farnum" 
написал:

On Mon, Aug 28, 2017 at 4:05 AM, Yuri Gorshkov 
wrote:
> Hi.
>
> When trying to take down a host for maintenance purposes I encountered an
> I/O stall along with some PGs marked 'peered' unexpectedly.
>
> Cluster stats: 96/96 OSDs, healthy prior to incident, 5120 PGs, 4 hosts
> consisting of 24 OSDs each. Ceph version 11.2.0, using standard filestore
> (with LVM journals on SSD) and default crush map. All pools are size 3,
> min_size 2.
>
> Steps to reproduce the problem:
> 0. Cluster is healthy, HEALTH_OK
> 1. Set noout flag to prepare for host removal.
> 2. Begin taking OSDs on one of the hosts down: systemctl stop ceph-osd@
$osd.
> 3. Notice the IO has stalled unexpectedly and about 100 PGs total are in
> degraded+undersized+peered state if the host is down.
>
> AFAIK the 'peered' state means that the PG has not been replicated to
> min_size yet, so there is something strange going on. Since we have 4
hosts
> and are using the default crush map, how is it possible that after taking
> one host (or even just some OSDs on that host) down some PGs in the
cluster
> are left with less than 2 copies?
>
> Here's the snippet of 'ceph pg dump_stuck' when this happened. Sadly I
don't
> have any more information yet...
>
> # ceph pg dump|grep peered
> dumped all in format plain
> 3.c80   173  0  346   692   0   715341824
> 1004110041 undersized+degraded+remapped+backfill_wait+peered
2017-08-02
> 19:12:39.319222  12124'104727   12409:62777 [62,76,44] 62
[2]
> 21642'32485 2017-07-18 22:57:06.2637271008'135 2017-07-09
> 22:34:40.893182
> 3.204   184  0  368   649   0   769544192
> 1006510065 undersized+degraded+remapped+backfill_wait+peered
2017-08-02
> 19:12:39.334905   12124'13665   12409:37345  [75,52,1] 75
[2]
> 2 1375'4316 2017-07-18 00:10:27.601548   1371'2740 2017-07-12
> 07:48:34.953831
> 11.19 25525  051050 78652   0 14829768529
> 1005910059 undersized+degraded+remapped+backfill_wait+peered
2017-08-02
> 19:12:39.311612  12124'156267  12409:137128 [56,26,14] 56
 [18]
> 181375'28148 2017-07-17 20:27:04.916079 0'0 2017-07-10
> 16:12:49.270606

Well, are those listed OSDs all on different hosts, or are they on the
same host? Kind of sounds (and look) like your CRUSH map is separating
copies across hard drives rather than across hosts. (This could happen
if you initially created your cluster with only one host or
something.)
-Greg

>
> --
> Sincerely,
> Yuri Gorshkov
> Systems Engineer
> SmartLabs LLC
> +7 (495) 645-44-46 ext. 6926
> ygorsh...@smartlabs.tv
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous Upgrade KRBD

2017-09-05 Thread Ashley Merrick
Hello,

Have recently upgraded a cluster to Luminous (Running Proxmox), at the same 
time I have upgraded the Compute Cluster to 5.x meaning we now run the latest 
kernel version (Linux 4.10.15-1) Looking to do the following :

ceph osd set-require-min-compat-client luminous

Below is the output of ceph features, the 4 number next to the last row of 
luminous is as expected for the 4 Compute nodes, are the other 4 spread across 
hammer & jewel log's of when the node's last connected before they was upgraded 
to Proxmox 5.0, am I safe to run the above command? No other RBD resources are 
connected to this cluster.

  "client": {
"group": {
"features": "0x106b84a842a42",
"release": "hammer",
"num": 1
},
"group": {
"features": "0x40106b84a842a52",
"release": "jewel",
"num": 3
},
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 4
}


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Developers Monthly - September

2017-09-05 Thread Leonardo Vaz
On Wed, Aug 30, 2017 at 01:04:51AM -0300, Leonardo Vaz wrote:
> Hey Cephers,
> 
> This is just a friendly reminder that the next Ceph Developer Montly
> meeting is coming up:
> 
>  http://wiki.ceph.com/Planning
> 
> If you have work that you're doing that it a feature work, significant
> backports, or anything you would like to discuss with the core team,
> please add it to the following page:
> 
>  http://wiki.ceph.com/CDM_06-SEP-2017
> 
> If you have questions or comments, please let us know.

Hey cephers,

The Ceph Developer Monthly is confirmed for tonight, September 6 at 9pm
Eastern Time (EDT), in an APAC-friendly time slot.

If you have any topic to discuss on this meeting, please add it
on the following pad:

   http://tracker.ceph.com/projects/ceph/wiki/CDM_06-SEP-2017

We will use the following Bluejeans URL for the video conference:

  https://bluejeans.com/707503600

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous BlueStore EC performance

2017-09-05 Thread Blair Bethwaite
Hi all,

(Sorry if this shows up twice - I got auto-unsubscribed and so first
attempt was blocked)

I'm keen to read up on some performance comparisons for replication versus
EC on HDD+SSD based setups. So far the only recent thing I've found is
Sage's Vault17 slides [1], which have a single slide showing 3X / EC42 /
EC51 for Kraken. I guess there is probably some of this data to be found in
the performance meeting threads, but it's hard to know the currency of
those (typically master or wip branch tests) with respect to releases. Can
anyone point out any other references or highlight something that's coming?

I'm sure there are piles of operators and architects out there at the
moment wondering how they could and should reconfigure their clusters once
upgraded to Luminous. A couple of things going around in my head at the
moment:

* We want to get to having the bulk of our online storage in CephFS on EC
pool/s...
*-- is overwrite performance on EC acceptable for near-line NAS use-cases?
*-- recovery implications (currently recovery on our Jewel RGW EC83 pool is
_way_ slower that 3X pools, what does this do to reliability? maybe split
capacity into multiple pools if it helps to contain failure?)

[1]
https://www.slideshare.net/sageweil1/bluestore-a-new-storage-backend-for-ceph-one-year-in/37

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous Upgrade KRBD

2017-09-05 Thread Henrik Korkuc

On 17-09-06 07:33, Ashley Merrick wrote:

Hello,

Have recently upgraded a cluster to Luminous (Running Proxmox), at the 
same time I have upgraded the Compute Cluster to 5.x meaning we now 
run the latest kernel version (Linux 4.10.15-1) Looking to do the 
following :


ceph osd set-require-min-compat-client luminous

Does 4.10 kernel support luminous features? I am afraid (but do not have 
info to back it up) that 4.10 is too old for Luminous features.


Below is the output of ceph features, the 4 number next to the last 
row of luminous is as expected for the 4 Compute nodes, are the other 
4 spread across hammer & jewel log's of when the node's last connected 
before they was upgraded to Proxmox 5.0, am I safe to run the above 
command? No other RBD resources are connected to this cluster.


  "client": {
        "group": {
            "features": "0x106b84a842a42",
            "release": "hammer",
            "num": 1
        },
        "group": {
            "features": "0x40106b84a842a52",
            "release": "jewel",
            "num": 3
        },
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 4
        }




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous Upgrade KRBD

2017-09-05 Thread Ashley Merrick
I was just going by : docs.ceph.com/docs/master/start/os-recommendations/


Which states 4.9


docs.ceph.com/docs/master/rados/operations/crush-map


Only goes as far as Jewel and states 4.5


Not sure where else I can find a concrete answer to if 4.10 is new enough.


,Ashley


From: Henrik Korkuc 
Sent: 06 September 2017 06:58:52
To: Ashley Merrick; ceph-us...@ceph.com
Subject: Re: [ceph-users] Luminous Upgrade KRBD

On 17-09-06 07:33, Ashley Merrick wrote:
Hello,

Have recently upgraded a cluster to Luminous (Running Proxmox), at the same 
time I have upgraded the Compute Cluster to 5.x meaning we now run the latest 
kernel version (Linux 4.10.15-1) Looking to do the following :

ceph osd set-require-min-compat-client luminous

Does 4.10 kernel support luminous features? I am afraid (but do not have info 
to back it up) that 4.10 is too old for Luminous features.

Below is the output of ceph features, the 4 number next to the last row of 
luminous is as expected for the 4 Compute nodes, are the other 4 spread across 
hammer & jewel log's of when the node's last connected before they was upgraded 
to Proxmox 5.0, am I safe to run the above command? No other RBD resources are 
connected to this cluster.

  "client": {
"group": {
"features": "0x106b84a842a42",
"release": "hammer",
"num": 1
},
"group": {
"features": "0x40106b84a842a52",
"release": "jewel",
"num": 3
},
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 4
}





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com