Re: [ceph-users] Monitors - proactive questions about quantity, placement and protection

2015-12-12 Thread Wido den Hollander
On 12/11/2015 08:12 PM, Alex Gorbachev wrote:
> This is a proactive message to summarize best practices and options
> working with monitors, especially in a larger production environment
> (larger for me is > 3 racks).
> 
> I know MONs do not require a lot of resources, but prefer to run on SSDs
> for response time.  Also that you need an odd number, as you must have a
> simple majority present.  MON uses leveldb and that data is constantly
> changing, so traditional backups are not relevant/useful.
> 
> There has been uncertainty whether more than 3 MONs will cause any
> performance issues in a cluster.  To that extent, may I ask from both
> the Ceph development community and the excellent power user contributors
> on this list:
> 
> - Is there any performance impact to running > 3 MONs?
> 

No, but as your cluster grows larger with >100k PGs you might need
additional monitors to handle all the PG stats.

> - Is anyone running > 3 MONs in production and what are your experiences?
> 

Yes, running 5 in multiple setups with >1000 OSDs. Works just fine.

> - Has anyone had a need to back up their MONs and any recovery
> experience, such
> as http://blog.widodh.nl/2014/03/safely-backing-up-your-ceph-monitors ?  
> 

I do it sometimes just for disaster purposes. Never needed them to
recover a cluster.

What you could do every day is:

$ ceph osd dump -o osdmap
$ ceph pg dump -o pgmap
$ ceph mon getmap -o monmap
$ ceph osd getcrushmap -o crushmap

That will give you some metadata.

> Our cluster has 8 racks right now, and I would love to place a MON at
> the top of the rack (maybe on SDN switches in the future - why not?). 
> Thank you for helping answer these questions.
> 
> --
> Alex Gorbachev
> Storcium
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

2015-12-12 Thread Claes Sahlström
Just to share with the rest of the list, my problems have been solved now.

I got this information from Sergey Malinin who had the same problem:

1. Stop OSD daemons on all nodes.

2. Check the output of "ceph osd tree". You will see some of OSDs showing as 
"up" - shut them down using "ceph osd down osd.X"

3. Start OSD daemons on all nodes - your cluster should now become operational.

It is probably the  "mon_osd_min_up_ratio" that messed up my upgrade. I am 
happily running Infernalis now.

Thanks all for the help and effort.

/Claes
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu 14.04

2015-12-12 Thread Josef Johansson
Thanks for sharing your solution as well!

Happy holidays
/Josef
On 12 Dec 2015 12:56 pm, "Claes Sahlström"  wrote:

> Just to share with the rest of the list, my problems have been solved now.
>
>
>
> I got this information from Sergey Malinin who had the same problem:
>
> 1. Stop OSD daemons on all nodes.
>
> 2. Check the output of "ceph osd tree". You will see some of OSDs showing
> as "up" - shut them down using "ceph osd down osd.X"
>
> 3. Start OSD daemons on all nodes - your cluster should now become
> operational.
>
>
>
> It is probably the  “mon_osd_min_up_ratio” that messed up my upgrade. I am
> happily running Infernalis now.
>
>
>
> Thanks all for the help and effort.
>
>
>
> /Claes
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel RBD hang on OSD Failure

2015-12-12 Thread Ilya Dryomov
On Sat, Dec 12, 2015 at 6:37 PM, Tom Christensen  wrote:
> We had a kernel map get hung up again last night/this morning.  The rbd is
> mapped but unresponsive, if I try to unmap it I get the following error:
> rbd: sysfs write failed
> rbd: unmap failed: (16) Device or resource busy
>
> Now that this has happened attempting to map another RBD fails, using lsblk
> fails as well, both of these tasks just hang forever.
>
> We have 1480 OSDs in the cluster so posting the osdmap seems excessive,
> however here is the beginning (didn't change in 5 runs):
> root@wrk-slc-01-02:~# cat
> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdmap
> epoch 1284256
> flags
> pool 0 pg_num 2048 (2047) read_tier -1 write_tier -1
> pool 1 pg_num 512 (511) read_tier -1 write_tier -1
> pool 3 pg_num 2048 (2047) read_tier -1 write_tier -1
> pool 4 pg_num 512 (511) read_tier -1 write_tier -1
> pool 5 pg_num 32768 (32767) read_tier -1 write_tier -1
>
> Here is osdc output, it is not changed after 5 runs:
>
> root@wrk-slc-01-02:~# cat
> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdc
> 93835   osd1206 5.6841959c  rbd_data.34df3ac703ced61.1dff
> read
> 9065810 osd1382 5.a50fa0ea  rbd_header.34df3ac703ced61
> 474103'5506530325561344 watch
> root@wrk-slc-01-02:~# cat
> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdc
> 93835   osd1206 5.6841959c  rbd_data.34df3ac703ced61.1dff
> read
> 9067286 osd1382 5.a50fa0ea  rbd_header.34df3ac703ced61
> 474103'5506530325561344 watch
> root@wrk-slc-01-02:~# cat
> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdc
> 93835   osd1206 5.6841959c  rbd_data.34df3ac703ced61.1dff
> read
> 9067831 osd1382 5.a50fa0ea  rbd_header.34df3ac703ced61
> 474103'5506530325561344 watch
> root@wrk-slc-01-02:~# ls /dev/rbd/rbd
> none  volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d
> volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d-part1
> root@wrk-slc-01-02:~# rbd info volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d
> rbd image 'volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d':
> size 61439 MB in 7680 objects
> order 23 (8192 kB objects)
> block_name_prefix: rbd_data.34df3ac703ced61
> format: 2
> features: layering
> flags:
> parent:
> rbd/volume-93d9a102-260e-4500-b87d-9696c7fc2b67@snapshot-9ba998b6-ca57-40dd-8895-265023132e99
> overlap: 61439 MB
>
> ceph status indicates the current osdmap epoch
> osdmap e1284866: 1480 osds: 1480 up, 1480 in
> pgmap v10231386: 37888 pgs, 5 pools, 745 TB data, 293 Mobjects
>
> root@wrk-slc-01-02:~# uname -r
> 3.19.0-25-generic
>
> So, the kernel driver is some 600 epochs behind current.  This does seem to
> be load related as we've been running 4 different kernels in our clients in
> our test environment and have not been able to recreate it there in a little
> over a week, however our production environment has had 2 of these hangs in
> the last 4 days.  Unfortunately I wasn't able to grab data from the first
> one.

If you haven't already nuked it, what's the output of:

$ ceph osd map 
rbd_data.34df3ac703ced61.1dff
$ ceph osd map  rbd_header.34df3ac703ced61

$ ceph daemon osd.1206 ops
$ ceph daemon osd.1206 objecter_requests
$ ceph daemon osd.1206 dump_ops_in_flight
$ ceph daemon osd.1206 dump_historic_ops

and repeat for osd.1382.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com