Sam, the logs are rather large in size. Where should I post it to? 

Thanks 
----- Original Message -----

From: "Samuel Just" <sam.j...@inktank.com> 
To: "Andrei Mikhailovsky" <and...@arhont.com> 
Cc: ceph-users@lists.ceph.com 
Sent: Tuesday, 18 November, 2014 7:54:56 PM 
Subject: Re: [ceph-users] Giant upgrade - stability issues 

Ok, why is ceph marking osds down? Post your ceph.log from one of the 
problematic periods. 
-Sam 

On Tue, Nov 18, 2014 at 1:35 AM, Andrei Mikhailovsky <and...@arhont.com> wrote: 
> Hello cephers, 
> 
> I need your help and suggestion on what is going on with my cluster. A few 
> weeks ago i've upgraded from Firefly to Giant. I've previously written about 
> having issues with Giant where in two weeks period the cluster's IO froze 
> three times after ceph down-ed two osds. I have in total just 17 osds 
> between two osd servers, 3 mons. The cluster is running on Ubuntu 12.04 with 
> latest updates. 
> 
> I've got zabbix agents monitoring the osd servers and the cluster. I get 
> alerts of any issues, such as problems with PGs, etc. Since upgrading to 
> Giant, I am now frequently seeing emails alerting of the cluster having 
> degraded PGs. I am getting around 10-15 such emails per day stating that the 
> cluster has degraded PGs. The number of degraded PGs very between a couple 
> of PGs to over a thousand. After several minutes the cluster repairs itself. 
> The total number of PGs in the cluster is 4412 between all the pools. 
> 
> I am also seeing more alerts from vms stating that there is a high IO wait 
> and also seeing hang tasks. Some vms reporting over 50% io wait. 
> 
> This has not happened on Firefly or the previous releases of ceph. Not much 
> has changed in the cluster since the upgrade to Giant. Networking and 
> hardware is still the same and it is still running the same version of 
> Ubuntu OS. The cluster load hasn't changed as well. Thus, I think the issues 
> above are related to the upgrade of ceph to Giant. 
> 
> Here is the ceph.conf that I use: 
> 
> [global] 
> fsid = 51e9f641-372e-44ec-92a4-b9fe55cbf9fe 
> mon_initial_members = arh-ibstorage1-ib, arh-ibstorage2-ib, arh-cloud13-ib 
> mon_host = 192.168.168.200,192.168.168.201,192.168.168.13 
> auth_supported = cephx 
> osd_journal_size = 10240 
> filestore_xattr_use_omap = true 
> public_network = 192.168.168.0/24 
> rbd_default_format = 2 
> osd_recovery_max_chunk = 8388608 
> osd_recovery_op_priority = 1 
> osd_max_backfills = 1 
> osd_recovery_max_active = 1 
> osd_recovery_threads = 1 
> filestore_max_sync_interval = 15 
> filestore_op_threads = 8 
> filestore_merge_threshold = 40 
> filestore_split_multiple = 8 
> osd_disk_threads = 8 
> osd_op_threads = 8 
> osd_pool_default_pg_num = 1024 
> osd_pool_default_pgp_num = 1024 
> osd_crush_update_on_start = false 
> 
> [client] 
> rbd_cache = true 
> admin_socket = /var/run/ceph/$name.$pid.asok 
> 
> 
> I would like to get to the bottom of these issues. Not sure if the issues 
> could be fixed with changing some settings in ceph.conf or a full downgrade 
> back to the Firefly. Is the downgrade even possible on a production cluster? 
> 
> Thanks for your help 
> 
> Andrei 
> 
> _______________________________________________ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to