Sam, 

I've done more network testing, this time over 2 days and I believe I have 
enough evidence to conclude that the osd disconnects are not caused by the 
network. I have ran about 140 million TCP connects on each osd and host server 
over the course of about two days. Generating about 800-900 connections per 
seconds. I've not had a single error/packet drop and the latency / standard 
deviation was very minimal. 

While the tests were running I did see a number of osds being marked as down by 
other osds. According to the logs it happened at least 3 times in the course of 
two days. However, this time the cluster IO was available. The osds simply 
connected back with the message that they were wrongly marked down. 

I was not able to set the full debug logging on the cluster as it would have 
consumed the disk space in less than 30 mins. So I am not really sure how to 
debug this particular problem. 

What I have done is I have rebooted both osd servers and so far I've not see 
any osd disconnects. The servers are up 3 days already. Perhaps the problem 
could be down to the kernel stability, but if this was the case, I would have 
seen similar issues on Firefly, which I did not. Not sure what to think now. 

Andrei 
----- Original Message -----

> From: "Andrei Mikhailovsky" <and...@arhont.com>
> To: sj...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Sent: Thursday, 20 November, 2014 4:50:21 PM
> Subject: Re: [ceph-users] Giant upgrade - stability issues

> Thanks, I will try that.

> Andrei
> ----- Original Message -----

> From: "Samuel Just" <sam.j...@inktank.com>
> To: "Andrei Mikhailovsky" <and...@arhont.com>
> Cc: ceph-users@lists.ceph.com
> Sent: Thursday, 20 November, 2014 4:26:00 PM
> Subject: Re: [ceph-users] Giant upgrade - stability issues

> You can try to capture logging at

> debug osd = 20
> debug ms = 20
> debug filestore = 20

> while an osd is misbehaving.
> -Sam

> On Thu, Nov 20, 2014 at 7:34 AM, Andrei Mikhailovsky
> <and...@arhont.com> wrote:
> > Sam,
> >
> > further to your email I have done the following:
> >
> > 1. Upgraded both osd servers with the latest updates and restarted
> > each
> > server in turn
> > 2. fired up nping utility to generate TCP connections (3 way
> > handshake) from
> > each of the servers as well as from the host servers. In total i've
> > ran 5
> > tests. The nping utility was establishing connects on port 22 (as
> > all
> > servers have this port open) with the delay of 1ms. The command
> > used to
> > generate the traffic was as follows:
> >
> > nping --tcp-connect -p 22 --delay 1ms <hostname> -v2 -c 36000000 |
> > gzip
> >>/root/nping-hostname-output.gz
> >
> > The tests took just over 12 hours to complete. The results did not
> > show any
> > problems as far as I can see. Here is the tailed output of one of
> > the
> > findings:
> >
> >
> > SENT (37825.7303s) Starting TCP Handshake > arh-ibstorage1-ib:22
> > (192.168.168.200:22)
> > RECV (37825.7303s) Handshake with arh-ibstorage1-ib:22
> > (192.168.168.200:22)
> > completed
> >
> > Max rtt: 4.447ms | Min rtt: 0.008ms | Avg rtt: 0.008ms
> > TCP connection attempts: 36000000 | Successful connections:
> > 36000000 |
> > Failed: 0 (0.00%)
> > Tx time: 37825.72833s | Tx bytes/s: 76138.65 | Tx pkts/s: 951.73
> > Rx time: 37825.72939s | Rx bytes/s: 38069.33 | Rx pkts/s: 951.73
> > Nping done: 1 IP address pinged in 37844.55 seconds
> >
> >
> > As you can see from the above, there are no failed connects at all
> > from the
> > 36 million established connections. The average delay is 0.008ms
> > and it was
> > sending on average almost 1000 packets per second. I've got the
> > same results
> > from other servers.
> >
> > Unless you have other tests in mind, I think there are no issues
> > with the
> > network.
> >
> > I fire up another test for 24 hours this time to see if it makes a
> > difference.
> >
> > Thanks
> >
> > Andrei
> >
> >
> > ________________________________
> > From: "Samuel Just" <sam.j...@inktank.com>
> > To: "Andrei Mikhailovsky" <and...@arhont.com>
> > Cc: ceph-users@lists.ceph.com
> > Sent: Wednesday, 19 November, 2014 9:45:40 PM
> >
> > Subject: Re: [ceph-users] Giant upgrade - stability issues
> >
> > Well, the heartbeats are failing due to networking errors
> > preventing
> > the heartbeats from arriving. That is causing osds to go down, and
> > that is causing pgs to become degraded. You'll have to work out
> > what
> > is preventing the tcp connections from being stable.
> > -Sam
> >
> > On Wed, Nov 19, 2014 at 1:39 PM, Andrei Mikhailovsky
> > <and...@arhont.com>
> > wrote:
> >>
> >>>You indicated that osd 12 and 16 were the ones marked down, but it
> >>>looks like only 0,1,2,3,7 were marked down in the ceph.log you
> >>>sent.
> >>>The logs for 12 and 16 did indicate that they had been partitioned
> >>>from the other nodes. I'd bet that you are having intermittent
> >>>network trouble since the heartbeats are intermittently failing.
> >>>-Sam
> >>
> >> AM: I will check the logs further for the osds 12 and 16. Perhaps
> >> I've
> >> missed something, but the ceph osd tree output was showing 12 and
> >> 16 as
> >> down.
> >>
> >> Regarding the failure of heartbeats, Wido has suggested that I
> >> should
> >> investigate the reason for it's failure. The obvious thing to look
> >> at is
> >> the
> >> network and this is what I've initially done. However, I do not
> >> see any
> >> signs of the network issues. There are no errors on the physical
> >> interface
> >> and ifconfig is showing a very small number of TX dropped packets
> >> (0.00006%)
> >> and 0 errors:
> >>
> >>
> >> # ifconfig ib0
> >> ib0 Link encap:UNSPEC HWaddr
> >> 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00
> >> inet addr:192.168.168.200 Bcast:192.168.168.255
> >> Mask:255.255.255.0
> >> inet6 addr: fe80::223:7dff:ff94:e2a5/64 Scope:Link
> >> UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
> >> RX packets:1812895801 errors:0 dropped:52 overruns:0 frame:0
> >> TX packets:1835002992 errors:0 dropped:1037 overruns:0 carrier:0
> >> collisions:0 txqueuelen:2048
> >> RX bytes:6252740293262 (6.2 TB) TX bytes:11343307665152 (11.3
> >> TB)
> >>
> >>
> >> How would I investigate what is happening with the hearbeats and
> >> the
> >> reason
> >> for their failures? I have a suspetion that this will solve the
> >> issues
> >> with
> >> frequent reporting of degraded PGs on the cluster and intermittent
> >> high
> >> levels of IO wait on vms.
> >>
> >> And also, as i've previously mentioned, the issues started to
> >> happen after
> >> the upgrade to Giant. I've not had these problems with Firefly,
> >> Emperor or
> >> Dumpling releases on the same hardware and same cluster loads.
> >>
> >> Thanks
> >>
> >> Andrei
> >>
> >>
> >>
> >>
> >> On Tue, Nov 18, 2014 at 3:34 PM, Andrei Mikhailovsky
> >> <and...@arhont.com>
> >> wrote:
> >>> Sam,
> >>>
> >>> Pastebin or similar will not take tens of megabytes worth of
> >>> logs. If we
> >>> are
> >>> talking about debug_ms 10 setting, I've got about 7gb worth of
> >>> logs
> >>> generated every half an hour or so. Not really sure what to do
> >>> with that
> >>> much data. Anything more constructive?
> >>>
> >>> Thanks
> >>> ________________________________
> >>> From: "Samuel Just" <sam.j...@inktank.com>
> >>> To: "Andrei Mikhailovsky" <and...@arhont.com>
> >>> Cc: ceph-users@lists.ceph.com
> >>> Sent: Tuesday, 18 November, 2014 8:53:47 PM
> >>>
> >>> Subject: Re: [ceph-users] Giant upgrade - stability issues
> >>>
> >>> pastebin or something, probably.
> >>> -Sam
> >>>
> >>> On Tue, Nov 18, 2014 at 12:34 PM, Andrei Mikhailovsky
> >>> <and...@arhont.com>
> >>> wrote:
> >>>> Sam, the logs are rather large in size. Where should I post it
> >>>> to?
> >>>>
> >>>> Thanks
> >>>> ________________________________
> >>>> From: "Samuel Just" <sam.j...@inktank.com>
> >>>> To: "Andrei Mikhailovsky" <and...@arhont.com>
> >>>> Cc: ceph-users@lists.ceph.com
> >>>> Sent: Tuesday, 18 November, 2014 7:54:56 PM
> >>>> Subject: Re: [ceph-users] Giant upgrade - stability issues
> >>>>
> >>>>
> >>>> Ok, why is ceph marking osds down? Post your ceph.log from one
> >>>> of the
> >>>> problematic periods.
> >>>> -Sam
> >>>>
> >>>> On Tue, Nov 18, 2014 at 1:35 AM, Andrei Mikhailovsky
> >>>> <and...@arhont.com>
> >>>> wrote:
> >>>>> Hello cephers,
> >>>>>
> >>>>> I need your help and suggestion on what is going on with my
> >>>>> cluster. A
> >>>>> few
> >>>>> weeks ago i've upgraded from Firefly to Giant. I've previously
> >>>>> written
> >>>>> about
> >>>>> having issues with Giant where in two weeks period the
> >>>>> cluster's IO
> >>>>> froze
> >>>>> three times after ceph down-ed two osds. I have in total just
> >>>>> 17 osds
> >>>>> between two osd servers, 3 mons. The cluster is running on
> >>>>> Ubuntu 12.04
> >>>>> with
> >>>>> latest updates.
> >>>>>
> >>>>> I've got zabbix agents monitoring the osd servers and the
> >>>>> cluster. I
> >>>>> get
> >>>>> alerts of any issues, such as problems with PGs, etc. Since
> >>>>> upgrading
> >>>>> to
> >>>>> Giant, I am now frequently seeing emails alerting of the
> >>>>> cluster having
> >>>>> degraded PGs. I am getting around 10-15 such emails per day
> >>>>> stating
> >>>>> that
> >>>>> the
> >>>>> cluster has degraded PGs. The number of degraded PGs very
> >>>>> between a
> >>>>> couple
> >>>>> of PGs to over a thousand. After several minutes the cluster
> >>>>> repairs
> >>>>> itself.
> >>>>> The total number of PGs in the cluster is 4412 between all the
> >>>>> pools.
> >>>>>
> >>>>> I am also seeing more alerts from vms stating that there is a
> >>>>> high IO
> >>>>> wait
> >>>>> and also seeing hang tasks. Some vms reporting over 50% io
> >>>>> wait.
> >>>>>
> >>>>> This has not happened on Firefly or the previous releases of
> >>>>> ceph. Not
> >>>>> much
> >>>>> has changed in the cluster since the upgrade to Giant.
> >>>>> Networking and
> >>>>> hardware is still the same and it is still running the same
> >>>>> version of
> >>>>> Ubuntu OS. The cluster load hasn't changed as well. Thus, I
> >>>>> think the
> >>>>> issues
> >>>>> above are related to the upgrade of ceph to Giant.
> >>>>>
> >>>>> Here is the ceph.conf that I use:
> >>>>>
> >>>>> [global]
> >>>>> fsid = 51e9f641-372e-44ec-92a4-b9fe55cbf9fe
> >>>>> mon_initial_members = arh-ibstorage1-ib, arh-ibstorage2-ib,
> >>>>> arh-cloud13-ib
> >>>>> mon_host = 192.168.168.200,192.168.168.201,192.168.168.13
> >>>>> auth_supported = cephx
> >>>>> osd_journal_size = 10240
> >>>>> filestore_xattr_use_omap = true
> >>>>> public_network = 192.168.168.0/24
> >>>>> rbd_default_format = 2
> >>>>> osd_recovery_max_chunk = 8388608
> >>>>> osd_recovery_op_priority = 1
> >>>>> osd_max_backfills = 1
> >>>>> osd_recovery_max_active = 1
> >>>>> osd_recovery_threads = 1
> >>>>> filestore_max_sync_interval = 15
> >>>>> filestore_op_threads = 8
> >>>>> filestore_merge_threshold = 40
> >>>>> filestore_split_multiple = 8
> >>>>> osd_disk_threads = 8
> >>>>> osd_op_threads = 8
> >>>>> osd_pool_default_pg_num = 1024
> >>>>> osd_pool_default_pgp_num = 1024
> >>>>> osd_crush_update_on_start = false
> >>>>>
> >>>>> [client]
> >>>>> rbd_cache = true
> >>>>> admin_socket = /var/run/ceph/$name.$pid.asok
> >>>>>
> >>>>>
> >>>>> I would like to get to the bottom of these issues. Not sure if
> >>>>> the
> >>>>> issues
> >>>>> could be fixed with changing some settings in ceph.conf or a
> >>>>> full
> >>>>> downgrade
> >>>>> back to the Firefly. Is the downgrade even possible on a
> >>>>> production
> >>>>> cluster?
> >>>>>
> >>>>> Thanks for your help
> >>>>>
> >>>>> Andrei
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@lists.ceph.com
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to