Re: [ovs-discuss] ovsdb-server unkillable, need some help

Jeff Bachtel Fri, 28 Feb 2014 12:00:48 -0800

Does anyone have any insight into this? For further datapoints, I builtthe 2.0 release and much more current openvswitch snapshots (mostrecently to commit bdeadfdd) which exhibited the same problems. TheCentOS 6 kernel is 2.6.32. Because of presumed incompatibility with theLinux bridge module, I made sure bridge.o wasn't being loaded. On a hostwhere ovsdb-server had not yet become unresponsive, ovs-vswitchd wasunkillable, in state R<L. Could my problem be related to vwsitchdbecoming unresponsive under load, taking ovsdb-server with it?

I've received further confirmation that this is involved in some waywith load, as a node inadvertently disconnected from the rest of theCeph cluster had a record uptime with openvswitch. If anyone can give mepointers on getting a backtrace I'm happy to run things until failureand get better data. I've had trouble with this at least as far as usingstrace is concerned. As it is, I've cron'd a restart of openvswitchevery minute - obviously an incredibly unideal situation.


Thanks for any help,

Jeff

On 02/20/2014 12:54 AM, Jeff Bachtel wrote:

I'm running OpenVSwitch 1.11 from the RDO Havana repository. Inaddition, I'm running OpenStack Havana, Neutron, and Ceph Emperor, allon some CentOS 6.5 machines.
After installing Bacula on the previous openstack version (grizzly), Inoticed the networking had become somewhat load sensitive.ovsdb-server was freezing - not responding to queries on its unixsocket and becoming unkillable in process state R< . Believing that itwas probably due to being behind in ovs version, I pushed ahead withan upgrade only to find my stability problems become much much worse.Every 20-30 minutes I can count on an ovsdb-server process freezing.
Athttps://drive.google.com/folderview?id=0B-wx2_T_hW-_OXZJWGJNc0l0MzQ&usp=sharingplease find a folder with shared copies of diagnostic files from amachine with hung ovsdb-server. There is a process list (.ps,apologies forgot postscript until upload was done), strace, dmesg, and/var/log/messages.
The strace didn't reveal anything suspicious to me. To mitigate Itried lowering log verbosity, completely recreating conf.db, as wellas frequent compacting (every minute) and putting the db on a ramdisk,nothing worked as a solution.
The ovsdb-server processes most likely to succumb to locking run onceph hosts running osd - meaning they can see a lot of networktraffic, as well as disk i/o.
I don't understand what a simple database RPC server could be doingthat would cause it to become unkillable, especially with the attemptat minimizing disk i/o by putting the db file on a ramdisk.
I hope someone has some ideas of what I might do to test or mitigatethe situation. Not running ceph osd on the hosts is, unfortunately,not a solution I can use.
Thanks,
Jeff


_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] ovsdb-server unkillable, need some help

Reply via email to