Yikes: I personally found that the most problematic thing was hiveserver + zk locking, if you do not need that turn it off. Other then that we just wrote a good nagios check..it runs a query (one that does not invoke a map reduce job). That seems to spot the problems quickly and allow our ops to restart the bad instance.
On Mon, Nov 18, 2013 at 5:11 PM, Roberto Congiu <roberto.con...@openx.com>wrote: > We've also had issues with both hiveserver1 and 2 crashing because of heap > exhaustion, but instead of restarting it periodically we took a different > approach, that is, abstracting the part of the interface we needed, and > implemented an adapter that implements the same method as thrift, but > forking a shell, sending commands to it, and parsing the results. > It is slow, but it's fast enough for our hourly process that loads data in > hive EXTERNAL tables for which we need extra reliability. > > R. > > > On Mon, Nov 18, 2013 at 1:24 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > >> Thanks for pointing out any issue. HiveServer1 is significantly less >> robust. We have run HS1 behind a load balancer/proxy and rotated/restarted >> "angry" instances. >> >> >> On Mon, Nov 18, 2013 at 3:59 PM, Stephen Sprague <sprag...@gmail.com>wrote: >> >>> A word of warning for users of HiveServer2 - version 0.11 at least. This >>> puppy has the ability crash and/or hang your server with a memory leak. >>> >>> Apparently its not new since googling shows this discussed before and i >>> see reference to a workaround here: >>> >>> https://cwiki.apache.org/confluence/display/Hive/Setting+up+HiveServer2 >>> >>> Anyhoo. Consider this a Public Service Announcement. Take heed. >>> >>> Regards, >>> Stephen. >>> >>> >>> >>> >> > > > -- > ---------------------------------------------------------- > Good judgement comes with experience. > Experience comes with bad judgement. > ---------------------------------------------------------- > Roberto Congiu - Data Engineer - OpenX > tel: +1 626 466 1141 >