Nothing specifically about our Hive setup although some of us at Forward have 
blogged bits and pieces about Hive + Hadoop and have a few Hadoop/Hive related 
libs on our GitHub account: https://github.com/forward.

I've blogged a few bits (http://www.oobaloo.co.uk/) as has one of my colleagues 
(http://blog.fingertap.org/post/1255463384/hive-thrift-client).

Another colleague also presented a little about our setup during a Hadoop 
meetup last summer 
(http://skillsmatter.com/podcast/home/hadoop-in-context-1591). The numbers Andy 
mentioned will be a little out of date but it does include some screenshots of 
a few of the surrounding apps we built that connect to Hive and Hadoop 
(including a web based Hive query tool + work queue).

I had a quick search through the mailing lists when we had connection problems 
but I think most of it was discussed/resolved during a chat I had with Shevek 
from Karmasphere at a London pub following a Hadoop meetup :)

If you're interested, I've posted a gist (https://gist.github.com/953926) that 
contains our HAProxy config; clients connect to 10000 and are balanced between 
:10001 and :10005 on 2 servers (so actually 10 backend servers).

Be happy to talk more about our experience- feel free to ping me an email off 
list if you'd like.


On 3 May 2011, at 19:18, Matthew Rathbone wrote:

> Hey Paul,
> 
> I'd be very interested in reading about your hadoop/hive setup, do you have a 
> blog post or anything describing this setup, or some of the issues you've 
> have with hive?
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matt...@foursquare.com | @rathboma | 4sq
> 
> On Tuesday, May 3, 2011 at 2:15 PM, Paul Ingles wrote:
> HiveServer does seem to support multiple connections but I think it still has 
> thread-safety problems (https://issues.apache.org/jira/browse/HIVE-80).
>> 
>> We've (www.forward.co.uk) certainly had instability problems with the thrift 
>> server in the past and now run 5 or so instances behind the HAProxy 
>> load-balancer (http://haproxy.1wt.eu/). Since we did that it's been 
>> significantly better. 
>> 
>> I think the JDBC server still operates using thrift to connect to the 
>> HiveServer so I would expect it to have similar problems (but I may have got 
>> that wrong :)
>> 
>> 
>> On 3 May 2011, at 18:59, Matthew Rathbone wrote:
>> 
>>> Even if it is single threaded it certainly seems to support multiple 
>>> connections. 
>>> 
>>> We run 5 workers all connected at the same time executing a different query 
>>> each ( with a different connection per worker).
>>> 
>>> Hope that helps
>>> 
>>> Matthew 
>>> On Tuesday, May 3, 2011 at 1:40 PM, V.Senthil Kumar wrote:
>>> Thanks Matthew. The wiki page http://wiki.apache.org/hadoop/Hive/HiveServer 
>>> says 
>>>> its single threaded. I have a queue of queries which gets added 
>>>> dynamically all 
>>>> the time. By the time I run 1 query using 1 JDBC connection, the queue 
>>>> gets 
>>>> added more queries and builds up a backlog. So, I was that's why I was 
>>>> wondering 
>>>> whether I can run two or more instances to avoid having a big backlog in 
>>>> queue.
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message ----
>>>> From: Matthew Rathbone <matt...@foursquare.com>
>>>> To: user@hive.apache.org
>>>> Sent: Tue, May 3, 2011 7:46:49 AM
>>>> Subject: Re: HIVE Server multiple instances
>>>> 
>>>> Why would you want to run two? I think it is multithreaded, so you can 
>>>> query it 
>>>> from two different connections
>>>> 
>>>> -- 
>>>> Matthew Rathbone
>>>> Foursquare | Software Engineer | Server Engineering Team
>>>> matt...@foursquare.com | @rathboma | 4sq
>>>> 
>>>> On Monday, May 2, 2011 at 6:41 PM, V.Senthil Kumar wrote:
>>>> Hello, 
>>>>> 
>>>>> I have one instance of HIVE JDBC server running on port 10000. Can I run 
>>>>> another 
>>>>> 
>>>>> instance on different port ? Would it cause a concurrency issue on the 
>>>>> underlying data warehouse files ? Please clarify.
>>>>> 
>>>>> Thanks,
>>>>> V.Senthil Kumar
>> 
> 

Reply via email to