Re: Hive server concurrent connection

Benyi Wang Tue, 01 May 2012 14:37:07 -0700

Thanks Carl. This is clear.

When will HiveServer2 be implemented?


On Mon, Apr 30, 2012 at 12:15 PM, Carl Steinbach <[email protected]> wrote:

> Hi Benyi,
>
> The quote from the HiveServer2 proposal reads in full:
>
> "In fact, it's impossible for HiveServer to support concurrent connections
> using the current Thrift API, *a result of the fact that Thrift doesn't
> provide server-side access to connection handles*"
>
> The point I'm trying to make with this statement is that HiveServer
> maintains session state using thread-local variables and implicitly relies
> on Thrift consistently mapping the same connection to the same Thrift
> worker thread, but this isn't a valid assumption to make. For example, if a
> client executes "set mapred.reduce.tasks=1" followed by "select .....", you
> can't assume that both of these statements will be executed by the same
> worker thread. Furthermore, the Thrift API doesn't provide any mechanism
> for detecting client disconnects (see THRIFT-1195), which results in
> incorrect behavior like this:
>
> % hive -h localhost -p 10000
> [localhost:10000] hive> set x=1;
> set x=1;
> [localhost:10000] hive> set x;
> set x;
> x=1
> [localhost:10000] hive> quit;
> quit;
> % hive -h localhost -p 10000
> [localhost:10000] hive> set x;
> set x;
> x=1
> [localhost:10000] hive> quit;
> quit;
>
> In this example I opened a connection to HiveServer and modified my
> sessions state on the server by setting x=1. I then killed the connection
> and reconnected, and then printed the value of x again. Since I'm creating
> a new connection/session I expect x to be undefined, however I actually see
> the value of x which I set in the previous connection. This happens because
> Thrift assigns the same worker thread to service the second connection, and
> since there's no way of detecting client disconnects, HiveServer was unable
> clear the thread-local session state associated with that worker thread
> before Thrift reassigned it to the second connection.
>
> While it's tempting to try to solve these problems by modifying Thrift to
> provide direct access to the connection handle (which would allow us map
> connections to session state on the server-side), this approach makes it
> really hard to support HA since it depends on the physical connection
> lasting as long as the user session, which isn't a fair assumption to make
> in the context of queries that can take many hours to complete.
>
> Instead, the approach we're taking with HiveServer2 is to provide explicit
> support for sessions in the client API, e.g every RPC call references a
> session ID which the server then maps to persistent session state. This
> makes it possible for any worker thread to service any request from any
> client connection.
>
> I hope this clarifies the limitations of the current HiveServer
> implementation as well as the motivations for implementing HiveServer2.
> Please let me know if you have any more questions.
>
> Thanks.
>
> Carl
>
> On Thu, Apr 26, 2012 at 11:55 AM, Benyi Wang <[email protected]>
> wrote:
>
> > I'm a little confused with "In fact, it's impossible for HiveServer to
> > support concurrent connections using the current Thrift API" in hive wiki
> > page
> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API.
> >
> > I started a hive server on hostA using cdh3u3
> >
> > hadoop-hive.noarch                  0.7.1+42.36-2
> >  installed
> >
> > Then I logged on two nodes: hostB, and hostC, then start hive client
> >
> > $ hive -h hostA -p 10000
> >
> > It seems that both of two hive clients work normally.
> >
> > Am I wrong? or the issue in the wiki page has been resolved?
> >
>

Re: Hive server concurrent connection

Reply via email to