Thanks Carl. This is clear. When will HiveServer2 be implemented?
On Mon, Apr 30, 2012 at 12:15 PM, Carl Steinbach <c...@cloudera.com> wrote: > Hi Benyi, > > The quote from the HiveServer2 proposal reads in full: > > "In fact, it's impossible for HiveServer to support concurrent connections > using the current Thrift API, *a result of the fact that Thrift doesn't > provide server-side access to connection handles*" > > The point I'm trying to make with this statement is that HiveServer > maintains session state using thread-local variables and implicitly relies > on Thrift consistently mapping the same connection to the same Thrift > worker thread, but this isn't a valid assumption to make. For example, if a > client executes "set mapred.reduce.tasks=1" followed by "select .....", you > can't assume that both of these statements will be executed by the same > worker thread. Furthermore, the Thrift API doesn't provide any mechanism > for detecting client disconnects (see THRIFT-1195), which results in > incorrect behavior like this: > > % hive -h localhost -p 10000 > [localhost:10000] hive> set x=1; > set x=1; > [localhost:10000] hive> set x; > set x; > x=1 > [localhost:10000] hive> quit; > quit; > % hive -h localhost -p 10000 > [localhost:10000] hive> set x; > set x; > x=1 > [localhost:10000] hive> quit; > quit; > > In this example I opened a connection to HiveServer and modified my > sessions state on the server by setting x=1. I then killed the connection > and reconnected, and then printed the value of x again. Since I'm creating > a new connection/session I expect x to be undefined, however I actually see > the value of x which I set in the previous connection. This happens because > Thrift assigns the same worker thread to service the second connection, and > since there's no way of detecting client disconnects, HiveServer was unable > clear the thread-local session state associated with that worker thread > before Thrift reassigned it to the second connection. > > While it's tempting to try to solve these problems by modifying Thrift to > provide direct access to the connection handle (which would allow us map > connections to session state on the server-side), this approach makes it > really hard to support HA since it depends on the physical connection > lasting as long as the user session, which isn't a fair assumption to make > in the context of queries that can take many hours to complete. > > Instead, the approach we're taking with HiveServer2 is to provide explicit > support for sessions in the client API, e.g every RPC call references a > session ID which the server then maps to persistent session state. This > makes it possible for any worker thread to service any request from any > client connection. > > I hope this clarifies the limitations of the current HiveServer > implementation as well as the motivations for implementing HiveServer2. > Please let me know if you have any more questions. > > Thanks. > > Carl > > On Thu, Apr 26, 2012 at 11:55 AM, Benyi Wang <bewang.t...@gmail.com> > wrote: > > > I'm a little confused with "In fact, it's impossible for HiveServer to > > support concurrent connections using the current Thrift API" in hive wiki > > page > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API. > > > > I started a hive server on hostA using cdh3u3 > > > > hadoop-hive.noarch 0.7.1+42.36-2 > > installed > > > > Then I logged on two nodes: hostB, and hostC, then start hive client > > > > $ hive -h hostA -p 10000 > > > > It seems that both of two hive clients work normally. > > > > Am I wrong? or the issue in the wiki page has been resolved? > > >