Hi Adrew, I understand your policy on edge node. However, I'm wondering why you cannot require that Hive CLI run only on gateway nodes, similar to HS2? In essence, Hive CLI is a client with embedded hive server, so it seems reasonable to have a similar requirement as it for HS2.
I'm not defending against your request. Rather, I'm interested in the rationale behind your policy. Thanks, Xuefu On Mon, Oct 19, 2015 at 9:12 PM, Andrew Lee <alee...@hotmail.com> wrote: > Hi Xuefu, > > I agree for HS2 since HS2 usually runs on a gateway or service node inside > the cluster environment. > In my case, it is actually additional security. > A separate edge node (not running HS2, HS2 runs on another box) is used > for HiveCLI. > We don't allow data/worker nodes to talk to the edge node on random ports. > All ports must be registered or explicitly specified and monitored. > That's why I am asking for this feature. Otherwise, opening up 1024-65535 > from data/worker node to edge node is actually > a bad idea and bad practice for network security. :( > > > > ________________________________________ > From: Xuefu Zhang <xzh...@cloudera.com> > Sent: Monday, October 19, 2015 1:12 PM > To: dev@hive.apache.org > Subject: Re: Hard Coded 0 to assign RPC Server port number when > hive.execution.engine=spark > > Hi Andrew, > > RpcServer is an instance launched for each user session. In case of Hive > CLI, which is for a single user, what you said makes sense and the port > number can be configurable. In the context of HS2, however, there are > multiple user sessions and the total is unknown in advance. While +1 scheme > works, there can be still a band of ports that might be eventually opened. > > On a different perspective, we expect that either Hive CLI or HS2 resides > on a gateway node, which are in the same network with the data/worker > nodes. In this configuration, firewall issue you mentioned doesn't apply. > Such configuration is what we usually see in our enterprise customers, > which is what we recommend. I'm not sure why you would want your Hive users > to launch Hive CLI anywhere outside your cluster, which doesn't seem secure > if security is your concern. > > Thanks, > Xuefu > > On Mon, Oct 19, 2015 at 7:20 AM, Andrew Lee <alee...@hotmail.com> wrote: > > > Hi All, > > > > > > I notice that in > > > > > > > > > ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java > > > > > > The port number is assigned with 0 which means it will be a random port > > every time when the RPC Server is created > > > > to talk to Spark in the same session. > > > > > > Any reason why this port number is not a property to be configured and > > follow the same rule as +1 if the port is taken? > > > > Just like Spark's configuration for Spark Driver, etc.? Because of this, > > this is causing problems to configure firewall between the > > > > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In > > other word, users need to open all hive ports range > > > > from Data Node => HiveCLI (edge node). > > > > > > this.channel = new ServerBootstrap() > > .group(group) > > .channel(NioServerSocketChannel.class) > > .childHandler(new ChannelInitializer<SocketChannel>() { > > @Override > > public void initChannel(SocketChannel ch) throws Exception { > > SaslServerHandler saslHandler = new > SaslServerHandler(config); > > final Rpc newRpc = Rpc.createServer(saslHandler, config, ch, > > group); > > saslHandler.rpc = newRpc; > > > > Runnable cancelTask = new Runnable() { > > @Override > > public void run() { > > LOG.warn("Timed out waiting for hello from client."); > > newRpc.close(); > > } > > }; > > saslHandler.cancelTask = group.schedule(cancelTask, > > RpcServer.this.config.getServerConnectTimeoutMs(), > > TimeUnit.MILLISECONDS); > > > > } > > }) > > .option(ChannelOption.SO_BACKLOG, 1) > > .option(ChannelOption.SO_REUSEADDR, true) > > .childOption(ChannelOption.SO_KEEPALIVE, true) > > .bind(0) > > .sync() > > .channel(); > > this.port = ((InetSocketAddress) channel.localAddress()).getPort(); > > > > > > Appreciate any feedback, and if a JIRA is required to keep track of this > > conversation. Thanks. > > > > >