Re: Hard Coded 0 to assign RPC Server port number when hive.execution.engine=spark

Xuefu Zhang Mon, 19 Oct 2015 21:38:42 -0700

Hi Adrew,

I understand your policy on edge node. However, I'm wondering why you
cannot require that Hive CLI run only on gateway nodes, similar to HS2? In
essence, Hive CLI is a client with embedded hive server, so it seems
reasonable to have a similar requirement as it for HS2.


I'm not defending against your request. Rather, I'm interested in the
rationale behind your policy.

Thanks,
Xuefu

On Mon, Oct 19, 2015 at 9:12 PM, Andrew Lee <alee...@hotmail.com> wrote:

> Hi Xuefu,
>
> I agree for HS2 since HS2 usually runs on a gateway or service node inside
> the cluster environment.
> In my case, it is actually additional security.
> A separate edge node (not running HS2, HS2 runs on another box) is used
> for HiveCLI.
> We don't allow data/worker nodes to talk to the edge node on random ports.
> All ports must be registered or explicitly specified and monitored.
> That's why I am asking for this feature. Otherwise, opening up 1024-65535
> from data/worker node to edge node is actually
> a bad idea and bad practice for network security.  :(
>
>
>
> ________________________________________
> From: Xuefu Zhang <xzh...@cloudera.com>
> Sent: Monday, October 19, 2015 1:12 PM
> To: dev@hive.apache.org
> Subject: Re: Hard Coded 0 to assign RPC Server port number when
> hive.execution.engine=spark
>
> Hi Andrew,
>
> RpcServer is an instance launched for each user session. In case of Hive
> CLI, which is for a single user, what you said makes sense and the port
> number can be configurable. In the context of HS2, however, there are
> multiple user sessions and the total is unknown in advance. While +1 scheme
> works, there can be still a band of ports that might be eventually opened.
>
> On a different perspective, we expect that either Hive CLI or HS2 resides
> on a gateway node, which are in the same network with the data/worker
> nodes. In this configuration, firewall issue you mentioned doesn't apply.
> Such configuration is what we usually see in our enterprise customers,
> which is what we recommend. I'm not sure why you would want your Hive users
> to launch Hive CLI anywhere outside your cluster, which doesn't seem secure
> if security is your concern.
>
> Thanks,
> Xuefu
>
> On Mon, Oct 19, 2015 at 7:20 AM, Andrew Lee <alee...@hotmail.com> wrote:
>
> > Hi All,
> >
> >
> > I notice that in
> >
> >
> >
> >
> ./spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcServer.java
> >
> >
> > The port number is assigned with 0 which means it will be a random port
> > every time when the RPC Server is created
> >
> > to talk to Spark in the same session.
> >
> >
> > Any reason why this port number is not a property to be configured and
> > follow the same rule as +1 if the port is taken?
> >
> > Just like Spark's configuration for Spark Driver, etc.?  Because of this,
> > this is causing problems to configure firewall between the
> >
> > HiveCLI RPC Server and Spark due to unpredictable port numbers here. In
> > other word, users need to open all hive ports range
> >
> > from Data Node => HiveCLI (edge node).
> >
> >
> >  this.channel = new ServerBootstrap()
> >       .group(group)
> >       .channel(NioServerSocketChannel.class)
> >       .childHandler(new ChannelInitializer<SocketChannel>() {
> >           @Override
> >           public void initChannel(SocketChannel ch) throws Exception {
> >             SaslServerHandler saslHandler = new
> SaslServerHandler(config);
> >             final Rpc newRpc = Rpc.createServer(saslHandler, config, ch,
> > group);
> >             saslHandler.rpc = newRpc;
> >
> >             Runnable cancelTask = new Runnable() {
> >                 @Override
> >                 public void run() {
> >                   LOG.warn("Timed out waiting for hello from client.");
> >                   newRpc.close();
> >                 }
> >             };
> >             saslHandler.cancelTask = group.schedule(cancelTask,
> >                 RpcServer.this.config.getServerConnectTimeoutMs(),
> >                 TimeUnit.MILLISECONDS);
> >
> >           }
> >       })
> >       .option(ChannelOption.SO_BACKLOG, 1)
> >       .option(ChannelOption.SO_REUSEADDR, true)
> >       .childOption(ChannelOption.SO_KEEPALIVE, true)
> >       .bind(0)
> >       .sync()
> >       .channel();
> >     this.port = ((InetSocketAddress) channel.localAddress()).getPort();
> >
> >
> > Appreciate any feedback, and if a JIRA is required to keep track of this
> > conversation. Thanks.
> >
> >
>

Re: Hard Coded 0 to assign RPC Server port number when hive.execution.engine=spark

Reply via email to