One possibility (which we use in Kudu) is to use SSL for encryption but with a self-signed certificate, maintaining the existing SASL/GSSAPI handshake for authentication. The one important bit here, security wise, is to implement channel binding (RFC 5056 and RFC 5929) to prevent against MITMs. The description of the Kudu protocol is here: https://github.com/apache/kudu/blob/master/docs/design-docs/rpc.md#wire-protocol
If implemented correctly, this provides TLS encryption (with all of its performance and security benefits) without requiring the user to deploy a custom cert. -Todd On Thu, Nov 1, 2018 at 7:14 PM Konstantin Shvachko <shv.had...@gmail.com> wrote: > Hi Wei-Chiu, > > Thanks for starting the thread and summarizing the problem. Sorry for slow > response. > We've been looking at the encrypted performance as well and are interested > in this effort. > We ran some benchmarks locally. Our benchmarks also showed substantial > penalty for turning on wire encryption on rpc. > Although it was less drastic - more in the range of -40%. But we ran a > different benchmark NNThroughputBenchmark, and we ran it on 2.6 last year. > Could have published the results, but need to rerun on more recent > versions. > > Three points from me on this discussion: > > 1. We should settle on the benchmarking tools. > For development RPCCallBenchmark is good as it measures directly the > improvement on the RPC layer. But for external consumption it is more > important to know about e.g. NameNode RPCs performance. So we probably > should run both benchmarks. > 2. SASL vs SSL. > Since current implementation is based on SASL, I think it would make sense > to make improvements in this direction. I assume switching to SSL would > require changes in configuration. Not sure if it will be compatible, since > we don't have the details. At this point I would go with HADOOP-10768. > Given all (Daryn's) concerns are addressed. > 3. Performance improvement expectations. > Ideally we want to have < 10% penalty for encrypted communication. Anything > over 30% will probably have very limited usability. And there is the gray > area in between, which could be mitigated by allowing mixed encrypted and > un-encrypted RPCs on the single NameNode like in HDFS-13566. > > Thanks, > --Konstantin > > On Wed, Oct 31, 2018 at 7:39 AM Daryn Sharp <da...@oath.com.invalid> > wrote: > > > Various KMS tasks have been delaying my RPC encryption work – which is > 2nd > > on TODO list. It's becoming a top priority for us so I'll try my best to > > get a preliminary netty server patch (sans TLS) up this week if that > helps. > > > > The two cited jiras had some critical flaws. Skimming my comments, both > > use blocking IO (obvious nonstarter). HADOOP-10768 is a hand rolled > > TLS-like encryption which I don't feel is something the community can or > > should maintain from a security standpoint. > > > > Daryn > > > > On Wed, Oct 31, 2018 at 8:43 AM Wei-Chiu Chuang <weic...@apache.org> > > wrote: > > > > > Ping. Any one? Cloudera is interested in moving forward with the RPC > > > encryption improvements, but I just like to get a consensus which > > approach > > > to go with. > > > > > > Otherwise I'll pick HADOOP-10768 since it's ready for commit, and I've > > > spent time on testing it. > > > > > > On Thu, Oct 25, 2018 at 11:04 AM Wei-Chiu Chuang <weic...@apache.org> > > > wrote: > > > > > > > Folks, > > > > > > > > I would like to invite all to discuss the various Hadoop RPC > encryption > > > > performance improvements. As you probably know, Hadoop RPC encryption > > > > currently relies on Java SASL, and have _really_ bad performance (in > > > terms > > > > of number of RPCs per second, around 15~20% of the one without SASL) > > > > > > > > There have been some attempts to address this, most notably, > > HADOOP-10768 > > > > <https://issues.apache.org/jira/browse/HADOOP-10768> (Optimize > Hadoop > > > RPC > > > > encryption performance) and HADOOP-13836 > > > > <https://issues.apache.org/jira/browse/HADOOP-13836> (Securing > Hadoop > > > RPC > > > > using SSL). But it looks like both attempts have not been > progressing. > > > > > > > > During the recent Hadoop contributor meetup, Daryn Sharp mentioned > he's > > > > working on another approach that leverages Netty for its SSL > > encryption, > > > > and then integrate Netty with Hadoop RPC so that Hadoop RPC > > automatically > > > > benefits from netty's SSL encryption performance. > > > > > > > > So there are at least 3 attempts to address this issue as I see it. > Do > > we > > > > have a consensus that: > > > > 1. this is an important problem > > > > 2. which approach we want to move forward with > > > > > > > > -- > > > > A very happy Hadoop contributor > > > > > > > > > > > > > -- > > > A very happy Hadoop contributor > > > > > > > > > -- > > > > Daryn > > > -- Todd Lipcon Software Engineer, Cloudera