Filed https://issues.apache.org/jira/browse/HADOOP-16119 to implemented KMS on Hadoop RPC. It seemed there were interested parties involved in the development of this feature.
On Fri, Nov 2, 2018 at 4:35 PM Wei-Chiu Chuang <weic...@apache.org> wrote: > > Thanks all for the inputs, > > To offer additional information (while Daryn is working on his stuff), > optimizing RPC encryption opens up another possibility: migrating KMS > service to use Hadoop RPC. > > Today's KMS uses HTTPS + REST API, much like webhdfs. It has very > undesirable performance (a few thousand ops per second) compared to > NameNode. Unfortunately for each NameNode namespace operation you also need > to access KMS too. > > Migrating KMS to Hadoop RPC greatly improves its performance (if > implemented correctly), and RPC encryption would be a prerequisite. So > please keep that in mind when discussing the Hadoop RPC encryption > improvements. Cloudera is very interested to help with the Hadoop RPC > encryption project because a lot of our customers are using at-rest > encryption, and some of them are starting to hit KMS performance limit. > > This whole "migrating KMS to Hadoop RPC" was Daryn's idea. I heard this > idea in the meetup and I am very thrilled to see this happening because it > is a real issue bothering some of our customers, and I suspect it is the > right solution to address this tech debt. > > On Fri, Nov 2, 2018 at 1:21 PM Todd Lipcon <t...@cloudera.com.invalid> > wrote: > > > One possibility (which we use in Kudu) is to use SSL for encryption but > > with a self-signed certificate, maintaining the existing SASL/GSSAPI > > handshake for authentication. The one important bit here, security wise, is > > to implement channel binding (RFC 5056 and RFC 5929) to prevent against > > MITMs. The description of the Kudu protocol is here: > > > > https://github.com/apache/kudu/blob/master/docs/design-docs/rpc.md#wire-protocol > > > > If implemented correctly, this provides TLS encryption (with all of its > > performance and security benefits) without requiring the user to deploy a > > custom cert. > > > > -Todd > > > > On Thu, Nov 1, 2018 at 7:14 PM Konstantin Shvachko <shv.had...@gmail.com> > > wrote: > > > > > Hi Wei-Chiu, > > > > > > Thanks for starting the thread and summarizing the problem. Sorry for > > slow > > > response. > > > We've been looking at the encrypted performance as well and are > > interested > > > in this effort. > > > We ran some benchmarks locally. Our benchmarks also showed substantial > > > penalty for turning on wire encryption on rpc. > > > Although it was less drastic - more in the range of -40%. But we ran a > > > different benchmark NNThroughputBenchmark, and we ran it on 2.6 last > > year. > > > Could have published the results, but need to rerun on more recent > > > versions. > > > > > > Three points from me on this discussion: > > > > > > 1. We should settle on the benchmarking tools. > > > For development RPCCallBenchmark is good as it measures directly the > > > improvement on the RPC layer. But for external consumption it is more > > > important to know about e.g. NameNode RPCs performance. So we probably > > > should run both benchmarks. > > > 2. SASL vs SSL. > > > Since current implementation is based on SASL, I think it would make > > sense > > > to make improvements in this direction. I assume switching to SSL would > > > require changes in configuration. Not sure if it will be compatible, > > since > > > we don't have the details. At this point I would go with HADOOP-10768. > > > Given all (Daryn's) concerns are addressed. > > > 3. Performance improvement expectations. > > > Ideally we want to have < 10% penalty for encrypted communication. > > Anything > > > over 30% will probably have very limited usability. And there is the gray > > > area in between, which could be mitigated by allowing mixed encrypted and > > > un-encrypted RPCs on the single NameNode like in HDFS-13566. > > > > > > Thanks, > > > --Konstantin > > > > > > On Wed, Oct 31, 2018 at 7:39 AM Daryn Sharp <da...@oath.com.invalid> > > > wrote: > > > > > > > Various KMS tasks have been delaying my RPC encryption work – which is > > > 2nd > > > > on TODO list. It's becoming a top priority for us so I'll try my best > > to > > > > get a preliminary netty server patch (sans TLS) up this week if that > > > helps. > > > > > > > > The two cited jiras had some critical flaws. Skimming my comments, > > both > > > > use blocking IO (obvious nonstarter). HADOOP-10768 is a hand rolled > > > > TLS-like encryption which I don't feel is something the community can > > or > > > > should maintain from a security standpoint. > > > > > > > > Daryn > > > > > > > > On Wed, Oct 31, 2018 at 8:43 AM Wei-Chiu Chuang <weic...@apache.org> > > > > wrote: > > > > > > > > > Ping. Any one? Cloudera is interested in moving forward with the RPC > > > > > encryption improvements, but I just like to get a consensus which > > > > approach > > > > > to go with. > > > > > > > > > > Otherwise I'll pick HADOOP-10768 since it's ready for commit, and > > I've > > > > > spent time on testing it. > > > > > > > > > > On Thu, Oct 25, 2018 at 11:04 AM Wei-Chiu Chuang <weic...@apache.org > > > > > > > > wrote: > > > > > > > > > > > Folks, > > > > > > > > > > > > I would like to invite all to discuss the various Hadoop RPC > > > encryption > > > > > > performance improvements. As you probably know, Hadoop RPC > > encryption > > > > > > currently relies on Java SASL, and have _really_ bad performance > > (in > > > > > terms > > > > > > of number of RPCs per second, around 15~20% of the one without > > SASL) > > > > > > > > > > > > There have been some attempts to address this, most notably, > > > > HADOOP-10768 > > > > > > <https://issues.apache.org/jira/browse/HADOOP-10768> (Optimize > > > Hadoop > > > > > RPC > > > > > > encryption performance) and HADOOP-13836 > > > > > > <https://issues.apache.org/jira/browse/HADOOP-13836> (Securing > > > Hadoop > > > > > RPC > > > > > > using SSL). But it looks like both attempts have not been > > > progressing. > > > > > > > > > > > > During the recent Hadoop contributor meetup, Daryn Sharp mentioned > > > he's > > > > > > working on another approach that leverages Netty for its SSL > > > > encryption, > > > > > > and then integrate Netty with Hadoop RPC so that Hadoop RPC > > > > automatically > > > > > > benefits from netty's SSL encryption performance. > > > > > > > > > > > > So there are at least 3 attempts to address this issue as I see it. > > > Do > > > > we > > > > > > have a consensus that: > > > > > > 1. this is an important problem > > > > > > 2. which approach we want to move forward with > > > > > > > > > > > > -- > > > > > > A very happy Hadoop contributor > > > > > > > > > > > > > > > > > > > > > -- > > > > > A very happy Hadoop contributor > > > > > > > > > > > > > > > > > -- > > > > > > > > Daryn > > > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org