Howdy Akhil, Thanks - that did help! And, it made me think about how the EC2 scripts work [1] to set up security. From my understanding of EC2 security groups [2], this just sets up external access, right? (This has no effect on internal communication between the instances, right?)
I am still confused as to why I am seeing the workers open up new ports for each job. Jacob [1] https://github.com/apache/spark/blob/master/ec2/spark_ec2.py#L230 [2] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#default-security-group Jacob D. Eisinger IBM Emerging Technologies [email protected] - (512) 286-6075 From: Akhil Das <[email protected]> To: [email protected] Date: 04/25/2014 12:51 PM Subject: Re: Securing Spark's Network Sent by: [email protected] Hi Jacob, This post might give you a brief idea about the ports being used https://groups.google.com/forum/#!topic/spark-users/PN0WoJiB0TA On Fri, Apr 25, 2014 at 8:53 PM, Jacob Eisinger <[email protected]> wrote: Howdy, We tried running Spark 0.9.1 stand-alone inside docker containers distributed over multiple hosts. This is complicated due to Spark opening up ephemeral / dynamic ports for the workers and the CLI. To ensure our docker solution doesn't break Spark in unexpected ways and maintains a secure cluster, I am interested in understanding more about Spark's network architecture. I'd appreciate it if you could you point us to any documentation! A couple specific questions: 1. What are these ports being used for? Checking out the code / experiments, it looks like asynchronous communication for shuffling around results. Anything else? 2. How do you secure the network? Network administrators tend to secure and monitor the network at the port level. If these ports are dynamic and open randomly, firewalls are not easily configured and security alarms are raised. Is there a way to limit the range easily? (We did investigate setting the kernel parameter ip_local_reserved_ports, but this is broken [1] on some versions of Linux's cgroups.) Thanks, Jacob [1] https://github.com/lxc/lxc/issues/97 Jacob D. Eisinger IBM Emerging Technologies [email protected] - (512) 286-6075
