Re: RDD operation examples with data?

2014-07-31 Thread Jacob Eisinger
I would check out the source examples on Spark's Github: https://github.com/apache/spark/tree/master/examples/src/main/scala/org/apache/spark/examples And, Zhen He put together a great web page with summaries and examples of each function: http://apache-spark-user-list.1001560.n3.nabble.com/A-new-

Re: spark with docker: errors with akka, NAT?

2014-06-17 Thread Jacob Eisinger
Long story [1] short, akka opens up dynamic, random ports for each job [2]. So, simple NAT fails. You might try some trickery with a DNS server and docker's --net=host . [1] http://apache-spark-user-list.1001560.n3.nabble.com/Comprehensive-Port-Configuration-reference-tt5384.html#none [2] http:

Re: Comprehensive Port Configuration reference?

2014-05-29 Thread Jacob Eisinger
: 05/28/2014 05:18 PM Subject:Re: Comprehensive Port Configuration reference? Hmm, those do look like 4 listening ports to me.  PID 3404 is an executor and PID 4762 is a worker?  This is a standalone cluster? On Wed, May 28, 2014 at 8:22 AM, Jacob Eisinger wrote: Howdy Andrew

Re: Comprehensive Port Configuration reference?

2014-05-28 Thread Jacob Eisinger
taking a look through! I also realized that I had a couple mistakes with the 0.9 to 1.0 transition so appropriately documented those now as well in the updated PR. Cheers! Andrew On Fri, May 23, 2014 at 2:43 PM, Jacob Eisinger wrote: Howdy Andrew, I noticed you have a configuration item

Re: Comprehensive Port Configuration reference?

2014-05-23 Thread Jacob Eisinger
014 at 10:19 AM, Mark Baker wrote: On Tue, May 6, 2014 at 9:09 AM, Jacob Eisinger wrote: > In a nut shell, Spark opens up a couple of well known ports.  And,then the workers and the shell open up dynamic ports for each job.  These dynamic ports make securing the Spark network di

Re: Local Dev Env with Mesos + Spark Streaming on Docker: Can't submit jobs.

2014-05-20 Thread Jacob Eisinger
On Tue, May 6, 2014 at 3:30 PM, Jacob Eisinger wrote: Howdy, You might find the discussion Andrew and I have been having about Docker and network security [1] applicable. Also, I posted an answer [2] to your stackoverflow question. [1] http://apache-spark-user-list.1001560.n3.nabbl

Re: Comprehensive Port Configuration reference?

2014-05-06 Thread Jacob Eisinger
Howdy Scott, Please see the discussions about securing the Spark network [1] [2]. In a nut shell, Spark opens up a couple of well known ports. And,then the workers and the shell open up dynamic ports for each job. These dynamic ports make securing the Spark network difficult. Jacob [1] http:

Re: Local Dev Env with Mesos + Spark Streaming on Docker: Can't submit jobs.

2014-05-06 Thread Jacob Eisinger
Howdy, You might find the discussion Andrew and I have been having about Docker and network security [1] applicable. Also, I posted an answer [2] to your stackoverflow question. [1] http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-driver-interacting-with-Workers-in-YARN-mode-fire

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-06 Thread Jacob Eisinger
Howdy Andrew, Agreed - if that subnet is configured to only allow THOSE docker images onto it, than, yeah, I figure it would be secure. Great setup, in my opinion! (And, I think we both agree - a better one would be to have Spark only listen on well known ports to allow for a secured firewall/n

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-05 Thread Jacob Eisinger
Howdy Andrew, I agree; the subnet idea is a good one... unfortunately, it doesn't really help to secure the network. You mentioned that the drivers need to talk to the workers. I think it is slightly broader - all of the workers and the driver/shell need to be addressable from/to each other on

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Jacob Eisinger
Howdy Andrew, I think I am running into the same issue [1] as you. It appears that Spark opens up dynamic / ephemera [2] ports for each job on the shell and the workers. As you are finding out, this makes securing and managing the network for Spark very difficult. > Any idea how to restrict th

Re: Securing Spark's Network

2014-04-25 Thread Jacob Eisinger
https://groups.google.com/forum/#!topic/spark-users/PN0WoJiB0TA On Fri, Apr 25, 2014 at 8:53 PM, Jacob Eisinger wrote: Howdy, We tried running Spark 0.9.1 stand-alone inside docker containers distributed over multiple hosts. This is complicated due to Spark opening up ephemeral / dynamic ports fo

Securing Spark's Network

2014-04-25 Thread Jacob Eisinger
Howdy, We tried running Spark 0.9.1 stand-alone inside docker containers distributed over multiple hosts. This is complicated due to Spark opening up ephemeral / dynamic ports for the workers and the CLI.  To ensure our docker solution doesn't break Spark in unexpected ways and maintains a sec