Re: Spark 1.0.0 rc3

2014-05-03 Thread Nan Zhu
SPARK_HADOOP_VERSION=2.3.0 sbt/sbt assembly 

and copy the generated jar to lib/ directory of my application, 

it seems that sbt cannot find the dependencies in the jar?

but everything works with the pre-built jar files downloaded from the link 
provided by Patrick

Best, 

-- 
Nan Zhu


On Thursday, May 1, 2014 at 11:16 PM, Madhu wrote:

> I'm guessing EC2 support is not there yet?
> 
> I was able to build using the binary download on both Windows 7 and RHEL 6
> without issues.
> I tried to create an EC2 cluster, but saw this:
> 
> ~/spark-ec2
> Initializing spark
> ~ ~/spark-ec2
> ERROR: Unknown Spark version
> Initializing shark
> ~ ~/spark-ec2 ~/spark-ec2
> ERROR: Unknown Shark version
> 
> The spark dir on the EC2 master has only a conf dir, so it didn't deploy
> properly.
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-0-0-rc3-tp6427p6456.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com 
> (http://Nabble.com).
> 
> 




Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Hi,

I have written a code that works just about fine in the spark shell on EC2.
The ec2 script helped me configure my master and worker nodes. Now I want to
run the scala-spark code out side the interactive shell. How do I go about
doing it.

I was referring to the instructions mentioned here:
https://spark.apache.org/docs/0.9.1/quick-start.html

But this is confusing because it mentions about a simple project jar file
which I am not sure how to generate. I only have the file that runs directly
on my spark shell. Any easy intruction to get this quickly running as a job?

Thanks
AJ



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Apache Spark running out of the spark shell

2014-05-03 Thread Sandy Ryza
Hi AJ,

You might find this helpful -
http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/

-Sandy


On Sat, May 3, 2014 at 8:42 AM, Ajay Nair  wrote:

> Hi,
>
> I have written a code that works just about fine in the spark shell on EC2.
> The ec2 script helped me configure my master and worker nodes. Now I want
> to
> run the scala-spark code out side the interactive shell. How do I go about
> doing it.
>
> I was referring to the instructions mentioned here:
> https://spark.apache.org/docs/0.9.1/quick-start.html
>
> But this is confusing because it mentions about a simple project jar file
> which I am not sure how to generate. I only have the file that runs
> directly
> on my spark shell. Any easy intruction to get this quickly running as a
> job?
>
> Thanks
> AJ
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>


Re: Apache Spark running out of the spark shell

2014-05-03 Thread Nicolas Garneau
Hey AJ,

I created a little sample app using the spark's quick start.
Have a look here.
Assuming you used scala, using sbt is good for running your application in 
standalone mode.
The configuration file which is "simple.sbt" in my repo, holds all the 
dependencies needed to build your app.

Hope this helps!

Le 2014-05-03 à 11:42, Ajay Nair  a écrit :

> Hi,
> 
> I have written a code that works just about fine in the spark shell on EC2.
> The ec2 script helped me configure my master and worker nodes. Now I want to
> run the scala-spark code out side the interactive shell. How do I go about
> doing it.
> 
> I was referring to the instructions mentioned here:
> https://spark.apache.org/docs/0.9.1/quick-start.html
> 
> But this is confusing because it mentions about a simple project jar file
> which I am not sure how to generate. I only have the file that runs directly
> on my spark shell. Any easy intruction to get this quickly running as a job?
> 
> Thanks
> AJ
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 

Nicolas Garneau
ngarn...@ngarneau.com



Re: Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Thank you for the reply. Have you posted a link from where I follow the steps
?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6462.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Apache Spark running out of the spark shell

2014-05-03 Thread Nicolas Garneau
Sorry, the link went wrong. I meant here:
https://github.com/ngarneau/spark-standalone

Le 2014-05-03 à 13:23, Nicolas Garneau  a écrit :

> Hey AJ,
> 
> I created a little sample app using the spark's quick start.
> Have a look here.
> Assuming you used scala, using sbt is good for running your application in 
> standalone mode.
> The configuration file which is "simple.sbt" in my repo, holds all the 
> dependencies needed to build your app.
> 
> Hope this helps!
> 
> Le 2014-05-03 à 11:42, Ajay Nair  a écrit :
> 
>> Hi,
>> 
>> I have written a code that works just about fine in the spark shell on EC2.
>> The ec2 script helped me configure my master and worker nodes. Now I want to
>> run the scala-spark code out side the interactive shell. How do I go about
>> doing it.
>> 
>> I was referring to the instructions mentioned here:
>> https://spark.apache.org/docs/0.9.1/quick-start.html
>> 
>> But this is confusing because it mentions about a simple project jar file
>> which I am not sure how to generate. I only have the file that runs directly
>> on my spark shell. Any easy intruction to get this quickly running as a job?
>> 
>> Thanks
>> AJ
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459.html
>> Sent from the Apache Spark Developers List mailing list archive at 
>> Nabble.com.
>> 
> 
> Nicolas Garneau
> ngarn...@ngarneau.com
> 

Nicolas Garneau
418.569.3097
ngarn...@ngarneau.com



Re: Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Thank you. Let me try this quickly !



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6463.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Apache Spark running out of the spark shell

2014-05-03 Thread Ajay Nair
Quick question, where should I place your folder. Inside the spark directory.
My Spark directory is in /root/spark
So currently I tried pulling your github code in /root/spark/spark-examples
and modified my home spark directory in the scala code.
I copied the sbt folder within the spark-examples folder. But when I try
running this command

$root/spark/spark-examples: sbt/sbt package

awk: cmd. line:1: fatal: cannot open file `./project/build.properties' for
reading (No such file or directory)
Launching sbt from sbt/sbt-launch-.jar
Error: Invalid or corrupt jarfile sbt/sbt-launch-.jar


However the sbt package runs fines (Expectedly) when i run it from
/root/spark folder.

Anything I am doing wrong here?





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6465.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: Apache Spark running out of the spark shell

2014-05-03 Thread Nicolas Garneau
Hey AJ,

As I can see your path when running sbt is:

> $root/spark/spark-examples: sbt/sbt package

You should be within the app's folder that contains the simple.sbt, which is 
spark-standalone/;

> $root/spark/spark-examples/spark-standalone: sbt/sbt package
> $root/spark/spark-examples/spark-standalone: sbt/sbt run


Don't forget to move the sbt folder within your app's directory.

That being said, I think you can install sbt globally on your system so you'll 
be able to run the sbt command everywhere on your PC.
It'll be useful when creating multiple apps.

For example, the way I'm building it from A to Z:
$ git clone https://github.com/ngarneau/spark-standalone.git
$ cd spark-standalone
-- change the path of spark's home dir
$ sbt package (assuming sbt is installed globally)
$ sbt run (assuming sbt is installed globally)

Hope this helps!

Le 2014-05-03 à 13:38, Ajay Nair  a écrit :

> Quick question, where should I place your folder. Inside the spark directory.
> My Spark directory is in /root/spark
> So currently I tried pulling your github code in /root/spark/spark-examples
> and modified my home spark directory in the scala code.
> I copied the sbt folder within the spark-examples folder. But when I try
> running this command
> 
> $root/spark/spark-examples: sbt/sbt package
> 
> awk: cmd. line:1: fatal: cannot open file `./project/build.properties' for
> reading (No such file or directory)
> Launching sbt from sbt/sbt-launch-.jar
> Error: Invalid or corrupt jarfile sbt/sbt-launch-.jar
> 
> 
> However the sbt package runs fines (Expectedly) when i run it from
> /root/spark folder.
> 
> Anything I am doing wrong here?
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Apache-Spark-running-out-of-the-spark-shell-tp6459p6465.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 

Nicolas Garneau
ngarn...@ngarneau.com



Re: Mailing list

2014-05-03 Thread Matei Zaharia
Hi Nicolas,

Good catches on these things.

> Your website seems a little bit incomplete. I have found this page [1] with 
> list the two main mailing lists, users and dev. But I see a reference to a 
> mailing list about "issues" which tracks the sparks issues when it was hosted 
> at Atlassian. I guess it has moved ? where ?
> And is there any mailing about the commits ?

Good catch, this was an old link and I’ve fixed it now. I also added the one 
for commits.

> Also, I found it weird that there is no page that is referencing the true 
> code source, the git at the ASF, I only found references to the git at github.

The GitHub repo is actually a mirror managed by the ASF, but the “git tag” link 
at http://spark.apache.org/downloads.html also points to the source repo. The 
problem is that our contribution process is through GitHub so it’s easier to 
point people to something that they can use to contribute.

> I am also interested in your workflow, because Ant is moving from svn to git 
> and we're still a little bit in the grey about the workflow. I am thus 
> intrigued how do you work with github pull requests.

Take a look at 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and 
https://cwiki.apache.org/confluence/display/SPARK/Reviewing+and+Merging+Patches 
to see our contribution process. In a nutshell, it works as follows:

- Anyone can make a patch by forking the GitHub repo and sending a pull request 
(GitHub’s internal patch mechanism)
- Committers review the patch and ask for changes; contributors can push 
additional changes into their pull request to respond
- When the patch looks good, we use a script to merge it into the source Apache 
repo; this also squashes the changes into one commit, making the Git history 
sane and facilitating reverts, cherry-picks into other branches, etc.

Note by the way that using GitHub is not at all necessary for using Git. We 
happened to do our development on GitHub before moving to the ASF, and all our 
developers were used to its interface, so we stuck with it. It definitely beats 
attaching patches on JIRA but it may not be the first step you want to take in 
moving to Git.

Matei

> 
> Nicolas
> 
> [1] https://spark.apache.org/community.html
>