Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-09 Thread Ashish Dutt
of yarn and spark is that these binaries get compressed > and packaged with Java to be pushed to work node. > Regards, > On Sep 7, 2015 9:00 PM, "Ashish Dutt" wrote: > >> Hello Sasha, >> >> I have no answer for debian. My cluster is on Linux and I'm us

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-07 Thread Ashish Dutt
packages > required to run app. > > If someone confirms that I need to build everything from source with > specific version of software I will do that, but at this point I am not > sure what to do to remedy this situation... > > --sasha > > > On Sun, Sep 6, 2015 at 8:27

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-06 Thread Ashish Dutt
flow <http://stackoverflow.com/search?q=no+module+named+pyspark> website Sincerely, Ashish Dutt On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski wrote: > Hi, > I am successfully running python app via pyCharm in local mode > setMaster("local[*]") > > When I turn on Spa

How to connect to remote HDFS programmatically to retrieve data, analyse it and then write the data back to HDFS?

2015-08-05 Thread Ashish Dutt
y.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent

PySpark in Pycharm- unable to connect to remote server

2015-08-05 Thread Ashish Dutt
:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call l

Re: Is it possible to change the default port number 7077 for spark?

2015-07-13 Thread Ashish Dutt
Hello Arun, Thank you for the descriptive response. And thank you for providing the sample file too. It certainly is a great help. Sincerely, Ashish On Mon, Jul 13, 2015 at 10:30 PM, Arun Verma wrote: > > PFA sample file > > On Mon, Jul 13, 2015 at 7:37 PM, Arun Verma > wrote: > >> Hi, >> >>

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-07-13 Thread Ashish Dutt
n Windows environment? What I mean is how to setup .libPaths()? where is it in windows environment Thanks for your help Sincerely, Ashish Dutt On Mon, Jul 13, 2015 at 3:48 PM, Sun, Rui wrote: > Hi, Kachau, > > If you are using SparkR with RStudio, have you followed the guideli

Re: Connecting to nodes on cluster

2015-07-09 Thread Ashish Dutt
Hello Akhil, Thanks for the response. I will have to figure this out. Sincerely, Ashish On Thu, Jul 9, 2015 at 3:40 PM, Akhil Das wrote: > On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt > wrote: > >> Hi, >> >> We have a cluster with 4 nodes. The cluster uses CDH 5.4

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
written something wrong here. Cannot seem to figure out, what is it? Thank you for your help Sincerely, Ashish Dutt On Thu, Jul 9, 2015 at 11:53 AM, Sujit Pal wrote: > Hi Ashish, > > >> Nice post. > Agreed, kudos to the author of the post, Benjamin Benfort of District Labs. >

DLL load failed: %1 is not a valid win32 application on invoking pyspark

2015-07-08 Thread Ashish Dutt
your help. Sincerely, Ashish Dutt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
N", MAVEN_HOME="D:\MAVEN\BIN", PYTHON_HOME="C:\PYTHON27\", SBT_HOME="C:\SBT\" Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Thu, Jul 9, 2015 at 4:56 AM, Sujit Pal wrote: >

Re: Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
The error is JVM has not responded after 10 seconds. On 08-Jul-2015 10:54 PM, "ayan guha" wrote: > What's the error you are getting? > On 9 Jul 2015 00:01, "Ashish Dutt" wrote: > >> Hi, >> >> We have a cluster with 4 nodes. The cluster uses CD

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
k cluster. Not sure if that is possible. > > - Sooraj > > On 8 July 2015 at 15:31, Ashish Dutt wrote: >> >> My apologies for double posting but I missed the web links that i followed which are 1, 2, 3 >> >> Thanks, >> Ashish >> >> On Wed, Jul 8

Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
connect to the nodes I am usnig SSH Question: Would it be better if I work directly on the nodes rather than trying to connect my laptop to them ? Question 2: If yes, then can you suggest any python and R IDE that I can install on the nodes to make it work? Thanks for your help Sincerely, Ashish

Re: Getting started with spark-scala developemnt in eclipse.

2015-07-08 Thread Ashish Dutt
Hello Prateek, I started with getting the pre built binaries so as to skip the hassle of building them from scratch. I am not familiar with scala so can't comment on it. I have documented my experiences on my blog www.edumine.wordpress.com Perhaps it might be useful to you. On 08-Jul-2015 9:39 PM,

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
My apologies for double posting but I missed the web links that i followed which are 1 , 2 , 3

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
and hence not much help to me. I am able to launch ipython on localhost but cannot get it to work on the cluster Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 5:49 PM, sooraj wrote: > That turned out to be a silly data type mistake. At one point in the > iterative call, I was passing an i

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks you Akhil for the link Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Wed, Jul 8, 2015 at 3:43 PM, Akhil Das wrote: > Have a look > http://alvinalexander.com/scala/how-to-create-java-

How to upgrade Spark version in CDH 5.4

2015-07-08 Thread Ashish Dutt
Hi, I need to upgrade spark version 1.3 to version 1.4 on CDH 5.4. I checked the documentation here

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks for your reply Akhil. How do you multithread it? Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 3:29 PM, Akhil Das wrote: > Whats the point of creating them in parallel? You can multi-thread it run > it in parallel though. > > Thanks > Best Regards > > On Wed, J

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
. All I want for now is how to connect my laptop to the spark cluster machine using either pyspark or SparkR. (I have python 2.7) On my laptop I am using winutils in place of hadoop and have spark 1.4 installed Thank you Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
g4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/07/08 11:28:35 INFO SecurityManager: Changing view acls to: Ashish Dutt 15/07/08 11:28:35 INFO Securit

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
Thank you Ayan for your response.. But I have just realised that the Spark is configured to be a history server. Please, can somebody suggest to me how can I convert Spark history server to be a Master server? Thank you Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 12:28 PM, ayan guha wrote

How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
Hi, I have CDH 5.4 installed on a linux server. It has 1 cluster in which spark is deployed as a history server. I am trying to connect my laptop to the spark history server. When I run spark-shell master ip: port number I get the following output How can I verify that the worker is connected to th

Re: JVM is not ready after 10 seconds

2015-07-06 Thread Ashish Dutt
# spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" Sincerely, Ashish Dutt On Tue, Jul 7, 2015 at 9:30 AM, Ashish Dutt wrote: > Hello Shivaram, > Thank you for your response. Being a novice at this st

Re: JVM is not ready after 10 seconds

2015-07-06 Thread Ashish Dutt
Hello Shivaram, Thank you for your response. Being a novice at this stage can you also tell how to configure or set the execute permission for the spark-submit file? Thank you for your time. Sincerely, Ashish Dutt On Tue, Jul 7, 2015 at 9:21 AM, Shivaram Venkataraman < sh

JVM is not ready after 10 seconds.

2015-07-06 Thread Ashish Dutt
I am using windows 7 as the OS on the worker machine and I am invoking the sparkR.init() from RStudio Any help in this reference will be appreciated Thank you, Ashish Dutt