Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
is an empty host between the 2nd and 3rd. This is true > of most URI schemes with a host. > > On Mon, Jan 19, 2015 at 9:56 AM, Rapelly Kartheek > wrote: > > Yes yes.. hadoop/etc/hadoop/hdfs-site.xml file has the path like: > > "hdfs://home/..." > > > > On M

Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
ne you mean it as a > root directory. > > On Mon, Jan 19, 2015 at 9:33 AM, Rapelly Kartheek > wrote: > > Hi, > > > > I get the following exception when I run my application: > > > > karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class > > org.apach

Re: UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
ssuming it's your local machine, add an entry in your /etc/hosts file > like and then run the program again (use sudo to edit the file) > > 127.0.0.1 home > > On Mon, Jan 19, 2015 at 3:03 PM, Rapelly Kartheek > wrote: > > Hi, > > > > I get the following ex

UnknownhostException : home

2015-01-19 Thread Rapelly Kartheek
Hi, I get the following exception when I run my application: karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class org.apache.spark.examples.SimpleApp001 --deploy-mode client --master spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar >out1.txt log4j:WARN No such propert

Re: Problem with building spark-1.2.0

2015-01-12 Thread Rapelly Kartheek
Yes, this proxy problem is resolved. *how your build refers tohttps://github.com/ScrapCodes/sbt-pom-reader.git I don't see thisrepo the project code base.* I manually downloaded the sbt-pom-reader directory and moved into .sbt/0.13/staging/*/ di

Re: Problem with building spark-1.2.0

2015-01-04 Thread Rapelly Kartheek
om for > cloning some dependencies as github is blocked in India. What are the other > possible ways for this problem?? > > Thank you! > > On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=20963&i=0>>

Spark-1.2.0 build error

2015-01-02 Thread rapelly kartheek
Hi, I get the following error when I build spark using sbt: [error] Nonzero exit code (128): git clone https://github.com/ScrapCodes/sbt-pom-reader.git /home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader [error] Use 'last' for the full log. Any help please?

Re: NullPointerException

2014-12-31 Thread rapelly kartheek
whether you still see the same error? > > On Wed, Dec 31, 2014 at 10:35 PM, rapelly kartheek < > kartheek.m...@gmail.com> wrote: > >> spark-1.0.0 >> >> On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen wrote: >> >>> Which version of Spark are you using? &

Fwd: NullPointerException

2014-12-31 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek Date: Thu, Jan 1, 2015 at 12:05 PM Subject: Re: NullPointerException To: Josh Rosen , user@spark.apache.org spark-1.0.0 On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen wrote: > Which version of Spark are you using? > > On We

Re: NullPointerException

2014-12-31 Thread rapelly kartheek
spark-1.0.0 On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen wrote: > Which version of Spark are you using? > > On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek < > kartheek.m...@gmail.com> wrote: > >> Hi, >> I get this following Exception when I submit spark a

NullPointerException

2014-12-31 Thread rapelly kartheek
Hi, I get this following Exception when I submit spark application that calculates the frequency of characters in a file. Especially, when I increase the size of data, I face this problem. Exception in thread "Thread-47" org.apache.spark.SparkException: Job aborted due to stage failure: Task 11.0:

Spark profiler

2014-12-29 Thread rapelly kartheek
Hi, I want to find the time taken for replicating an rdd in spark cluster along with the computation time on the replicated rdd. Can someone please suggest a suitable spark profiler? Thank you

Storage Locations of an rdd

2014-12-26 Thread rapelly kartheek
Hi, I need to find the storage locations (node Ids ) of each partition of a replicated rdd in spark. I mean, if an rdd is replicated twice, I want to find the two nodes for each partition where it is stored. Spark WebUI has a page wherein it depicts the data distribution of each rdd. But, I need to

Storage Locations of an rdd

2014-12-26 Thread rapelly kartheek
Hi, I need to find the storage locations (node Ids ) of each partition of a replicated rdd in spark. I mean, if an rdd is replicated twice, I want to find the two nodes for each partition where it is stored. Spark WebUI has a page wherein it depicts the data distribution of each rdd. But, I rea

Profiling a spark application.

2014-12-25 Thread rapelly kartheek
Hi, I want to find the time taken for replicating an rdd in spark cluster along with the computation time on the replicated rdd. Can someone please suggest some ideas? Thank you

Necessity for rdd replication.

2014-12-03 Thread rapelly kartheek
Hi, I was just thinking about necessity for rdd replication. One category could be something like large number of threads requiring same rdd. Even though, a single rdd can be shared by multiple threads belonging to "same application" , I believe we can extract better parallelism if the rdd is rep

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek
ose threads are finishing quickly. > > Thanks > Best Regards > > On Tue, Dec 2, 2014 at 2:19 PM, rapelly kartheek > wrote: > >> But, somehow, if I run this application for the second time, I find that >> the application gets executed and the results are out rega

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek
within it. >> >> Thanks >> Best Regards >> >> On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek < >> kartheek.m...@gmail.com> wrote: >> >>> Hi, >>> >>> I face the following exception when submit a spark application. The log >

Re: java.io.IOException: Filesystem closed

2014-12-02 Thread rapelly kartheek
> > On Tue, Dec 2, 2014 at 11:59 AM, rapelly kartheek > wrote: > >> Hi, >> >> I face the following exception when submit a spark application. The log >> file shows: >> >> 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener >&g

java.io.IOException: Filesystem closed

2014-12-01 Thread rapelly kartheek
Hi, I face the following exception when submit a spark application. The log file shows: 14/12/02 11:52:58 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:689) at org.apache.

How the sequence of blockManagerId's are constructed in spark/*/storage/blockManagerMasterActor.getPeers()?

2014-11-26 Thread rapelly kartheek
Hi, I've been fiddling with spark/*/storage/blockManagerMasterActor.getPeers() definition in the context of blockManagerMaster.askDriverWithReply() sending a request GetPeers(). 1) I couldn't understand what the 'selfIndex' is used for?. 2) Also, I tried modifying the 'peers' array by just elimin

[no subject]

2014-11-26 Thread rapelly kartheek
Hi, I've been fiddling with spark/*/storage/blockManagerMasterActor.getPeers() definition in the context of blockManagerMaster.askDriverWithReply() sending a request GetPeers(). 1) I couldn't understand what the 'selfIndex' is used for?. 2) Also, I tried modifying the 'peers' array by just elimin

How to access application name in the spark framework code.

2014-11-24 Thread rapelly kartheek
Hi, When I submit a spark application like this: ./bin/spark-submit --class org.apache.spark.examples.SparkKMeans --deploy-mode client --master spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar /k-means 4 0.001 Which part of the spark framework code deals with the name of t

Re: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek
Hi Akhil, I face error: " not found : value URI " On Fri, Nov 14, 2014 at 9:29 PM, rapelly kartheek wrote: > I'll just try out with object Akhil provided. > There was no problem working in shell with sc.textFile. > > Thank you Akhil and Tri. > > On Fri, N

Re: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek
9:18 PM, Bui, Tri < > tri@verizonwireless.com.invalid> wrote: > >> It should be >> >> >> >> val file = sc.textFile("hdfs:///localhost:9000/sigmoid/input.txt") >> >> >> >> 3 “///” >> >> >> >> Thanks >&

Re: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek
Akhil Das wrote: > like this? > > val file = sc.textFile("hdfs://localhost:9000/sigmoid/input.txt") > > Thanks > Best Regards > > On Fri, Nov 14, 2014 at 9:02 PM, rapelly kartheek > wrote: > >> Hi, >> I am trying to read a HDFS file from

Read a HDFS file from Spark using HDFS API

2014-11-14 Thread rapelly kartheek
Hi, I am trying to read a HDFS file from Spark "scheduler code". I could find how to write hdfs read/writes in java. But I need to access hdfs from spark using scala. Can someone please help me in this regard.

Re: Read a HDFS file from Spark source code

2014-11-11 Thread rapelly kartheek
2014 at 11:26 AM, Samarth Mailinglist < mailinglistsama...@gmail.com> wrote: > Instead of a file path, use a HDFS URI. > For example: (In Python) > > > > data = sc.textFile("hdfs://localhost/user/someuser/data") > > ​ > > On Wed, Nov 12, 2014 at 10:

Read a HDFS file from Spark source code

2014-11-11 Thread rapelly kartheek
Hi I am trying to access a file in HDFS from spark "source code". Basically, I am tweaking the spark source code. I need to access a file in HDFS from the source code of the spark. I am really not understanding how to go about doing this. Can someone please help me out in this regard. Thank you!!

Rdd replication

2014-11-09 Thread rapelly kartheek
Hi, I am trying to understand rdd replication code. In the process, I frequently execute one spark application whenever I make a change to the code to see effect. My problem is, after a set of repeated executions of the same application, I find that my cluster behaves unusually. Ideally, when I

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread rapelly kartheek
anjiv Singh > > > Regards > Sanjiv Singh > Mob : +091 9990-447-339 > > On Sun, Oct 12, 2014 at 11:45 AM, rapelly kartheek <[hidden email] > <http://user/SendEmail.jtp?type=node&node=16231&i=0>> wrote: > >> Hi, >> >> I am trying to write

How to convert a non-rdd data to rdd.

2014-10-11 Thread rapelly kartheek
Hi, I am trying to write a String that is not an rdd to HDFS. This data is a variable in Spark Scheduler code. None of the spark File operations are working because my data is not rdd. So, I tried using SparkContext.parallelize(data). But it throws error: [error] /home/karthik/spark-1.0.0/core/s

Rdd repartitioning

2014-10-10 Thread rapelly kartheek
Hi, I was facing GC overhead errors while executing an application with 570MB data(with rdd replication). In order to fix the heap errors, I repartitioned the rdd to 10: val logData = sc.textFile("hdfs:/text_data/text data.txt").persist(StorageLevel.MEMORY_ONLY_2) val parts=logData.coalesce(1

Re: rsync problem

2014-09-26 Thread rapelly kartheek
Pfeiffer wrote: > Hi, > > I assume you unintentionally did not reply to the list, so I'm adding it > back to CC. > > How do you submit your job to the cluster? > > Tobias > > > On Thu, Sep 25, 2014 at 2:21 AM, rapelly kartheek > wrote: > >> Ho

Re: rsync problem

2014-09-19 Thread rapelly kartheek
irectory > $SPARK_HOME/work is rsynced as well. > Try emptying the contents of the work folder on each node and try again. > > > > On Fri, Sep 19, 2014 at 4:53 AM, rapelly kartheek > wrote: > >> I >> * followed this command:rsync -avL --progress path/to/spark-1.0.0 >>

Re: rsync problem

2014-09-19 Thread rapelly kartheek
4 at 5:17 PM, rapelly kartheek > wrote: > >> > , >> >> * you have copied a lot of files from various hosts to >> username@slave3:path* >> only from one node to all the other nodes... >> > > I don't think rsync can do that in one command as you

Fwd: rsync problem

2014-09-19 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek Date: Fri, Sep 19, 2014 at 1:51 PM Subject: Re: rsync problem To: Tobias Pfeiffer any idea why the cluster is dying down??? On Fri, Sep 19, 2014 at 1:47 PM, rapelly kartheek wrote: > , > > > * you have copied a lot o

Re: rsync problem

2014-09-19 Thread rapelly kartheek
, * you have copied a lot of files from various hosts to username@slave3:path* only from one node to all the other nodes... On Fri, Sep 19, 2014 at 1:45 PM, rapelly kartheek wrote: > Hi Tobias, > > I've copied the files from master to all the slaves. > > On Fri, Sep

Re: rsync problem

2014-09-19 Thread rapelly kartheek
Hi Tobias, I've copied the files from master to all the slaves. On Fri, Sep 19, 2014 at 1:37 PM, Tobias Pfeiffer wrote: > Hi, > > On Fri, Sep 19, 2014 at 5:02 PM, rapelly kartheek > wrote: >> >> This worked perfectly. But, I wanted to simultaneously rsync all

rsync problem

2014-09-19 Thread rapelly kartheek
Hi, I'd made some modifications to the spark source code in the master and reflected them to the slaves using rsync. I followed this command: rsync -avL --progress path/to/spark-1.0.0 username@destinationhostname :path/to/destdirectory. This worked perfectly. But, I wanted to simultaneously rs

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
Can you please direct me to the right way of doing this. On Mon, Sep 15, 2014 at 10:18 PM, rapelly kartheek wrote: > I came across these APIs in one the scala tutorials over the net. > > On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi > wrote: > >> But the above APIs are n

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
I came across these APIs in one the scala tutorials over the net. On Mon, Sep 15, 2014 at 10:14 PM, Mohit Jaggi wrote: > But the above APIs are not for HDFS. > > On Mon, Sep 15, 2014 at 9:40 AM, rapelly kartheek > wrote: > >> Yes. I have HDFS. My cluster has 5 nodes.

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
The file gets created on the fly. So I dont know how to make sure that its accessible to all nodes. On Mon, Sep 15, 2014 at 10:10 PM, rapelly kartheek wrote: > Yes. I have HDFS. My cluster has 5 nodes. When I run the above commands, I > see that the file gets created in the master nod

Re: File I/O in spark

2014-09-15 Thread rapelly kartheek
is > accessible on ALL executors. One way to do that is to use a distributed > filesystem like HDFS or GlusterFS. > > On Mon, Sep 15, 2014 at 8:51 AM, rapelly kartheek > wrote: > >> Hi >> >> I am trying to perform some read/write file operations in spark. Som

File I/O in spark

2014-09-15 Thread rapelly kartheek
Hi I am trying to perform some read/write file operations in spark. Somehow I am neither able to write to a file nor read. import java.io._ val writer = new PrintWriter(new File("test.txt" )) writer.write("Hello Scala") Can someone please tell me how to perform file I/O in spark.

File operations on spark

2014-09-14 Thread rapelly kartheek
Hi I am trying to perform read/write file operations in spark by creating Writable object. But, I am not able to write to a file. The concerned data is not rdd. Can someone please tell me how to perform read/write file operations on non-rdd data in spark. Regards karthik

replicate() method in BlockManager.scala choosing only one node for replication.

2014-09-11 Thread rapelly kartheek
Hi, I just wanted to see the flow of nodes getting allocated for rdd replication. I see that all the blocks are getting replicated in the same node. I was expecting that each block gets replicated over different nodes. I have a humble three node spark cluster :). Below is the trace of replicate()

Re: compiling spark source code

2014-09-11 Thread rapelly kartheek
I have been doing that. All the modifications to the code are not being compiled. On Thu, Sep 11, 2014 at 10:45 PM, Daniil Osipov wrote: > In the spark source folder, execute `sbt/sbt assembly` > > On Thu, Sep 11, 2014 at 8:27 AM, rapelly kartheek > wrote: > >> HI,

compiling spark source code

2014-09-11 Thread rapelly kartheek
HI, Can someone please tell me how to compile the spark source code to effect the changes in the source code. I was trying to ship the jars to all the slaves, but in vain. -Karthik

Re: How to profile a spark application

2014-09-08 Thread rapelly kartheek
hi Ted, Where do I find the licence keys that I need to copy to the licences directory. Thank you!! On Mon, Sep 8, 2014 at 8:25 PM, rapelly kartheek wrote: > Thank you Ted. > > regards > Karthik > > On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu wrote: > >> See

Re: How to profile a spark application

2014-09-08 Thread rapelly kartheek
Thank you Ted. regards Karthik On Mon, Sep 8, 2014 at 3:33 PM, Ted Yu wrote: > See > https://cwiki.apache.org/confluence/display/SPARK/Profiling+Spark+Applications+Using+YourKit > > On Sep 8, 2014, at 2:48 AM, rapelly kartheek > wrote: > > Hi, > > Can someone tel

How to profile a spark application

2014-09-08 Thread rapelly kartheek
Hi, Can someone tell me how to profile a spark application. -Karthik

replicated rdd storage problem

2014-09-05 Thread rapelly kartheek
Hi, Whenever I replicate an rdd, I find that the rdd gets replicated only in one node. I have a 3 node cluster. I set rdd.persist(StorageLevel.MEMORY_ONLY_2) in my application. The webUI shows that its replicates twice. But, the rdd stogare details show that its replicated only once and only in

question on replicate() in blockManager.scala

2014-09-05 Thread rapelly kartheek
Hi, var cachedPeers: Seq[BlockManagerId] = null private def replicate(blockId: String, data: ByteBuffer, level: StorageLevel) { val tLevel = StorageLevel(level.useDisk, level.useMemory, level.deserialized, 1) if (cachedPeers == null) { cachedPeers = master.getPeers(blockManagerId,

Fwd: RDDs

2014-09-03 Thread rapelly kartheek
-- Forwarded message -- From: rapelly kartheek Date: Thu, Sep 4, 2014 at 11:49 AM Subject: Re: RDDs To: "Liu, Raymond" Thank you Raymond. I am more clear now. So, if an rdd is replicated over multiple nodes (i.e. say two sets of nodes as it is a collection of chunk

RDDs

2014-09-03 Thread rapelly kartheek
Hi, Can someone tell me what kind of operations can be performed on a replicated rdd?? What are the use-cases of a replicated rdd. One basic doubt that is bothering me from long time: what is the difference between an application and job in the Spark parlance. I am confused b'cas of Hadoop jargon

operations on replicated RDD

2014-09-01 Thread rapelly kartheek
Hi, An RDD replicated by an application is owned by only that application. No other applications can share it. Then, what is motive behind providing the rdd replication feature. What all oparations can be performed on the replicated RDD. Thank you!!! -karthik

Replicate RDDs

2014-08-27 Thread rapelly kartheek
Hi I have a three node spark cluster. I restricted the resources per application by setting appropriate parameters and I could run two applications simultaneously. Now, I want to replicate an RDD and run two applications simultaneously. Can someone help how to go about doing this!!! I replicated

StorageLevel error.

2014-08-25 Thread rapelly kartheek
Hi, Can someone help me with the following error: scala> val rdd = sc.parallelize(Array(1,2,3,4)) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at :12 scala> rdd.persist(StorageLevel.MEMORY_ONLY) :15: error: not found: value StorageLevel rdd.persist(S

Hi

2014-08-20 Thread rapelly kartheek
Hi I have this doubt: I understand that each java process runs on different JVM instances. Now, if I have a single executor on my machine and run several java processes, then there will be several JVM instances running. Now, process_local means, the data is located on the same JVM as the task tha

Scheduling in spark

2014-07-08 Thread rapelly kartheek
Hi, I am a post graduate student, new to spark. I want to understand how Spark scheduler works. I just have theoretical understanding of DAG scheduler and the underlying task scheduler. I want to know, given a job to the framework, after the DAG scheduler phase, how the scheduling happens?? Can

hi

2014-06-22 Thread rapelly kartheek
Hi Can someone help me with the following error that I faced while setting up single node spark framework. karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MASTER=spark://localhost:7077 sbin/spark-shell bash: sbin/spark-shell: No such file or directory karthik@karthik-OptiPlex-9020:~/spark-1.0.0$ MA

Scheduling code for Spark

2014-06-07 Thread rapelly kartheek
Hi, *I am new to Spark framework. I understood Spark framework to some extent. I have some experience with Hadoop as well. The concepts of in-memory computation and RDD's *are extremely fascinating. I am trying to understand the scheduler of Spark framework. Can someone help me out where to l