java.io.IOException: connection closed.

2015-01-24 Thread Kartheek.R
Hi,
While running spark application,  I get the following Exception leading to
several failed stages.

Exception in thread "Thread-46" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 11.0 (TID 262, s4):
java.io.IOException: Connection from s5/10.0.1.93:53496 closed
at
org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:98)
at
org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:81)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
at
io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
at
io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
at
io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:738)
at
io.netty.channel.AbstractChannel$AbstractUnsafe$6.run(AbstractChannel.java:606)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Can someone please tell me how to resolve this issue?

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-connection-closed-tp21348.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.io.IOException: connection closed.

2015-01-24 Thread Kartheek.R
When I increase the executor.memory size, I run it smoothly without any
errors.



On Sat, Jan 24, 2015 at 9:29 PM, Rapelly Kartheek 
wrote:

> Hi,
> While running spark application,  I get the following Exception leading to
> several failed stages.
>
> Exception in thread "Thread-46" org.apache.spark.SparkException: Job
> aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 11.0 (TID 262, s4):
> java.io.IOException: Connection from s5/10.0.1.93:53496 closed
> at
> org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:98)
> at
> org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:81)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
> at
> io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
> at
> io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:738)
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe$6.run(AbstractChannel.java:606)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
> at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> Can someone please tell me how to resolve this issue?
>
> Thanks
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-java-io-IOException-connection-closed-tp21349.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Connection closed/reset by peers error

2015-02-01 Thread Kartheek.R
Hi,

I keep facing this error when I run my application:

java.io.IOException: Connection from s1/- closed +details

java.io.IOException: Connection from s1/:43741 closed
at 
org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:98)
at 
org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:81)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:738)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe$6.run(AbstractChannel.java:606)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)


I have a 6 node cluster. I set executor memory = 5G as this is the
max. memory the weakest machine in the cluster can afford.

Any help please?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Connection-closed-reset-by-peers-error-tp21459.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Question about recomputing lost partition of rdd ?

2015-02-06 Thread Kartheek.R
Hi,

I have this doubt: Assume that an rdd is stored across multiple nodes and
one of the nodes fails. So, a partition is lost. Now, I know that when this
node is back, it uses the lineage from its neighbours and recomputes that
partition alone.

1) How does it get the source data (original data before applying any
transformations) that is lost during the crash. Is it our responsibility to
get back the source data before using the lineage?.  We have only lineage
stored on other nodes.

2)Suppose the underlying HDFS deploys replication factor =3. We know that
spark doesn't replicate RDD. When a partition is lost, is there a
possibility to use the second copy of the original data stored in HDFS and
generate the required partition using lineage from other nodes?.

3)Does it make any difference to spark if HDFS replicates its blocks more
that once?

Can someone please enlighten me on these fundamentals?

Thank you




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Question-about-recomputing-lost-partition-of-rdd-tp21535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Need a spark application.

2015-02-09 Thread Kartheek.R
Hi,

Can someone please suggest some real life application implemented in spark
( things like gene sequencing) that is of type below code. Basically, the
application should have jobs submitted via as many threads as possible.  I
need similar kind of spark application for benchmarking.


val threadA = new Thread(new Runnable {
  def run() {
  for(i<- 0 until end)
  {
val numAs = logData.filter(line => line.contains("a"))
  //  numAs.saveAsTextFile("hdfs:/t1")
println("Lines with a: %s".format(numAs.count))
  }
 }
})

   val threadB = new Thread(new Runnable {
  def run() {
  for(i<- 0 until end)
  {
val numBs = logData.filter(line => line.contains("b"))
  //  numBs.saveAsTextFile("hdfs:/t2")
println("Lines with b: %s".format(numBs.count))
  }
  }
})

val threadC = new Thread(new Runnable {
  def run() {
  for(i<- 0 until end)
  {
   val numCs = logData.filter(line => line.contains("c"))
 //   numCs.saveAsTextFile("hdfs:/t3")
println("Lines with c: %s".format( numCs.count))
  }
  }
})

 val threadD = new Thread(new Runnable {
  def run() {
 for(i<- 0 until end)
  {
   val numDs = logData.filter(line => line.contains("d"))
   // numDs.saveAsTextFile("hdfs:/t4")
println("Lines with d: %s".format( numDs.count))
   }
  }
})

Regards
Karthik




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Need-a-spark-application-tp21552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Inconsistent execution times for same application.

2015-02-15 Thread Kartheek.R
Hi,
My spark cluster contains machines like Pentium-4, dual core and quad-core
machines. I am trying to run a character frequency count application. The
application contains several threads, each submitting a job(action) that
counts the frequency of a single character. But, my problem is, I get
different execution times each time I run the same application with same
data (1G text data). Sometimes the difference is as huge as 10-15 mins. I
think, this pertains to scheduling when the cluster is heterogeneous in
nature. Can someone please tell me how tackle this issue?. I need to get
consistent results. Any suggestions please!!

I cache() the rdd. Total 7 slave nodes. Executor memory=2500m.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Inconsistent-execution-times-for-same-application-tp21662.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Caching RDD

2015-02-19 Thread Kartheek.R
Hi,
I have HDFS file of size 598MB. I create RDD over this file and cache it in
RAM in a 7 node cluster with 2G RAM each. I find that each partition gets
replicated thrice or even 4 times in the cluster even without me specifying
in code. Total partitions are 5 for the RDD created but cached partitions
are 17. I want to know if Spark replicates RDD automatically whenever there
is scope (availability of extra resources)?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Caching-RDD-tp21728.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: java.io.IOException: Filesystem closed

2015-02-21 Thread Kartheek.R
Are you replicating any RDDs?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-Filesystem-closed-tp20150p21749.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Task not serializable exception

2015-02-23 Thread Kartheek.R
Hi,

I have a file containig data in the following way:
0.0 0.0 0.0
0.1 0.1 0.1
0.2 0.2 0.2
9.0 9.0 9.0
9.1 9.1 9.1
9.2 9.2 9.2
Now I do the folloowing:

val kPoints = data.takeSample(withReplacement = false, 4, 42).toArray
val thread1= new Thread(new Runnable { 
  def run() {
val dist1 =data.map(x => squaredDistance(x,kPoints(0)))
 
 }  
})
thread1.start

I am facing Task not serializable exception:
Exception in thread "Thread-32" org.apache.spark.SparkException: Task not
serializable
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1435)
at org.apache.spark.rdd.RDD.map(RDD.scala:271)
at org.apache.spark.examples.SparkKart$$anon$1.run(SparkKart.scala:67)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.NotSerializableException:
org.apache.spark.examples.SparkKart$$anon$1
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
... 5 more

Any helpl please!!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Task-not-serializable-exception-tp21776.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Task not serializable exception

2015-02-23 Thread Kartheek.R
I could trace where the problem is. If I run without any threads, it works
fine. When I allocate threads, I run into Not serializable  problem. But, I
need to have threads in my code.

Any help please!!!

This is my code:
object SparkKart
{
def parseVector(line: String): Vector[Double] = {
DenseVector(line.split(' ').map(_.toDouble))
  }
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("SparkKart")
val sc = new SparkContext(sparkConf)
val lines = sc.textFile(args(0))
val data = lines.map(parseVector
_).persist(StorageLevel.MEMORY_ONLY_SER)
val kPoints = data.takeSample(withReplacement = false, 4, 42).toArray

val thread1= new Thread(new Runnable {
  def run() {
val dist1 =data.map(x => squaredDistance(x,kPoints(0)))
dist1.saveAsTextFile("hdfs:/kart")

 } 
})
thread1.start

}
}



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Task-not-serializable-exception-tp21776p21778.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Task not serializable exception

2015-02-24 Thread Kartheek.R
Hi,
I run into Task not Serializable excption with following code below. When I
remove the threads and run, it works, but with threads I run into Task not
serializable exception.

object SparkKart extends Serializable{
 def parseVector(line: String): Vector[Double] = {
DenseVector(line.split(' ').map(_.toDouble))
  }
  def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("SparkKart")
val sc = new SparkContext(sparkConf)
val lines = sc.textFile(args(0))

   val data = lines.map(parseVector _)

val kPoints = data.takeSample(withReplacement = false, 4, 42).toArray

   val thread1= new Thread (new Runnable {
  def run() {
val dist1 = data.map(squaredDistance(_ , kPoints(0)))
   dist1.saveAsTextFile("hdfs:/kart3")
   }
})
 val thread2= new Thread (new Runnable {
  def run() {
val dist1 =data.map(squaredDistance(_, kPoints(1)))
dist1.saveAsTextFile("hdfs:/kart2")
   }
})
  val thread3= new Thread (new Runnable {
  def run() {
val dist1 =data.map(squaredDistance(_, kPoints(2)))
dist1.saveAsTextFile("hdfs:/kart1")
   }
})
thread1.start
thread2.start
thread3.start

}
}

Any help please?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Task-not-serializable-exception-tp21795.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Job submission via multiple threads

2015-02-26 Thread Kartheek.R
Hi,

I just wrote an application that intends to submit its actions(jobs) via
independent threads keeping in view of the point: "Second, within each
Spark application, multiple “jobs” (Spark actions) may be running
concurrently if they were submitted by different threads", mentioned in:
https://spark.apache.org/docs/0.9.0/job-scheduling.html

val a = sc.textFile("hdfs:\.")
val b =a.sometransformations

Thread1 = {
 /* An action on RDD b */
}

Thread2 = {
/* An action on RDD b */
}

and so on. Basically, different actions are performed on same RDD. Now, I
just want to know if writing code like this with threads makes any
difference in execution when there are no threads written in the code ( It
means all actions are submitted via one thread, I guess). Before that, is
this a way to submits actions via multiple threads?

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Job-submission-via-multiple-threads-tp21825.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: RDDs

2015-03-03 Thread Kartheek.R
Hi TD,
"You can always run two jobs on the same cached RDD, and they can run in
parallel (assuming you launch the 2 jobs from two different threads)"

Is this a correct way to launch jobs from two different threads?

val threadA = new Thread(new Runnable { 
  def run() {
  for(i<- 0 until end)
  {  
val numAs = logData.filter(line => line.contains("a")) 
println("Lines with a: %s".format(numAs.count)) 
  } 
 } 
}) 

   val threadB = new Thread(new Runnable { 
  def run() { 
  for(i<- 0 until end)
  {
val numBs = logData.filter(line => line.contains("b")) 
println("Lines with b: %s".format(numBs.count)) 
  }
  } 
}) 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p21892.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Scheduling in spark

2014-07-14 Thread Kartheek.R
Thank you so much for the link, Sujeet.

regards
Karthik



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9716.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Scheduling in spark

2014-07-14 Thread Kartheek.R
Thank you Andrew for the updated link.

regards
Karthik



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scheduling-in-spark-tp9035p9717.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


RE: RDDs

2014-09-03 Thread Kartheek.R
Thank you Raymond and Tobias. 
Yeah, I am very clear about what I was asking. I was talking about
"replicated" rdd only. Now that I've got my understanding about job and
application validated, I wanted to know if we can replicate an rdd and run
two jobs (that need same rdd) of an application in parallel?.

-Karthk




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13416.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: RDDs

2014-09-04 Thread Kartheek.R
Thank you yuanbosoft. 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: replicate() method in BlockManager.scala choosing only one node for replication.

2014-09-12 Thread Kartheek.R
When I see the storage details of the rdd in the webUI, I find that each
block is replicated twice and not on a single node. All the nodes in the
cluster are hosting some block or the other.

Why is this difference?? The trace of replicate() method shows only one
node. But, webUI shows multiple nodes.

Can someone correct me if my understanding is not correct.

-Karthik



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/replicate-method-in-BlockManager-scala-choosing-only-one-node-for-replication-tp14059p14072.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread Kartheek.R
Hi Sean,
I tried even with sc as: sc.parallelize(data). But. I get the error: value
sc not found.

On Sun, Oct 12, 2014 at 1:47 PM, sowen [via Apache Spark User List] <
ml-node+s1001560n16233...@n3.nabble.com> wrote:

> It is a method of the class, not a static method of the object. Since a
> SparkContext is available as sc in the shell, or you have perhaps created
> one similarly in your app, write sc.parallelize(...)
> On Oct 12, 2014 7:15 AM, "rapelly kartheek" <[hidden email]
> > wrote:
>
>> Hi,
>>
>> I am trying to write a String that is not an rdd to HDFS. This data is a
>> variable in Spark Scheduler code. None of the spark File operations are
>> working because my data is not rdd.
>>
>> So, I tried using SparkContext.parallelize(data). But it throws error:
>>
>> [error]
>> /home/karthik/spark-1.0.0/core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala:265:
>> not found: value SparkContext
>> [error]  SparkContext.parallelize(result)
>> [error]  ^
>> [error] one error found
>>
>> I realized that this data is part of the Scheduler. So, the Sparkcontext
>> would not have got created yet.
>>
>> Any help in "writing scheduler variable data to HDFS" is appreciated!!
>>
>> -Karthik
>>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-convert-a-non-rdd-data-to-rdd-tp16230p16233.html
>  To unsubscribe from How to convert a non-rdd data to rdd., click here
> 
> .
> NAML
> 
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-convert-a-non-rdd-data-to-rdd-tp16230p16234.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread Kartheek.R
Does SparkContext exists when this part (AskDriverWithReply()) of the
scheduler code gets executed?

On Sun, Oct 12, 2014 at 1:54 PM, rapelly kartheek 
wrote:

> Hi Sean,
> I tried even with sc as: sc.parallelize(data). But. I get the error: value
> sc not found.
>
> On Sun, Oct 12, 2014 at 1:47 PM, sowen [via Apache Spark User List] <
> ml-node+s1001560n16233...@n3.nabble.com> wrote:
>
>> It is a method of the class, not a static method of the object. Since a
>> SparkContext is available as sc in the shell, or you have perhaps created
>> one similarly in your app, write sc.parallelize(...)
>> On Oct 12, 2014 7:15 AM, "rapelly kartheek" <[hidden email]
>> > wrote:
>>
>>> Hi,
>>>
>>> I am trying to write a String that is not an rdd to HDFS. This data is a
>>> variable in Spark Scheduler code. None of the spark File operations are
>>> working because my data is not rdd.
>>>
>>> So, I tried using SparkContext.parallelize(data). But it throws error:
>>>
>>> [error]
>>> /home/karthik/spark-1.0.0/core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala:265:
>>> not found: value SparkContext
>>> [error]  SparkContext.parallelize(result)
>>> [error]  ^
>>> [error] one error found
>>>
>>> I realized that this data is part of the Scheduler. So, the Sparkcontext
>>> would not have got created yet.
>>>
>>> Any help in "writing scheduler variable data to HDFS" is appreciated!!
>>>
>>> -Karthik
>>>
>>
>>
>> --
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-convert-a-non-rdd-data-to-rdd-tp16230p16233.html
>>  To unsubscribe from How to convert a non-rdd data to rdd., click here
>> 
>> .
>> NAML
>> 
>>
>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-convert-a-non-rdd-data-to-rdd-tp16230p16235.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to access application name in the spark framework code.

2014-11-24 Thread Kartheek.R
Hi Deng,

Thank you. That works perfectly:)

Regards
Karthik.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-access-application-name-in-the-spark-framework-code-tp19719p19723.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Problem with building spark-1.2.0

2015-01-04 Thread Kartheek.R
Hi,

I get the following error when I build spark-1.2.0 using sbt:

[error] Nonzero exit code (128): git clone
https://github.com/ScrapCodes/sbt-pom-reader.git
/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
[error] Use 'last' for the full log.

Any help please?

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-building-spark-1-2-0-tp20961.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Problem with building spark-1.2.0

2015-01-04 Thread Kartheek.R
The problem is that my network is not able to access github.com for cloning
some dependencies as github is blocked in India. What are the other
possible ways for this problem??

Thank you!

On Sun, Jan 4, 2015 at 9:45 PM, Rapelly Kartheek 
wrote:

> Hi,
>
> I get the following error when I build spark-1.2.0 using sbt:
>
> [error] Nonzero exit code (128): git clone
> https://github.com/ScrapCodes/sbt-pom-reader.git
> /home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
> [error] Use 'last' for the full log.
>
> Any help please?
>
> Thanks
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-Problem-with-building-spark-1-2-0-tp20963.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Problem with building spark-1.2.0

2015-01-12 Thread Kartheek.R
Hi,
This is what I am trying to do:

karthik@s4:~/spark-1.2.0$ SPARK_HADOOP_VERSION=2.3.0 sbt/sbt clean
Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
[info] Loading project definition from
/home/karthik/spark-1.2.0/project/project
Cloning into
'/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader'...
fatal: unable to access 'https://github.com/ScrapCodes/sbt-pom-reader.git/':
Received HTTP code 407 from proxy after CONNECT
java.lang.RuntimeException: Nonzero exit code (128): git clone
https://github.com/ScrapCodes/sbt-pom-reader.git
/home/karthik/.sbt/0.13/staging/ad8e8574a5bcb2d22d23/sbt-pom-reader
at scala.sys.package$.error(package.scala:27)
at sbt.Resolvers$.run(Resolvers.scala:127)
at sbt.Resolvers$.run(Resolvers.scala:117)
at sbt.Resolvers$$anon$2.clone(Resolvers.scala:74)
at
sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11$$anonfun$apply$5.apply$mcV$sp(Resolvers.scala:99)
at sbt.Resolvers$.creates(Resolvers.scala:134)
at
sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11.apply(Resolvers.scala:98)
at
sbt.Resolvers$DistributedVCS$$anonfun$toResolver$1$$anonfun$apply$11.apply(Resolvers.scala:97)
at
sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(BuildLoader.scala:88)
at
sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$3.apply(BuildLoader.scala:87)
at scala.Option.map(Option.scala:145)
at 
sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:87)
at 
sbt.BuildLoader$$anonfun$componentLoader$1.apply(BuildLoader.scala:83)
at sbt.MultiHandler.apply(BuildLoader.scala:15)
at sbt.BuildLoader.apply(BuildLoader.scala:139)
at sbt.Load$.loadAll(Load.scala:334)
at sbt.Load$.loadURI(Load.scala:289)
at sbt.Load$.load(Load.scala:285)
at sbt.Load$.load(Load.scala:276)
at sbt.Load$.apply(Load.scala:130)
at sbt.Load$.buildPluginDefinition(Load.scala:819)
at sbt.Load$.buildPlugins(Load.scala:785)
at sbt.Load$.plugins(Load.scala:773)
at sbt.Load$.loadUnit(Load.scala:431)
at sbt.Load$$anonfun$18$$anonfun$apply$11.apply(Load.scala:281)
at sbt.Load$$anonfun$18$$anonfun$apply$11.apply(Load.scala:281)
at
sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:91)
at
sbt.BuildLoader$$anonfun$componentLoader$1$$anonfun$apply$4$$anonfun$apply$5$$anonfun$apply$6.apply(BuildLoader.scala:90)
at sbt.BuildLoader.apply(BuildLoader.scala:140)
at sbt.Load$.loadAll(Load.scala:334)
at sbt.Load$.loadURI(Load.scala:289)
at sbt.Load$.load(Load.scala:285)
at sbt.Load$.load(Load.scala:276)
at sbt.Load$.apply(Load.scala:130)
at sbt.Load$.defaultLoad(Load.scala:36)
at sbt.BuiltinCommands$.doLoadProject(Main.scala:481)
at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:475)
at sbt.BuiltinCommands$$anonfun$loadProjectImpl$2.apply(Main.scala:475)
at
sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:58)
at
sbt.Command$$anonfun$applyEffect$1$$anonfun$apply$2.apply(Command.scala:58)
at
sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:60)
at
sbt.Command$$anonfun$applyEffect$2$$anonfun$apply$3.apply(Command.scala:60)
at sbt.Command$.process(Command.scala:92)
at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:98)
at sbt.MainLoop$$anonfun$1$$anonfun$apply$1.apply(MainLoop.scala:98)
at sbt.State$$anon$1.process(State.scala:184)
at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:98)
at sbt.MainLoop$$anonfun$1.apply(MainLoop.scala:98)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.MainLoop$.next(MainLoop.scala:98)
at sbt.MainLoop$.run(MainLoop.scala:91)
at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:70)
at sbt.MainLoop$$anonfun$runWithNewLog$1.apply(MainLoop.scala:65)
at sbt.Using.apply(Using.scala:24)
at sbt.MainLoop$.runWithNewLog(MainLoop.scala:65)
at sbt.MainLoop$.runAndClearLast(MainLoop.scala:48)
at sbt.MainLoop$.runLoggedLoop(MainLoop.scala:32)
at sbt.MainLoop$.runLogged(MainLoop.scala:24)
at sbt.StandardMain$.runManaged(Main.scala:53)
at sbt.xMain.run(Main.scala:28)
at xsbt.boot.Launch$$anonfun$run$1.apply(Launch.scala:109)
at xsbt.boot.Launch$.withContextLoader(Launch.scala:128)
at xsbt.boot.Launch$.run(Launch.scala:109)
at xsbt.boot.Launch$$anonfun$apply$1.apply(Launch.scala:35)
at xsbt.boot.Launch$.launch(Launch.scala:117)
at xsbt.boot.Launch$.apply(Launch.scala:18)
at xsbt.boot.Boot$.runImpl(Boot.scal

Fwd: UnknownhostException : home

2015-01-19 Thread Kartheek.R
-- Forwarded message --
From: Rapelly Kartheek 
Date: Mon, Jan 19, 2015 at 3:03 PM
Subject: UnknownhostException : home
To: "user@spark.apache.org" 


Hi,

I get the following exception when I run my application:

karthik@karthik:~/spark-1.2.0$ ./bin/spark-submit --class
org.apache.spark.examples.SimpleApp001 --deploy-mode client --master
spark://karthik:7077 $SPARK_HOME/examples/*/scala-*/spark-examples-*.jar
>out1.txt
log4j:WARN No such property [target] in org.apache.log4j.FileAppender.
Exception in thread "main" java.lang.IllegalArgumentException:
java.net.UnknownHostException: home
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:569)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:512)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:142)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2316)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:366)
at org.apache.spark.util.FileLogger.(FileLogger.scala:90)
at
org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:63)
at org.apache.spark.SparkContext.(SparkContext.scala:352)
at org.apache.spark.examples.SimpleApp001$.main(SimpleApp001.scala:13)
at org.apache.spark.examples.SimpleApp001.main(SimpleApp001.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: home
... 20 more


I couldn't trace the cause of this exception. Any help in this regard?

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-UnknownhostException-home-tp21230.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.