Re: ClasssNotFoundExeception was thrown while trying to save rdd

2014-10-12 Thread Akhil Das
Adding your application jar to the sparkContext will resolve this issue. Eg: sparkContext.addJar("./target/scala-2.10/myTestApp_2.10-1.0.jar") Thanks Best Regards On Mon, Oct 13, 2014 at 8:42 AM, Tao Xiao wrote: > In the beginning I tried to read HBase and found that exception was > thrown, th

Re: What if I port Spark from TCP/IP to RDMA?

2014-10-12 Thread Josh Rosen
Hi Theo, Check out *spark-perf*, a suite of performance benchmarks for Spark: https://github.com/databricks/spark-perf. - Josh On Fri, Oct 10, 2014 at 7:27 PM, Theodore Si wrote: > Hi, > > Let's say that I managed to port Spark from TCP/IP to RDMA. > What tool or benchmark can I use to test th

Re: small bug in pyspark

2014-10-12 Thread Josh Rosen
Hi Andy, You may be interested in https://github.com/apache/spark/pull/2651, a recent pull request of mine which cleans up / simplifies the configuration of PySpark's Python executables. For instance, it makes it much easier to control which Python options are passed when launching the PySpark dr

Re: ClasssNotFoundExeception was thrown while trying to save rdd

2014-10-12 Thread Tao Xiao
In the beginning I tried to read HBase and found that exception was thrown, then I start to debug the app. I removed the codes reading HBase and tried to save an rdd containing a list and the exception was still thrown. So I'm sure that exception was not caused by reading HBase. While debugging I

Re: Spark job doesn't clean after itself

2014-10-12 Thread Rohit Pujari
Reviving this .. any thoughts experts? On Thu, Oct 9, 2014 at 3:47 PM, Rohit Pujari wrote: > Hello Folks: > > I'm running spark job on YARN. After the execution, I would expect the > spark job to clean staging the area, but it seems every run creates a new > staging directory. Is there a way to

Re: Spark in cluster and errors

2014-10-12 Thread Jorge Simão
You have a connection refuse error. You need to check: -That the master is listening on specified host&port. -No firewall blocking access. -Make sure that config is pointing to the master host&port. Check the host name from the web console. Send more details about cluster layout for more details..

Nested Query using SparkSQL 1.1.0

2014-10-12 Thread shahab
Hi, Apparently is it is possible to query nested json using spark SQL, but , mainly due to lack of proper documentation/examples, I did not manage to make it working. I do appreciate if you could point me to any example or help with this issue, Here is my code: val anotherPeopleRDD = sc.paral

Spark in cluster and errors

2014-10-12 Thread Morbious
Hi, Can anyone point me how spark works ? Why is it trying to connect from master port A to master port ABCD in cluster mode with 6 workers ? 14/10/09 19:37:19 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker@...:7078] -> [akka.tcp://sparkExecutor@...:53757]: Error [Associat

NullPointerException when deploying JAR to standalone cluster..

2014-10-12 Thread Jorge Simão
Hi, everybody! I'm trying to deploy a simple app in Spark standalone cluster with a single node (the localhost). Unfortunately, something goes wrong while processing the JAR file and an exception NullPointerException is thrown. I'm running everything in a single machine with Windows8. Check below

Re: Interactive interface tool for spark

2014-10-12 Thread andy petrella
Yeah, if it allows to craft some Scala/Spark code in a shareable manner, it is a good another option! thx for sharing aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] On Sun, Oct 12, 2014 at 9:47 PM, Jaonary Rabarisoa wrote: > And what about Hue ht

Re: Interactive interface tool for spark

2014-10-12 Thread Jaonary Rabarisoa
And what about Hue http://gethue.com ? On Sun, Oct 12, 2014 at 1:26 PM, andy petrella wrote: > Dear Sparkers, > > As promised, I've just updated the repo with a new name (for the sake of > clarity), default branch but specially with a dedicated README containing: > > * explanations on how to lau

setting heap space

2014-10-12 Thread Chengi Liu
Hi, I am trying to use spark but I am having hard time configuring the sparkconf... My current conf is conf = SparkConf().set("spark.executor.memory","10g").set("spark.akka.frameSize", "1").set("spark.driver.memory","16g") but I still see the java heap size error 14/10/12 09:54:50 ERROR

Re: ClasssNotFoundExeception was thrown while trying to save rdd

2014-10-12 Thread Ted Yu
Your app is named scala.HBaseApp Does it read / write to HBase ? Just curious. On Sun, Oct 12, 2014 at 8:00 AM, Tao Xiao wrote: > Hi all, > > I'm using CDH 5.0.1 (Spark 0.9) and submitting a job in Spark Standalone > Cluster mode. > > The job is quite simple as follows: > > object HBaseApp {

ClasssNotFoundExeception was thrown while trying to save rdd

2014-10-12 Thread Tao Xiao
Hi all, I'm using CDH 5.0.1 (Spark 0.9) and submitting a job in Spark Standalone Cluster mode. The job is quite simple as follows: object HBaseApp { def main(args:Array[String]) { testHBase("student", "/test/xt/saveRDD") } def testHBase(tableName: String, outFile:String)

Re: Interactive interface tool for spark

2014-10-12 Thread andy petrella
Dear Sparkers, As promised, I've just updated the repo with a new name (for the sake of clarity), default branch but specially with a dedicated README containing: * explanations on how to launch and use it * an intro on each feature like Spark, Classpaths, SQL, Dynamic update, ... * pictures show

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread Kartheek.R
Does SparkContext exists when this part (AskDriverWithReply()) of the scheduler code gets executed? On Sun, Oct 12, 2014 at 1:54 PM, rapelly kartheek wrote: > Hi Sean, > I tried even with sc as: sc.parallelize(data). But. I get the error: value > sc not found. > > On Sun, Oct 12, 2014 at 1:47 PM

RE: Spark SQL parser bug?

2014-10-12 Thread Cheng, Hao
Hi, I couldn’t reproduce the bug with the latest master branch. Which version are you using? Can you also list data in the table “x”? case class T(a:String, ts:java.sql.Timestamp) val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD val data = sc.parallelize(

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread Kartheek.R
Hi Sean, I tried even with sc as: sc.parallelize(data). But. I get the error: value sc not found. On Sun, Oct 12, 2014 at 1:47 PM, sowen [via Apache Spark User List] < ml-node+s1001560n16233...@n3.nabble.com> wrote: > It is a method of the class, not a static method of the object. Since a > Spark

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread Sean Owen
It is a method of the class, not a static method of the object. Since a SparkContext is available as sc in the shell, or you have perhaps created one similarly in your app, write sc.parallelize(...) On Oct 12, 2014 7:15 AM, "rapelly kartheek" wrote: > Hi, > > I am trying to write a String that is

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread rapelly kartheek
Its a variable in spark-1.0.0/*/storagre/BlockManagerMaster.scala class. The return data of AskDriverWithReply() method for the getPeers() method. Basically, it is a Seq[ArrayBuffer]: ArraySeq(ArrayBuffer(BlockManagerId(1, s1, 47006, 0), BlockManagerId(0, s1, 34625, 0)), ArrayBuffer(BlockManager

Re: How to convert a non-rdd data to rdd.

2014-10-12 Thread @Sanjiv Singh
Hi Karthik, Can you provide us more detail of dataset "data" that you wanted to parallelize with SparkContext.parallelize(data); Regards, Sanjiv Singh Regards Sanjiv Singh Mob : +091 9990-447-339 On Sun, Oct 12, 2014 at 11:45 AM, rapelly kartheek wrote: > Hi, > > I am trying to write a