Is there a way to get column names using hiveContext ?

2014-12-07 Thread abhishek
Hi, I have iplRDD which is a json, and I do below steps and query through hivecontext. I get the results but without columns headers. Is there is a way to get the columns names ? val teamRDD = hiveContext.jsonRDD(iplRDD) teamRDD.registerTempTable("teams") hiveContext.cacheTable("teams") val res

monitoring for spark standalone

2014-12-07 Thread Judy Nash
Hello, Are there ways we can programmatically get health status of master & slave nodes, similar to Hadoop Ambari? Wiki seems to suggest there are only web UI or instrumentations (http://spark.apache.org/docs/latest/monitoring.html). Thanks, Judy

spark Exception while performing saveAsTextFiles

2014-12-07 Thread Hafiz Mujadid
I am facing following exception while saving Dstream to hdfs 14/12/08 12:14:26 INFO DAGScheduler: Failed to run saveAsTextFile at DStream.scala:788 14/12/08 12:14:26 ERROR JobScheduler: Error running job streaming job 1418022865000 ms.0 org.apache.spark.SparkException: Job aborted due to stage f

Re: Convert RDD[Map[String, Any]] to SchemaRDD

2014-12-07 Thread Jianshi Huang
I checked the source code for inferSchema. Looks like this is exactly what I want: val allKeys = rdd.map(allKeysWithValueTypes).reduce(_ ++ _) Then I can do createSchema(allKeys). Jianshi On Sun, Dec 7, 2014 at 2:50 PM, Jianshi Huang wrote: > Hmm.. > > I've created a JIRA: https://issues.ap

Re: Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Cheng Lian
You may access it via something like |SELECT filterIp.element FROM tb|, just like Hive. Or if you’re using Spark SQL DSL, you can use |tb.select("filterIp.element".attr)|. On 12/8/14 1:08 PM, Xuelin Cao wrote: Hi, I'm generating a Spark SQL table from an offline Json file. The diff

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao
Hi,     I'm generating a Spark SQL table from an offline Json file.     The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |  

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao
Hi,     I'm generating a Spark SQL table from an offline Json file.     The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |  

Print Node info. of Decision Tree

2014-12-07 Thread jake Lim
How can i print Node info. of Decision Tree model? I want to navigate and print all information of Decision tree Model. Is there some kind of function/method to support it? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Print-Node-info-of-Decision-Tree-tp

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao
Hi,     I'm generating a Spark SQL table from an offline Json file.     The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |  

Re: Does filter on an RDD scan every data item ?

2014-12-07 Thread 诺铁
there is a *PartitionPruningRDD* :: DeveloperApi :: A RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions. An example use case: If we know the RDD is partitioned by range, and the execution DAG has a filter on the key, we can avoid launching tasks on part

Re: Does filter on an RDD scan every data item ?

2014-12-07 Thread nsareen
@Sowen, would appreciate, if you can explain how would Spark SQL help in my scenario.. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20571.html Sent from the Apache Spark User List mailing list archive at

RE: spark assembly jar caused "changed on src filesystem" error

2014-12-07 Thread Hu, Leo
If anybody knows the reason, please help me. thanks a lot. Thanks Best Regard LEO HU CD&SP SAP LABS CHINA From: Hu, Leo [mailto:leo.h...@sap.com] Sent: Friday, December 05, 2014 10:23 AM To: u...@spark.incubator.apache.org Subject: spark assembly jar caused "changed on src filesystem" error Hi a

Re: spark-submit on YARN is slow

2014-12-07 Thread Tobias Pfeiffer
Hi, thanks for your responses! On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza wrote: > > What version are you using? In some recent versions, we had a couple of > large hardcoded sleeps on the Spark side. > I am using Spark 1.1.1. As Andrew mentioned, I guess most of the 10 seconds waiting time p

Re: run JavaAPISuite with mavem

2014-12-07 Thread Koert Kuipers
thanks. that makes sense. i searched the mailing list but couldnt find any mention of it. i should have searched jira instead... On Sun, Dec 7, 2014 at 6:25 PM, Sean Owen wrote: > I think it's a known issue: > > https://issues.apache.org/jira/browse/SPARK-4159 > https://issues.apache.org/jira/br

Re: run JavaAPISuite with mavem

2014-12-07 Thread Sean Owen
I think it's a known issue: https://issues.apache.org/jira/browse/SPARK-4159 https://issues.apache.org/jira/browse/SPARK-661 I got bit by this too recently and meant to look into it. On Sun, Dec 7, 2014 at 4:50 PM, Koert Kuipers wrote: > so as part of the official build the java api does not ge

Re: run JavaAPISuite with mavem

2014-12-07 Thread Koert Kuipers
so as part of the official build the java api does not get tested then? i am sure there is a good reason for it, but thats surprising to me. On Sun, Dec 7, 2014 at 12:19 PM, Ted Yu wrote: > Looking at the pom.xml, I think I found the reason - scalatest is used. > With the following diff: > > dif

RE: Is there a way to force spark to use specific ips?

2014-12-07 Thread Ashic Mahtab
Hi Matt,That's what I'm seeing too. I've reverted to creating a fact in the vagrantfile + adding host in puppet. Save's from having to have the vagrant plugin installed. Vagrant-hosts looks interesting for scenarios where I control all the machines. Cheers,Ashic. Subject: Re: Is there a way to

saveAsParquetFile and DirectFileOutputCommitter Class not found Error

2014-12-07 Thread Addanki, Santosh Kumar
Hi, When we try to call saveAsParquetFile on a schemaRDD we get the following error : Py4JJavaError: An error occurred while calling o384.saveAsParquetFile. : java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/DirectFileOutputCommitter at org.apache.spark.sql.parqu

Re: NoClassDefFoundError

2014-12-07 Thread Ted Yu
See the following threads: http://search-hadoop.com/m/JW1q5kjNlK http://search-hadoop.com/m/JW1q5XqSDk Cheers On Sun, Dec 7, 2014 at 9:35 AM, Julius K wrote: > Hi everyone, > I am new to Spark and encountered a problem. > I want to use an external library in a java project and compiling > work

NoClassDefFoundError

2014-12-07 Thread Julius K
Hi everyone, I am new to Spark and encountered a problem. I want to use an external library in a java project and compiling works fine with maven, but during runtime (locally) I get a NoClassDefFoundError. Do I have to put the jars somewhere, or tell spark where they are? I can send the pom.xml an

Re: run JavaAPISuite with mavem

2014-12-07 Thread Ted Yu
Looking at the pom.xml, I think I found the reason - scalatest is used. With the following diff: diff --git a/pom.xml b/pom.xml index b7df53d..b0da893 100644 --- a/pom.xml +++ b/pom.xml @@ -947,7 +947,7 @@ 2.17 -true +false

RE: Bulk-load to HBase

2014-12-07 Thread fralken
Hello, you can have a look at this project hbase-rdd that provides a simple method to bulk load an rdd to HBase. fralken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.html Sent fr

Re: run JavaAPISuite with mavem

2014-12-07 Thread Koert Kuipers
hey guys, i was able to run the test just fine with: $ sbt > project core > testOnly org.apache.spark.JavaAPISuite however i found it strange that it didnt run when i do "mvn test -pl core", or at least didnt seem like it ran to me. this would mean that when someone tests/publishes with maven the

MLlib(Logistic Regression) + Spark Streaming.

2014-12-07 Thread Nasir Khan
I am new to spark. Lets say i want to develop a machine learning model. which trained on normal method in MLlib. I want to use that model with classifier Logistic regression and predict the streaming data coming from a file or socket. Streaming data -> Logistic Regression -> binary label predicti