Re: Can't access remote Hive table from spark

2015-02-11 Thread guxiaobo1982
Thanks. Zhan Zhang On Feb 11, 2015, at 4:34 AM, guxiaobo1982 wrote: Hi Zhan, My Single Node Cluster of Hadoop is installed by Ambari 1.7.0, I tried to create the /user/xiaobogu directory in hdfs, but both failed with user xiaobogu and root [xiaobogu@lix1 current]$ hadoop d

Re: Can't access remote Hive table from spark

2015-02-11 Thread guxiaobo1982
ight permission to xiaobogu. Thanks. Zhan Zhang On Feb 7, 2015, at 8:15 AM, guxiaobo1982 wrote: Hi Zhan Zhang, With the pre-bulit version 1.2.0 of spark against the yarn cluster installed by ambari 1.7.0, I come with the following errors: [xiaobogu@lix1 spark]$ ./bin/sp

Re: Can't access remote Hive table from spark

2015-02-08 Thread guxiaobo1982
uot;user@spark.apache.org"; Subject: Re: Can't access remote Hive table from spark Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0, none of other versions are supported. Best, Cheng On 1/25/15 1

Re: Can't access remote Hive table from spark

2015-02-07 Thread guxiaobo1982
0 only support Hive 0.13.1 or 0.12.0, none of other versions are supported. Best, Cheng On 1/25/15 12:18 AM, guxiaobo1982 wrote: Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Sp

Is the pre-built version of spark 1.2.0 with --hive option?

2015-02-07 Thread guxiaobo1982
Hi, After various problems with the binaries built by myself, I want to try the pre-built binary, but I want to know whether it is built with --hive option. Thanks.

Can we execute "create table" and "load data" commands against Hive inside HiveContext?

2015-02-05 Thread guxiaobo1982
Hi, I am playing with the following example code: public class SparkTest { public static void main(String[] args){ String appName= "This is a test application"; String master="spark://lix1.bh.com:7077"; SparkCo

how to specify hive connection options for HiveContext

2015-02-02 Thread guxiaobo1982
Hi, I know two options, one for spark_submit, the other one for spark-shell, but how to set for programs running inside eclipse? Regards,

spark-shell can't import the default hive-site.xml options probably.

2015-02-01 Thread guxiaobo1982
Hi, To order to let a local spark-shell connect to a remote spark stand-alone cluster and access hive tables there, I must put the hive-site.xml file into the local spark installation's conf path, but spark-shell even can't import the default settings there, I found two errors: hi

Re: Can't access remote Hive table from spark

2015-02-01 Thread guxiaobo1982
One friend told me that I should add the hive-site.xml file to the --files option of spark-submit command, but how can I run and debug my program inside eclipse? -- Original -- From: "guxiaobo1982";; Send time: Sunday, Feb 1, 2015 4:18 PM To: &q

Re: RE: Can't access remote Hive table from spark

2015-01-31 Thread guxiaobo1982
The following line does not work too export SPARK_CLASSPATH=/etc/hive/conf -- Original -- From: "guxiaobo1982";; Send time: Sunday, Feb 1, 2015 2:15 PM To: "Skanda Prasad"; "user@spark.apache.org"; Cc: "徐涛"<77044.

Re: RE: Can't access remote Hive table from spark

2015-01-31 Thread guxiaobo1982
nd it worked. You can try this approach. -Skanda From: guxiaobo1982 Sent: ‎25-‎01-‎2015 13:50 To: user@spark.apache.org Subject: Can't access remote Hive table from spark Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance insta

Can't access remote Hive table from spark

2015-01-25 Thread guxiaobo1982
Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I

How to create distributed matrixes from hive tables.

2015-01-18 Thread guxiaobo1982
Hi, We have large datasets with data format for Spark MLLib matrix, but there are pre-computed by Hive and stored inside Hive, my question is can we create a distributed matrix such as IndexedRowMatrix directlly from Hive tables, avoiding reading data from Hive tables and feed them into an emp

How to get the master URL at runtime inside driver program?

2015-01-17 Thread guxiaobo1982
Hi, Driver programs submitted by the spark-submit script will get the runtime spark master URL, but how it get the URL inside the main method when creating the SparkConf object? Regards,

Is cluster mode is supported by the submit command for standalone clusters?

2015-01-17 Thread guxiaobo1982
Hi, The submitting applications guide in http://spark.apache.org/docs/latest/submitting-applications.html says: Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to usecluster mode to minimize network laten

Re: Can't submit the SparkPi example to local Yarn 2.6.0 installed byambari 1.7.0

2014-12-29 Thread guxiaobo1982
/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 --queue thequeue lib/spark-examples-1.2.0-hadoop2.6.0.jar 10 Got the same error by the above command, I think I missed the jar containin

Can't submit the SparkPi example to local Yarn 2.6.0 installed by ambari 1.7.0

2014-12-26 Thread guxiaobo1982
Hi,I build the 1.2.0 version of spark against single node hadoop 2.6.0 installed by ambari 1.7.0, the ./bin/run-example SparkPi 10 command can execute on my local Mac 10.9.5 and the centos virtual machine, which host hadoop, but I can't run the SparkPi example inside yarn, it seems there's somet

Re: How to build Spark against the latest

2014-12-25 Thread guxiaobo1982
The following command works ./make-distribution.sh --tgz -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -- Original -- From: "guxiaobo1982";; Send time: Thursday, Dec 25, 2014 3:58 PM To: "";

Re: How to build Spark against the latest

2014-12-24 Thread guxiaobo1982
What options should I use when running the make-distribution.sh script, I tried ./make-distribution.sh --hadoop.version 2.6.0 --with-yarn -with-hive --with-tachyon --tgz with nothing came out. Regards -- Original -- From: "guxiaobo1982";;

Re: How to build Spark against the latest

2014-12-24 Thread guxiaobo1982
Re: How to build Spark against the latest See http://search-hadoop.com/m/JW1q5Cew0j On Tue, Dec 23, 2014 at 8:00 PM, guxiaobo1982 wrote: Hi, The official pom.xml file only have profile for hadoop version 2.4 as the latest version, but I installed hadoop version 2.6.0 with ambari, how can I buil

How to build Spark against the latest

2014-12-23 Thread guxiaobo1982
Hi, The official pom.xml file only have profile for hadoop version 2.4 as the latest version, but I installed hadoop version 2.6.0 with ambari, how can I build spark against it, just using mvn -Dhadoop.version=2.6.0, or how to make a coresponding profile for it? Regards, Xiaobo

Re: What about implementing various hypothesis test for LogisticRegression in MLlib

2014-08-22 Thread guxiaobo1982
MLlib We implemented chi-squared tests in v1.1: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala#L166 and we will add more after v1.1. Feedback on which tests should come first would be greatly appreciated. -Xiangrui On Tue, Aug 19,

What about implementing various hypothesis test for Logistic Regression in MLlib

2014-08-19 Thread guxiaobo1982
Hi, From the documentation I think only the model fitting part is implement, what about the various hypothesis test and performance indexes used to evaluate the model fit? Regards, Xiaobo Gu

What's the best practice to deploy spark on Big SMP servers?

2014-06-26 Thread guxiaobo1982
Hi, We have a big SMP server(with 128G RAM and 32 CPU cores) to runn small scale analytical works, what's the best practice to deploy a stand alone Spark on the server to achieve good performance. How many instances should be configured, how many RAM and CPU cores should be allocated for ea

Re: Where Can I find the full documentation for Spark SQL?

2014-06-25 Thread guxiaobo1982
: "Gianluca Privitera";; Date: Jun 26, 2014 To: "user@spark.apache.org"; Subject: Re: Where Can I find the full documentation for Spark SQL? You can find something in the API, nothing more than that I think for now. Gianluca On 25 Jun 2014, at 23:36, guxiaobo1982 wrote:

Where Can I find the full documentation for Spark SQL?

2014-06-25 Thread guxiaobo1982
Hi, I want to know the full list of functions, syntax, features that Spark SQL supports, is there some documentations. Regards, Xiaobo Gu