create a table for csv files

2015-11-19 Thread xiaohe lan
Hi, I have some csv file in HDFS with headers like col1, col2, col3, I want to add a column named id, so the a record would be How can I do this using Spark SQL ? Can id be auto increment ? Thanks, Xiaohe

specify yarn-client for --master from a laptop

2015-10-27 Thread xiaohe lan
Hi, I have hadoop 2.4 cluster running on some remote VMs, can I start spark shell or submit from my laptop. For example: bin/spark-shell --mast yarn-client If this is possible, how can I do this ? I have copied the same hadoop to my laptop(but I don't run hadoop on my laptop), I have also set:

Re: SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq

2015-08-17 Thread xiaohe lan
is provided, you need to change > it to compile to run SparkPi in Intellij. As I remember, you also need to > change guava and jetty related library to compile too. > > On Mon, Aug 17, 2015 at 2:14 AM, xiaohe lan > wrote: > >> Hi, >> >> I am trying to run Spark

SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq

2015-08-16 Thread xiaohe lan
Hi, I am trying to run SparkPi in Intellij and getting NoClassDefFoundError. Anyone else saw this issue before ? Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0

Re: getting WARN ReliableDeliverySupervisor

2015-07-02 Thread xiaohe lan
Change jdk from 1.8.0_45 to 1.7.0_79 solve this issue. I saw https://issues.apache.org/jira/browse/SPARK-6388 But it is not a problem however. On Thu, Jul 2, 2015 at 1:30 PM, xiaohe lan wrote: > Hi Expert, > > Hadoop version: 2.4 > Spark version: 1.3.1 > > I am running t

getting WARN ReliableDeliverySupervisor

2015-07-01 Thread xiaohe lan
Hi Expert, Hadoop version: 2.4 Spark version: 1.3.1 I am running the SparkPi example application. bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 2G lib/spark-examples-1.3.1-hadoop2.4.0.jar 2 The same command sometimes gets WARN ReliableDeli

Re: number of executors

2015-05-18 Thread xiaohe lan
: > Awesome! > > It's documented here: > https://spark.apache.org/docs/latest/submitting-applications.html > > -Sandy > > On Mon, May 18, 2015 at 8:03 PM, xiaohe lan > wrote: > >> Hi Sandy, >> >> Thanks for your information. Yes, spark-submit --master y

Re: number of executors

2015-05-18 Thread xiaohe lan
, Sandy Ryza wrote: > Hi Xiaohe, > > The all Spark options must go before the jar or they won't take effect. > > -Sandy > > On Sun, May 17, 2015 at 8:59 AM, xiaohe lan > wrote: > >> Sorry, them both are assigned task actually. >> >> Aggreg

Re: number of executors

2015-05-17 Thread xiaohe lan
MB295.4 MB2host2:620721.7 min505640.0 MB / 12014510386.0 MB / 109269121646.6 MB304.8 MB On Sun, May 17, 2015 at 11:50 PM, xiaohe lan wrote: > bash-4.1$ ps aux | grep SparkSubmit > xilan 1704 13.2 1.2 5275520 380244 pts/0 Sl+ 08:39 0:13 > /scratch/xilan/jdk1.8.0_45/bin/java -cp &

Re: number of executors

2015-05-17 Thread xiaohe lan
executor-cores param? While you submit the job, do a ps aux > | grep spark-submit and see the exact command parameters. > > Thanks > Best Regards > > On Sat, May 16, 2015 at 12:31 PM, xiaohe lan > wrote: > >> Hi, >> >> I have a 5 nodes yarn cluster, I used

println in spark-shell

2015-05-17 Thread xiaohe lan
Hi, When I start spark shell by passing yarn to master option, println does not print elements in RDD: bash-4.1$ spark-shell --master yarn 15/05/17 01:50:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to

number of executors

2015-05-16 Thread xiaohe lan
Hi, I have a 5 nodes yarn cluster, I used spark-submit to submit a simple app. spark-submit --master yarn target/scala-2.10/simple-project_2.10-1.0.jar --class scala.SimpleApp --num-executors 5 I have set the number of executor to 5, but from sparkui I could see only two executors and it ran ve

Re: How to install spark in spark on yarn mode

2015-04-30 Thread xiaohe lan
> http://mbonaci.github.io/mbo-spark/ > You dont need to install spark on every node.Just install it on one node > or you can install it on remote system also and made a spark cluster. > Thanks > Madhvi > > On Thursday 30 April 2015 09:31 AM, xiaohe lan wrote: > >> Hi experts

How to install spark in spark on yarn mode

2015-04-29 Thread xiaohe lan
Hi experts, I see spark on yarn has yarn-client and yarn-cluster mode. I also have a 5 nodes hadoop cluster (hadoop 2.4). How to install spark if I want to try the spark on yarn mode. Do I need to install spark on the each node of hadoop cluster ? Thanks, Xiaohe