RE: Yarn Spark on EMR

2015-11-20 Thread Bozeman, Christopher
Suraj, Spark History server is running on 18080 (http://spark.apache.org/docs/latest/monitoring.html) which is not going to give you are real-time update on a running Spark application. Given this is Spark on YARN, you will need to view the Spark UI from the Application Master URL which can

RE: Spark Expand Cluster

2015-11-20 Thread Bozeman, Christopher
Dan, Even though you may be adding more nodes to the cluster, the Spark application has to be requesting additional executors in order to thus use the added resources. Or the Spark application can be using Dynamic Resource Allocation (http://spark.apache.org/docs/latest/job-scheduling.html#dyn

Re: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Bozeman, Christopher
We worked it out. There was multiple items (like location of remote metastore and db user auth) to make HiveContext happy in yarn-cluster mode. For reference https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/using-hivecontext-yarn-cluster.md -Christopher Bozeman On

Re: Spark on EMR

2015-06-19 Thread Bozeman, Christopher
You can use Spark 1.4 on EMR AMI 3.8.0 if you install Spark as a 3rd party application using the bootstrap action directly without the native Spark inclusion with 1.3.1. See https://github.com/awslabs/emr-bootstrap-actions/tree/master/spark Refer to https://github.com/awslabs/emr-bootstrap-ac

RE: Where are my log4j or exception output on EMR?

2015-06-17 Thread Bozeman, Christopher
Sean, Spark on YARN (https://spark.apache.org/docs/latest/running-on-yarn.html) follows the logging construct of YARN. If you are using cluster deployment mode on yarn (master=yarn-cluster) then the logging performed in the driver (your code) would be picked up by YARN’s logs in the Applicatio

RE: --driver-memory parameter doesn't work for spark-submmit on yarn?

2015-04-01 Thread Bozeman, Christopher
Shuai, What did " ps aux | grep spark-submit" reveal? When you compare using _JAVA_OPTIONS and without using it, where do you see the difference? Thanks Christopher -Original Message- From: Shuai Zheng [mailto:szheng.c...@gmail.com] Sent: Wednesday, April 01, 2015 11:12 AM To: 'Sea

RE: Issue on Spark SQL insert or create table with Spark running on AWS EMR -- s3n.S3NativeFileSystem: rename never finished

2015-04-01 Thread Bozeman, Christopher
Teng, There is no need to alter hive.metastore.warehouse.dir. Leave it as is and just create external tables with location pointing to S3. What I suspect you are seeing is that spark-sql is writing to a temp directory within S3 then issuing a rename to the final location as would be done wi

RE: JavaKinesisWordCountASLYARN Example not working on EMR

2015-03-27 Thread Bozeman, Christopher
Ankur, The JavaKinesisWordCountASLYARN is no longer valid and was added just to the EMR build back in 1.1.0 to demonstrate Spark Streaming with Kinesis in YARN, just follow the stock example as seen in JavaKinesisWordCountASL as it is better form anyway given it is best not to hard code the mas

RE: Issue with Parquet on Spark 1.2 and Amazon EMR

2015-01-15 Thread Bozeman, Christopher
Thanks to Aniket’s work there is two new options to the EMR install script for Spark. See https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/README.md The “-a” option can be used to bump the spark-assembly to the front of the classpath. -Christopher From: Aniket Bhatnagar [