from:"JasmineGeorge"

Re: Running Spark on Yarn-Client/Cluster mode

2016-04-07 Thread JasmineGeorge

The logs are self explanatory. It says "java.io.IOException: Incomplete HDFS URI, no host: hdfs:/user/hduser/share/lib/spark-assembly.jar" you need to specify the host in the above hdfs url. It should look something like the following: hdfs://:8020/user/hduser/share/lib/spark-assembly.jar -

Only 60% of Total Spark Batch Application execution time spent in Task Processing

2016-04-07 Thread JasmineGeorge

We are running a batch job with the following specifications • Building RandomForest with config : maxbins=100, depth=19, num of trees = 20 • Multiple runs with different input data size 2.8 GB, 10 Million records • We are running spark application on Yarn in cluster mode, with 3