Re: Exception while triggering spark job from remote jvm

Akhil Das Mon, 20 Jul 2015 00:23:13 -0700

Just make sure there is no firewall/network blocking the requests as its
complaining about timeout.


Thanks
Best Regards

On Mon, Jul 20, 2015 at 1:14 AM, ankit tyagi <ankittyagi.mn...@gmail.com>
wrote:

> Just to add more information. I have checked the status of this file, not
> a single block is corrupted.
>
> *[hadoop@ip-172-31-24-27 ~]$   hadoop fsck /ankit -files -blocks*
> *DEPRECATED: Use of this script to execute hdfs command is deprecated.*
> *Instead use the hdfs command for it.*
>
> Connecting to namenode via
> http://ip-172-31-24-27.us-west-2.compute.internal:9101
> FSCK started by hadoop (auth:SIMPLE) from /172.31.24.27 for path /ankit
> at Sun Jul 19 19:11:37 UTC 2015
> /ankit <dir>
> /ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar 103599417
> bytes, 1 block(s):  OK
> 0. BP-511626939-172.31.24.27-1436185102368:blk_1073741964_126634
> len=103599417 repl=1
>
> *Status: HEALTHY*
> * Total size: 103599417 B*
> * Total dirs: 1*
> * Total files: 1*
>  Total symlinks: 0
>  Total blocks (validated): 1 (avg. block size 103599417 B)
>  Minimally replicated blocks: 1 (100.0 %)
>  Over-replicated blocks: 0 (0.0 %)
>  Under-replicated blocks: 0 (0.0 %)
>  Mis-replicated blocks: 0 (0.0 %)
>  Default replication factor: 1
>  Average block replication: 1.0
> * Corrupt blocks: 0*
>  Missing replicas: 0 (0.0 %)
>  Number of data-nodes: 4
>  Number of racks: 1
> FSCK ended at Sun Jul 19 19:11:37 UTC 2015 in 1 milliseconds
>
>
> On Mon, Jul 20, 2015 at 12:33 AM, ankit tyagi <ankittyagi.mn...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am using below code to trigger spark job from remote jvm.
>>
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.spark.SparkConf;
>> import org.apache.spark.deploy.yarn.Client;
>> import org.apache.spark.deploy.yarn.ClientArguments;
>>
>> /**
>>  * @version 1.0, 15-Jul-2015
>>  * @author ankit
>>  */
>>
>> public class QueryEngineImpl implements IQueryEngine {
>>
>>     SparkSqlEngine sqlEngine;
>>
>>     public QueryEngineImpl(SparkSqlEngine sparkSqlEngine) {
>>         this.sqlEngine = sparkSqlEngine;
>>     }
>>
>>     @Override
>>     public void executeQuery(String query, String resultLocation,
>> String... parquetFileLocation) {
>>         // TODO Auto-generated method stub
>>         String[] args = new String[] {
>>                 // the name of your application
>>                 "--name",
>>                 "RemoteJVM",
>>
>>                 // memory for driver (optional)
>>                 "--driver-memory",
>>                 "1000M",
>>
>>                 // path to your application's JAR file
>>                 // required in yarn-cluster mode
>>                 "--jar",
>>                 "hdfs://
>> 52.24.76.10:9000/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar
>> ",
>>
>>                 // name of your application's main class (required)
>>                 "--class",
>>                 "SparkSqlEngine",
>>
>>                 // argument 1 to your Spark program
>> (SparkFriendRecommendation)
>>                 "--arg",
>>                 query,
>>
>>                 // argument 2 to your Spark program
>> (SparkFriendRecommendation)
>>                 "--arg",
>>                 resultLocation,
>>
>>                 // argument 3 to your Spark program
>> (SparkFriendRecommendation)
>>                 "--arg",
>>                 parquetFileLocation[0],
>>
>>                 "--arg",
>>                 "yarn-cluster" };
>>
>>         Configuration conf = new Configuration();
>>         conf.set("yarn.resourcemanager.address", "52.24.76.10:9022");
>>         conf.set("HADOOP_HOME", "/home/hadoop");
>>
>>         System.setProperty("SPARK_YARN_MODE", "true");
>>         SparkConf sparkConf = new SparkConf();
>>         System.out.println("SPARK CONF" + sparkConf.toDebugString());
>>         // create ClientArguments, which will be passed to Client
>>         org.apache.spark.deploy.yarn.ClientArguments cArgs = new
>> ClientArguments(args, sparkConf);
>>         //create a insntance of yarn client
>>         Client client = new Client(cArgs, conf, sparkConf);
>>
>>         client.run();
>>     }
>>
>>     public static void main(String[] args) {
>>         QueryEngineImpl impl = new QueryEngineImpl(null);
>>         impl.executeQuery("select count(*) from parquetTable",
>> "/tmp/ankit.txt",
>> "s3n://AKIAJPLOFN3DM27DIIUQ:zKsFTopwgmu4zNdAfZ5Xe+Qe0XtbegHLTgy629VB@hadoop-poc-ashish
>> /parquet");
>>     }
>> }
>>
>>
>> But I am getting below exception.
>>
>> *23:08:09.268 [main] WARN  org.apache.hadoop.hdfs.DFSClient - Failed to
>> connect to /172.31.24.27:9200 <http://172.31.24.27:9200> for block, add to
>> deadNodes and continue. org.apache.hadoop.net.ConnectTimeoutException:
>> 60000 millis timeout while waiting for channel to be ready for connect. ch
>> : java.nio.channels.SocketChannel[connection-pending
>> remote=/172.31.24.27:9200 <http://172.31.24.27:9200>]*
>> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/
>> 172.31.24.27:9200]
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
>> ~[hadoop-common-2.2.0.jar:na]
>> at
>> org.apache.hadoop.hdfs.DFSInputStream.newTcpPeer(DFSInputStream.java:955)
>>
>>
>> *23:08:09.269 [main] WARN  org.apache.hadoop.hdfs.DFSClient - DFS Read*
>> *org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-511626939-172.31.24.27-1436185102368:blk_1073741964_126634
>> file=/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar*
>> * at
>> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
>> [hadoop-hdfs-2.2.0.jar:na]*
>> * at
>> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
>> [hadoop-hdfs-2.2.0.jar:na]*
>> at
>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
>> [hadoop-hdfs-2.2.0.jar:na]
>> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
>> [hadoop-hdfs-2.2.0.jar:na]
>> at java.io.DataInputStream.read(DataInputStream.java:100) [na:1.7.0_80]
>>
>>
>> Please suggest me is there  any other way to trigger job from remote jvm??
>>
>
>

Re: Exception while triggering spark job from remote jvm

Reply via email to