ABHISHEK KUMAR GUPTA created SPARK-24627: --------------------------------------------
Summary: [Spark2.3.0] After HDFS Token expire kinit not able to submit job using beeline Key: SPARK-24627 URL: https://issues.apache.org/jira/browse/SPARK-24627 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Environment: OS: SUSE11 Spark Version: 2.3.0 Hadoop: 2.8.3 Reporter: ABHISHEK KUMAR GUPTA Steps: beeline session was active. 1.Launch spark-beeline 2. create table alt_s1 (time timestamp, name string, isright boolean, datetoday date, num binary, height double, score float, decimaler decimal(10,0), id tinyint, age int, license bigint, length smallint) row format delimited fields terminated by ','; 3. load data local inpath '/opt/typeddata60.txt' into table alt_s1; 4. show tables;( Table listed successfully ) 5. select * from alt_s1; Throws HDFS_DELEGATION_TOKEN Exception 0: jdbc:hive2://10.18.18.214:23040/default> select * from alt_s1; Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 22.0 failed 4 times, most recent failure: Lost task 1.3 in stage 22.0 (TID 106, blr1000023110, executor 1): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 7 for spark) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1475) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:255) at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1226) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:306) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272) at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:264) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:257) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:256) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:214) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) **Note: Even after kinit spark/hadoop token is not getting renewed.** Now Launch spark sql session ( Select * from alt_s1 ) is successful. 1. Launch spark-sql 2.spark-sql> select * from alt_s1; 2018-06-22 14:24:04 INFO HiveMetaStore:746 - 0: get_table : db=test_one tbl=alt_s1 2018-06-22 14:24:04 INFO audit:371 - ugi=spark/had...@hadoop.com ip=unknown-ip-addr cmd=get_table : db=test_one tbl=alt_s1 2018-06-22 14:24:04 INFO SQLStdHiveAccessController:95 - Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=2cf6aac4-91c6-4c2d-871b-4d7620d91f43, clientType=HIVECLI] 2018-06-22 14:24:04 INFO metastore:291 - Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook 2018-06-22 14:24:04 INFO HiveMetaStore:746 - 0: Shutting down the object store... 2018-06-22 14:24:04 INFO audit:371 - ugi=spark/had...@hadoop.com ip=unknown-ip-addr cmd=Shutting down the object store... 2018-06-22 14:24:04 INFO HiveMetaStore:746 - 0: Metastore shutdown complete. 2018-06-22 14:24:04 INFO audit:371 - ugi=spark/had...@hadoop.com ip=unknown-ip-addr cmd=Metastore shutdown complete. 2018-06-22 14:24:04 INFO MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 437.9 KB, free 365.9 MB) 2018-06-22 14:24:04 INFO MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 33.6 KB, free 365.8 MB) 2018-06-22 14:24:04 INFO BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on blr1000023111:45858 (size: 33.6 KB, free: 366.3 MB) 2018-06-22 14:24:04 INFO SparkContext:54 - Created broadcast 0 from 2018-06-22 14:24:05 INFO DFSClient:1033 - Created HDFS_DELEGATION_TOKEN token 42 for spark on ha-hdfs:hacluster 2018-06-22 14:24:05 INFO TokenCache:125 - Got dt for hdfs://hacluster; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 42 for spark) 2018-06-22 14:24:05 INFO FileInputFormat:249 - Total input paths to process : 1 2018-06-22 14:24:05 INFO SparkContext:54 - Starting job: processCmd at CliDriver.java:376 2018-06-22 14:24:05 INFO DAGScheduler:54 - Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions 2018-06-22 14:24:05 INFO DAGScheduler:54 - Final stage: ResultStage 0 (processCmd at CliDriver.java:376) 2018-06-22 14:24:05 INFO DAGScheduler:54 - Parents of final stage: List() 2018-06-22 14:24:05 INFO DAGScheduler:54 - Missing parents: List() 2018-06-22 14:24:05 INFO DAGScheduler:54 - Submitting ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376), which has no missing parents 2018-06-22 14:24:05 INFO MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 10.6 KB, free 365.8 MB) 2018-06-22 14:24:05 INFO MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.3 KB, free 365.8 MB) 2018-06-22 14:24:05 INFO BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on blr1000023111:45858 (size: 5.3 KB, free: 366.3 MB) 2018-06-22 14:24:05 INFO SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1039 2018-06-22 14:24:05 INFO ContextCleaner:54 - Cleaned accumulator 0 2018-06-22 14:24:05 INFO DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[4] at processCmd at CliDriver.java:376) (first 15 tasks are for partitions Vector(0)) 2018-06-22 14:24:05 INFO TaskSchedulerImpl:54 - Adding task set 0.0 with 1 tasks 2018-06-22 14:24:05 INFO TaskSetManager:54 - Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, ANY, 7923 bytes) 2018-06-22 14:24:05 INFO Executor:54 - Running task 0.0 in stage 0.0 (TID 0) 2018-06-22 14:24:05 INFO HadoopRDD:54 - Input split: hdfs://hacluster/user/sparkhive/warehouse/test_one.db/alt_s1/typeddata60.txt:0+4802 2018-06-22 14:24:05 INFO CodeGenerator:54 - Code generated in 232.938395 ms 2018-06-22 14:24:06 INFO Executor:54 - Finished task 0.0 in stage 0.0 (TID 0). 2221 bytes result sent to driver 2018-06-22 14:24:06 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 (TID 0) in 660 ms on localhost (executor driver) (1/1) 2018-06-22 14:24:06 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose tasks have all completed, from pool 2018-06-22 14:24:06 INFO DAGScheduler:54 - ResultStage 0 (processCmd at CliDriver.java:376) finished in 0.769 s 2018-06-22 14:24:06 INFO DAGScheduler:54 - Job 0 finished: processCmd at CliDriver.java:376, took 0.829488 s 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 1 2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 2 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 3 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 4 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 5 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 6 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 7 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 8 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 9 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 10 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 11 2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 12 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 13 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 14 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 15 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 16 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 17 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 18 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 19 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 20 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 21 2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 22 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 23 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 24 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 25 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 26 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 27 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 28 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 29 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 30 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 31 2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 32 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 33 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 34 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 35 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 36 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 37 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 38 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 39 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 40 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 41 2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 42 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 43 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 44 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 45 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 46 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 47 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 48 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 49 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 50 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 51 2015-11-13 12:10:22 jaie true 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 52 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 53 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 54 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 55 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 56 2014-09-11 22:10:22 suss false 2013-01-21 8 2200.0 19.0 232344559 7 777 777777 57 2016-10-12 12:00:00 abcd true 2015-10-10 5 1600.0 26.0 123456789 9 999 999999 58 2015-11-13 12:10:22 jaie false 2016-11-11 7 1800.0 29.0 455756789 8 888 888888 59 2014-09-11 22:10:22 suss false 2003-01-21 8 2200.0 19.0 24444559 1 111 11111 60 Below is configuration in hdfs-site.xml <property> <name>dfs.namenode.delegation.key.update-interval</name> <value>18000000</value> </property> <property> <name>dfs.namenode.delegation.token.max-lifetime</name> <value>60000000</value> </property> <property> Insert gives job aborted Exception: 0: jdbc:hive2://10.18.18.214:23040/default> show tables; +-----------+------------+--------------+--+ | database | tableName | isTemporary | +-----------+------------+--------------+--+ | test_one | alt_s1 | false | | test_one | csvtable | false | | test_one | csvtable1 | false | | test_one | csvtable2 | false | | test_one | emp | false | | test_one | tab1 | false | | test_one | tab2 | false | +-----------+------------+--------------+--+ 7 rows selected (0.049 seconds) 0: jdbc:hive2://10.18.18.214:23040/default> insert into emp values( 34 ); Error: org.apache.spark.SparkException: Job aborted. (state=,code=0) 0: jdbc:hive2://10.18.18.214:23040/default> JDBC Server Logs snapshot Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 7 for spark) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1475) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy16.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246) ... 16 more 2018-06-22 14:53:22,557 | ERROR | [HiveServer2-Background-Pool: Thread-2966] | Error running hive query: | org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:179) org.apache.hive.service.cli.HiveSQLException: org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:269) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:175) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:185) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org