Re: Can't access remote Hive table from spark

Ted Yu Sat, 07 Feb 2015 08:29:35 -0800

Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=xiaobogu, access=WRITE,
inode="/user":hdfs:hdfs:drwxr-xr-x


Looks like permission issue. Can you give access to 'xiaobogu' ?

Cheers

On Sat, Feb 7, 2015 at 8:15 AM, guxiaobo1982 <guxiaobo1...@qq.com> wrote:

> Hi Zhan Zhang,
>
> With the pre-bulit version 1.2.0 of spark against the yarn cluster
> installed by ambari 1.7.0, I come with the following errors:
>
> [xiaobogu@lix1 spark]$ ./bin/spark-submit --class
> org.apache.spark.examples.SparkPi    --master yarn-cluster  --num-executors
> 3 --driver-memory 512m  --executor-memory 512m   --executor-cores 1
> lib/spark-examples*.jar 10
>
>
> Spark assembly has been built with Hive, including Datanucleus jars on
> classpath
>
> 15/02/08 00:11:53 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 15/02/08 00:11:54 INFO client.RMProxy: Connecting to ResourceManager at
> lix1.bh.com/192.168.100.3:8050
>
> 15/02/08 00:11:56 INFO yarn.Client: Requesting a new application from
> cluster with 1 NodeManagers
>
> 15/02/08 00:11:57 INFO yarn.Client: Verifying our application has not
> requested more than the maximum memory capability of the cluster (4096 MB
> per container)
>
> 15/02/08 00:11:57 INFO yarn.Client: Will allocate AM container, with 896
> MB memory including 384 MB overhead
>
> 15/02/08 00:11:57 INFO yarn.Client: Setting up container launch context
> for our AM
>
> 15/02/08 00:11:57 INFO yarn.Client: Preparing resources for our AM
> container
>
> 15/02/08 00:11:58 WARN hdfs.BlockReaderLocal: The short-circuit local
> reads feature cannot be used because libhadoop cannot be loaded.
>
> Exception in thread "main"
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=xiaobogu, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>
>
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>
> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2555)
>
> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816)
>
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)
>
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:595)
>
> at
> org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:151)
>
> at
> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35)
>
> at
> org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:308)
>
> at
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35)
>
> at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80)
>
> at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501)
>
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)
>
> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)
>
> at org.apache.spark.deploy.yarn.Client.main(Client.scala)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Caused by:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
> Permission denied: user=xiaobogu, access=WRITE,
> inode="/user":hdfs:hdfs:drwxr-xr-x
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>
>
>  at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>
> at com.sun.proxy.$Proxy17.mkdirs(Unknown Source)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>
> at com.sun.proxy.$Proxy17.mkdirs(Unknown Source)
>
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500)
>
> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553)
>
> ... 24 more
>
> [xiaobogu@lix1 spark]$
>
>
>
>
> ------------------ Original ------------------
> *From: * "Zhan Zhang";<zzh...@hortonworks.com>;
> *Send time:* Friday, Feb 6, 2015 2:55 PM
> *To:* ""<guxiaobo1...@qq.com>;
> *Cc:* "user@spark.apache.org"<user@spark.apache.org>; "Cheng Lian"<
> lian.cs....@gmail.com>;
> *Subject: * Re: Can't access remote Hive table from spark
>
> Not sure spark standalone mode. But on spark-on-yarn, it should work. You
> can check following link:
>
>   http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/
>
>  Thanks.
>
>  Zhan Zhang
>
>  On Feb 5, 2015, at 5:02 PM, Cheng Lian <lian.cs....@gmail.com> wrote:
>
>   Please note that Spark 1.2.0 *only* support Hive 0.13.1 *or* 0.12.0,
> none of other versions are supported.
>
> Best,
> Cheng
>
> On 1/25/15 12:18 AM, guxiaobo1982 wrote:
>
>
>  Hi,
> I built and started a single node standalone Spark 1.2.0 cluster along
> with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the
> Spark and Hive node I can create and query tables inside Hive, and on
> remote machines I can submit the SparkPi example to the Spark master. But
> I failed to run the following example code :
>
>  public class SparkTest {
>
> public static void main(String[] args)
>
> {
>
> String appName= "This is a test application";
>
> String master="spark://lix1.bh.com:7077";
>
>  SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
>
> JavaSparkContext sc = new JavaSparkContext(conf);
>
>  JavaHiveContext sqlCtx = new
> org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);
>
> //sqlCtx.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");
>
> //sqlCtx.sql("LOAD DATA LOCAL INPATH '/opt/spark/examples/src
> /main/resources/kv1.txt' INTO TABLE src");
>
> // Queries are expressed in HiveQL.
>
> List<Row> rows = sqlCtx.sql("FROM src SELECT key, value").collect();
>
> System.out.print("I got " + rows.size() + " rows \r\n");
>
> sc.close();}
>
> }
>
>
>  Exception in thread "main"
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found
> src
>
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)
>
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)
>
> at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(
> HiveMetastoreCatalog.scala:70)
>
> at org.apache.spark.sql.hive.HiveContext$anon$2.org
> $apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(
> HiveContext.scala:253)
>
> at
> org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(
> Catalog.scala:141)
>
> at
> org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(
> Catalog.scala:141)
>
> at scala.Option.getOrElse(Option.scala:120)
>
> at
> org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(
> Catalog.scala:141)
>
> at org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(
> HiveContext.scala:253)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(
> Analyzer.scala:143)
>
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(
> Analyzer.scala:138)
>
> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(
> TreeNode.scala:144)
>
> at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(
> TreeNode.scala:162)
>
> at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48
> )
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(
> ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47
> )
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at scala.collection.TraversableOnce$class.toBuffer(
> TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at scala.collection.TraversableOnce$class.toArray(
> TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(
> TreeNode.scala:191)
>
> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(
> TreeNode.scala:147)
>
> at org.apache.spark.sql.catalyst.trees.TreeNode.transform(
> TreeNode.scala:135)
>
> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(
> Analyzer.scala:138)
>
> at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(
> Analyzer.scala:137)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(
> RuleExecutor.scala:61)
>
> at
> org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(
> RuleExecutor.scala:59)
>
> at scala.collection.LinearSeqOptimized$class.foldLeft(
> LinearSeqOptimized.scala:111)
>
> at scala.collection.immutable.List.foldLeft(List.scala:84)
>
> at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(
> RuleExecutor.scala:59)
>
> at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(
> RuleExecutor.scala:51)
>
> at scala.collection.immutable.List.foreach(List.scala:318)
>
> at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(
> RuleExecutor.scala:51)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(
> SQLContext.scala:411)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(
> SQLContext.scala:411)
>
> at
> org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(
> SQLContext.scala:412)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(
> SQLContext.scala:412)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(
> SQLContext.scala:413)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(
> SQLContext.scala:413)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(
> SQLContext.scala:418)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(
> SQLContext.scala:416)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(
> SQLContext.scala:422)
>
> at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(
> SQLContext.scala:422)
>
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
>
> at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(
> JavaSchemaRDD.scala:114)
>
> at com.blackhorse.SparkTest.main(SparkTest.java:27)
>
> [delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook
> called
>
>  [delete Spark local dirs] DEBUG
> org.apache.spark.storage.DiskBlockManager - Shutdown hook calle
>
>
>
>  But if I change the query to "show tables", the program can run and got
> 0 rows through I have many tables inside Hive, so I come to doubt that my
> program or the spark instance did not connect to my Hive instance, maybe it
> started a local hive. I have put the hive-site.xml file from Hive
> installation into spark's conf directory. Can you help figure out what's
> wrong here, thanks.
>
>
>
>  
>
>
>

Re: Can't access remote Hive table from spark

Reply via email to