You need to have right hdfs account, e.g., hdfs, to create directory and assign permission.
Thanks. Zhan Zhang On Feb 11, 2015, at 4:34 AM, guxiaobo1982 <guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>> wrote: Hi Zhan, My Single Node Cluster of Hadoop is installed by Ambari 1.7.0, I tried to create the /user/xiaobogu directory in hdfs, but both failed with user xiaobogu and root [xiaobogu@lix1 current]$ hadoop dfs -mkdir /user/xiaobogu DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. mkdir: Permission denied: user=xiaobogu, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x root@lix1 bin]# hadoop dfs -mkdir /user/xiaobogu DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. mkdir: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x I notice there is a hdfs account created by ambari, but what's password for it, should I user the hdfs account to create the directory? ------------------ Original ------------------ From: "Zhan Zhang";<zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>>; Send time: Sunday, Feb 8, 2015 4:11 AM To: ""<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>>; Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"<user@spark.apache.org<mailto:user@spark.apache.org>>; "Cheng Lian"<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>>; Subject: Re: Can't access remote Hive table from spark Yes. You need to create xiaobogu under /user and provide right permission to xiaobogu. Thanks. Zhan Zhang On Feb 7, 2015, at 8:15 AM, guxiaobo1982 <guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>> wrote: Hi Zhan Zhang, With the pre-bulit version 1.2.0 of spark against the yarn cluster installed by ambari 1.7.0, I come with the following errors: [xiaobogu@lix1 spark]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10 Spark assembly has been built with Hive, including Datanucleus jars on classpath 15/02/08 00:11:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/02/08 00:11:54 INFO client.RMProxy: Connecting to ResourceManager at lix1.bh.com/192.168.100.3:8050<http://lix1.bh.com/192.168.100.3:8050> 15/02/08 00:11:56 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers 15/02/08 00:11:57 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container) 15/02/08 00:11:57 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 15/02/08 00:11:57 INFO yarn.Client: Setting up container launch context for our AM 15/02/08 00:11:57 INFO yarn.Client: Preparing resources for our AM container 15/02/08 00:11:58 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=xiaobogu, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2555) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827) at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:595) at org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:151) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35) at org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:308) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80) at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501) at org.apache.spark.deploy.yarn.Client.run(Client.scala:35) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xiaobogu, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy17.mkdirs(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy17.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553) ... 24 more [xiaobogu@lix1 spark]$ ------------------ Original ------------------ From: "Zhan Zhang";<zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>>; Send time: Friday, Feb 6, 2015 2:55 PM To: ""<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>>; Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"<user@spark.apache.org<mailto:user@spark.apache.org>>; "Cheng Lian"<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>>; Subject: Re: Can't access remote Hive table from spark Not sure spark standalone mode. But on spark-on-yarn, it should work. You can check following link: http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/ Thanks. Zhan Zhang On Feb 5, 2015, at 5:02 PM, Cheng Lian <lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote: Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0, none of other versions are supported. Best, Cheng On 1/25/15 12:18 AM, guxiaobo1982 wrote: Hi, I built and started a single node standalone Spark 1.2.0 cluster along with a single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and Hive node I can create and query tables inside Hive, and on remote machines I can submit the SparkPi example to the Spark master. But I failed to run the following example code : public class SparkTest { public static void main(String[] args) { String appName= "This is a test application"; String master="spark://lix1.bh.com:7077"; SparkConf conf = new SparkConf().setAppName(appName).setMaster(master); JavaSparkContext sc = new JavaSparkContext(conf); JavaHiveContext sqlCtx = new org.apache.spark.sql.hive.api.java.JavaHiveContext(sc); //sqlCtx.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)"); //sqlCtx.sql("LOAD DATA LOCAL INPATH '/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src"); // Queries are expressed in HiveQL. List<Row> rows = sqlCtx.sql("FROM src SELECT key, value").collect(); System.out.print("I got " + rows.size() + " rows \r\n"); sc.close();} } Exception in thread "main" org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70) at org.apache.spark.sql.hive.HiveContext$anon$2.org<http://2.org/>$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141) at org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111) at scala.collection.immutable.List.foldLeft(List.scala:84) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413) at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418) at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422) at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422) at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444) at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114) at com.blackhorse.SparkTest.main(SparkTest.java:27) [delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook called [delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - Shutdown hook calle But if I change the query to "show tables", the program can run and got 0 rows through I have many tables inside Hive, so I come to doubt that my program or the spark instance did not connect to my Hive instance, maybe it started a local hive. I have put the hive-site.xml file from Hive installation into spark's conf directory. Can you help figure out what's wrong here, thanks.