Badrinath is trying to write to a Hive in a cluster where he doesn't have permission to submit spark jobs, he doesn't have Hive/Spark metadata access. The only way to communicate with this third-party Hive cluster is through JDBC protocol.
[ Cloudera Data Hub - Hive Server] <-> [Spark Standalone] Who's creating this table is "Spark" because he's using "overwrite" in order to test it. df.write .format("jdbc") .option("url", "jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0") .option("dbtable", "test.test") .option("user", "admin") .option("password", "admin") .option("driver", "com.cloudera.hive.jdbc41.HS2Driver") * .mode("overwrite")* .save This error is weird, looks like the third-party Hive server isn't able to recognize the SQL dialect coming from [Spark Standalone] server JDBC driver. 1) I would try to execute the create statement manually in this server 2) if works try to run again with "append" option I would open a case with Cloudera and ask which driver you should use. Thanks On Mon, Jul 19, 2021 at 10:33 AM Artemis User <arte...@dtechspace.com> wrote: > As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's > saveAsTable method is the way to go. JDBC Driver is for a JDBC client (a > Java client for instance) to access the Hive tables in Spark via the Thrift > server interface. > > -- ND > > On 7/19/21 2:42 AM, Badrinath Patchikolla wrote: > > I have trying to create table in hive from spark itself, > > And using local mode it will work what I am trying here is from spark > standalone I want to create the manage table in hive (another spark cluster > basically CDH) using jdbc mode. > > When I try that below are the error I am facing. > > On Thu, 15 Jul, 2021, 9:55 pm Mich Talebzadeh, <mich.talebza...@gmail.com> > wrote: > >> Have you created that table in Hive or are you trying to create it from >> Spark itself. >> >> You Hive is local. In this case you don't need a JDBC connection. Have >> you tried: >> >> df2.write.mode("overwrite").saveAsTable(mydb.mytable) >> >> HTH >> >> >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Thu, 15 Jul 2021 at 12:51, Badrinath Patchikolla < >> pbadrinath1...@gmail.com> wrote: >> >>> Hi, >>> >>> Trying to write data in spark to the hive as JDBC mode below is the >>> sample code: >>> >>> spark standalone 2.4.7 version >>> >>> 21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load >>> native-hadoop library for your platform... using builtin-java classes where >>> applicable >>> Setting default log level to "WARN". >>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use >>> setLogLevel(newLevel). >>> Spark context Web UI available at http://localhost:4040 >>> Spark context available as 'sc' (master = spark://localhost:7077, app id >>> = app-20210715080414-0817). >>> Spark session available as 'spark'. >>> Welcome to >>> ____ __ >>> / __/__ ___ _____/ /__ >>> _\ \/ _ \/ _ `/ __/ '_/ >>> /___/ .__/\_,_/_/ /_/\_\ version 2.4.7 >>> /_/ >>> >>> Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_292) >>> Type in expressions to have them evaluated. >>> Type :help for more information. >>> >>> scala> :paste >>> // Entering paste mode (ctrl-D to finish) >>> >>> val df = Seq( >>> ("John", "Smith", "London"), >>> ("David", "Jones", "India"), >>> ("Michael", "Johnson", "Indonesia"), >>> ("Chris", "Lee", "Brazil"), >>> ("Mike", "Brown", "Russia") >>> ).toDF("first_name", "last_name", "country") >>> >>> >>> df.write >>> .format("jdbc") >>> .option("url", >>> "jdbc:hive2://localhost:10000/foundation;AuthMech=2;UseNativeQuery=0") >>> .option("dbtable", "test.test") >>> .option("user", "admin") >>> .option("password", "admin") >>> .option("driver", "com.cloudera.hive.jdbc41.HS2Driver") >>> .mode("overwrite") >>> .save >>> >>> >>> // Exiting paste mode, now interpreting. >>> >>> java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR >>> processing query/statement. Error Code: 40000, SQL state: >>> TStatus(statusCode:ERROR_STATUS, >>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while >>> compiling statement: FAILED: ParseException line 1:39 cannot recognize >>> input near '"first_name"' 'TEXT' ',' in column name or primary key or >>> foreign key:28:27, >>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329, >>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207, >>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290, >>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260, >>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504, >>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490, >>> sun.reflect.GeneratedMethodAccessor13:invoke::-1, >>> sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43, >>> java.lang.reflect.Method:invoke:Method.java:498, >>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78, >>> org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36, >>> org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63, >>> java.security.AccessController:doPrivileged:AccessController.java:-2, >>> javax.security.auth.Subject:doAs:Subject.java:422, >>> org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875, >>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59, >>> com.sun.proxy.$Proxy35:executeStatementAsync::-1, >>> org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295, >>> org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507, >>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437, >>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422, >>> org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39, >>> org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39, >>> org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56, >>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286, >>> java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149, >>> java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624, >>> java.lang.Thread:run:Thread.java:748, >>> *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39 cannot recognize >>> input near '"first_name"' 'TEXT' ',' in column name or primary key or >>> foreign key:33:6, >>> org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221, >>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75, >>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68, >>> org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564, >>> org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425, >>> org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398, >>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205], >>> sqlState:42000, errorCode:40000, errorMessage:Error while compiling >>> statement: FAILED: ParseException line 1:39 cannot recognize input near >>> '"first_name"' 'TEXT' ',' in column name or primary key or foreign key), >>> Query: CREATE TABLE test.test("first_name" TEXT , "last_name" TEXT , >>> "country" TEXT ). >>> at >>> com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatementInternal(Unknown >>> Source) >>> at >>> com.cloudera.hiveserver2.hivecommon.api.HS2Client.executeStatement(Unknown >>> Source) >>> at >>> com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeHelper(Unknown >>> Source) >>> at >>> com.cloudera.hiveserver2.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.execute(Unknown >>> Source) >>> at >>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeNoParams(Unknown >>> Source) >>> at >>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeAnyUpdate(Unknown >>> Source) >>> at >>> com.cloudera.hiveserver2.jdbc.common.SStatement.executeUpdate(Unknown >>> Source) >>> at >>> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createTable(JdbcUtils.scala:863) >>> at >>> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:81) >>> at >>> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) >>> at >>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) >>> at >>> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) >>> at >>> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) >>> at >>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) >>> at >>> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) >>> at >>> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) >>> at >>> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) >>> at >>> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) >>> at >>> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83) >>> at >>> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) >>> at >>> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) >>> at >>> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696) >>> at >>> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) >>> at >>> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) >>> at >>> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) >>> at >>> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) >>> at >>> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) >>> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) >>> ... 48 elided >>> Caused by: com.cloudera.hiveserver2.support.exceptions.GeneralException: >>> [Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error >>> Code: 40000, SQL state: TStatus(statusCode:ERROR_STATUS, >>> infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while >>> compiling statement: FAILED: ParseException line 1:39 cannot recognize >>> input near '"first_name"' 'TEXT' ',' in column name or primary key or >>> foreign key:28:27, >>> org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:329, >>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:207, >>> org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:290, >>> org.apache.hive.service.cli.operation.Operation:run:Operation.java:260, >>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:504, >>> org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementAsync:HiveSessionImpl.java:490, >>> sun.reflect.GeneratedMethodAccessor13:invoke::-1, >>> sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43, >>> java.lang.reflect.Method:invoke:Method.java:498, >>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78, >>> org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36, >>> org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63, >>> java.security.AccessController:doPrivileged:AccessController.java:-2, >>> javax.security.auth.Subject:doAs:Subject.java:422, >>> org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1875, >>> org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59, >>> com.sun.proxy.$Proxy35:executeStatementAsync::-1, >>> org.apache.hive.service.cli.CLIService:executeStatementAsync:CLIService.java:295, >>> org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:507, >>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1437, >>> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1422, >>> org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39, >>> org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39, >>> org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56, >>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286, >>> java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149, >>> java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624, >>> java.lang.Thread:run:Thread.java:748, >>> *org.apache.hadoop.hive.ql.parse.ParseException:line 1:39 cannot recognize >>> input near '"first_name"' 'TEXT' ',' in column name or primary key or >>> foreign key:33:6, >>> org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:221, >>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:75, >>> org.apache.hadoop.hive.ql.parse.ParseUtils:parse:ParseUtils.java:68, >>> org.apache.hadoop.hive.ql.Driver:compile:Driver.java:564, >>> org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1425, >>> org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1398, >>> org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:205], >>> sqlState:42000, errorCode:40000, errorMessage:Error while compiling >>> statement: FAILED: ParseException line 1:39 cannot recognize input near >>> '"first_name"' 'TEXT' ',' in column name or primary key or foreign key), >>> Query: CREATE TABLE profile_test.person_test ("first_name" TEXT , >>> "last_name" TEXT , "country" TEXT ). >>> ... 77 more >>> >>> >>> Found similar issue ion Jira : >>> >>> https://issues.apache.org/jira/browse/SPARK-31614 >>> >>> There no comments in that and the resolution is Incomplete, is there >>> any way we can do in spark to write data into the hive as JDBC mode. >>> >>> Thanks for any help. >>> >>> >>> Thanks, >>> Badrinath. >>> >> > -- -- Daniel Mantovani