Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/4446#discussion_r24467869
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -166,15 +160,82 @@ case class CreateMetastoreDataSourceAsSelect(
options
}
- // Create the relation based on the data of df.
- ResolvedDataSource(sqlContext, provider, optionsWithPath, df)
+ if (sqlContext.catalog.tableExists(Seq(tableName))) {
+ // Check if we need to throw an exception or just return.
+ mode match {
+ case SaveMode.ErrorIfExists =>
+ sys.error(s"Table $tableName already exists. " +
+ s"If you want to append into it, please set mode to
SaveMode.Append. " +
+ s"Or, if you want to overwrite it, please set mode to
SaveMode.Overwrite.")
+ case SaveMode.Ignore =>
+ // Since the table already exists and the save mode is Ignore,
we will just return.
+ return Seq.empty[Row]
+ case SaveMode.Append =>
+ // Check if the specified data source match the data source of
the existing table.
+ val resolved =
+ ResolvedDataSource(sqlContext, Some(query.schema), provider,
optionsWithPath)
+ val createdRelation = LogicalRelation(resolved.relation)
+
EliminateAnalysisOperators(sqlContext.table(tableName).logicalPlan) match {
+ case l @ LogicalRelation(i: InsertableRelation) =>
+ if (l.schema != createdRelation.schema) {
+ val errorDescription =
+ s"Cannot append to table $tableName because the schema
of this " +
+ s"DataFrame does not match the schema of table
$tableName."
+ val errorMessage =
+ s"""
+ |$errorDescription
+ |== Schemas ==
+ |${sideBySide(
+ s"== Expected Schema ==" +:
+ l.schema.treeString.split("\\\n"),
+ s"== Actual Schema ==" +:
+
createdRelation.schema.treeString.split("\\\n")).mkString("\n")}
+ """.stripMargin
+ sys.error(errorMessage)
+ } else if (i != createdRelation.relation) {
+ val errorDescription =
+ s"Cannot append to table $tableName because the resolved
relation does not " +
+ s"match the existing relation of $tableName. " +
+ s"You can use insertInto($tableName, false) to append
this DataFrame to the " +
+ s"table $tableName and using its data source and
options."
--- End diff --
Nit: I would prefer the triple quotes with stripMargin as is done elsewhere.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]