Currently (In Spark 2.3.1) we cannot bucket DataFrames by nested columns, e.g
df.write.bucketBy(10, "key.a").saveAsTable(“junk”)
will result in the following exception:
org.apache.spark.sql.AnalysisException: bucket column key.a is not defined in
table junk, defined table columns are: key, value;
at
org.apache.spark.sql.catalyst.catalog.CatalogUtils$$anonfun$org$apache$spark$sql$catalyst$catalog$CatalogUtils$$normalizeColumnName$2.apply(ExternalCatalogUtils.scala:246)
at
org.apache.spark.sql.catalyst.catalog.CatalogUtils$$anonfun$org$apache$spark$sql$catalyst$catalog$CatalogUtils$$normalizeColumnName$2.apply(ExternalCatalogUtils.scala:246)
at scala.Option.getOrElse(Option.scala:121)
…
Are there plans to change this anytime soon?
Thanks, David
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]