Spark reported error java.lang.IllegalArgumentException with messages:
java.lang.IllegalArgumentException: requirement failed: Found fields with
the same name.
at scala.Predef$.require(Predef.scala:233)
at
org.apache.spark.sql.catalyst.types.StructType.<init>(dataTypes.scala:317)
at
org.apache.spark.sql.catalyst.types.StructType$.fromAttributes(dataTypes.scala:310)
at
org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:306)
at
org.apache.spark.sql.parquet.ParquetTableScan.execute(ParquetTableOperations.scala:83)
at
org.apache.spark.sql.execution.Filter.execute(basicOperators.scala:57)
at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:85)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:433)
After trial and error, it seems it's caused by duplicated columns in my
select clause.
I made the duplication on purpose for my code to parse correctly. I think
we should allow users to specify duplicated columns as return value.
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/