Thanks a lot Cheng. So it seems even in spark 1.3 and 1.4, parquet ENUMs were treated as Strings in Spark SQL right? So does this mean partitioning for enums already works in previous versions too since they are just treated as strings?
Also, is there a good way to verify that the partitioning is being used? I tried "explain" like (where data is partitioned by "type" column) scala> ev.filter("type = 'NON'").explain == Physical Plan == Filter (type#4849 = NON) PhysicalRDD [...cols..], MapPartitionsRDD[103] at map at newParquet.scala:573 but that is the same even with non partitioned data. On Tue, Jul 21, 2015 at 4:35 PM, Cheng Lian <lian.cs....@gmail.com> wrote: > Parquet support for Thrift/Avro/ProtoBuf ENUM types are just added to the > master branch. https://github.com/apache/spark/pull/7048 > > ENUM types are actually not in the Parquet format spec, that's why we > didn't have it at the first place. Basically, ENUMs are always treated as > UTF8 strings in Spark SQL now. > > Cheng > > On 7/22/15 3:41 AM, ankits wrote: > >> Hi, I am using a custom build of spark 1.4 with the parquet dependency >> upgraded to 1.7. I have thrift data encoded with parquet that i want to >> partition by a column of type ENUM. Spark programming guide says partition >> discovery is only supported for string and numeric columns, so it seems >> partition discovery won't work out of the box here. >> >> Is there any workaround that will allow me to partition by ENUMs? Will >> hive >> partitioning help here? I am unfamiliar with Hive, and how it plays into >> parquet, thrift and spark so I would appreciate any pointers in the right >> direction. Thanks. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Partition-parquet-data-by-ENUM-column-tp23939.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >