If you can reproduce the issue with Spark 2.0.2 I'd suggest opening a JIRA.
On Fri, Nov 4, 2016 at 5:11 PM, Sam Goodwin <sam.goodwi...@gmail.com> wrote: > I have a table with a few columns, some of which are arrays. Since > upgrading from Spark 1.6 to Spark 2.0.1, the array fields are always null > when reading in a DataFrame. > > When writing the Parquet files, the schema of the column is specified as > > StructField("packageIds",ArrayType(StringType)) > > The schema of the column in the Hive Metastore is > > packageIds array<string> > > The schema used in the writer exactly matches the schema in the Metastore > in all ways (order, casing, types etc) > > The query is a simple "select *" > > spark.sql("select * from tablename limit 1").collect() // null columns in Row > > How can I begin debugging this issue? Notable things I've already > investigated: > > - Files were written using Spark 1.6 > - DataFrame works in spark 1.5 and 1.6 > - I've inspected the parquet files using parquet-tools and can see the > data. > - I also have another table written in exactly the same way and it > doesn't have the issue. > >