Re: [SparkSQL][Parquet] Read from nested parquet data

2016-01-01 Thread lin
Hi Cheng, Thank you for your informative explanation; it is quite helpful. We'd like to try both approaches; should we have some progress, we would update this thread so that anybody interested can follow. Thanks again @yanboliang, @chenglian!

Re: [SparkSQL][Parquet] Read from nested parquet data

2015-12-31 Thread Cheng Lian
Hey Lin, This is a good question. The root cause of this issue lies in the analyzer. Currently, Spark SQL can only resolve a name to a top level column. (Hive suffers the same issue.) Take the SQL query and struct you provided as an example, col_b.col_d.col_g is resolved as two nested GetStru

Re: [SparkSQL][Parquet] Read from nested parquet data

2015-12-30 Thread lin
Hi yanbo, thanks for the quick response. Looks like we'll need to do some work-around. But before that, we'd like to dig into some related discussions first. We've looked through the following urls, but none seems helpful. Mailing list threads: http://search-hadoop.com/m/q3RTtLkgZl1K4oyx/v=thread

Re: [SparkSQL][Parquet] Read from nested parquet data

2015-12-30 Thread Yanbo Liang
This problem has been discussed long before, but I think there is no straight forward way to read only col_g. 2015-12-30 17:48 GMT+08:00 lin : > Hi all, > > We are trying to read from nested parquet data. SQL is "select > col_b.col_d.col_g from some_table" and the data schema for some_table is: >

[SparkSQL][Parquet] Read from nested parquet data

2015-12-30 Thread lin
Hi all, We are trying to read from nested parquet data. SQL is "select col_b.col_d.col_g from some_table" and the data schema for some_table is: root |-- col_a: long (nullable = false) |-- col_b: struct (nullable = true) ||-- col_c: string (nullable = true) ||-- col_d: array (nullable