Hi Cheng,
Thank you for your informative explanation; it is quite helpful.
We'd like to try both approaches; should we have some progress, we would
update this thread so that anybody interested can follow.
Thanks again @yanboliang, @chenglian!
Hey Lin,
This is a good question. The root cause of this issue lies in the
analyzer. Currently, Spark SQL can only resolve a name to a top level
column. (Hive suffers the same issue.) Take the SQL query and struct you
provided as an example, col_b.col_d.col_g is resolved as two nested
GetStru
Hi yanbo, thanks for the quick response.
Looks like we'll need to do some work-around.
But before that, we'd like to dig into some related discussions first. We've
looked through the following urls, but none seems helpful.
Mailing list threads:
http://search-hadoop.com/m/q3RTtLkgZl1K4oyx/v=thread
This problem has been discussed long before, but I think there is no
straight forward way to read only col_g.
2015-12-30 17:48 GMT+08:00 lin :
> Hi all,
>
> We are trying to read from nested parquet data. SQL is "select
> col_b.col_d.col_g from some_table" and the data schema for some_table is:
>
Hi all,
We are trying to read from nested parquet data. SQL is "select
col_b.col_d.col_g from some_table" and the data schema for some_table is:
root
|-- col_a: long (nullable = false)
|-- col_b: struct (nullable = true)
||-- col_c: string (nullable = true)
||-- col_d: array (nullable