I use "LATERAL VIEW explode(...)" to read data from a parquet-file but the
full schema is requeseted by parquet instead just the used columns. When I
didn't use LATERAL VIEW the requested schema has just the two columns which
I use. Is it correct or is there place for an optimization or do I
understand there somthing wrong?
Here are my examples:
1) hiveContext.sql("SELECT userid FROM pef WHERE observeddays==20140509")
The requested schema is:
optional group observedDays (LIST) {
repeated int32 array;
}
required int64 userid;
}
This is what I expect although the result does not work, but that is not the
problem here!
2) hiveContext.sql("SELECT userid FROM pef LATERAL VIEW
explode(observeddays) od AS observed WHERE observed==20140509")
the requested schema is:
required int64 userid;
optional int32 source;
optional group observedDays (LIST) {
repeated int32 array;
}
optional group placetobe (LIST) {
repeated group bag {
optional group array {
optional binary palces (UTF8);
optional group dates (LIST) {
repeated int32 array;
}
}
}
}
}
Why does parquet request the full schema. I just use two fields of the
table.
Can somebody please explain me why this can happen.
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/LATERAL-VIEW-explode-requests-the-full-schema-tp21893.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]