Hi! I have a question about spark left join pruning. For example, I have 2 tables: table A: create table A ( user_id int, gender string, email string, phone string, )
table B: create table B ( user_id int, jobs string, graduate_schools string ) If I select columns of A from sub-query with A left join B. Can spark optimize the plan to just scan table A? Query like this: select user_id, gender, email, phone from ( select A.user_id as user_id, A.gender as gender, A.email as email, A. phone as phone from A left join B on A.user_id = B.user_id ) More general question like this, If I have a view with table A1 left join table A2 left join... An Columns of Ai, Aj, Ak are selected. i, j k are subset of 1...n Can spark prune the unnecessary table during the table joining? -- *Shuo Chen* chenatu2...@gmail.com