Hi , I have a table sourced from* 2 parquet files* with few extra columns in one of the parquet file. Simple * queries works fine but queries with predicate on extra column doesn't work and I get column not found
*Column resp_party_type exist in just one parquet file* a) Query that work : select resp_party_type from operational_analytics b) Query that doesn't work : (complains about missing column *resp_party_type *) select category as Events, resp_party as Team, count(*) as Total from operational_analytics where application = 'PeopleMover' and resp_party_type = 'Team' group by category, resp_party *Query Plan for (b)* == Physical Plan == TungstenAggregate(key=[category#30986,resp_party#31006], functions=[(count(1),mode=Final,isDistinct=false)], output=[Events#36266,Team#36267,Total#36268L]) TungstenExchange hashpartitioning(category#30986,resp_party#31006) TungstenAggregate(key=[category#30986,resp_party#31006], functions=[(count(1),mode=Partial,isDistinct=false)], output=[category#30986,resp_party#31006,currentCount#36272L]) Project [category#30986,resp_party#31006] Filter ((application#30983 = PeopleMover) && (resp_party_type#31007 = Team)) Scan ParquetRelation[snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_peoplemover.parquet,snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_mis.parquet][category#30986,resp_party#31006,application#30983,resp_party_type#31007] I have set spark.sql.parquet.mergeSchema = true and spark.sql.parquet.filterPushdown = true. When I set spark.sql.parquet.filterPushdown = false Query (b) starts working, execution plan after setting the filterPushdown = false for Query(b) == Physical Plan == TungstenAggregate(key=[category#30986,resp_party#31006], functions=[(count(1),mode=Final,isDistinct=false)], output=[Events#36313,Team#36314,Total#36315L]) TungstenExchange hashpartitioning(category#30986,resp_party#31006) TungstenAggregate(key=[category#30986,resp_party#31006], functions=[(count(1),mode=Partial,isDistinct=false)], output=[category#30986,resp_party#31006,currentCount#36319L]) Project [category#30986,resp_party#31006] Filter ((application#30983 = PeopleMover) && (resp_party_type#31007 = Team)) Scan ParquetRelation[snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_peoplemover.parquet,snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_mis.parquet][category#30986,resp_party#31006,application#30983,resp_party_type#31007] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-schema-evolution-tp26563.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org