Which version of Spark? This sounds like a bug (that might be fixed). On Tue, Mar 22, 2016 at 6:34 AM, gtinside <gtins...@gmail.com> wrote:
> Hi , > > I have a table sourced from* 2 parquet files* with few extra columns in one > of the parquet file. Simple * queries works fine but queries with predicate > on extra column doesn't work and I get column not found > > *Column resp_party_type exist in just one parquet file* > > a) Query that work : > select resp_party_type from operational_analytics > > b) Query that doesn't work : (complains about missing column > *resp_party_type *) > select category as Events, resp_party as Team, count(*) as Total from > operational_analytics where application = 'PeopleMover' and resp_party_type > = 'Team' group by category, resp_party > > *Query Plan for (b)* > == Physical Plan == > TungstenAggregate(key=[category#30986,resp_party#31006], > functions=[(count(1),mode=Final,isDistinct=false)], > output=[Events#36266,Team#36267,Total#36268L]) > TungstenExchange hashpartitioning(category#30986,resp_party#31006) > TungstenAggregate(key=[category#30986,resp_party#31006], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[category#30986,resp_party#31006,currentCount#36272L]) > Project [category#30986,resp_party#31006] > Filter ((application#30983 = PeopleMover) && (resp_party_type#31007 = > Team)) > Scan > > ParquetRelation[snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_peoplemover.parquet,snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_mis.parquet][category#30986,resp_party#31006,application#30983,resp_party_type#31007] > > > I have set spark.sql.parquet.mergeSchema = true and > spark.sql.parquet.filterPushdown = true. When I set > spark.sql.parquet.filterPushdown = false Query (b) starts working, > execution > plan after setting the filterPushdown = false for Query(b) > > == Physical Plan == > TungstenAggregate(key=[category#30986,resp_party#31006], > functions=[(count(1),mode=Final,isDistinct=false)], > output=[Events#36313,Team#36314,Total#36315L]) > TungstenExchange hashpartitioning(category#30986,resp_party#31006) > TungstenAggregate(key=[category#30986,resp_party#31006], > functions=[(count(1),mode=Partial,isDistinct=false)], > output=[category#30986,resp_party#31006,currentCount#36319L]) > Project [category#30986,resp_party#31006] > Filter ((application#30983 = PeopleMover) && (resp_party_type#31007 = > Team)) > Scan > > ParquetRelation[snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_peoplemover.parquet,snackfs://tst:9042/aladdin_data_beta/operational_analytics/operational_analytics_mis.parquet][category#30986,resp_party#31006,application#30983,resp_party_type#31007] > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-schema-evolution-tp26563.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >