[ https://issues.apache.org/jira/browse/SPARK-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maciej Bryński updated SPARK-16320: ----------------------------------- Description: I did some test on parquet file with many nested columns (about 30G in 400 partitions) and Spark 2.0 is sometimes slower. I tested following queries: 1) `select count(*) where id > some_id` In this query performance is similar. (about 1 sec) 2) `select count(*) where nested_column.id > some_id` Spark 1.6 -> 1.6 min Spark 2.0 -> 2.1 min Should I expect such a drop in performance ? I don't know how to prepare sample data to show the problem. Any ideas ? Or public data with many nested columns ? was: I did some test on parquet file with many nested columns (about 30G in 400 partitions) and Spark 2.0 is sometimes slower. I tested following queries: 1) select count(*) where id > some_id In this query performance is similar. (about 1 sec) 2) select count(*) where nested_column.id > some_id Spark 1.6 -> 1.6 min Spark 2.0 -> 2.1 min Should I expect such a drop in performance ? I don't know how to prepare sample data to show the problem. Any ideas ? Or public data with many nested columns ? > Spark 2.0 slower than 1.6 when querying nested columns > ------------------------------------------------------ > > Key: SPARK-16320 > URL: https://issues.apache.org/jira/browse/SPARK-16320 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Maciej Bryński > > I did some test on parquet file with many nested columns (about 30G in > 400 partitions) and Spark 2.0 is sometimes slower. > I tested following queries: > 1) `select count(*) where id > some_id` > In this query performance is similar. (about 1 sec) > 2) `select count(*) where nested_column.id > some_id` > Spark 1.6 -> 1.6 min > Spark 2.0 -> 2.1 min > Should I expect such a drop in performance ? > I don't know how to prepare sample data to show the problem. > Any ideas ? Or public data with many nested columns ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org