For the question 1., It is possible but not supported yet. Please refer https://github.com/apache/spark/pull/13775
Thanks! 2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr>: > Hi, > > Assuming I have some data in both ORC/Parquet formats, and some complex > workflow that eventually combine results of some queries on these datasets, > I would like to get the best execution and looking at the default configs I > noticed: > > 1) Vectorized query execution possible with Parquet only, can you confirm > this is possible with the ORC format? > > parameter spark.sql.parquet.enableVectorizedReader > [1] > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala > Hive is assuming ORC, parameter hive.vectorized.execution.enabled > [2] > https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution > > 2) Enabling filter pushdown is by default true for Parquet only, why not > also for ORC? > spark.sql.parquet.filterPushdown=true > spark.sql.orc.filterPushdown=false > > 3) Should I even try to process ORC format with Spark at it seems there is > Parquet native support? > > > Thank you! > > Best, > Ovidiu >