I forgot to add. You might also try running: SET spark.sql.hive.metastorePartitionPruning=true
On Wed, Oct 14, 2015 at 2:23 PM, Michael Armbrust <mich...@databricks.com> wrote: > No link to the original stack overflow so I can up my reputation? :) > > This is likely not a difference between HiveContext/SQLContext, but > instead a difference between a table where the metadata is coming from the > HiveMetastore vs the SparkSQL Data Source API. I would guess that if you > create the table the same way, the performance would be similar. > > In the data source API we have spent a fair amount of time optimizing the > discovery and handling of many partitions, and in general I would say this > path is easier to use / faster. > > Likely the problem with the hive table, is downloading all of the > partition metadata from the metastore and converting it to our internal > format. We do this for all partitions, even though in this case you only > want the first ~20 rows. > > On Wed, Oct 14, 2015 at 1:38 PM, charles.drotar < > charles.dro...@capitalone.com> wrote: > >> I have duplicated my submission to stack overflow below since it is >> exactly >> the same question I would like to post here as well. Please don't judge me >> too harshly for my laziness >> >> < >> http://apache-spark-user-list.1001560.n3.nabble.com/file/n25067/Screen_Shot_2015-10-14_at_3.png >> > >> >> *The questions I am concerned with are the same ones listed in the >> "QUESTIONS" section namely:* >> >> */1) Has anyone noticed anything similar to this? >> 2) What is happening on the backend that could be causing this consumption >> of resources and what could I do to avoid it?/* >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Hive-Context-Does-Not-Return-Results-but-SQL-Context-Does-for-Similar-Query-tp25067.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >