Hi, I did not explicitly create a Hive Context. I have been using the spark.sqlContext that gets created upon launching the spark-shell. Isn't this sqlContext same as the hiveContext? Thanks, Rishikesh
On Wed, Aug 7, 2019 at 12:43 PM Jörn Franke <jornfra...@gmail.com> wrote: > Do you use the HiveContext in Spark? Do you configure the same options > there? Can you share some code? > > Am 07.08.2019 um 08:50 schrieb Rishikesh Gawade <rishikeshg1...@gmail.com > >: > > Hi. > I am using Spark 2.3.2 and Hive 3.1.0. > Even if i use parquet files the result would be same, because after all > sparkSQL isn't able to descend into the subdirectories over which the table > is created. Could there be any other way? > Thanks, > Rishikesh > > On Tue, Aug 6, 2019, 1:03 PM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> which versions of Spark and Hive are you using. >> >> what will happen if you use parquet tables instead? >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Tue, 6 Aug 2019 at 07:58, Rishikesh Gawade <rishikeshg1...@gmail.com> >> wrote: >> >>> Hi. >>> I have built a Hive external table on top of a directory 'A' which has >>> data stored in ORC format. This directory has several subdirectories inside >>> it, each of which contains the actual ORC files. >>> These subdirectories are actually created by spark jobs which ingest >>> data from other sources and write it into this directory. >>> I tried creating a table and setting the table properties of the same as >>> *hive.mapred.supports.subdirectories=TRUE* and >>> *mapred.input.dir.recursive**=TRUE*. >>> As a result of this, when i fire the simplest query of *select count(*) >>> from ExtTable* via the Hive CLI, it successfully gives me the expected >>> count of records in the table. >>> However, when i fire the same query via sparkSQL, i get count = 0. >>> >>> I think the sparkSQL isn't able to descend into the subdirectories for >>> getting the data while hive is able to do so. >>> Are there any configurations needed to be set on the spark side so that >>> this works as it does via hive cli? >>> I am using Spark on YARN. >>> >>> Thanks, >>> Rishikesh >>> >>> Tags: subdirectories, subdirectory, recursive, recursion, hive external >>> table, orc, sparksql, yarn >>> >>