[Spark SQL]: How to read Hive tables with Sub directories - is this supported?

mattl156 Tue, 19 Jun 2018 21:40:14 -0700

Hello,


We have a number of Hive tables (non partitioned) that are populated with
subdirectories. (result of tez execution engine union queries)

 

E.g. Table location: “s3://table1/” With the actual data residing in:

 

s3://table1/1/data1

s3://table1/2/data2

s3://table1/3/data3

 

When using SparkSession (sql/hiveContext has the same behavior) and
spark.sql to query the data, no records are displayed due to these
subdirectories.



e.g 

val df = spark.sql("select * from db.table1").show()



I’ve tried a number of setConf properties e.g.
spark.hive.mapred.supports.subdirectories=true,
mapreduce.input.fileinputformat.input.dir.recursive=true but it does not
look like any of these properties are supported. 

 

Has anyone run into similar problems or ways to resolve it? Our current
alternatives are reading the input path directory directly e.g.:




spark.read.csv("s3://bucket-name/table1/bullseye_segments/*/*")


But this requires prior knowledge of the path or an extra step to determine
it. 
 

Thanks,

Matt





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark SQL]: How to read Hive tables with Sub directories - is this supported?

Reply via email to