Hi, I have data files (json in this example but could also be avro) written in a directory structure like:
dataroot +-- year=2015 +-- month=06 +-- day=01 +-- data1.json +-- data2.json +-- data3.json +-- day=02 +-- data1.json +-- data2.json +-- data3.json +-- month=07 +-- day=20 +-- data1.json +-- data2.json +-- data3.json +-- day=21 +-- data1.json +-- data2.json +-- data3.json +-- day=22 +-- data1.json +-- data2.json Using spark-sql I create a temporary table: CREATE TEMPORARY TABLE dataTable USING org.apache.spark.sql.json OPTIONS ( path "dataroot/*" ) Querying the table works well but I'm so far not able to use the directories for pruning. Is there a way to register the directory structure as partitions (without using Hive) to avoid scanning the whole tree when I query, say I want to compare data for the first day of the month? Thanks, Johan