You can't do this now without writing a bunch of custom logic (see here for an example: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala )
I would like to make this easier as part of improvements to the datasources api that we are planning for Spark 1.3 On Mon, Dec 29, 2014 at 2:19 AM, Mickalas <michael.belldav...@gmail.com> wrote: > I see that there is already a request to add wildcard support to the > SQLContext.parquetFile function > https://issues.apache.org/jira/browse/SPARK-3928. > > What seems like a useful thing for our use case is to associate the > directory structure with certain columns in the table, but it does not seem > like this is supported. > > For example we want to create parquet files on a daily basis associated > with > geographic regions and so will create a set of files under directories such > as: > > * 2014-12-29/Americas > * 2014-12-29/Asia > * 2014-12-30/Americas > * ... > > Where queries have predicates that match the column values determinable > from > directory structure it would be good to only extract data from matching > files. > > Does anyone know if something like this is supported, or whether this is a > reasonable thing to request? > > Mick > > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >