...@hortonworks.com
CC: user@spark.apache.org
Subject: RE: Directory / File Reading Patterns
Date: Sun, 18 Jan 2015 15:41:53 +
You may also want to keep an eye on SPARK-5182 / SPARK-5302 which may help if
you are using Spark SQL. It should be noted that this is possible with
HiveContext today.
Cheers
You may also want to keep an eye on SPARK-5182 / SPARK-5302 which may help if
you are using Spark SQL. It should be noted that this is possible with
HiveContext today.
Cheers,
Bob
Date: Sun, 18 Jan 2015 08:47:06 +
Subject: Re: Directory / File Reading Patterns
From: so...@cloudera.com
I think that putting part of the data (only) in a filename is an
anti-pattern, but we sometimes have to play these where they lie.
You can list all the directory paths containing the CSV files, map them
each to RDDs with textFile, transform the RDDs to include info from the
path, and then simply u