Anurag,
There is another method called newAPIHadoopRDD that takes in a
Configuration object rather than a path. Give that a shot?
https://spark.apache.org/docs/latest/api/core/index.html#org.apache.spark.SparkContext
On Tue, Apr 8, 2014 at 1:47 PM, Anurag wrote:
> andrew - yes, i am using th
andrew/nick,
thx for the input, got it to work:
sc.hadoopConfiguration.set("record.delimiter.regex",
"^[A-Za-z]{3},\\s\\d{2}\\s[A-Za-z]{3}.*")
:-)
-anurag
On Tue, Apr 8, 2014 at 1:47 PM, Anurag wrote:
> andrew - yes, i am using the PatternInputFormat from the blog post you
> referenced.
> I
andrew - yes, i am using the PatternInputFormat from the blog post you
referenced.
I know how to set the pattern in configuration while writing a MR job, how
do i do that from a spark shell?
-anurag
On Tue, Apr 8, 2014 at 1:41 PM, Andrew Ash wrote:
> Are you using the PatternInputFormat from
Seems like you need to initialise a regex pattern for that inputformat. How
is this done? Perhaps via a config option?
In which case you need to first create a hadoop configuration, set the
appropriate config option for the regex, and pass that into
newAPIHadoopFile.
On Tue, Apr 8, 2014 at 10:36
Are you using the PatternInputFormat from this blog post?
https://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/
If so you need to set the pattern in the configuration before attempting to
read data with that InputFormat:
String regex = "^[A-Za-